nate berkopec's latest activity

Author, The Complete Guide to Rails Performance. Co-maintainer of Puma. speedshop.co (he/him)

Here's a demonstration of how IO/CPU interact with the GVL to affect the throughput of your Puma or Sidekiq application. Give it a run (gem install parallel first) and see what happens! You can also try removing the GVL by making Parallel use processes instead of threads.

gist.github.com/nateberkopec/b

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

This is why your load test is a lie.

This is what real prod traffic looks like. 200 rps one second, 500 rps the next, ping ponging around from moment to moment. Uneven arrivals like this are so much harder to deal with than fake, synthetic load test requests.

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin
2
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

If you've got *1 million* concurrent users, saving $2 million/year in infra is hopefully not the most important thing for that business.

1
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

Had a little “lost my yubikey” scare. Now I’ve done what I should have done in the first place: made it hard to misplace and have two to begin with!

1
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

The easiest way to spout bullshit about performance is to talk in relative terms only (this is 3x faster than before!) without reference to the absolute.

Great, your new code is 3x faster. But it runs 3 million iter/sec and we only call it once.

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

Is there a compelling argument for _not_ always using YJIT locally/in development?

I think most people aren't.

3
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

Imagine that one of the DBs for your app suddenly had 100ms added to every call. You need to access this DB currently 1 to 30 times per transaction.

What would you do to compensate for this added latency?

2
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

I've written a ~500 line web application load simulator in Ruby. You give it the number of servers, processes, threads, p50 and p95 response times, # of db VCPU, and I/O wait %, and it Monte Carlo simulates your maximum possible req/sec.

Deploying as a tool for retainer clients soon.

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

Underrated/missed change from Dima Fatko to basecamp/marginalia:

github.com/basecamp/marginalia

I've profiled the previous version using caller and felt that capturing line numbers were too expensive as a result. caller_locations is a new-ish API and this change would make a big difference!

1
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

The costs of setting pools too low is obvious - high latency caused by concurrent threads blocking on checking out a connection.

Pool "too high" cost is you don't catch leaks. But leaks have been far less of an issue in recent years, and there's probably better ways to detect.

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

sorry, RMT = RAILS_MAX_THREADS or whatever you use to set your puma/sidekiq concurrency

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

I'm wondering if database pools should always be set to 25 conns.

Puma/Sidekiq is not the only source of concurrency. load_async, Parallel, Thread.new, fibers, etc. So RMT + 5 doesn't make sense.

25 is low enough to catch leaks, high enough to allow concurrency

3
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

Check out this before/after shot of our retainer client deploying a bunch of missing foreign key indexes identified by ids_must_be_indexed.

github.com/speedshop/ids_must_

1
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

mosh --predict=experimental is CRAZY good for removing latency on SSH connections. I will probably never use ssh again.

1
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

If you want to limit concurrency to an external HTTP API, create a remote gateway class and put the limiter THERE, not on background jobs that access the API.

It's really common for teams to end up with a spaghetti of locks on jobs that end up over or under throttling the API calls. Have one lock, in one place, not on the job.

1
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

You should know about hyperfine:

github.com/sharkdp/hyperfine

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

VERY common error with newbies and profiling:

They don't check that the output/thing they're profiling actually does what they think it is.

You end up profiling a command or something and you accidentally are profiling an error pathway instead of the real thing. ALWAYS check the output!

0
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin

TIL: Bundler's job parallelization uses threads, not processes

github.com/rubygems/rubygems/b

The default is "the number of available processors", but the number of processors has nothing to do with the optimal number here, only 1 processor will be used.

1
Share
Share on Mastodon
Share on Twitter
Share on Facebook
Share on Linkedin
Replies