What is this?

The Genius annotation is the work of the Genius Editorial project. Our editors and contributors collaborate to create the most interesting and informative explanation of any line of text. It’s also a work in progress, so leave a suggestion if this or any annotation is missing something.

To learn more about participating in the Genius Editorial project, check out the contributor guidelines.

Loading...

Beguiling is to charm or enchant someone in a deceptive way; the narrator is intrigued by the bird, to the point of being amused by it. He seems to be teetering here between fear and an unstable kind of joy (or mania).

This video is processing – it'll appear automatically when it's done.

This video is processing – it'll appear automatically when it's done.

What is this?

The Genius annotation is the work of the Genius Editorial project. Our editors and contributors collaborate to create the most interesting and informative explanation of any line of text. It’s also a work in progress, so leave a suggestion if this or any annotation is missing something.

To learn more about participating in the Genius Editorial project, check out the contributor guidelines.

Loading...

Rap Genius runs on Heroku. In fact, with nearly 15 million monthly unique visitors, we’re one of their largest customers.

This is the story of a quiet change they made to their infrastructure that radically changed – for the worse – the scaling properties of Rails applications on their platform.

Heroku responded with an apology and a technical review

This video is processing – it'll appear automatically when it's done.

Click here to get the full source on Github

Brief summary
Our goal is to populate a table whose rows represent the lifecycle of every web request within a five-minute window. There are four columns: the request’s start time, its duration, the index of the dyno it was assigned to, and the time it spent in a dyno queue. All times are given in milliseconds.

results = matrix(c(start_times, 
                    req_durations_in_ms, 
                    dyno_assignments, 
                    rep(0, total_requests)), nrow = total_requests, ncol = 4)

To fill out the first column – the list of start times – we imagined that in each millisecond of the simulation, a new request will spawn with probability determined by a Poisson process. More than one request can be spawned per millisecond. The idea is to have 9,000 requests per minute on average. (For simplicity you could get away with distributing the requests uniformly, but a Poisson distribution is truer to life.)

reqs_starting = rpois(simulation_length_in_ms, reqs_per_ms)

The second column we also pre-calculated. We sampled request times from real data given to us by Heroku, summarized in the table in the main text.

(Note that these times do not include in-dyno queuing, just the amount of time actually spent by the app processing each request. The whole goal of the simulation is to back out the queue times.)

rq = read.table("~/Downloads/request_times.txt", header=FALSE)$V1
req_durations_in_ms = sample(rq, total_requests, TRUE)

The last two columns will be filled out by the main loop of the program. That is, for each request we’re trying to (a) assign it to a dyno and (b) figure out how long it will queue.

In the naive routing regime, assigning a dyno is trivial: we just choose one randomly. In the intelligent regime, we ask each dyno when it’s going to be next available, and choose the dyno with the best (soonest) answer. Modulo some processing time (<20ms), this is equivalent to buffering requests at the router and dispatching them only when a dyno frees up.

for(i in 1:nrow(results)) {
  row = results[i,]
  st = row["start_time"]
  duration = row["request_duration"]
  if (router_mode == "naive") { 
    dyno = row["dyno"] 
   } else { 
     dyno = which.min(dyno_next_available) 
   }
}

To calculate time spent queuing, we simply subtract our chosen start time – the time that this request’s assigned dyno will become available – from the current millisecond. And we update each dyno’s “next available” time to account for the time required to service the current request. Rinse and repeat until we’ve run out of requests.

queue_time = dyno_next_available[dyno] – st
results[i, "time_in_queue"] = queue_time
dyno_next_available[dyno] = st + queue_time + duration

When we’re done, we have a large table of requests, and for each one we know whether and for how long it queued.

(Compare this style of simulation to, say, running a bunch of Ruby threads that sleep to mimic dynos processing a request. Here, once we have our distribution of request times, and request start times, our calculations are precise and precisely replicable. We treat each dyno as a vector of milliseconds and painstakingly figure out whether it will be processing or not at each tick. “Queues” of requests are represented by adjacent strings of “yes I’m working” marks in a dyno’s lifetime (its list of 1ms ticks). At any moment of the program’s execution we can interrogate each dyno to see precisely which request it’s serving, which are queued, etc. This is what allows us to make the graphs you see below, and gives us confidence in the correctness of the results.)

This video is processing – it'll appear automatically when it's done.

This video is processing – it'll appear automatically when it's done.

We found that the Weibull distribution with shape parameter = 0.46 is a reasonable approximation

So if you want to run this at home, you should do:

  • wshape = 0.46
  • wlambda = 50 / (log(2) ^ (1 / wshape))
  • req_durations_in_ms = pmin(30000, pmax(10, ceiling(rweibull(nreqs, wshape, wlambda))))

This video is processing – it'll appear automatically when it's done.

If that seems like an outrageous figure, recall the birthday paradox, which shows that you only need to get 23 people in a room before there’s a 50% chance that two of them will share a birthday. The birthday paradox is called a paradox because nobody quite believes the number’s going to be that small; our minds vastly underestimate the probability of hash collisions.

See also the below graph, which plots the probability of all dynos being open (i.e., no collisions/queuing) given a certain number of simultaneous requests. This is with a naive router. (With an intelligent router, of course, this would look like a flat line at 1.0, since you’d never having any queuing, no matter how many requests you had.)

Here’s another “intuition pump.” Imagine you’re playing a game of Russian roulette. You’re holding a revolver with n chambers. All but one are loaded.

Before you put the barrel to your temple and pull the trigger, you’re given a choice between two methods for spinning the cylinder: (A) spin until you reach the first empty chamber, or (B) spin randomly. Which do you choose?

Unless you have a death wish, you choose A. Of course you choose A. And the larger n gets, the more you want to choose A, because as n grows, so too does the probability that a random spin will land on a bullet.

The analogy back to Heroku’s two routing regimes should be clear.

This video is processing – it'll appear automatically when it's done.

This video is processing – it'll appear automatically when it's done.

This video is processing – it'll appear automatically when it's done.

Whole Foods saw a big improvement in wait times at its busiest stores when it implemented a checkout line management system exactly analogous to Heroku’s old intelligent routing scheme.

This video is processing – it'll appear automatically when it's done.