Brief summary
Our goal is to populate a table whose rows represent the lifecycle of every web request within a five-minute window. There are four columns: the request’s start time, its duration, the index of the dyno it was assigned to, and the time it spent in a dyno queue. All times are given in milliseconds.
results = matrix(c(start_times,
req_durations_in_ms,
dyno_assignments,
rep(0, total_requests)), nrow = total_requests, ncol = 4)
To fill out the first column – the list of start times – we imagined that in each millisecond of the simulation, a new request will spawn with probability determined by a Poisson process. More than one request can be spawned per millisecond. The idea is to have 9,000 requests per minute on average. (For simplicity you could get away with distributing the requests uniformly, but a Poisson distribution is truer to life.)
reqs_starting = rpois(simulation_length_in_ms, reqs_per_ms)
The second column we also pre-calculated. We sampled request times from real data given to us by Heroku, summarized in the table in the main text.
(Note that these times do not include in-dyno queuing, just the amount of time actually spent by the app processing each request. The whole goal of the simulation is to back out the queue times.)
rq = read.table("~/Downloads/request_times.txt", header=FALSE)$V1
req_durations_in_ms = sample(rq, total_requests, TRUE)
The last two columns will be filled out by the main loop of the program. That is, for each request we’re trying to (a) assign it to a dyno and (b) figure out how long it will queue.
In the naive routing regime, assigning a dyno is trivial: we just choose one randomly. In the intelligent regime, we ask each dyno when it’s going to be next available, and choose the dyno with the best (soonest) answer. Modulo some processing time (<20ms), this is equivalent to buffering requests at the router and dispatching them only when a dyno frees up.
for(i in 1:nrow(results)) {
row = results[i,]
st = row["start_time"]
duration = row["request_duration"]
if (router_mode == "naive") {
dyno = row["dyno"]
} else {
dyno = which.min(dyno_next_available)
}
}
To calculate time spent queuing, we simply subtract our chosen start time – the time that this request’s assigned dyno will become available – from the current millisecond. And we update each dyno’s “next available” time to account for the time required to service the current request. Rinse and repeat until we’ve run out of requests.
queue_time = dyno_next_available[dyno] – st
results[i, "time_in_queue"] = queue_time
dyno_next_available[dyno] = st + queue_time + duration
When we’re done, we have a large table of requests, and for each one we know whether and for how long it queued.
(Compare this style of simulation to, say, running a bunch of Ruby threads that sleep
to mimic dynos processing a request. Here, once we have our distribution of request times, and request start times, our calculations are precise and precisely replicable. We treat each dyno as a vector of milliseconds and painstakingly figure out whether it will be processing or not at each tick. “Queues” of requests are represented by adjacent strings of “yes I’m working” marks in a dyno’s lifetime (its list of 1ms ticks). At any moment of the program’s execution we can interrogate each dyno to see precisely which request it’s serving, which are queued, etc. This is what allows us to make the graphs you see below, and gives us confidence in the correctness of the results.)
101