This was often egged on by Heroku. See this ticket between me and a Heroku engineer from over a year ago in which the engineer advised me to upgrade my database plan to solve my H12 problems (which were actually caused by in-dyno queuing which at the time I thought was impossible):

(click for bigger)

This video is processing – it'll appear automatically when it's done.

This video is processing – it'll appear automatically when it's done.

Read it yourself in the full transcript of our email chain:

(Click for bigger)

This video is processing – it'll appear automatically when it's done.

You can’t access https://help.heroku.com/tickets/75238 unless you work at Heroku, but here’s a screenshot of the whole thing

(Click for bigger)

This video is processing – it'll appear automatically when it's done.

This video is processing – it'll appear automatically when it's done.

We made it look this way by patching New Relic. You can do this too if you install the heroku-true-relic gem that we wrote

This video is processing – it'll appear automatically when it's done.

Oren was the author of Heroku’s first apology blog post

This video is processing – it'll appear automatically when it's done.

Download heroku-true-relic today!

These new accurate queue numbers confirm the results of our simulations: we are currently running 250 dynos (monthly bill: $27,000) with an average throughput of ~11000 requests per minute

A simulation with those numbers estimates that the average queue time should be around 290ms, which is very close to the 324ms average New Relic now reports. We don’t have a ton of data yet with the accurate request queueing, but that’s pretty close!

This video is processing – it'll appear automatically when it's done.

Justin George
February 18th, 2013

Used to work on the New Relic Agent, this is not entirely a correct solution. Many machines on AWS experience clock skew, leading to greater (or negative) queue time length — that’s what the queue wait parameter was a work around for.

The correct solution would be to have the parameter added at the edge of the dyno, not at the beginning in the front-end routing framework. This is not a user-accessible area within heroku, however.

Just be aware that your queue time may be (perhaps massively) over or underreported using this method.

awarner's photo

276

Editor
Moderator
February 18th, 2013

Great point! This is far from a perfect solution, but it’s the best we’ve got now. It seems roughly accurate in the aggregate, at least for us, based on http://rapgenius.com/1506509. We’re continuing this discussion over on github for anybody who is curious — https://github.com/RapGenius/heroku-true-relic/issues/1.

Our main goal is to get people the right information, so that they can work on optimizing the right code paths in their app. Thanks Justin!

Add a comment

Read more about the simulations in our original article

This video is processing – it'll appear automatically when it's done.

Read the thread yourself

Tim also wrote a blog post documenting the experience:

The admin section of the app I recently moved over to Heroku is used daily by 20 or so employees. Their work flow has them making a few longer running requests to the app for report generation, sending emails, and file uploads. Most of these requests don’t take longer then 5-10 seconds and that was never a problem, but now it is. If the app has 5 dynos and one request takes 15 seconds, in the first second 20% of the requests to the app will have a 15 second delay. The next second, 20% of the apps request will have a 14 second delay and so on. The other 4 dynos may be available, but that one dyno will have a large and growing backlog. A simple request to the front page of the site that should take ~200ms could take over 15s.

At this point I have 3 options if I want to remain on Heroku. Optimize these report generators to the point they all take less than 1s. (easier said then done). The request could send the report to Delayed::Job, which saves the report output to S3. (Introduces more lag for the employee). Duplicate the app on Heroku and send all admin requests to this second app that the public never hits.

Heroku is a great service and the purpose of this post is not to speak bad about them, but to highlight the current backlog queue situation and provide anyone else an explanation if they are researching the same strange behavior. Though I do hope this may encourage Heroku to update their documentation and focus on getting a new backlog queue in place.

This video is processing – it'll appear automatically when it's done.