I'm a queueing theory researcher, just completing my PhD. My advisor, Mor Harchol-Balter, wrote one of the most popular textbooks in the area: http://performancemodeling.org/. If anyone has questions about the field, please ask, I'd love to answer any questions.
That really is a great book, and very approachable! All the examples are practical, results derived clearly, and a nice build up.
Here's my question for you: queue theory is a nice tool, but most of the classic results are about systems (like M/M/c) that don't match the real world in important ways. Are there good rules of thumb for thinking about how different changes (e.g. burstier than Poisson, seasonality, constrained queue lengths, etc) map to these results?
Obviously, simulation is a powerful tool for more general systems, but being able to reason about effects quickly is super useful.
For thinking about bursty arrivals, a good rule of thumb is to look at the variance of the inter-arrival times. The key number is the variance of interarrival times divided by the mean interarrival time squared. The waiting time in a system with bursty arrivals will roughly be larger than the M/M/c by this multiplicative factor. Kingman's formula is the equivalent for the single-server setting: https://en.wikipedia.org/wiki/Kingman%27s_formula
For seasonality, if the arrival rates fluctuate over a long time period relative to the typical waiting time, it makes sense to just do separate calculations for the different conditions you experience. If the fluctuation is very fast, just use the average arrival rate.
For constrained queue lengths, there are a lot of theoretical results in this area, such as the M/M/c/c model: https://en.wikipedia.org/wiki/M/M/c_queue. The second "c" refers to the buffer size.
> For seasonality, if the arrival rates fluctuate over a long time period relative to the typical waiting time, it makes sense to just do separate calculations for the different conditions you experience. If the fluctuation is very fast, just use the average arrival rate.
Thanks, that makes sense. More quantitatively, about where would get set the bar on "very fast"? Is it ~1x the mean interarrival time, or ~1 million x?
By the way, I really enjoyed your "Nudge" paper from last year. The result about FCFS was very surprising to me!
There's a transition zone from "fast" to "slow" fluctuations around the mean waiting time that's more complicated and is an area of active research. If the fluctuations are 5x below the mean waiting time, I'd guess the effects of fluctuation will be gone.
One more: my understanding is that in Nudge I need to know processing time, but in FCFS I don't. How sensitive is your optimality result to errors in processing time estimates (I don't recall this being covered in the paper, but if it is feel free to tell me).
In the cloud services and databases settings, we seldom have accurate processing time estimates until we're quite far down processing a request (post-auth, post-parse, at least, but also for databases post-query-plan).
The only thing we use processing time for in the Nudge paper is to classify jobs as "Large", "Small" or "Other". If instead of exact processing times, we had estimates, the result would still work as long as a job that was estimated to large was typically longer than a job estimated to be small. So Nudge totally works in these more realistic settings.
If the estimates were super noisy, you might be better off using Nudge very sparingly, only when you're more confident about the relative sizes.
I had actually bought Mod’s book, and she’s absolutely one of the best authors in the field. She takes the notions of giants like Moshe Haviv and Gallagher and translate it in a way that is super intuitive and simple. She’s one of my role models in academia. Pass her my greetings. I intend to teach a course based on her textbook after I’m done with my PhD (but mostly IEEE articles here, so not real science like ACM’s SIGs. Wish my PhD was in the awesome topics you guys are working on. I see the papers of Mor’s and Ziv Scully and the whole lab, really awesome work! Kudos to you guys. I won’t hide my envy of having Mor as a supervisor :D)
what are the most common types of queues for large data center applications (eg. Stateless app that handles rpcs, running on N machines). Are they mostly M/M/N? For example, is memcached different than nginx, or Postgres?
While an M/M/n is a good starting point, there are several extra important features to capture.
First, many of these applications have a load-balancing step, where arriving jobs are dispatched to queues at each of the machines. The performance will depend on how this load-balancing is done.
Second, some applications will parallelize across many cores or machine, while others will run each job on a single core or machine. This obviously has major implications for performance.
Third, your application may have variance in interarrival times or in job completion lengths. This is also important to measure and incorporate into a model. Something like Kingman's formula can be useful: https://en.wikipedia.org/wiki/Kingman%27s_formula
Here's my website, if you want to see more of what I do: https://isaacg1.github.io/