https://blog.danslimmon.com/2016/08/26/the-most-important-thing-to-understand-about-queues/ Skip to content Dan Slimmon SRE at Hashicorp Menu Search * About Search for: [ ] Search Concert line (Felix Pagaimo via Flickr) The most important thing to understand about queues On 2016/08/262022/03/08 By Dan SlimmonIn Statistics and probability You only need to learn a little bit of queueing theory before you start getting that ecstatic "everything is connected!" high that good math always evokes. So many damn things follow the same set of abstract rules. Queueing theory lets you reason effectively about an enormous class of diverse systems, all with a tiny number of theorems. I want to share with you the most important fundamental fact I have learned about queues. It's counterintuitive, but once you understand it, you'll have deeper insight into the behavior not just of CPUs and database thread pools, but also grocery store checkout lines, ticket queues, highways - really just a mind-blowing collection of systems. Okay. Here's the Important Thing: As you approach maximum throughput, average queue size - and therefore average wait time - approaches infinity. "Wat," you say? Let's break it down. Queues A queueing is exactly what you think it is: a processor that processes tasks whenever they're available, and a queue that holds tasks until the processor is ready to process them. When a task is processed, it leaves the system. queue_basicA queueing system with a single queue and a single processor. Tasks (yellow circles) arrive at random intervals and take different amounts of time to process. The Important Thing I stated above applies to a specific type of queue called an M/M/1/[?] queue. That blob of letters and numbers and symbols is a description in Kendall's notation. I won't go into the M /M/1 part, as it doesn't really end up affecting the Important Thing I'm telling you about. What does matter, however, is that '[?]'. The '[?]' in "M/M/1/[?]" means that the queue is unbounded. There's no limit to the number of tasks that can be waiting in the queue at any given time. Like most infinities, this doesn't reflect any real-world situation. It would be like a grocery store with infinite waiting space, or an I /O scheduler that can queue infinitely many operations. But really, all we're saying with this '[?]' is that the system under study never hits its maximum occupancy. Capacity The processor (or processors) of a queueing system have a given total capacity. Capacity is the number of tasks per unit time that can be processed. For an I/O scheduler, the capacity might be measured in IOPS (I/O operations per second). For a ticket queue, it might be measured in tickets per sprint. Consider a simple queueing system with a single processor, like the one depicted in the GIF above. If, on average, 50% of the system's capacity is in use, that means that the processor is working on a task 50% of the time. At 0% utilization, no tasks are ever processed. At 100% utilization, the processor is always busy. Let's think about 100% utilization a bit more. Imagine a server with all 4 of its CPUs constantly crunching numbers, or a dev team that's working on high-priority tickets every hour of the day. Or a devOp who's so busy putting out fires that he never has time to blog. When a system has steady-state 100% utilization, its queue will grow to infinity. It might shrink over short time scales, but over a sufficiently long time it will grow without bound. Why is that? If the processor is always busy, that means there's never even an instant when a newly arriving task can be assigned immediately to the processor. That means that, whenever a task finishes processing, there must always be a task waiting in the queue; otherwise the processor would have an idle moment. And by induction, we can show that a system that's forever at 100% utilization will exceed any finite queue size you care to pick: no_good_0No good. Processor idle, so utilization can't be 100%. no_good_1.pngNo good. As soon as the task finishes, we're back to the above picture. no_good_2.pngNo good. As soon as the current task finishes, we're back to the above picture, which we've already decided is no good. no_good_3.pngYou get the idea. So basically, the statements "average capacity utilization is 100%" and "the queue size is growing without bound" are equivalent. And remember: by Little's Law, we know that average service time is directly proportional to queue size. That means that a queueing system with 100% average capacity utilization will have wait times that grow without bound. If all the developers on a team are constantly working on critical tickets, then the time it takes to complete tickets in the queue will approach infinity. If all the CPUs on a server are constantly grinding, load average will climb and climb. And so on. Now you may be saying to yourself, "100% capacity utilization is a purely theoretical construct. There's always some amount of idle capacity, so queues will never grow toward infinity forever." And you're right. But you might be surprised at how quickly things start to look ugly as you get closer to maximum throughput. What about 90%? The problems start well before you get to 100% utilization. Why? Sometimes a bunch of tasks will just happen to show up at the same time. These tasks will get queued up. If there's not much capacity available to crunch through that backlog, then it'll probably still be around when the next random bunch of tasks show up. The queue will just keep getting longer until there happens to be along enough period of low activity to clear it out. I ran a qsim simulation to illustrate this point. It simulates a single queue that can process, on average, one task per second. I ran the simulation for a range of arrival rates to get different average capacity usages. (For the wonks: I used a Poisson arrival process and exponentially distributed processing times). Here's the plot: without_extreme Notice how things start to get a little crazy around 80% utilization. By the time you reach 96% utilization (the rightmost point on the plot), average wait times are already 20 seconds. And what if you go further and run at 99.8% utilization? with_extreme Check out that Y axis. Crazy. You don't want any part of that. This is why overworked engineering teams get into death spirals. It's also why, when you finally log in to that server that's been slow as a dog, you'll sometimes find that it has a load average of like 500. Queues are everywhere. What can be done about it To reiterate: As you approach maximum throughput, average queue size - and therefore average wait time - approaches infinity. What's the solution, then? We only have so many variables to work with in a queueing system. Any solution for the exploding-queue-size problem is going to involve some combination of these three approaches: 1. Increase capacity 2. Decrease demand 3. Set an upper bound for the queue size I like number 3. It forces you to acknowledge the fact that queue size is always finite, and to think through the failure modes that a full queue will engender. Stay tuned for a forthcoming blog post in which I'll examine some of the common methods for dealing with this property of queueing systems. Until then, I hope I've helped you understand a very important truth about many of the systems you encounter on a daily basis. Share this: * Twitter * Facebook * Like this: Like Loading... Post navigation Previous 3 Things That Make Encryption Easier Next A Moral Thought Experiment That Breaks My Brain 13 thoughts on "The most important thing to understand about queues" 1. Pingback: The Latency/Throughput Tradeoff: Why Fast Services Are Slow And Vice Versa - Dan Slimmon 2. Pingback: Latency- and Throughput-Optimized Clusters Under Load - Dan Slimmon 3. [9c444cd] Antoine I am not certain of how rigourous is this paragraph: "If the processor is always busy, that means there's never even an instant when a newly arriving task can be assigned immediately to the processor. That means that, whenever a task finishes processing, there must always be a task waiting in the queue; otherwise the processor would have an idle moment. And by induction, we can show that a system that's forever at 100% utilization will exceed any finite queue size you care to pick:" if the queue processes x elements per second, and there is initially 1 element in the queue plus x elements per second, it seems to me that the queue won't be infinite, but the processor will still have a 100% utilization. 2021/01/08 at 23:01 Reply 1. [bb447f0] Dan Slimmon You're correct that it's not stated rigorously. What I mean by "will exceed any finite queue size you care to pick" is that the queue size will increase without bound. Given enough time, it will exceed any arbitrary size. 2021/03/14 at 23:03 Reply 2. [072165b] Arthur it is not only lacking rigour, it is simply wrong. What prevents the system from boucing between pictures 3 and 4, while mantaining 100% utilization? the author rushed away from this possibility to "prove" his point 2021/04/23 at 12:26 Reply 1. [bb447f0] Dan Slimmon > What prevents the system from boucing between pictures 3 and 4, while mantaining 100% utilization? What's to prevent the system from bouncing between "1 task in processor & 1 task in queue" and "1 task in processor & 2 tasks in queue" while maintaining 100% utilization? Nothing! That could totally happen in a queueing system. However, the arrival process would need to be tuned quite precisely to the processing rate. You would need to watch the status of the processor and, when a task finishes, only then insert a new task into the queue. But this implies a shell game: if you always have a task ready to put into the queue when it needs to be padded back out, then isn't that task in a sense already "queued"? Instead, in queueing theory we usually assume a random arrival process: sometimes a minute may pass between arrivals into the queue; sometimes only a second. So the system can't bounce for arbitrarily long between 1-in-queue and 2-in-queue states. Eventually, one of two things will randomly occur: 1. From the 1-in-queue state, the active task finishes processing before a new task arrives, bringing queue size to 0. 2. From the 2-in-queue state, a new task arrives in the queue before the active task finishes, causing the queue size to grow to 3. 2021/04/23 at 13:45 Reply 1. [072165b] Arthur > So the system can't bounce for arbitrarily long between 1-in-queue and 2-in-queue states Now that addresses the unbounded size. The example of the article excluded this possibility of bouncing between two states for an extended time, and seemed to exchange "unbounded size" for "bounded to grow", which is totally not the case. 2021/04/23 at 17:00 Reply 1. [072165b] Arthur bound to grow* 2021/04/23 at 17:02 Reply 4. Pingback: The most important thing to understand about queues (2016) - The web development company Lzo Media - Senior Backend Developer 5. [4e3bb13] Graham I like the analogy of an overflowing sink https://ferd.ca/queues-don-t-fix-overload.html While backpressure is important, It's also useful to understand what is in the queue. Some tasks don't mind if they are completed an hour late, whereas others might as well be discarded if queued for too long (e.g. if the client has already timed out) 2022/03/08 at 13:40 Reply 6. [8a63ee0] Dan some about of idle capacity -> some amout of idle capacity 2022/03/08 at 13:44 Reply 1. [bb447f0] Dan Slimmon Fixed. Thank you! 2022/03/08 at 20:33 Reply 7. Pingback: The most important thing to understand about queues - blog.wuyuansheng.com Leave a Reply Cancel reply Enter your comment here... [ ] Fill in your details below or click an icon to log in: * * * * Gravatar Email (required) (Address never made public) [ ] Name (required) [ ] Website [ ] WordPress.com Logo You are commenting using your WordPress.com account. ( Log Out / Change ) Google photo You are commenting using your Google account. ( Log Out / Change ) Twitter picture You are commenting using your Twitter account. ( Log Out / Change ) Facebook photo You are commenting using your Facebook account. ( Log Out / Change ) Cancel Connecting to %s [ ] Notify me of new comments via email. [ ] Notify me of new posts via email. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] Search for: [ ] Search Top Posts * The most important thing to understand about queues * When efficiency hurts more than it helps * Do-nothing scripting: the key to gradual automation * A Moral Thought Experiment That Breaks My Brain * Takeaways: managing attention with stacks Archives * February 2022 * December 2021 * March 2021 * August 2019 * July 2019 * June 2019 * May 2019 * March 2019 * February 2019 * October 2017 * July 2017 * June 2017 * April 2017 * September 2016 * August 2016 * June 2016 * March 2016 * December 2015 * October 2015 * July 2015 * June 2015 * September 2014 * August 2014 * July 2014 * May 2014 * October 2013 * September 2013 * June 2013 * May 2013 * April 2013 * March 2013 * December 2012 * November 2012 * August 2012 * July 2012 Categories * Monitoring * Problem-Solving * Statistics and probability * Uncategorized * Workflow Meta * Register * Log in * Entries feed * Comments feed * WordPress.com Blog at WordPress.com. * Follow Following + [12afdd] Dan Slimmon Join 26 other followers [ ] Sign me up + Already have a WordPress.com account? Log in now. * + [12afdd] Dan Slimmon + Customize + Follow Following + Sign up + Log in + Copy shortlink + Report this content + View post in Reader + Manage subscriptions + Collapse this bar Loading Comments... Write a Comment... [ ] Email (Required) [ ] Name (Required) [ ] Website [ ] [Post Comment] %d bloggers like this: [b]