[HN Gopher] The most important thing to understand about queues ...
       ___________________________________________________________________
        
       The most important thing to understand about queues (2016)
        
       Author : dhotson
       Score  : 161 points
       Date   : 2022-03-08 13:08 UTC (9 hours ago)
        
 (HTM) web link (blog.danslimmon.com)
 (TXT) w3m dump (blog.danslimmon.com)
        
       | ncmncm wrote:
       | Adding just a little capacity makes need for queue capacity
       | collapse.
       | 
       | But if rates are not well specified, what "a little" means is
       | also not, and rejecting additions to the queue is the only
       | answer.
        
       | mwcampbell wrote:
       | > but once you understand it, you'll have deeper insight into the
       | behavior not just of CPUs and database thread pools, but also
       | grocery store checkout lines, ticket queues, highways - really
       | just a mind-blowing collection of systems.
       | 
       | ...and karaoke singer rotations. In 6 years of frequently singing
       | karaoke at bars, I've never known a host to turn away new
       | singers, unless it's almost time to end the show. So you could
       | say the queue is unbounded, with predictable results for the few
       | singers who show up early and then get frustrated with the ever-
       | increasing time until their next turn as more singers arrive. I
       | don't know what the best solution is.
        
         | bombcar wrote:
         | That's the same load shedding as grocery stores use - if
         | everything gets too crowded people start leaving (not
         | queueing).
         | 
         | Now that may actually be suboptimal for the business (if they
         | can only let 10 people in they'd rather let the 10 who will
         | spend the most, say) which is why things like reservations, etc
         | come into play. I wonder if restaurants that do both
         | reservations and a queue see one group pay more on average ...
        
           | claytonjy wrote:
           | I've never managed a restaurant, but thinking about my own
           | habits I'd bet that a restaurant that takes reservations but
           | also some walk-ins almost surely makes more on the
           | reservation guests. The meal is the main attraction for them,
           | often a special occasion, while walk-ins are more likely to
           | spread their money over other businesses (shopping before,
           | drinks elsewhere after, etc.). I bet group size is also
           | higher for reservations; most people know a reservation is
           | all-but-required for larger groups.
        
           | Animats wrote:
           | _That 's the same load shedding as grocery stores use - if
           | everything gets too crowded people start leaving (not
           | queueing)._
           | 
           | Yes. That's called the "rejection rate".
           | 
           | Unless, some of the time, queue length is zero, you will have
           | a nonzero rejection rate. This is worth bringing up with
           | store managers who want to run understaffed checkouts. One of
           | the things retail consultants do is point out how sales are
           | being lost that way, both in customers who leave and
           | customers who never come back.
           | 
           | Much of my early work on network congestion was based on
           | that. In the early days of networking, everyone was thinking
           | Poisson arrivals, where arrivals are unaffected by queue
           | lengths. This is partly because the original analysis for the
           | ARPANET, by Leonard Klienrock, was done that way. It was done
           | that way because his PhD thesis was based on analyzing
           | Western Union Plan 55-A, which handled telegrams. (Think of
           | Plan 55-A as a network of Sendmail servers, but with queues
           | made from paper tape punches feeding paper tape readers. The
           | queue was a bin between punch and reader.)[1], at 7:00. Queue
           | length was invisible to people sending telegrams, so senders
           | did behave like Poisson arrivals. That's still true of email
           | today.
           | 
           | The IP layer is open loop with rejection. Transport protocols
           | such as TCP are closed loop systems. The two have to be
           | considered together. Everybody gets this now, but it was a
           | radical idea in 1985.[2]
           | 
           | [1] https://archive.org/details/Telegram1956
           | 
           | [2] https://datatracker.ietf.org/doc/rfc970/
        
             | bombcar wrote:
             | This is a big part of self checkout - it reduces rejection
             | rate.
             | 
             | And that can be done with queues in general, if alternate
             | paths are available at some point intelligent operators
             | will change path (this can be problematic also, when the
             | option to change isn't available - I'd normally like to
             | connect to a server near me but if it is overloaded give me
             | the option to take a further away one).
        
         | testudovictoria wrote:
         | Focus on the bottleneck. There is a finite amount of time.
         | Increase it to allow more participants.
         | 
         | There's also putting restrictions on the input. Have a maximum
         | song length.
         | 
         | Otherwise, require additional constraints before queueing like
         | making it computationally expensive or having a priority system
         | based on criteria. Charge a cover (with increasing costs) or
         | have a membership club that gives more/better access.
        
       | galaxyLogic wrote:
       | Here's another article about the same issue I think
       | https://ferd.ca/queues-don-t-fix-overload.html .
       | 
       | Solution: "Step 1. Identify the bottleneck. Step 2: ask the
       | bottleneck for permission to pile more data in"
        
         | hammock wrote:
         | Love that solution. It's plainly unfair but in my experience so
         | critical to getting things done in business, even setting aside
         | engineering.
        
         | bombcar wrote:
         | This is wonderful and shows the problem very clearly.
         | 
         |  _If your system doesn 't have a way to shed load, it will
         | eventually overload_.
         | 
         | The problem for many is that when it does finally shed load,
         | the load shedder gets blamed (and turned off). See how many
         | people look for ways to turn off the OOMKiller and how few look
         | to figure out how to get more RAM.
        
           | dcow wrote:
           | This is also wonderful (emphasis mine):
           | 
           | > All of a sudden, the buffers, queues, whatever, can't deal
           | with it anymore. You're in a critical state where you can see
           | smoke rising from your servers, _or if in the cloud, things
           | are as bad as usual, but more!_
        
         | cecilpl2 wrote:
         | This is a fantastic article, thank you for sharing!
        
         | prometheus76 wrote:
         | I work in a custom fabrication environment, and this solution
         | doesn't really apply, because what happens in a dynamic system
         | is that the bottleneck shifts all around the shop dynamically.
         | It's never just one operation that is the constant bottleneck.
        
           | owenmarshall wrote:
           | How doesn't it? I've lived in a world of coordinated
           | microservices before, where a nice linear progression of
           | queues only happens at the atomic level; when you zoom out
           | you've got a digraph full of cycles, nodes that come & go in
           | response to various events & schedules, and the transitory
           | nature of those services entering could cause a smooth-
           | sailing pipeline to suddenly seize up because some downstream
           | dependency is servicing higher-priority requests.
           | 
           | The _only_ way we could stay sane was aggressively following
           | Ferd 's rule - nobody got to surface an API unless it
           | surfaced HTTP 429s on overload[1], and if you called a
           | downstream API without checking for and responding to a 429,
           | _tough noogies_ , your requests got dropped and you had to
           | figure out what to do to fix it.
           | 
           | Funny enough, at the time I did software for a manufacturing
           | facility that did high-throughput manufacturing for many
           | varying products. I was lucky enough to do a "gemba walk"
           | where they made us engineers put on hard-hats and eye
           | protection and reminded us who actually made the company
           | money, and by god those guys were doing exactly what we did:
           | they had JIT delivery of raw materials / sub-components to
           | assembly lines, and the delivery of resources to the line was
           | always constrained by a fixed buffer, quite literally "how
           | many units you could put on a shelf". You couldn't just toss
           | more product at a line that wasn't ready for it, it would
           | literally _not fit_.
           | 
           | Sure, they used different terms than we did - muda and muri
           | and value-stream mapping - but the outcome was very similar.
           | Their version of the NOC was a team of managers who watched
           | the cycle time of each line, how long it took material to be
           | transported, which areas were bottlenecking and which were at
           | risk of running into resource starvation, and they had the
           | ability to balance the hot spots by bursting product in,
           | caching upstream components into hold docks, releasing held
           | work product to downstream lines, or even turning on a flex
           | line to work around longer-term constraints. They rarely had
           | fixed, static bottlenecks, because they optimized the life
           | out of them. Their bottlenecks arose when a shipment didn't
           | arrive (and you better believe those were tracked just as
           | aggressively), a machine broke down, an operator got sick,
           | things like that.
           | 
           | But at the end of the day? Same principle: downstream
           | components would only accept as much as they could process
           | from upstream components; you asked before you pushed.
           | 
           | [1]: and if you said "I swear, you can't overload this
           | service, we don't need to send 429s" a swarm of senior
           | engineers descended upon you and only left until you
           | convinced them they were wrong, or - way more likely - they
           | convinced you.
        
             | prometheus76 wrote:
             | Manufacturing and fabrication are two different worlds.
             | Manufacturing is all about reducing/eliminating variance,
             | whereas fabrication (high-mix, low volume) is all about
             | absorbing variance. Most of the things we build have never
             | been built before, and most of them will never be built
             | again. So the estimates on how long it will take to build
             | one is just that: an estimate. And the process of
             | fabricating it is affected by MANY variables, but I'll just
             | list a few. Materials with long lead times. Materials not
             | arriving on time. Wrong materials sent. A pareto
             | distribution of time it will take to finish an operation
             | depending on which employee I give it to. What else is in
             | the queue for an operation. Whether customers put a hold on
             | the part. And so on, and so on.
             | 
             | Every manufacturing person I've met thinks we're just
             | idiots. Then they come to our shop and it takes them about
             | 1-2 years of trying to "we just need to implement Lean,
             | guys!" before they realize that fabrication has very
             | little, if anything, in common with manufacturing and the
             | only lean principle that really applies is keeping your
             | work area tidy and your tools organized.
             | 
             | Our business model is built around our ability to absorb
             | and manage variance, rather than eliminate it.
        
               | owenmarshall wrote:
               | Very interesting - thanks!
        
               | galaxyLogic wrote:
               | Would this apply to "Software Fabrication" as well?
        
               | prometheus76 wrote:
               | Yes, we have much more in common with software
               | development than we do with manufacturing. We have more
               | stages of production than software, and we have a
               | challenging situation of shared resources between
               | projects (we are generally working anywhere from 25-50
               | projects at once in our shop using shared resources), but
               | the variance factor of 1) not knowing exactly how long it
               | will take to make something because it's never been made
               | before, and 2) different employees taking vastly
               | different amounts of time to accomplish the same task are
               | both shared challenges that we face along with software
               | development.
        
       | dcow wrote:
       | I like the author's solution (always use bounded queues) because
       | it usually forces you to confront back-pressure up front. It
       | doesn't matter how big your queue is, your system must be
       | designed to work at peak throughput essentially without a queue,
       | and thus your system must handle the possibility that an event
       | fails to be processed and must be retried/dropped. Queues _only_
       | serve to mitigate bursts.
       | 
       | It's annoying but also understandable how often people just dump
       | stuff into an unbounded queue and punt on making sure things work
       | until the system is falling down. Often queues are more of a tool
       | developers and operators can use to navigate unknown performance
       | characteristics and scale later than they are a requirement for
       | the actual system itself.
        
         | mwcampbell wrote:
         | I wonder which combination of HTTP reverse proxy (e.g. nginx,
         | cloud load balancer) and application server (e.g. Python WSGI
         | server) is best at using bounded queues and signaling
         | backpressure (e.g. HTTP 503) rather than just letting requests
         | pile up.
        
           | slrz wrote:
           | I guess that basically everything you'd consider for
           | production use allows you to configure queue sizes and
           | related behaviour however you fancy.
        
         | sdevonoes wrote:
         | > It's annoying but also understandable how often people just
         | dump stuff into an unbounded queue and punt on making sure
         | things work until the system is falling down.
         | 
         | It's annoying if it is done by the infrastructure team (I mean,
         | they should know the details of the queue they are managing).
         | It's understandable if it is done by product developers (they
         | are more into "inheritance vs composition" kind of things).
        
           | capn_duck wrote:
           | I've never met an application developer who was unaware of
           | what a "queue" was or the problems they purport to solve.
           | Pretty sure stacks/queues are among the first "data
           | structures" that students create, which inevitably leads to
           | the question of, "what to do when full?" i.e. circular
           | queues, double ended queues, priority queues. I know that the
           | enterprise grade queuing systems we're talking about are a
           | lot more involved than that, but to suggest that developers
           | don't grok queues is pretty disingenuous. And the
           | implications of rate in > rate out is pretty obvious for
           | anyone that's ever had a clogged sink.
        
           | dcow wrote:
           | I've seen plenty of product engineers not understand this
           | fundamental aspect of queues and just add them because it
           | "felt right" or "for scale" or something silly...
        
             | sparsely wrote:
             | There's a lot of "Let's add a queue to deal with scaling
             | issues" type thoughts which don't really work in practice.
             | Like 95% of the time the system works better without one.
        
               | treis wrote:
               | The joke I like is:
               | 
               | You have a problem so you implement a queue. Now you have
               | two problems.
               | 
               | It succiently illustrates the problem because you should
               | build your application to account for the queue being
               | down. So you still have your original problem of what do
               | I do if I can't process everything I need to.
        
         | jorangreef wrote:
         | > I like the author's solution (always use bounded queues)
         | because it usually forces you to confront back-pressure up
         | front.
         | 
         | I work on a system that can process a million financial
         | transactions a second, and this principle of "always use
         | bounded queues" has definitely made an incredible impact on the
         | design.
         | 
         | We use bounded queues everywhere -- and everything is also
         | static allocation: memory, network resources, disk too, even
         | all the messages that might possibly be needed to execute
         | different variants of the distributed consensus protocol. It's
         | really nice coding on it, because all the limits are clearly
         | defined.
         | 
         | It forces a little more upfront thinking. This thinking can be
         | a little difficult for a day or two, but the net result is a
         | better design.
         | 
         | I love Little's law.
        
         | konha wrote:
         | > and thus your system must handle the possibility that an
         | event fails to be processed and must be retried/dropped
         | 
         | Isn't a retry just extending the queue to the caller?
        
           | ggrrhh_ta wrote:
           | That's the best scenario; regardless of whether the event
           | reaches the destination or not, in most cases (except
           | continuous monitoring of some parameter which admits losses)
           | the source still will keep the information or the way to
           | generate that information, so the amount of state that the
           | queue at the source requires is still being used. But
           | moreover, the source can throttle the generation of events,
           | or it can just decide that that event is stale and can be
           | dropped, or can pack that event with the next events, or can
           | signal other processes about what is happening, or it can
           | choose other path, or maybe it has some out-of-band signaling
           | capability, or....
        
           | dcow wrote:
           | That's why I said "retry/drop". Handling back-pressure means
           | you must propagate it backwards to a point in your program
           | where there is enough context to enable decision to be made
           | on how to handle the failure. Also, a retry can wait some
           | amount of time until the system is less busy (and this can be
           | enforced by some other part if needed). Further, you may have
           | many callers and one central thing processing requests so
           | propagating back-pressure to callers also allows you to push
           | the problem back to a point where there are potentially more
           | resources. In a silly example, if someone spamming your queue
           | starts to experience a sluggish UI or run out of RAM because
           | back-pressure has forced the queue to grow on their machine
           | an not your infrastructure, perhaps they'll get the message
           | and cool it (ideally the program handles this gracefully too
           | and tells the user a maximum has been hit, but even without
           | that back-pressure is useful).
           | 
           | So, yes. But that's the point.
        
           | pornel wrote:
           | Even worse, it multiplies the amount of requests hitting the
           | overloaded processor. You can create a "metastable failure"
           | situation that will prevent an overloaded service from ever
           | recovering.
        
             | toast0 wrote:
             | It helps if your client retries have backoff (but that's
             | not always possible), but you do need to have a way to drop
             | requests much faster than you typically process them and
             | it's nice to have a sensible way to drop requests that are
             | unlikely to be successful. Sometimes you can drop based on
             | time in queue, sometimes you can drop based on percentage,
             | sometimes you can track backend success and drop requests
             | to a failing backend, sometimes you have to just flush the
             | whole queue periodically if overloaded. Facebook has
             | published something about switching from a queue to a stack
             | during heavy load, which can work, although that stack
             | needs bounds, too.
        
             | dahfizz wrote:
             | > Even worse, it multiplies the amount of requests hitting
             | the overloaded processor.
             | 
             | How is that? The queue goes between the "client" and the
             | "server". The client thread does the work of trying to
             | enqueue something, while the server continues to churn
             | through existing requests completely independently.
        
             | sly010 wrote:
             | "backpressure" is an unfortunate term because it's not
             | something the queue does. "back-off" would probably be a
             | better term, since it's the behavior of the producer. The
             | producer has to be aware of it.
        
               | dcow wrote:
               | "backpressure" quite literally refers to a queue blocking
               | when full. Exerting backpressure on a system is
               | absolutely something a queue does. Backing off or
               | "loadshedding" is something the producer does in response
               | to backpressure from a queue.
        
           | im3w1l wrote:
           | Yes but that queue is bounded too. And if you follow the
           | systems backward eventually you end up with a real person.
           | That is then forced to prioritize away less important tasks.
        
         | marcosdumay wrote:
         | > Queues only serve to mitigate bursts.
         | 
         | Well, they also serve to hold your state while you add
         | processing capacity.
         | 
         | If you look at a market queues, when they start to grow,
         | management deviates people from non time sensitive tasks into
         | cashiers. The queues make keeps things working during the
         | change.
         | 
         | Equivalent situations happen on software too.
        
           | dcow wrote:
           | It's two sides of the same coin. In your example management
           | only adds more cashiers if the queues have become
           | unacceptably long (they've tripped a soft limit). The queues
           | are serving to accommodate irregular shopper purchasing
           | patterns because shoppers don't arrive consistently
           | throughout the day. So really they are serving to smooth over
           | bursts. Adding more cashiers is simply a response to queues
           | hitting an unacceptably long length. If management had
           | infinite cashiers and you could add them instantly then you
           | wouldn't need queues. But, and this is especially obvious if
           | you consider the scenario where all registers are staffed,
           | then you still need queues to handle bursts. Costco is a good
           | place to experience this.
        
         | lanstin wrote:
         | This philosophy, all queues have explicit tuned limits with a
         | drop log and increment a metric on queue full, was used
         | thoroughly at AOL in the 90s. It worked very well, and let us
         | use hardware at much higher loadings than is popular currently.
         | Our goal was ~90% CPU at peak (systems I have worked with more
         | recently start to die around fifty percent). Also of course
         | there was a meeting each Friday to look at all queue depths and
         | queue latencies and to see where we needed more capacity coming
         | up. We did have so many subscribers that traffic was fairly
         | smooth over all.
        
       | lliamander wrote:
       | This explains why every JIRA backlog I've seen always grows until
       | it becomes unwieldy.
       | 
       | I wonder if anyone here has experience with a fixed-size backlog
       | for work tickets.
        
         | dcow wrote:
         | Shape Up [1] (from basecamp) rejects the notion of backlogs
         | essentially for this reason. We've leaned into adopting it
         | without using basecamp software specifically and it's pretty
         | refreshing.
         | 
         | [1]: https://basecamp.com/shapeup
        
         | jrockway wrote:
         | What do you do when the queue overflows? Block work on backlog
         | capacity? Drop the new item on the floor?
         | 
         | The problem is that people add things to the backlog during
         | their ordinary work, so as you scale up the number of workers,
         | the pressure on the queue increases.
         | 
         | That said, my experience in doing bug triage is that some large
         | percentage of tickets are never going to be done. Those should
         | just be dropped on the floor.
        
           | dcow wrote:
           | Yeah you drop the old ones. And then the insight is that
           | nobody ever has time to process the backlog anyway so just
           | drop the new ones instead and delete the backlog. If the work
           | is important, it will come back up as a new request during
           | cycle planning.
        
       | AtlasBarfed wrote:
       | backpressure is the mechanism that solves this
       | 
       | or less elegantly, circuit breaking / fail-retry
        
       | __dt__ wrote:
       | Interesting article, what bothers me a bit is
       | 
       | > I won't go into the M/M/1 part, as it doesn't really end up
       | affecting the Important Thing I'm telling you about. What does
       | matter, however, is that '[?]'.
       | 
       | I'm really no expert but if I understand it correctly the `M/M`
       | part defines the distributions of the arrival and service
       | processes, so this definitely is important as others have already
       | mentioned in the comments. E.g. a D/D/1 queue where D stands for
       | deterministic arrival shouldn't suffer from this problem.
       | 
       | This doesn't change the interesting fact the article presents but
       | throwing in technical terms without proper explanation is imo bad
       | style. Either don't mention it (would be better in this case) or
       | explain it and why it is relevant.
       | 
       | This is also a common "mistake" unexperienced people make when
       | writing papers. They mention a fact that is somehow related but
       | is completely irrelevant to the topic and the rest of the paper.
       | I don't want to assume anything but to me this often smells like
       | showing-off.
        
       | bob1029 wrote:
       | For an alternative take on this, read up on the LMAX Disruptor.
       | Of particular note is section 2.5 "The problem of queues."
       | 
       | https://lmax-exchange.github.io/disruptor/disruptor.html
       | 
       | It has a completely opposite pattern: the more you load it the
       | faster it goes.
        
         | gpderetta wrote:
         | > It has a completely opposite pattern: the more you load it
         | the faster it goes.
         | 
         | Not sure what do you mean with that. The article is about the
         | general theoretical properties of (unbounded) queues.
         | 
         | LMAX is a bounded queue, with the quirk that it will drop older
         | messages in favour of newer ones and it assumes that either
         | there is a side channel for recovery or consumer(s) can
         | tolerate message drops.
        
           | bob1029 wrote:
           | You're right, that was a rough take. My presentation of LMAX
           | was to simply provide a different perspective over this
           | space.
           | 
           | What I was trying to get at is the specific memory access
           | patterns, batching effects and other nuances related to the
           | physical construction of the CPU which modulate the actual
           | performance of these things. I think this quote better
           | summarizes what I was trying to convey:
           | 
           | > When consumers are waiting on an advancing cursor sequence
           | in the ring buffer an interesting opportunity arises that is
           | not possible with queues. If the consumer finds the ring
           | buffer cursor has advanced a number of steps since it last
           | checked it can process up to that sequence without getting
           | involved in the concurrency mechanisms. This results in the
           | lagging consumer quickly regaining pace with the producers
           | when the producers burst ahead thus balancing the system.
           | This type of batching increases throughput while reducing and
           | smoothing latency at the same time. Based on our
           | observations, this effect results in a close to constant time
           | for latency regardless of load, up until the memory sub-
           | system is saturated, and then the profile is linear following
           | Little's Law. This is very different to the "J" curve effect
           | on latency we have observed with queues as load increases.
        
       | mopierotti wrote:
       | I'm no expert, but I don't think this is true. Couldn't tasks
       | arrive into the queue at the same rate that they are processed,
       | resulting in a fixed queue size at 100% utilization?
       | 
       | Put another way, in the second "No good" diagram showing one task
       | being worked on with none in the queue, another task could arrive
       | before the current one is finished.
       | 
       | I suspect the counterargument might be that the task arrival rate
       | is not usually that predictable, but even so, I expect that the
       | amount of variance predicts how true this theory is in practice.
        
         | drivebycomment wrote:
         | Classical misunderstanding of queueing. If you have a perfect
         | control over an arrival, you can consider it as already queued.
         | I.e. bank customers or grocery store checkout customers or
         | online http requests for a service all arrive at random
         | interval. You don't have a control over their arrival timing.
        
           | cormacrelf wrote:
           | This is not helped by the language used in the post. This was
           | supposed to be an introduction to queuing, people shouldn't
           | come away with their classic misunderstandings completely
           | intact.
           | 
           | > When a system has steady-state 100% utilization, its queue
           | _will grow to infinity_. (Emph mine.)
           | 
           | This is clearly false, there are clearly counterexamples
           | where the queue does not grow that way. You don't have
           | control over the arrival rate etc but it could just be your
           | lucky day. It would be more accurate to say you can't
           | guarantee that either the queue length or wait time are
           | finite. Author didn't include the words "assume the worst
           | case" or anything like that. I feel like that's a pretty
           | crucial piece of framing that's missing.
        
             | marcosdumay wrote:
             | The author didn't assume the worst case, he just didn't set
             | the timeframe. Every queue with random arrive times that
             | has processors operating at 100% of their capacity is
             | guaranteed to eventually grow indefinitely (as in, you can
             | name any size, it will become larger than it).
             | 
             | But on practice you also can not maintain 100% workload
             | indefinitely either.
        
             | setr wrote:
             | The steady state is like Big-O notation; it's already
             | talking about the edge (as time grows to infinity). A
             | single day deviation doesn't really matter -- your queue
             | might shrink and grow a little day-to-day, but on average
             | it'll grow more than it shrinks, and you're ultimately
             | fucked.
             | 
             | If your utilization looked like 100% on one day, 0% on
             | another, and flipping back and forth (allowing the queue to
             | clear) then by definition your steady state isn't 100%, its
             | more like 50%.
             | 
             | And of course if your time bound is limited, or your
             | potential customers is limited, then naturally everything
             | is limited. But like big-O, the notion is to describe what
             | happens when your inputs grows to infinity
        
         | remram wrote:
         | It does not immediately grow towards infinity, but it is
         | unbounded. Over a sufficiently long period of time, any given
         | queue size will be observed.
         | 
         | Assuming the input and output are just a little bit random, and
         | are only equal _on average_. You will have periods during which
         | the input and output balances out. You will have periods during
         | which the output is faster, and the queue will shrink to 0 (but
         | no lower). And you will have periods where the output is
         | slower, and the queue grows, with no bound.
        
         | crdrost wrote:
         | Yes, that is a very basic point, one basic reason to do queuing
         | is that queues are buffers for uncertainty. If things are
         | perfectly predictable, as you see on some robots, you allocate
         | exactly what you need and no extra.
         | 
         | The uncertainty exists in two places though... It's not just
         | the events coming in but also the time to process each queued
         | event, is another source of uncertainty in the same system that
         | can have the same effect.
         | 
         | The author mentions "M/M/1," this quantifies that there is only
         | one queue-handling process, that events come in with the sort
         | of randomness that raindrops have (no periodicity), and that
         | handling an event does not have any "timeouts" nor any "long
         | tails" where once a handler has been occupied for 3 or 4 _T_
         | (where _T_ is the average time to resolve) then the prediction
         | for how much time is remaining for them either drops to 0
         | (timeout) or escalates dramatically (long tail)... The
         | exponential distribution is used, kind of to be nicely in
         | between these, it is a model where over a time dt the handler
         | is asked to pick a random number with success rate p=dt /T, if
         | they pick the right random number then the event is handled,
         | otherwise a new random number is generated, repeat ad nauseam.
         | 
         | The basic lessons are surprisingly transferable though. So for
         | example traffic on the highway does not have this raindrop sort
         | of randomness because cars kind of repel each other, they space
         | out much more uniformly than you would have predicted on short
         | term scales... The handling time is also nothing like this
         | model of random guessing. Nevertheless, traffic jams still form
         | at high utilization! And they form for a really interesting
         | reason which is that the _n_ handlers--lanes, we call them--
         | have correlations between them. All these queuing models
         | optimistically assume that a handler never stops another
         | handler, "hey have you ever seen anything like this? where can
         | I find that password?" etc.
         | 
         | But in traffic, a car can only lane change from one lane to
         | another if both lanes synchronize speed. With a fixed ability
         | to accelerate, this depends the amount of space between you and
         | the next car, and the amount of space in the lane you want to
         | switch to: as utilization increases both of these spaces drop
         | and synchronization becomes a massive source of inefficiency.
         | 
         | Same happens at a company too, it is possible that an employee
         | might spend more time waiting for meetings, and finding
         | convenient times to schedule them, in order to synchronize two
         | streams of work, than the work itself takes. Can be very
         | frustrating!
        
         | seandavidfisher wrote:
         | The author addressed this in the comments.
         | 
         | > What's to prevent the system from bouncing between "1 task in
         | processor & 1 task in queue" and "1 task in processor & 2 tasks
         | in queue" while maintaining 100% utilization?
         | 
         | > Nothing! That could totally happen in a queueing system.
         | However, the arrival process would need to be tuned quite
         | precisely to the processing rate. You would need to watch the
         | status of the processor and, when a task finishes, only then
         | insert a new task into the queue. But this implies a shell
         | game: if you always have a task ready to put into the queue
         | when it needs to be padded back out, then isn't that task in a
         | sense already "queued"?
         | 
         | > Instead, in queueing theory we usually assume a random
         | arrival process: sometimes a minute may pass between arrivals
         | into the queue; sometimes only a second. So the system can't
         | bounce for arbitrarily long between 1-in-queue and 2-in-queue
         | states. Eventually, one of two things will randomly occur:
         | 
         | > 1. From the 1-in-queue state, the active task finishes
         | processing before a new task arrives, bringing queue size to 0.
         | 
         | > 2. From the 2-in-queue state, a new task arrives in the queue
         | before the active task finishes, causing the queue size to grow
         | to 3.
        
       | kazinator wrote:
       | What the naysayers are missing is the definition of load; read it
       | again: _" [C]apacity is the number of tasks per unit time that
       | can be processed."_
       | 
       | But tasks are variable. One task requires 5 minutes of
       | processing, another one of 35. They arrive randomly. This is also
       | explicitly given, in a caption under one of the diagrams. _" A
       | queueing system with a single queue and a single processor. Tasks
       | (yellow circles) arrive at random intervals and take different
       | amounts of time to process."_
       | 
       | People calling the article wrong may be thinking of capacity as
       | "unit time amount of work"; like when the processor is at full
       | capacity, it's doing one second's worth of work every second.
       | 
       | If we define capacity this way, the problem goes away: the
       | processor just becomes a leaky bucket. So that is to say, if we
       | know exactly how long each task will take, then we can measure
       | the queue size in terms of total number of seconds of work in the
       | queue. And so then, as long as no more than one second's worth of
       | work is being added to the queue per second, it will not grow
       | without bound, just like a leaky bucket that is not being
       | refilled faster than its leak rate.
       | 
       | When capacity is given as a maximum _number_ of tasks per second,
       | there has to be some underlying justification for that, like
       | there is some fixed part to servicing a job such as set up time
       | and clean up time that doesn 't go away even if the job takes
       | next to zero seconds, such that jobs effectively have a built-in
       | minimum duration. If it takes one second to set up a job, and one
       | second to clean up, then the maximum capacity is half a job per
       | second: 1800 jobs per hour and so on. Of course the queue starts
       | to backlog when you approach capacity, because the jobs also
       | require nonzero real work in relation to the administrative time.
       | 
       | If jobs have no minimum fixed cost attached, then the maximum job
       | rate is unbounded: the shorter the jobs being queued, the more of
       | them can be done per unit time: one million one-microsecond jobs
       | can be done in a second, or a billion one-nanosecond jobs, and so
       | on.
        
       | FourthProtocol wrote:
       | Messaging middleware such as queues is mostly redundant.
       | Simplistically -
       | 
       | Client --
       | 
       | If you cache a new submission (message) locally before
       | submitting, you just keep re-submitting until the server returns
       | an ACK for that submission; and your local cache is empty.
       | 
       | Scale clients out as needed.
       | 
       | Server --
       | 
       | If a message is received in good order, return an ACK.
       | 
       | If a message is a duplicate, discard the message and return an
       | ACK.
       | 
       | If a message can only be received once, discard if it already
       | exists on the server, and return an ACK.
       | 
       | If a message is invalid, return a FAIL.
       | 
       | Scale hardware out or up, as needed (ref. "capacity, item 2 in
       | the linked blog post above). Scale queues out by adding service
       | endpoints on the server. Async/await makes the client experience
       | painless. You save your employer $$ because no additional
       | procurement contracts, no licensing fees, no additional server-
       | side infrastructure to run queues, and no consultancy fees to set
       | up and operate Rabbit MQ/Tibco/IBM MQ/Amazon SQS/Azure Queue
       | Storage or whatever other queueing solution the org uses.
       | 
       | Message passing includes concepts auch as durability, security
       | policies, message filtering,delivery policies, routing policies,
       | batching, and so on. The above can support all of that and, if
       | your scenario calls for it, none of it.
       | 
       | The financial argument reduces to dev time vs. procurement,
       | deployment and operational costs of whatever 3rd party product is
       | used, as well as integration, of course.
       | 
       | * I note and welcome the downvotes. However I'd love it more if
       | you'd present a coherent counter argument with your vote.
        
         | remram wrote:
         | > If you cache a new submission (message) locally before
         | submitting, you just keep re-submitting until the server
         | returns an ACK for that submission; and your local cache is
         | empty.
         | 
         | This has terrible resource use and offers no visibility into
         | how many clients are waiting. And yet it's still a queue. Why
         | would anyone do that?
         | 
         | The rest of your post I can't parse at all.
        
       | detritus wrote:
       | The only thing to understand - and accept - is that you will
       | always pick the wrong queue. So will everyone else. Always.
       | 
       | .
       | 
       | I appreciate that this truism has nothing to do with the article.
        
         | prometheus76 wrote:
         | To enumerate your viewpoint: if there are five checkout lines
         | at the grocery store, your odds of picking the fastest line are
         | 1/5 (20%). The more grocery store lines there are, the lower
         | the odds are of you picking the fastest line.
         | 
         | They added "express lanes" back in the 80s to somewhat address
         | this, but then you have the person who ignores the sign and
         | gets in the line with a full cart.
        
       | Scene_Cast2 wrote:
       | What are the underlying assumptions we can break here?
       | 
       | For example, what if tasks were not monolithic? As the queue size
       | grows, increase the caching timeout and don't hit that DB, or
       | decrease timeout, or something like that.
       | 
       | Or, what if the input task variance was bounded and we just
       | initialized the queue with 10 tasks? This way, the addition of
       | tasks would never dip below 1 and would never exceed 20 (for
       | example).
        
         | jusssi wrote:
         | Some more:
         | 
         | Processing capacity might not be constant. If you're in a
         | cloud, maybe you can launch more processing power as queue
         | length increases.
         | 
         | Multiple items in the queue might be satisfiable by the same
         | processing, e.g. multiple requests for the same item. In that
         | case, having more requests in queue can increase processing
         | efficiency.
         | 
         | Edit: another one. All requests may not need the same quality
         | of service. For some, best effort might be acceptable.
        
         | GordonS wrote:
         | I wonder if it might also be possible to introduce a kind of
         | tiered storage, where, for example, the queue starts persisting
         | to cheap, massive block storage (such as cloud block storage)
         | instead of it's usual mechanism. That does imply tier 2 events
         | would _read_ slower when the busy period ceased though.
        
       | kingdomcome50 wrote:
       | And what if items are processed at exactly the same rate they are
       | added? Something is missing from this thesis. There must be an
       | additional assumption at play beyond what is stated in the
       | article.
        
         | estro0182 wrote:
         | Theoretically I think that's possible. In a practical setting
         | it is not since task arrival times are of a probability
         | distribution; this means that utilization is not 100% and
         | therefore queues do not grow to infinity. Given the author's
         | point about ~80% and above being bad, I imagine the immediate
         | processing scenario to still be pathological.
         | 
         | Edit: https://news.ycombinator.com/item?id=30601286
        
         | prometheus76 wrote:
         | It's not just the arrival rate that is a poisson distribution,
         | but it's also the processing time being a poisson distribution
         | that really sets the table for the scenario the article is
         | addressing. For me, it's easier to understand in a grocery
         | store situation, where customers arrive in a poisson
         | distribution at the checkout line, and that each cart has a
         | poisson distribution on how long it takes to process the items
         | in the cart. That second level of variance is key to the queue
         | buildup, especially as the grocer checker reaches 100%
         | utilization.
        
           | btilly wrote:
           | The processing time is exponential, not Poisson.
           | 
           | What all of this actually boils down to, though, is that the
           | odds of your next event being completing a job or getting a
           | new one are exactly even.
        
         | seandavidfisher wrote:
         | The assumption you're looking for is there but relatively
         | inconspicuous. He mentioned a M/M/1/[?] queue but then didn't
         | go into the details about the M/M/1 part. From the Wikipedia
         | page[0]:
         | 
         | > an M/M/1 queue represents the queue length in a system having
         | a single server, where arrivals are determined by a Poisson
         | process...
         | 
         | So the arrival times are not arriving at any fixed rate, but
         | according to a Poisson distribution. In the article he did
         | reference this fact right before the first plot:
         | 
         | > (For the wonks: I used a Poisson arrival process and
         | exponentially distributed processing times)
         | 
         | And then finally, he answered this question more directly in
         | the comments to the article:
         | 
         | > What's to prevent the system from bouncing between "1 task in
         | processor & 1 task in queue" and "1 task in processor & 2 tasks
         | in queue" while maintaining 100% utilization?
         | 
         | > Nothing! That could totally happen in a queueing system.
         | However, the arrival process would need to be tuned quite
         | precisely to the processing rate. You would need to watch the
         | status of the processor and, when a task finishes, only then
         | insert a new task into the queue. But this implies a shell
         | game: if you always have a task ready to put into the queue
         | when it needs to be padded back out, then isn't that task in a
         | sense already "queued"?
         | 
         | > Instead, in queueing theory we usually assume a random
         | arrival process: sometimes a minute may pass between arrivals
         | into the queue; sometimes only a second. So the system can't
         | bounce for arbitrarily long between 1-in-queue and 2-in-queue
         | states. Eventually, one of two things will randomly occur:
         | 
         | > 1. From the 1-in-queue state, the active task finishes
         | processing before a new task arrives, bringing queue size to 0.
         | 
         | > 2. From the 2-in-queue state, a new task arrives in the queue
         | before the active task finishes, causing the queue size to grow
         | to 3.
         | 
         | [0]: https://en.wikipedia.org/wiki/M/M/1_queue
        
           | kingdomcome50 wrote:
           | Of course I know what was missing. My comment was more about
           | making the implicit more explicit. I am not the only one who
           | immediately thought of the above as a counter-example to the
           | bold text (repeated twice) in the article.
        
       | civilized wrote:
       | > If the processor is always busy, that means there's never even
       | an instant when a newly arriving task can be assigned immediately
       | to the processor. That means that, whenever a task finishes
       | processing, there must always be a task waiting in the queue;
       | otherwise the processor would have an idle moment. And by
       | induction, we can show that a system that's forever at 100%
       | utilization will exceed any finite queue size you care to pick:
       | 
       | This argument is incorrect because it proves too much. If tasks
       | arrive predictably, say at the rate of one per minute, and the
       | processor completes tasks at the same rate, you can have maximum
       | throughput and a finite queue -- indeed, the queue never has to
       | exceed two waiting tasks. The unbounded queue growth problem only
       | arises when tasks arrive unpredictably. Since this argument
       | proves the same conclusion without regard to predictability, it's
       | incorrect.
       | 
       | Here's how I think it actually works: if tasks randomly come in
       | and out at the same rate, the size of the queue over time can be
       | modeled as an unbiased random walk with a floor at zero.
       | Elementary techniques prove that such a random walk will hit zero
       | in finite time, and will therefore hit zero infinitely many
       | times. These correspond to the idle times of the processor. The
       | only way to avoid hitting zero over and over is if tasks come in
       | faster than they come out, which leads to tasks piling up over
       | time.
        
         | assbuttbuttass wrote:
         | Thanks, I was confused by the article's induction proof which
         | seems to completely ignore new tasks coming in. The random walk
         | argument is very clear.
        
         | [deleted]
        
         | KaiserPro wrote:
         | I think you are putting too many fancy words into the problem.
         | 
         | Queues are balloons. If you put more air in than you are taking
         | out, they grow until they pop. As utilisation grows (ie as the
         | input volume of air starts to match the output) it takes much
         | longer to get rid of the backlog.
         | 
         | This means that any traffic anomalies that would otherwise be
         | hidden by the queue are instantly amplified.
        
           | civilized wrote:
           | If all you care about is engineering intuition, you can use
           | any number of metaphors. I'm talking about what it takes to
           | understand queues mathematically and prove correct results
           | about them, rather than incorrect results.
        
       | KaiserPro wrote:
       | This is a good article.
       | 
       | Ted Dzuba Monitoring theory covers unbounded queues pretty well:
       | http://widgetsandshit.com/teddziuba/2011/03/monitoring-theor...
        
       | maayank wrote:
       | Any good book/exploratory paper for queuing theory?
        
         | Jtsummers wrote:
         | https://web2.uwindsor.ca/math/hlynka/qonline.html
         | 
         | A large list of books (free and online), I cannot speak to the
         | quality of any particular book but you can take some of the
         | titles and authors and perhaps find reviews.
         | 
         | http://web2.uwindsor.ca/math/hlynka/qbook.html
         | 
         | Linked at the top of the first page I shared, but not available
         | online (though a few are available through some services like
         | O'Reilly's digital library).
        
       | fpoling wrote:
       | A lot of frameworks have queues bounded only by the available
       | memory. As the article nicely demonstrates it implies that CPU
       | must be idle at least 20% of time for that to work to have
       | reasonable bounds on latency. With various GUI event queues that
       | is OK, but it puzzled me why, for example, Erlang opted to have
       | unbounded message queues. Did not Erlang designers care about
       | latency?
        
         | yetihehe wrote:
         | In erlang it is standard to have 5 seconds timeout on queries.
         | So, when many processes start to send to one process and it
         | gets overwhelmed, other processes start having timeouts and
         | typically they have some backoff strategy. Essentially queues
         | are not bounded by size, but by time. Most processes don't send
         | messages in a fire-and-forget manner, but wait for confirmation
         | of execution.
        
         | toast0 wrote:
         | Erlang designers certainly cared about latency, but a bounded
         | message queue changes local messaging from reliable to
         | unreliable and remote messaging mostly reliable (as long as the
         | nodes don't disconnect) to unreliable. That's a big change and
         | they were unwilling to do it for a long time, although I think
         | there is an option to put bounds on the message queue now. It
         | was possible to have bounds before --- either with brute force
         | looking at the queue length for all processes and killing them,
         | or looking at the queue length within the process and
         | discarding requests if there are too many.
         | 
         | Also, for a long time, sending messages to a process with a
         | long queue might be deprioritized (suspended), as a form of
         | backpressure, although that got removed for local processes
         | because it wasn't effective with SMP, if I'm remembering
         | properly.
        
         | actionfromafar wrote:
         | Maybe the Erlang way is more letting the entire thread crash
         | and "bound" it with a restart?
        
       | motohagiography wrote:
       | If you are interested in modelling microservices, service busses,
       | and pubsub topologies, you can do some rough models of queues in
       | python with this: queueing-tool.readthedocs.io/
        
       | cjg wrote:
       | I just finished implementing a simple backpressure algorithm.
       | Once the queue is more than a certain percentage full, it rejects
       | new items. These rejections get turned into HTTP 503s.
        
         | SketchySeaBeast wrote:
         | Haven't you just moved the pressure back a step? Now you have a
         | bunch of items retrying to get their stuff on the queue and a
         | bunch of places to try and locate the problem, instead of an
         | overly large queue. Is the idea that the messages aren't
         | important enough to be placed on the queue in the first place
         | if demand is high?
        
           | cjg wrote:
           | The idea is that it's not a sudden hard failure. Instead only
           | a few things fail which helps in two ways: reducing the load
           | by discarding some work and alerting that the system is
           | beginning to be overloaded.
        
         | usrbin wrote:
         | What is the purpose of cutting off the queue at a n% full?
         | Isn't that effectively saying that the queue size is actually
         | n% of the stated size, or am I missing detail?
        
           | morelisp wrote:
           | Wait until you hear about hash table capacities.
        
           | bombcar wrote:
           | Perhaps "rejects _new_ items " is the key - a web server that
           | hits a certain percentage utilization only allow _existing_
           | clients /connections/logged in users/whatever to connect;
           | refuse _new_ connections.
        
           | cjg wrote:
           | It's probabilistic.
        
       | encoderer wrote:
       | This is unworkable if you are actively scaling your system. Am i
       | supposed to calculate ideal queue size with each scale out of my
       | data platform?
       | 
       | Instead, the right way to think about limiting queue size is load
       | shedding when you feel back pressure.
       | 
       | Here's an example at Cronitor: if our main sqs ingestion queue
       | backs up, our pipeline will automatically move from streaming to
       | micro-batching, drastically reducing the number of messages on
       | the queue at the expense of slightly increased latency per
       | message. At the same time, a less critical piece of our
       | infrastructure pauses itself until the queue is healthy again,
       | shedding one of the largest sources of db load and giving that
       | capacity to ingestion.
       | 
       | To me the goal is to feel and respond to back pressure before
       | blowing up and rejecting messages.
        
       | jph wrote:
       | Queuing theory in more detail:
       | https://github.com/joelparkerhenderson/queueing-theory
       | 
       | The article by Dan is referenced there too.
        
       | natmaka wrote:
       | If some logic re-orders a job-related queue in order to group
       | similar things, enhancing caches' hit-ratio, then a quite large
       | queue can be quite effective (especially if there is no priority
       | nor barrier, however in such a case the high throughput cost is a
       | high maximal latency).
        
         | taneq wrote:
         | Exactly. In fact I'd go so far as to say this is an equally
         | Important Thing to the Important Thing in the article. If your
         | queue is just a simple FIFO / LIFO / whatever to buffer surges
         | in demand until you get to them, then you absolutely get the
         | "queue suddenly goes to infinity as you hit a critical point
         | near maximum throughput" thing happening.
         | 
         | If, however, you use the queue as a way to process queries more
         | efficiently then that's a different story.
        
       ___________________________________________________________________
       (page generated 2022-03-08 23:01 UTC)