[HN Gopher] The most important thing to understand about queues ...
___________________________________________________________________
The most important thing to understand about queues (2016)
Author : dhotson
Score : 161 points
Date : 2022-03-08 13:08 UTC (9 hours ago)
(HTM) web link (blog.danslimmon.com)
(TXT) w3m dump (blog.danslimmon.com)
| ncmncm wrote:
| Adding just a little capacity makes need for queue capacity
| collapse.
|
| But if rates are not well specified, what "a little" means is
| also not, and rejecting additions to the queue is the only
| answer.
| mwcampbell wrote:
| > but once you understand it, you'll have deeper insight into the
| behavior not just of CPUs and database thread pools, but also
| grocery store checkout lines, ticket queues, highways - really
| just a mind-blowing collection of systems.
|
| ...and karaoke singer rotations. In 6 years of frequently singing
| karaoke at bars, I've never known a host to turn away new
| singers, unless it's almost time to end the show. So you could
| say the queue is unbounded, with predictable results for the few
| singers who show up early and then get frustrated with the ever-
| increasing time until their next turn as more singers arrive. I
| don't know what the best solution is.
| bombcar wrote:
| That's the same load shedding as grocery stores use - if
| everything gets too crowded people start leaving (not
| queueing).
|
| Now that may actually be suboptimal for the business (if they
| can only let 10 people in they'd rather let the 10 who will
| spend the most, say) which is why things like reservations, etc
| come into play. I wonder if restaurants that do both
| reservations and a queue see one group pay more on average ...
| claytonjy wrote:
| I've never managed a restaurant, but thinking about my own
| habits I'd bet that a restaurant that takes reservations but
| also some walk-ins almost surely makes more on the
| reservation guests. The meal is the main attraction for them,
| often a special occasion, while walk-ins are more likely to
| spread their money over other businesses (shopping before,
| drinks elsewhere after, etc.). I bet group size is also
| higher for reservations; most people know a reservation is
| all-but-required for larger groups.
| Animats wrote:
| _That 's the same load shedding as grocery stores use - if
| everything gets too crowded people start leaving (not
| queueing)._
|
| Yes. That's called the "rejection rate".
|
| Unless, some of the time, queue length is zero, you will have
| a nonzero rejection rate. This is worth bringing up with
| store managers who want to run understaffed checkouts. One of
| the things retail consultants do is point out how sales are
| being lost that way, both in customers who leave and
| customers who never come back.
|
| Much of my early work on network congestion was based on
| that. In the early days of networking, everyone was thinking
| Poisson arrivals, where arrivals are unaffected by queue
| lengths. This is partly because the original analysis for the
| ARPANET, by Leonard Klienrock, was done that way. It was done
| that way because his PhD thesis was based on analyzing
| Western Union Plan 55-A, which handled telegrams. (Think of
| Plan 55-A as a network of Sendmail servers, but with queues
| made from paper tape punches feeding paper tape readers. The
| queue was a bin between punch and reader.)[1], at 7:00. Queue
| length was invisible to people sending telegrams, so senders
| did behave like Poisson arrivals. That's still true of email
| today.
|
| The IP layer is open loop with rejection. Transport protocols
| such as TCP are closed loop systems. The two have to be
| considered together. Everybody gets this now, but it was a
| radical idea in 1985.[2]
|
| [1] https://archive.org/details/Telegram1956
|
| [2] https://datatracker.ietf.org/doc/rfc970/
| bombcar wrote:
| This is a big part of self checkout - it reduces rejection
| rate.
|
| And that can be done with queues in general, if alternate
| paths are available at some point intelligent operators
| will change path (this can be problematic also, when the
| option to change isn't available - I'd normally like to
| connect to a server near me but if it is overloaded give me
| the option to take a further away one).
| testudovictoria wrote:
| Focus on the bottleneck. There is a finite amount of time.
| Increase it to allow more participants.
|
| There's also putting restrictions on the input. Have a maximum
| song length.
|
| Otherwise, require additional constraints before queueing like
| making it computationally expensive or having a priority system
| based on criteria. Charge a cover (with increasing costs) or
| have a membership club that gives more/better access.
| galaxyLogic wrote:
| Here's another article about the same issue I think
| https://ferd.ca/queues-don-t-fix-overload.html .
|
| Solution: "Step 1. Identify the bottleneck. Step 2: ask the
| bottleneck for permission to pile more data in"
| hammock wrote:
| Love that solution. It's plainly unfair but in my experience so
| critical to getting things done in business, even setting aside
| engineering.
| bombcar wrote:
| This is wonderful and shows the problem very clearly.
|
| _If your system doesn 't have a way to shed load, it will
| eventually overload_.
|
| The problem for many is that when it does finally shed load,
| the load shedder gets blamed (and turned off). See how many
| people look for ways to turn off the OOMKiller and how few look
| to figure out how to get more RAM.
| dcow wrote:
| This is also wonderful (emphasis mine):
|
| > All of a sudden, the buffers, queues, whatever, can't deal
| with it anymore. You're in a critical state where you can see
| smoke rising from your servers, _or if in the cloud, things
| are as bad as usual, but more!_
| cecilpl2 wrote:
| This is a fantastic article, thank you for sharing!
| prometheus76 wrote:
| I work in a custom fabrication environment, and this solution
| doesn't really apply, because what happens in a dynamic system
| is that the bottleneck shifts all around the shop dynamically.
| It's never just one operation that is the constant bottleneck.
| owenmarshall wrote:
| How doesn't it? I've lived in a world of coordinated
| microservices before, where a nice linear progression of
| queues only happens at the atomic level; when you zoom out
| you've got a digraph full of cycles, nodes that come & go in
| response to various events & schedules, and the transitory
| nature of those services entering could cause a smooth-
| sailing pipeline to suddenly seize up because some downstream
| dependency is servicing higher-priority requests.
|
| The _only_ way we could stay sane was aggressively following
| Ferd 's rule - nobody got to surface an API unless it
| surfaced HTTP 429s on overload[1], and if you called a
| downstream API without checking for and responding to a 429,
| _tough noogies_ , your requests got dropped and you had to
| figure out what to do to fix it.
|
| Funny enough, at the time I did software for a manufacturing
| facility that did high-throughput manufacturing for many
| varying products. I was lucky enough to do a "gemba walk"
| where they made us engineers put on hard-hats and eye
| protection and reminded us who actually made the company
| money, and by god those guys were doing exactly what we did:
| they had JIT delivery of raw materials / sub-components to
| assembly lines, and the delivery of resources to the line was
| always constrained by a fixed buffer, quite literally "how
| many units you could put on a shelf". You couldn't just toss
| more product at a line that wasn't ready for it, it would
| literally _not fit_.
|
| Sure, they used different terms than we did - muda and muri
| and value-stream mapping - but the outcome was very similar.
| Their version of the NOC was a team of managers who watched
| the cycle time of each line, how long it took material to be
| transported, which areas were bottlenecking and which were at
| risk of running into resource starvation, and they had the
| ability to balance the hot spots by bursting product in,
| caching upstream components into hold docks, releasing held
| work product to downstream lines, or even turning on a flex
| line to work around longer-term constraints. They rarely had
| fixed, static bottlenecks, because they optimized the life
| out of them. Their bottlenecks arose when a shipment didn't
| arrive (and you better believe those were tracked just as
| aggressively), a machine broke down, an operator got sick,
| things like that.
|
| But at the end of the day? Same principle: downstream
| components would only accept as much as they could process
| from upstream components; you asked before you pushed.
|
| [1]: and if you said "I swear, you can't overload this
| service, we don't need to send 429s" a swarm of senior
| engineers descended upon you and only left until you
| convinced them they were wrong, or - way more likely - they
| convinced you.
| prometheus76 wrote:
| Manufacturing and fabrication are two different worlds.
| Manufacturing is all about reducing/eliminating variance,
| whereas fabrication (high-mix, low volume) is all about
| absorbing variance. Most of the things we build have never
| been built before, and most of them will never be built
| again. So the estimates on how long it will take to build
| one is just that: an estimate. And the process of
| fabricating it is affected by MANY variables, but I'll just
| list a few. Materials with long lead times. Materials not
| arriving on time. Wrong materials sent. A pareto
| distribution of time it will take to finish an operation
| depending on which employee I give it to. What else is in
| the queue for an operation. Whether customers put a hold on
| the part. And so on, and so on.
|
| Every manufacturing person I've met thinks we're just
| idiots. Then they come to our shop and it takes them about
| 1-2 years of trying to "we just need to implement Lean,
| guys!" before they realize that fabrication has very
| little, if anything, in common with manufacturing and the
| only lean principle that really applies is keeping your
| work area tidy and your tools organized.
|
| Our business model is built around our ability to absorb
| and manage variance, rather than eliminate it.
| owenmarshall wrote:
| Very interesting - thanks!
| galaxyLogic wrote:
| Would this apply to "Software Fabrication" as well?
| prometheus76 wrote:
| Yes, we have much more in common with software
| development than we do with manufacturing. We have more
| stages of production than software, and we have a
| challenging situation of shared resources between
| projects (we are generally working anywhere from 25-50
| projects at once in our shop using shared resources), but
| the variance factor of 1) not knowing exactly how long it
| will take to make something because it's never been made
| before, and 2) different employees taking vastly
| different amounts of time to accomplish the same task are
| both shared challenges that we face along with software
| development.
| dcow wrote:
| I like the author's solution (always use bounded queues) because
| it usually forces you to confront back-pressure up front. It
| doesn't matter how big your queue is, your system must be
| designed to work at peak throughput essentially without a queue,
| and thus your system must handle the possibility that an event
| fails to be processed and must be retried/dropped. Queues _only_
| serve to mitigate bursts.
|
| It's annoying but also understandable how often people just dump
| stuff into an unbounded queue and punt on making sure things work
| until the system is falling down. Often queues are more of a tool
| developers and operators can use to navigate unknown performance
| characteristics and scale later than they are a requirement for
| the actual system itself.
| mwcampbell wrote:
| I wonder which combination of HTTP reverse proxy (e.g. nginx,
| cloud load balancer) and application server (e.g. Python WSGI
| server) is best at using bounded queues and signaling
| backpressure (e.g. HTTP 503) rather than just letting requests
| pile up.
| slrz wrote:
| I guess that basically everything you'd consider for
| production use allows you to configure queue sizes and
| related behaviour however you fancy.
| sdevonoes wrote:
| > It's annoying but also understandable how often people just
| dump stuff into an unbounded queue and punt on making sure
| things work until the system is falling down.
|
| It's annoying if it is done by the infrastructure team (I mean,
| they should know the details of the queue they are managing).
| It's understandable if it is done by product developers (they
| are more into "inheritance vs composition" kind of things).
| capn_duck wrote:
| I've never met an application developer who was unaware of
| what a "queue" was or the problems they purport to solve.
| Pretty sure stacks/queues are among the first "data
| structures" that students create, which inevitably leads to
| the question of, "what to do when full?" i.e. circular
| queues, double ended queues, priority queues. I know that the
| enterprise grade queuing systems we're talking about are a
| lot more involved than that, but to suggest that developers
| don't grok queues is pretty disingenuous. And the
| implications of rate in > rate out is pretty obvious for
| anyone that's ever had a clogged sink.
| dcow wrote:
| I've seen plenty of product engineers not understand this
| fundamental aspect of queues and just add them because it
| "felt right" or "for scale" or something silly...
| sparsely wrote:
| There's a lot of "Let's add a queue to deal with scaling
| issues" type thoughts which don't really work in practice.
| Like 95% of the time the system works better without one.
| treis wrote:
| The joke I like is:
|
| You have a problem so you implement a queue. Now you have
| two problems.
|
| It succiently illustrates the problem because you should
| build your application to account for the queue being
| down. So you still have your original problem of what do
| I do if I can't process everything I need to.
| jorangreef wrote:
| > I like the author's solution (always use bounded queues)
| because it usually forces you to confront back-pressure up
| front.
|
| I work on a system that can process a million financial
| transactions a second, and this principle of "always use
| bounded queues" has definitely made an incredible impact on the
| design.
|
| We use bounded queues everywhere -- and everything is also
| static allocation: memory, network resources, disk too, even
| all the messages that might possibly be needed to execute
| different variants of the distributed consensus protocol. It's
| really nice coding on it, because all the limits are clearly
| defined.
|
| It forces a little more upfront thinking. This thinking can be
| a little difficult for a day or two, but the net result is a
| better design.
|
| I love Little's law.
| konha wrote:
| > and thus your system must handle the possibility that an
| event fails to be processed and must be retried/dropped
|
| Isn't a retry just extending the queue to the caller?
| ggrrhh_ta wrote:
| That's the best scenario; regardless of whether the event
| reaches the destination or not, in most cases (except
| continuous monitoring of some parameter which admits losses)
| the source still will keep the information or the way to
| generate that information, so the amount of state that the
| queue at the source requires is still being used. But
| moreover, the source can throttle the generation of events,
| or it can just decide that that event is stale and can be
| dropped, or can pack that event with the next events, or can
| signal other processes about what is happening, or it can
| choose other path, or maybe it has some out-of-band signaling
| capability, or....
| dcow wrote:
| That's why I said "retry/drop". Handling back-pressure means
| you must propagate it backwards to a point in your program
| where there is enough context to enable decision to be made
| on how to handle the failure. Also, a retry can wait some
| amount of time until the system is less busy (and this can be
| enforced by some other part if needed). Further, you may have
| many callers and one central thing processing requests so
| propagating back-pressure to callers also allows you to push
| the problem back to a point where there are potentially more
| resources. In a silly example, if someone spamming your queue
| starts to experience a sluggish UI or run out of RAM because
| back-pressure has forced the queue to grow on their machine
| an not your infrastructure, perhaps they'll get the message
| and cool it (ideally the program handles this gracefully too
| and tells the user a maximum has been hit, but even without
| that back-pressure is useful).
|
| So, yes. But that's the point.
| pornel wrote:
| Even worse, it multiplies the amount of requests hitting the
| overloaded processor. You can create a "metastable failure"
| situation that will prevent an overloaded service from ever
| recovering.
| toast0 wrote:
| It helps if your client retries have backoff (but that's
| not always possible), but you do need to have a way to drop
| requests much faster than you typically process them and
| it's nice to have a sensible way to drop requests that are
| unlikely to be successful. Sometimes you can drop based on
| time in queue, sometimes you can drop based on percentage,
| sometimes you can track backend success and drop requests
| to a failing backend, sometimes you have to just flush the
| whole queue periodically if overloaded. Facebook has
| published something about switching from a queue to a stack
| during heavy load, which can work, although that stack
| needs bounds, too.
| dahfizz wrote:
| > Even worse, it multiplies the amount of requests hitting
| the overloaded processor.
|
| How is that? The queue goes between the "client" and the
| "server". The client thread does the work of trying to
| enqueue something, while the server continues to churn
| through existing requests completely independently.
| sly010 wrote:
| "backpressure" is an unfortunate term because it's not
| something the queue does. "back-off" would probably be a
| better term, since it's the behavior of the producer. The
| producer has to be aware of it.
| dcow wrote:
| "backpressure" quite literally refers to a queue blocking
| when full. Exerting backpressure on a system is
| absolutely something a queue does. Backing off or
| "loadshedding" is something the producer does in response
| to backpressure from a queue.
| im3w1l wrote:
| Yes but that queue is bounded too. And if you follow the
| systems backward eventually you end up with a real person.
| That is then forced to prioritize away less important tasks.
| marcosdumay wrote:
| > Queues only serve to mitigate bursts.
|
| Well, they also serve to hold your state while you add
| processing capacity.
|
| If you look at a market queues, when they start to grow,
| management deviates people from non time sensitive tasks into
| cashiers. The queues make keeps things working during the
| change.
|
| Equivalent situations happen on software too.
| dcow wrote:
| It's two sides of the same coin. In your example management
| only adds more cashiers if the queues have become
| unacceptably long (they've tripped a soft limit). The queues
| are serving to accommodate irregular shopper purchasing
| patterns because shoppers don't arrive consistently
| throughout the day. So really they are serving to smooth over
| bursts. Adding more cashiers is simply a response to queues
| hitting an unacceptably long length. If management had
| infinite cashiers and you could add them instantly then you
| wouldn't need queues. But, and this is especially obvious if
| you consider the scenario where all registers are staffed,
| then you still need queues to handle bursts. Costco is a good
| place to experience this.
| lanstin wrote:
| This philosophy, all queues have explicit tuned limits with a
| drop log and increment a metric on queue full, was used
| thoroughly at AOL in the 90s. It worked very well, and let us
| use hardware at much higher loadings than is popular currently.
| Our goal was ~90% CPU at peak (systems I have worked with more
| recently start to die around fifty percent). Also of course
| there was a meeting each Friday to look at all queue depths and
| queue latencies and to see where we needed more capacity coming
| up. We did have so many subscribers that traffic was fairly
| smooth over all.
| lliamander wrote:
| This explains why every JIRA backlog I've seen always grows until
| it becomes unwieldy.
|
| I wonder if anyone here has experience with a fixed-size backlog
| for work tickets.
| dcow wrote:
| Shape Up [1] (from basecamp) rejects the notion of backlogs
| essentially for this reason. We've leaned into adopting it
| without using basecamp software specifically and it's pretty
| refreshing.
|
| [1]: https://basecamp.com/shapeup
| jrockway wrote:
| What do you do when the queue overflows? Block work on backlog
| capacity? Drop the new item on the floor?
|
| The problem is that people add things to the backlog during
| their ordinary work, so as you scale up the number of workers,
| the pressure on the queue increases.
|
| That said, my experience in doing bug triage is that some large
| percentage of tickets are never going to be done. Those should
| just be dropped on the floor.
| dcow wrote:
| Yeah you drop the old ones. And then the insight is that
| nobody ever has time to process the backlog anyway so just
| drop the new ones instead and delete the backlog. If the work
| is important, it will come back up as a new request during
| cycle planning.
| AtlasBarfed wrote:
| backpressure is the mechanism that solves this
|
| or less elegantly, circuit breaking / fail-retry
| __dt__ wrote:
| Interesting article, what bothers me a bit is
|
| > I won't go into the M/M/1 part, as it doesn't really end up
| affecting the Important Thing I'm telling you about. What does
| matter, however, is that '[?]'.
|
| I'm really no expert but if I understand it correctly the `M/M`
| part defines the distributions of the arrival and service
| processes, so this definitely is important as others have already
| mentioned in the comments. E.g. a D/D/1 queue where D stands for
| deterministic arrival shouldn't suffer from this problem.
|
| This doesn't change the interesting fact the article presents but
| throwing in technical terms without proper explanation is imo bad
| style. Either don't mention it (would be better in this case) or
| explain it and why it is relevant.
|
| This is also a common "mistake" unexperienced people make when
| writing papers. They mention a fact that is somehow related but
| is completely irrelevant to the topic and the rest of the paper.
| I don't want to assume anything but to me this often smells like
| showing-off.
| bob1029 wrote:
| For an alternative take on this, read up on the LMAX Disruptor.
| Of particular note is section 2.5 "The problem of queues."
|
| https://lmax-exchange.github.io/disruptor/disruptor.html
|
| It has a completely opposite pattern: the more you load it the
| faster it goes.
| gpderetta wrote:
| > It has a completely opposite pattern: the more you load it
| the faster it goes.
|
| Not sure what do you mean with that. The article is about the
| general theoretical properties of (unbounded) queues.
|
| LMAX is a bounded queue, with the quirk that it will drop older
| messages in favour of newer ones and it assumes that either
| there is a side channel for recovery or consumer(s) can
| tolerate message drops.
| bob1029 wrote:
| You're right, that was a rough take. My presentation of LMAX
| was to simply provide a different perspective over this
| space.
|
| What I was trying to get at is the specific memory access
| patterns, batching effects and other nuances related to the
| physical construction of the CPU which modulate the actual
| performance of these things. I think this quote better
| summarizes what I was trying to convey:
|
| > When consumers are waiting on an advancing cursor sequence
| in the ring buffer an interesting opportunity arises that is
| not possible with queues. If the consumer finds the ring
| buffer cursor has advanced a number of steps since it last
| checked it can process up to that sequence without getting
| involved in the concurrency mechanisms. This results in the
| lagging consumer quickly regaining pace with the producers
| when the producers burst ahead thus balancing the system.
| This type of batching increases throughput while reducing and
| smoothing latency at the same time. Based on our
| observations, this effect results in a close to constant time
| for latency regardless of load, up until the memory sub-
| system is saturated, and then the profile is linear following
| Little's Law. This is very different to the "J" curve effect
| on latency we have observed with queues as load increases.
| mopierotti wrote:
| I'm no expert, but I don't think this is true. Couldn't tasks
| arrive into the queue at the same rate that they are processed,
| resulting in a fixed queue size at 100% utilization?
|
| Put another way, in the second "No good" diagram showing one task
| being worked on with none in the queue, another task could arrive
| before the current one is finished.
|
| I suspect the counterargument might be that the task arrival rate
| is not usually that predictable, but even so, I expect that the
| amount of variance predicts how true this theory is in practice.
| drivebycomment wrote:
| Classical misunderstanding of queueing. If you have a perfect
| control over an arrival, you can consider it as already queued.
| I.e. bank customers or grocery store checkout customers or
| online http requests for a service all arrive at random
| interval. You don't have a control over their arrival timing.
| cormacrelf wrote:
| This is not helped by the language used in the post. This was
| supposed to be an introduction to queuing, people shouldn't
| come away with their classic misunderstandings completely
| intact.
|
| > When a system has steady-state 100% utilization, its queue
| _will grow to infinity_. (Emph mine.)
|
| This is clearly false, there are clearly counterexamples
| where the queue does not grow that way. You don't have
| control over the arrival rate etc but it could just be your
| lucky day. It would be more accurate to say you can't
| guarantee that either the queue length or wait time are
| finite. Author didn't include the words "assume the worst
| case" or anything like that. I feel like that's a pretty
| crucial piece of framing that's missing.
| marcosdumay wrote:
| The author didn't assume the worst case, he just didn't set
| the timeframe. Every queue with random arrive times that
| has processors operating at 100% of their capacity is
| guaranteed to eventually grow indefinitely (as in, you can
| name any size, it will become larger than it).
|
| But on practice you also can not maintain 100% workload
| indefinitely either.
| setr wrote:
| The steady state is like Big-O notation; it's already
| talking about the edge (as time grows to infinity). A
| single day deviation doesn't really matter -- your queue
| might shrink and grow a little day-to-day, but on average
| it'll grow more than it shrinks, and you're ultimately
| fucked.
|
| If your utilization looked like 100% on one day, 0% on
| another, and flipping back and forth (allowing the queue to
| clear) then by definition your steady state isn't 100%, its
| more like 50%.
|
| And of course if your time bound is limited, or your
| potential customers is limited, then naturally everything
| is limited. But like big-O, the notion is to describe what
| happens when your inputs grows to infinity
| remram wrote:
| It does not immediately grow towards infinity, but it is
| unbounded. Over a sufficiently long period of time, any given
| queue size will be observed.
|
| Assuming the input and output are just a little bit random, and
| are only equal _on average_. You will have periods during which
| the input and output balances out. You will have periods during
| which the output is faster, and the queue will shrink to 0 (but
| no lower). And you will have periods where the output is
| slower, and the queue grows, with no bound.
| crdrost wrote:
| Yes, that is a very basic point, one basic reason to do queuing
| is that queues are buffers for uncertainty. If things are
| perfectly predictable, as you see on some robots, you allocate
| exactly what you need and no extra.
|
| The uncertainty exists in two places though... It's not just
| the events coming in but also the time to process each queued
| event, is another source of uncertainty in the same system that
| can have the same effect.
|
| The author mentions "M/M/1," this quantifies that there is only
| one queue-handling process, that events come in with the sort
| of randomness that raindrops have (no periodicity), and that
| handling an event does not have any "timeouts" nor any "long
| tails" where once a handler has been occupied for 3 or 4 _T_
| (where _T_ is the average time to resolve) then the prediction
| for how much time is remaining for them either drops to 0
| (timeout) or escalates dramatically (long tail)... The
| exponential distribution is used, kind of to be nicely in
| between these, it is a model where over a time dt the handler
| is asked to pick a random number with success rate p=dt /T, if
| they pick the right random number then the event is handled,
| otherwise a new random number is generated, repeat ad nauseam.
|
| The basic lessons are surprisingly transferable though. So for
| example traffic on the highway does not have this raindrop sort
| of randomness because cars kind of repel each other, they space
| out much more uniformly than you would have predicted on short
| term scales... The handling time is also nothing like this
| model of random guessing. Nevertheless, traffic jams still form
| at high utilization! And they form for a really interesting
| reason which is that the _n_ handlers--lanes, we call them--
| have correlations between them. All these queuing models
| optimistically assume that a handler never stops another
| handler, "hey have you ever seen anything like this? where can
| I find that password?" etc.
|
| But in traffic, a car can only lane change from one lane to
| another if both lanes synchronize speed. With a fixed ability
| to accelerate, this depends the amount of space between you and
| the next car, and the amount of space in the lane you want to
| switch to: as utilization increases both of these spaces drop
| and synchronization becomes a massive source of inefficiency.
|
| Same happens at a company too, it is possible that an employee
| might spend more time waiting for meetings, and finding
| convenient times to schedule them, in order to synchronize two
| streams of work, than the work itself takes. Can be very
| frustrating!
| seandavidfisher wrote:
| The author addressed this in the comments.
|
| > What's to prevent the system from bouncing between "1 task in
| processor & 1 task in queue" and "1 task in processor & 2 tasks
| in queue" while maintaining 100% utilization?
|
| > Nothing! That could totally happen in a queueing system.
| However, the arrival process would need to be tuned quite
| precisely to the processing rate. You would need to watch the
| status of the processor and, when a task finishes, only then
| insert a new task into the queue. But this implies a shell
| game: if you always have a task ready to put into the queue
| when it needs to be padded back out, then isn't that task in a
| sense already "queued"?
|
| > Instead, in queueing theory we usually assume a random
| arrival process: sometimes a minute may pass between arrivals
| into the queue; sometimes only a second. So the system can't
| bounce for arbitrarily long between 1-in-queue and 2-in-queue
| states. Eventually, one of two things will randomly occur:
|
| > 1. From the 1-in-queue state, the active task finishes
| processing before a new task arrives, bringing queue size to 0.
|
| > 2. From the 2-in-queue state, a new task arrives in the queue
| before the active task finishes, causing the queue size to grow
| to 3.
| kazinator wrote:
| What the naysayers are missing is the definition of load; read it
| again: _" [C]apacity is the number of tasks per unit time that
| can be processed."_
|
| But tasks are variable. One task requires 5 minutes of
| processing, another one of 35. They arrive randomly. This is also
| explicitly given, in a caption under one of the diagrams. _" A
| queueing system with a single queue and a single processor. Tasks
| (yellow circles) arrive at random intervals and take different
| amounts of time to process."_
|
| People calling the article wrong may be thinking of capacity as
| "unit time amount of work"; like when the processor is at full
| capacity, it's doing one second's worth of work every second.
|
| If we define capacity this way, the problem goes away: the
| processor just becomes a leaky bucket. So that is to say, if we
| know exactly how long each task will take, then we can measure
| the queue size in terms of total number of seconds of work in the
| queue. And so then, as long as no more than one second's worth of
| work is being added to the queue per second, it will not grow
| without bound, just like a leaky bucket that is not being
| refilled faster than its leak rate.
|
| When capacity is given as a maximum _number_ of tasks per second,
| there has to be some underlying justification for that, like
| there is some fixed part to servicing a job such as set up time
| and clean up time that doesn 't go away even if the job takes
| next to zero seconds, such that jobs effectively have a built-in
| minimum duration. If it takes one second to set up a job, and one
| second to clean up, then the maximum capacity is half a job per
| second: 1800 jobs per hour and so on. Of course the queue starts
| to backlog when you approach capacity, because the jobs also
| require nonzero real work in relation to the administrative time.
|
| If jobs have no minimum fixed cost attached, then the maximum job
| rate is unbounded: the shorter the jobs being queued, the more of
| them can be done per unit time: one million one-microsecond jobs
| can be done in a second, or a billion one-nanosecond jobs, and so
| on.
| FourthProtocol wrote:
| Messaging middleware such as queues is mostly redundant.
| Simplistically -
|
| Client --
|
| If you cache a new submission (message) locally before
| submitting, you just keep re-submitting until the server returns
| an ACK for that submission; and your local cache is empty.
|
| Scale clients out as needed.
|
| Server --
|
| If a message is received in good order, return an ACK.
|
| If a message is a duplicate, discard the message and return an
| ACK.
|
| If a message can only be received once, discard if it already
| exists on the server, and return an ACK.
|
| If a message is invalid, return a FAIL.
|
| Scale hardware out or up, as needed (ref. "capacity, item 2 in
| the linked blog post above). Scale queues out by adding service
| endpoints on the server. Async/await makes the client experience
| painless. You save your employer $$ because no additional
| procurement contracts, no licensing fees, no additional server-
| side infrastructure to run queues, and no consultancy fees to set
| up and operate Rabbit MQ/Tibco/IBM MQ/Amazon SQS/Azure Queue
| Storage or whatever other queueing solution the org uses.
|
| Message passing includes concepts auch as durability, security
| policies, message filtering,delivery policies, routing policies,
| batching, and so on. The above can support all of that and, if
| your scenario calls for it, none of it.
|
| The financial argument reduces to dev time vs. procurement,
| deployment and operational costs of whatever 3rd party product is
| used, as well as integration, of course.
|
| * I note and welcome the downvotes. However I'd love it more if
| you'd present a coherent counter argument with your vote.
| remram wrote:
| > If you cache a new submission (message) locally before
| submitting, you just keep re-submitting until the server
| returns an ACK for that submission; and your local cache is
| empty.
|
| This has terrible resource use and offers no visibility into
| how many clients are waiting. And yet it's still a queue. Why
| would anyone do that?
|
| The rest of your post I can't parse at all.
| detritus wrote:
| The only thing to understand - and accept - is that you will
| always pick the wrong queue. So will everyone else. Always.
|
| .
|
| I appreciate that this truism has nothing to do with the article.
| prometheus76 wrote:
| To enumerate your viewpoint: if there are five checkout lines
| at the grocery store, your odds of picking the fastest line are
| 1/5 (20%). The more grocery store lines there are, the lower
| the odds are of you picking the fastest line.
|
| They added "express lanes" back in the 80s to somewhat address
| this, but then you have the person who ignores the sign and
| gets in the line with a full cart.
| Scene_Cast2 wrote:
| What are the underlying assumptions we can break here?
|
| For example, what if tasks were not monolithic? As the queue size
| grows, increase the caching timeout and don't hit that DB, or
| decrease timeout, or something like that.
|
| Or, what if the input task variance was bounded and we just
| initialized the queue with 10 tasks? This way, the addition of
| tasks would never dip below 1 and would never exceed 20 (for
| example).
| jusssi wrote:
| Some more:
|
| Processing capacity might not be constant. If you're in a
| cloud, maybe you can launch more processing power as queue
| length increases.
|
| Multiple items in the queue might be satisfiable by the same
| processing, e.g. multiple requests for the same item. In that
| case, having more requests in queue can increase processing
| efficiency.
|
| Edit: another one. All requests may not need the same quality
| of service. For some, best effort might be acceptable.
| GordonS wrote:
| I wonder if it might also be possible to introduce a kind of
| tiered storage, where, for example, the queue starts persisting
| to cheap, massive block storage (such as cloud block storage)
| instead of it's usual mechanism. That does imply tier 2 events
| would _read_ slower when the busy period ceased though.
| kingdomcome50 wrote:
| And what if items are processed at exactly the same rate they are
| added? Something is missing from this thesis. There must be an
| additional assumption at play beyond what is stated in the
| article.
| estro0182 wrote:
| Theoretically I think that's possible. In a practical setting
| it is not since task arrival times are of a probability
| distribution; this means that utilization is not 100% and
| therefore queues do not grow to infinity. Given the author's
| point about ~80% and above being bad, I imagine the immediate
| processing scenario to still be pathological.
|
| Edit: https://news.ycombinator.com/item?id=30601286
| prometheus76 wrote:
| It's not just the arrival rate that is a poisson distribution,
| but it's also the processing time being a poisson distribution
| that really sets the table for the scenario the article is
| addressing. For me, it's easier to understand in a grocery
| store situation, where customers arrive in a poisson
| distribution at the checkout line, and that each cart has a
| poisson distribution on how long it takes to process the items
| in the cart. That second level of variance is key to the queue
| buildup, especially as the grocer checker reaches 100%
| utilization.
| btilly wrote:
| The processing time is exponential, not Poisson.
|
| What all of this actually boils down to, though, is that the
| odds of your next event being completing a job or getting a
| new one are exactly even.
| seandavidfisher wrote:
| The assumption you're looking for is there but relatively
| inconspicuous. He mentioned a M/M/1/[?] queue but then didn't
| go into the details about the M/M/1 part. From the Wikipedia
| page[0]:
|
| > an M/M/1 queue represents the queue length in a system having
| a single server, where arrivals are determined by a Poisson
| process...
|
| So the arrival times are not arriving at any fixed rate, but
| according to a Poisson distribution. In the article he did
| reference this fact right before the first plot:
|
| > (For the wonks: I used a Poisson arrival process and
| exponentially distributed processing times)
|
| And then finally, he answered this question more directly in
| the comments to the article:
|
| > What's to prevent the system from bouncing between "1 task in
| processor & 1 task in queue" and "1 task in processor & 2 tasks
| in queue" while maintaining 100% utilization?
|
| > Nothing! That could totally happen in a queueing system.
| However, the arrival process would need to be tuned quite
| precisely to the processing rate. You would need to watch the
| status of the processor and, when a task finishes, only then
| insert a new task into the queue. But this implies a shell
| game: if you always have a task ready to put into the queue
| when it needs to be padded back out, then isn't that task in a
| sense already "queued"?
|
| > Instead, in queueing theory we usually assume a random
| arrival process: sometimes a minute may pass between arrivals
| into the queue; sometimes only a second. So the system can't
| bounce for arbitrarily long between 1-in-queue and 2-in-queue
| states. Eventually, one of two things will randomly occur:
|
| > 1. From the 1-in-queue state, the active task finishes
| processing before a new task arrives, bringing queue size to 0.
|
| > 2. From the 2-in-queue state, a new task arrives in the queue
| before the active task finishes, causing the queue size to grow
| to 3.
|
| [0]: https://en.wikipedia.org/wiki/M/M/1_queue
| kingdomcome50 wrote:
| Of course I know what was missing. My comment was more about
| making the implicit more explicit. I am not the only one who
| immediately thought of the above as a counter-example to the
| bold text (repeated twice) in the article.
| civilized wrote:
| > If the processor is always busy, that means there's never even
| an instant when a newly arriving task can be assigned immediately
| to the processor. That means that, whenever a task finishes
| processing, there must always be a task waiting in the queue;
| otherwise the processor would have an idle moment. And by
| induction, we can show that a system that's forever at 100%
| utilization will exceed any finite queue size you care to pick:
|
| This argument is incorrect because it proves too much. If tasks
| arrive predictably, say at the rate of one per minute, and the
| processor completes tasks at the same rate, you can have maximum
| throughput and a finite queue -- indeed, the queue never has to
| exceed two waiting tasks. The unbounded queue growth problem only
| arises when tasks arrive unpredictably. Since this argument
| proves the same conclusion without regard to predictability, it's
| incorrect.
|
| Here's how I think it actually works: if tasks randomly come in
| and out at the same rate, the size of the queue over time can be
| modeled as an unbiased random walk with a floor at zero.
| Elementary techniques prove that such a random walk will hit zero
| in finite time, and will therefore hit zero infinitely many
| times. These correspond to the idle times of the processor. The
| only way to avoid hitting zero over and over is if tasks come in
| faster than they come out, which leads to tasks piling up over
| time.
| assbuttbuttass wrote:
| Thanks, I was confused by the article's induction proof which
| seems to completely ignore new tasks coming in. The random walk
| argument is very clear.
| [deleted]
| KaiserPro wrote:
| I think you are putting too many fancy words into the problem.
|
| Queues are balloons. If you put more air in than you are taking
| out, they grow until they pop. As utilisation grows (ie as the
| input volume of air starts to match the output) it takes much
| longer to get rid of the backlog.
|
| This means that any traffic anomalies that would otherwise be
| hidden by the queue are instantly amplified.
| civilized wrote:
| If all you care about is engineering intuition, you can use
| any number of metaphors. I'm talking about what it takes to
| understand queues mathematically and prove correct results
| about them, rather than incorrect results.
| KaiserPro wrote:
| This is a good article.
|
| Ted Dzuba Monitoring theory covers unbounded queues pretty well:
| http://widgetsandshit.com/teddziuba/2011/03/monitoring-theor...
| maayank wrote:
| Any good book/exploratory paper for queuing theory?
| Jtsummers wrote:
| https://web2.uwindsor.ca/math/hlynka/qonline.html
|
| A large list of books (free and online), I cannot speak to the
| quality of any particular book but you can take some of the
| titles and authors and perhaps find reviews.
|
| http://web2.uwindsor.ca/math/hlynka/qbook.html
|
| Linked at the top of the first page I shared, but not available
| online (though a few are available through some services like
| O'Reilly's digital library).
| fpoling wrote:
| A lot of frameworks have queues bounded only by the available
| memory. As the article nicely demonstrates it implies that CPU
| must be idle at least 20% of time for that to work to have
| reasonable bounds on latency. With various GUI event queues that
| is OK, but it puzzled me why, for example, Erlang opted to have
| unbounded message queues. Did not Erlang designers care about
| latency?
| yetihehe wrote:
| In erlang it is standard to have 5 seconds timeout on queries.
| So, when many processes start to send to one process and it
| gets overwhelmed, other processes start having timeouts and
| typically they have some backoff strategy. Essentially queues
| are not bounded by size, but by time. Most processes don't send
| messages in a fire-and-forget manner, but wait for confirmation
| of execution.
| toast0 wrote:
| Erlang designers certainly cared about latency, but a bounded
| message queue changes local messaging from reliable to
| unreliable and remote messaging mostly reliable (as long as the
| nodes don't disconnect) to unreliable. That's a big change and
| they were unwilling to do it for a long time, although I think
| there is an option to put bounds on the message queue now. It
| was possible to have bounds before --- either with brute force
| looking at the queue length for all processes and killing them,
| or looking at the queue length within the process and
| discarding requests if there are too many.
|
| Also, for a long time, sending messages to a process with a
| long queue might be deprioritized (suspended), as a form of
| backpressure, although that got removed for local processes
| because it wasn't effective with SMP, if I'm remembering
| properly.
| actionfromafar wrote:
| Maybe the Erlang way is more letting the entire thread crash
| and "bound" it with a restart?
| motohagiography wrote:
| If you are interested in modelling microservices, service busses,
| and pubsub topologies, you can do some rough models of queues in
| python with this: queueing-tool.readthedocs.io/
| cjg wrote:
| I just finished implementing a simple backpressure algorithm.
| Once the queue is more than a certain percentage full, it rejects
| new items. These rejections get turned into HTTP 503s.
| SketchySeaBeast wrote:
| Haven't you just moved the pressure back a step? Now you have a
| bunch of items retrying to get their stuff on the queue and a
| bunch of places to try and locate the problem, instead of an
| overly large queue. Is the idea that the messages aren't
| important enough to be placed on the queue in the first place
| if demand is high?
| cjg wrote:
| The idea is that it's not a sudden hard failure. Instead only
| a few things fail which helps in two ways: reducing the load
| by discarding some work and alerting that the system is
| beginning to be overloaded.
| usrbin wrote:
| What is the purpose of cutting off the queue at a n% full?
| Isn't that effectively saying that the queue size is actually
| n% of the stated size, or am I missing detail?
| morelisp wrote:
| Wait until you hear about hash table capacities.
| bombcar wrote:
| Perhaps "rejects _new_ items " is the key - a web server that
| hits a certain percentage utilization only allow _existing_
| clients /connections/logged in users/whatever to connect;
| refuse _new_ connections.
| cjg wrote:
| It's probabilistic.
| encoderer wrote:
| This is unworkable if you are actively scaling your system. Am i
| supposed to calculate ideal queue size with each scale out of my
| data platform?
|
| Instead, the right way to think about limiting queue size is load
| shedding when you feel back pressure.
|
| Here's an example at Cronitor: if our main sqs ingestion queue
| backs up, our pipeline will automatically move from streaming to
| micro-batching, drastically reducing the number of messages on
| the queue at the expense of slightly increased latency per
| message. At the same time, a less critical piece of our
| infrastructure pauses itself until the queue is healthy again,
| shedding one of the largest sources of db load and giving that
| capacity to ingestion.
|
| To me the goal is to feel and respond to back pressure before
| blowing up and rejecting messages.
| jph wrote:
| Queuing theory in more detail:
| https://github.com/joelparkerhenderson/queueing-theory
|
| The article by Dan is referenced there too.
| natmaka wrote:
| If some logic re-orders a job-related queue in order to group
| similar things, enhancing caches' hit-ratio, then a quite large
| queue can be quite effective (especially if there is no priority
| nor barrier, however in such a case the high throughput cost is a
| high maximal latency).
| taneq wrote:
| Exactly. In fact I'd go so far as to say this is an equally
| Important Thing to the Important Thing in the article. If your
| queue is just a simple FIFO / LIFO / whatever to buffer surges
| in demand until you get to them, then you absolutely get the
| "queue suddenly goes to infinity as you hit a critical point
| near maximum throughput" thing happening.
|
| If, however, you use the queue as a way to process queries more
| efficiently then that's a different story.
___________________________________________________________________
(page generated 2022-03-08 23:01 UTC)