[HN Gopher] Queues should be empty
___________________________________________________________________
Queues should be empty
Author : redman25
Score : 103 points
Date : 2023-02-18 04:42 UTC (1 days ago)
(HTM) web link (joshvoigts.com)
(TXT) w3m dump (joshvoigts.com)
| 8note wrote:
| An always empty queue means you could run into problems when
| traffic increases.
|
| I've seen this come up recently, where our queue reader was(well,
| still is) over scaled, where it could handle 100x the normal
| queue throughout. When the input rate gets higher though, instead
| of the queue filling a bit and controlling the traffic, we hammer
| our dependencies.
|
| I now see a continuously empty queue as a smell. It should be
| filling and emptying over and over again. Not all the time, but a
| queue that never fills is a ticking time bomb
| advisedwang wrote:
| What you're describing (empty queue = unprepared for load) is a
| code smell, but the solution doesn't have to be "fill the queue
| sometimes". It could also be:
|
| * (Mandatory) Have a back-pressure system.
|
| * (Optional, depends on system requirements) have auto-scaling
| of queue readers.
|
| * (Mandatory) Make sure the system is regularly load tested.
| philliphaydon wrote:
| I think the title is meant to be:
|
| "Error queues should be empty"
| nkozyra wrote:
| The _goal_ of a queue is to be empty, the _purpose_ of a queue is
| to occasionally not be.
| robotresearcher wrote:
| Unless the goal of your queue is to avoid starving consumers.
| Then it should always be non-empty.
| ysavir wrote:
| The author has a good sentiment, but wrong expression. It's not
| that "queues should be empty", it's that "you should be emptying
| your queues". Whether they reach an empty state is irrelevant,
| it's just a sisyphian task that should always be working
| _towards_ empty, and ideally, never actually reaching empty.
|
| A good comparison is a restaurant's order queue.
|
| If the order queue is always empty, that means no one is ordering
| food at the restaurant. That is an undesirable state. We _want_
| people to be ordering food, an empty queue shows that we have a
| problem in getting people to eat at our restaurant.
|
| We don't want our queue to be always growing. If it's always
| growing, that means we're understaffed, and people waiting for
| their meals will get frustrated. Some will wait and leave a bad
| review, others might leave before they ever get their meal--and
| won't pay for it, either. Lost time and effort on our part.
|
| But an order queue that almost always has two meals in the making
| is golden. It means we're getting consistent clientele to dine at
| our restaurant, that meals are being cooked and served in good
| time, diners are happy with the experience, and our system
| operations are healthy.
|
| I think the author of TFA understands this, but stopped a little
| short of the optimal expression of their ideas. It's not about
| what you have, it's about what you should always be working
| towards.
| colechristensen wrote:
| You've got it all wrong. If you're running a restaurant and
| orders never back up then you have too much staff, too big a
| kitchen, and are losing money by having too much capacity. You
| want to have a lunch rush, for some people to leave because
| you're too busy. When the queue is backed up is when you're the
| most profitable both per hour and per unit sold. Then you want
| to find efficiencies to improve throughout at the times you're
| backed up.
|
| You need to intentionally design your bottlenecks not pretend
| that you can avoid them completely.
| jojobas wrote:
| Is this an attempt at reinventing queue theory?
|
| _In the US it was called queue theory, in the USSR it was known
| as mass service theory.
|
| The thing is, there were no queues in the US and no mass service
| in the USSR._
| tweaktastic wrote:
| I agree with author on the point that queues are best to handle
| spikes. But when you are talking about scale dealing with
| millions of messages in short span of time for extended period,
| queues can never remain empty. Empty queues in this situation
| would mean that your concurrency is high. When your concurrency
| is high, this would result in too many concurrent connections to
| your database which wouldend upcreating bottlenecks for your
| system.
|
| The system should be designed to handle as many messages as
| possible ensuring that your resources don't get over burdened.
| Empty queues where your messages keep getting rejected because
| your resources are being hammered is not necessarily an efficient
| system.
| roflyear wrote:
| > When your concurrency is high, this would result in too many
| concurrent connections to your database which wouldend
| upcreating bottlenecks for your system.
|
| Can you expand on this?
| dmoy wrote:
| If you have a polling queue, as the number of workers grows
| too large, the overhead of maintaining connections from
| (worker) -> (stuff in the middle) -> (db or whatever keeps
| queue state) becomes too high. In the extreme case, you can
| get into a point of diminishing returns, where adding more
| workers hurts both throughput and latency in a bad way.
|
| That is - initially you can decrease latency by sacrificing
| utilization and just adding more workers that usually sit
| idle. This increases throughput during heavy load, and
| decreases latency. Until you can't, because the overhead from
| all the extra workers causes a slowdown up the stack (either
| the controller that's handling messages, or the db itself
| that's got the queue state, or whatever is up there). Then
| when you throw more overhead, you increase latency and
| decrease throughput, because of the extra time wasted on
| overhead.
|
| It depends on a lot of different variables.
|
| If you have work items that vary significantly in cost, it
| can add even more problems (i.e. if your queue items can vary
| 3+ orders of magnitude in processing cost).
| roflyear wrote:
| Well yes but you can solve that in a lot of ways.
| highspeedbus wrote:
| >Decoupling components
|
| >This way, the producer doesn't need to know anything about the
| consumer and vice versa.
|
| Both components needs to agree on shared data, down to what each
| field actually means. The producer is just not saying What To Do
| with the message.
|
| Take an example of processing a purchase order in multiple,
| asynchronous steps. While the webpage button spins there's a
| whole dance of agreed messages and expected outcomes that needs
| to happen in order for it to work. And there is no decoupling in
| sight.
|
| My take is that Decoupling is a vague word. One should design
| distributed systems assuming that coupling is manageable, not
| inexistent.
|
| >you could submit the task and ACK immediately allowing the job
| to finish in the background.
|
| That's bad practice.
| ruduhudi wrote:
| Throttling stuff is another valid use case. I've used a pub/sub
| for sending e-mail for a newsletter because we couldn't send too
| many of them at once so the consumer would run in a fixed
| interval and always pick up a fixed number of messages.
| smadge wrote:
| The most important reason for queues is the variability of job
| inter-arrival times and service times. If there was no
| variability there would be no need for queuing.
| c3534l wrote:
| This is a somewhat uninsightful observation that only has people
| responding to it because of pedantic arguing. I don't think
| anyone even disagrees with the author, they're just arguing about
| words.
| ww520 wrote:
| Queues are great as a buffer to even out compute utilization.
| With an empty queue, most workers can be shut down after a while,
| saving on compute cost. When the queue grows and stays non-empty,
| more workers can be started up. The length of the queue can be
| used to estimate the number of workers to spin up, as a metric
| for real time capacity planning.
| Akronymus wrote:
| Queues also enable "easy" batching of jobs. (For a very limited
| set of problems)
| edejong wrote:
| Queues should have an ideal size based on your latency
| requirements.
|
| Say, you have 100 requests coming in per second. Each consumer of
| your queue requires 0.3 seconds per request. Now, you can
| optimise for the number of consumers. Would you choose 40
| consumers, then the probability of serving the request with an
| empty queue in 0.3 seconds would be 17% (and > 0.3 seconds would
| be 84%). However, the total waiting time per request would be 0.1
| second.
|
| With 80 consumers, the probability of an empty queue would be
| higher at 58%. However, most of those consumers are doing
| nothing, while your waiting time is only reduced by 0.1 second!
| Only a 30% improvement.
|
| (edit: probability of empty queue in second example is higher,
| not lower)
| anonymoushn wrote:
| Can you state the rest of your assumptions? What is the
| distribution of the request arrival times?
| edejong wrote:
| Poisson. And 1.0 variance in request processing time.
| ThrustVectoring wrote:
| I'm not awake/invested enough to make up numbers and do the
| math here, but shouldn't the cost for provisioning workers and
| value provided at various latencies matter? Like, each worker
| you provision costs a certain amount and shifts the latency
| distribution towards zero by a certain amount, and in principle
| there should be _some_ function that converts each distribution
| to a dollar value.
|
| Like, suppose you had a specific service level agreement that
| requires X% of requests to be served within Y seconds over some
| time interval, and breaking the SLA costs Z dollars. Each
| provisioning level would generate some distribution of
| latencies, which can get turned into likelihood of meeting the
| SLA, and from there you can put a dollar value on it. And
| crucially, this allows the "ideal" amount of provisioning to
| vary based off the relative cost of over and under-
| provisioning; if workers are cheap and breaking the SLA is
| costly, you would have more workers than if the SLA is
| relatively unimportant and workers are expensive.
| edejong wrote:
| Yes, that's an interesting topic, especially given the
| prevalence of distributed compute nowadays and the rising
| awareness of cloud costs.
|
| In the end, the distribution is not really poisson, of
| course. So, you might be interested in low pass filtering to
| elastically scale your provisioned workers. There is quite
| some theory about this, including sophisticated machine
| learned models to predict future load. But I digress.
| [deleted]
| unxdfa wrote:
| I think this could be better described as "queues should be
| heading in the direction of empty"
| remram wrote:
| If you look at a queue at any random time, it will have
| necessarily increased more than it has decreased over its
| lifetime, otherwise it would hold negative items.
| unxdfa wrote:
| That's why you look at the average age of the items in the
| queue.
| remram wrote:
| That's a good measure (average latency) but that's very
| different from your earlier statement of "heading in the
| direction of empty".
| unxdfa wrote:
| Well not really. If the average age is increasing it's
| not heading in the direction of empty.
| remram wrote:
| You still haven't defined what "heading in the direction
| of empty" means. Like I said, your queue can't be
| decreasing in size over its history, and it cannot be
| decreasing in size most of the time either, if it started
| empty.
| kbrannigan wrote:
| I just realized how useful queues can be.
|
| Instead of letting the system fail under heavy load.
|
| You put the tasks in a holding place.
|
| They will be processed later.
|
| Computers have spoiled us with instant and now.... But even a 1
| hour processing time is amazing if that task would take 3 days
| manually.
| AstixAndBelix wrote:
| I had a course about queue theory in my SWE degree. I left the
| course with a deep sense of dread because I knew most people
| studying CS and SWE in other universities would be oblivious to
| such a fundamental piece of knowledge.
|
| Basically everything that processes data has a queue attached
| to it, from your router to your scheduler. Not knowing anything
| about queue theory is such an insult to the profession of SWE.
| It's like being a Mech engineer and not knowing about control
| of dynamic systems.
| OJFord wrote:
| Why did you think you knew that? I'm also one of your
| supposed few to have studied it, I never thought that.
|
| The dread should have been that most companies/teams won't
| care, won't engineer on that level, will just set it up
| however and make it bigger/faster/more-consumed if it's too
| slow.
| setr wrote:
| In my uni, the class was more or less unadvertised and the
| professor largely ignored -- you had to go out of your way
| to take it (I was actively looking for poorly rated
| professors, because I figured they were usually rated such
| because they graded hard... which probably meant they cared
| more). When I tried to google the subject/code examples
| afterwards, I largely only found textbooks and one-off blog
| posts.
|
| Also a lot of GitHub repos with names like CS550...
|
| Only place I've seen it referenced since that class was:
|
| 1. HN periodically
|
| 2. A YouTube documentary about Disneyland fast pass queues
|
| 3. A business optimization consultancy
|
| Ultimately, a pretty niche subject area AFAICT. But
| implementing a fairly efficient queuing simulation seems to
| be pretty easy, and the subject seems to be stupid powerful
| and broadly applicable... so I've never understood why it's
| not more discussed
| OJFord wrote:
| Compared to what, as if say information theory comes up
| (explicitly) all the time? Seems like you could say that
| about any more theoretically oriented course.
| roflyear wrote:
| Except most times people don't implement the surrounding infra
| correctly.
|
| Let's say you consume a job, do something, but that something
| fails. You have to set it to be rescheduled somewhere. Lets say
| your things writing jobs to the queue fails. That needs to be
| handled as well.
|
| "They will be processed later [if I use a queue]" is not true
| by default (without work and thought) and assuming so will get
| you in trouble, and will also make certain people on your team
| who do understand that isn't the case upset.
| yurishimo wrote:
| There are entire languages built upon this model! Anything
| using the BEAM is a great example. Erlang, Elixir, Gleam, etc.
|
| The runtime acts as an event loop and if a specific program is
| hogging too much time on the CPU, it will get paused and thrown
| to the back of the execution stack. This is all transparent
| (relatively) for the developer.
|
| Here's a video if you want to see an example:
| https://youtu.be/JvBT4XBdoUE
| denton-scratch wrote:
| The ideal queue length is one. That will minimize latency, and no
| poll will result in a nothingburger.
| roflyear wrote:
| But this is not true. "Always having something to process" is
| not the ideal length. It will cause major problems, though it
| is somewhat counterintuitive. See:
| https://blog.danslimmon.com/2016/08/26/the-most-important-th...
| denton-scratch wrote:
| I didn't say the ideal was "always having something to
| process"; I said the ideal was always having exactly one
| thing to process.
|
| Come on, it's an ideal; it doesn't have to be realistic.
|
| [Edit] Just read the Slimmons article. His argument is
| specifically about a queue served by a server that is under
| stress. His assertion is:
|
| "As you approach maximum throughput, average queue size - and
| therefore average wait time - approaches infinity."
|
| He's quite right (and yes, it's a bit counter-intuitive). But
| it's orthogonal to the matter of the ideal queue length.
|
| Thanks for the link.
| roflyear wrote:
| Right but in reality you can't actually hit "always having
| exactly one thing to process" without maxing out your
| queue. It's actually impossible! Because they are actually
| the same thing. "Always having something to process" means
| no idle time - which means your queue is forever long :)
| Aiming for that ideal is not a good idea because it's
| misguided (for lack of a better term).
| MaxGabriel wrote:
| Another benefit, queues can enforce ordering, and limit
| concurrency.
|
| This can be helpful for avoiding resource contention, or hitting
| an API limit.
| remram wrote:
| Queue can also allow batching. It might be that you need to
| consume or pack or ship N items at a time, otherwise you're
| losing money. Trying to aim for minimal latency would literally
| hurt your business.
|
| This article obviously has a very specific kind of
| workload/service/target in mind that they fail to define and
| they overgeneralize. This is bad advice in many situations that
| they failed to imagine.
| edejong wrote:
| Queues should absolutely _not_ be empty. If they are often empty,
| you 're over provisioning your consumers. There is an ideal size
| to a queue, based on the balance equation
| (https://en.wikipedia.org/wiki/Balance_equation). Also study
| queueing theory if you want to know more:
| https://en.wikipedia.org/wiki/Queueing_theory.
| highspeedbus wrote:
| This only works in _very_ predictable workloads. Once your
| average input rate becomes slightly larger than consumption
| rate your system is screwed. Over provision ensures spikes can
| be processed along with average rate in a general increased
| load state for some time.
| edejong wrote:
| True, my example is an oversimplification. In reality you
| want to account for rapidly and slowly changing demands. This
| can be solved with elasticity of systems or by
| overprovisioning. You can get deep, with time series
| analysis, low pass filtering and various metrics, such as
| quantile distributions... however, saying zero queue length
| is desirable is a gross oversimplification.
| colanderman wrote:
| Or use load shedding, in which case you want your queue
| nonempty to take advantage of negative spikes.
| clnq wrote:
| They should be empty if you wish to minimize latency. For
| example, when emergency room queues are full in hospitals, that
| is a very undesirable scenario. The same can be applied in
| software - there are parts of systems that should only use
| queues to handle spike workloads. And the longer these queues
| are on average, the worse the overall performance of the system
| is. Just like patient health outcomes drop with longer ER
| queues.
|
| And really, we need more context to say whether a queue should
| be empty, trend towards being empty, or towards always having
| tasks. I don't think it's possible to make a general
| prescription for all queues.
| remram wrote:
| They should be empty to reach absolute minimal latency (e.g.
| the only latency is the time it takes to process the task,
| which you could call 0 latency). Different services have
| different latency targets.
|
| The right statement is not "they should be empty" but "they
| should be under the target latency", which might be 0 (then
| "the queues should be empty"), or might be some other target
| (then empty means overprovisioned resources).
|
| Everyone in this thread seem to be talking over each other
| because they imagine vastly different services (e.g. shipping
| company vs hospital emergency rooms).
| edejong wrote:
| What are spike workloads? Is this poisson? What's the
| distribution?
| clnq wrote:
| Distributions are useful for optimizing. But the decision
| whether a queue should generally be empty or have elements
| comes out of system requirements.
|
| Then you can employ statistics to plan what resources you
| will allocate to processing the queue.
|
| Though having capacity for Poisson event spikes isn't
| nearly the only way to ensure queues trend to zero
| elements. For example, you can also scale the workload for
| each queue element depending on the length of the queue.
|
| It's very popular in video game graphics to allocate a
| given amount of frame time for ray tracing or anti
| aliasing, and temporally accumulate the results. So we may
| process a frame in a real time rendering pipeline for more
| or less time, depending on the state of the pipeline. Say,
| if the GPU is still drawing the previous frame, we can
| accumulate some ray traced lighting info on the CPU for the
| next frame for longer. If the CPU can't spit out frames
| fast enough, we might accumulate ray traced lighting
| kernels/points for less time each frame and scale their
| size/impact.
|
| Statistical distributions of frame timings don't figure
| here much as workloads are unpredictable and may spike at
| any time - you have to adjust/scale rendering quality on
| the fly to maintain stable frame rates. This example
| assumes that we're preparing one frame on the CPU while
| drawing the previous one on the GPU. But there may also be
| other threads that handle frames in parallel.
|
| Other workload spikes could be predicted by things like
| Poisson distributions. But once again, the key takeaway is
| that you can't generalise this stuff in any practical
| application. It all depends on system requirements and
| context.
| colanderman wrote:
| Additionally, they _must not_ be empty to maximize throughput
| of consumers which benefit from batching. If there 's only ever
| one item to grab, consumers cannot amortize fetches across a
| batch, which impacts throughput, _without any inherent benefit
| to latency_.
|
| (Imagine sending a TCP packet for every individual byte that
| was enqueued on a socket!)
|
| Same applies if you have any sort of load shedding on the
| consumer side. If the queue is empty, it means you've shed load
| you didn't need to; when the next brief drop in load comes, you
| can't take advantage of it.
|
| (Fun analogy: this is why Honda hybrids typically don't charge
| their battery all the way while driving, to allow the chance to
| do so for free while braking.)
| plandis wrote:
| > If the queue is not empty, it means that whatever job is
| consuming from the queue is not keeping up with messages coming
| into it.
|
| You could conceivably have a non-zero queue size that
| asymptotically reaches some stable size over time. In such a case
| you are keeping up with messages but with some approximately
| static amount of added latency.
|
| I don't think I've seen this happen in practice but I think some
| systems can be engineered to be approximately stable (grows and
| shrinks but within predictable bounds)
| remram wrote:
| This is very common when you can adjust the consume rate. This
| happens in practice: serverless platforms scale, Kubernetes has
| HorizontalPodAutoscaler, you can manually grab more VMs, or IRL
| this might mean calling in more staff or at larger time scales
| hiring more people in your organization.
| fwlr wrote:
| Spiky workloads, safely persisting data, decoupling components,
| and enabling asynchronous processing are all things that can be
| provided by a stack or a heap as well. The call stack in any
| given programming language is a good example of a stack for
| handling spiky workloads, for instance. A min-heap is more
| flexible than a queue for handling async/decoupled/pubsub stuff
| (and arguably more correct for many use cases).
|
| The thing that queues are good for is enforcing order: when I get
| to the front of the queue and it's time to do my thing, I have a
| guarantee that anyone who entered the queue before me has already
| done their thing, and anyone who entered the queue after me has
| not yet had a chance to do their thing. This is a useful
| guarantee for tasks that have an order dependence, for example
| database operations looking to guarantee consistency.
| dylan604 wrote:
| Should this title be given some sort of limiter/qualifier as in
| "Messaging queues should be empty"? I come from the video
| production world where you have things like render queues. These
| are designed to utilize a limited number (due to sheer expense)
| of hardware instances where if they finished the jobs so fast
| that the queue was always empty would indicate you have too many
| very expensive machines. A queue that is never empty, but is just
| finishing as new big pushes to the queue happen would be the
| ideal balance.
| wielebny wrote:
| Also, water should be wet and SQL tables need to have primary
| keys.
| kilgnad wrote:
| He's right. You guys need to see the system in terms of it's
| derivative. The rate of speed in which data is produced and
| consumed.
|
| This article is saying you basically want an average negative
| velocity where consumers eat data faster then producers on
| average.
|
| Any queue that is non empty means the velocity is positive.
| Producers are making data faster then consumers and you have a
| limited time window before the queue becomes full and your system
| starts dropping data.
|
| If you observe that your system never has a full queue but is
| constantly loaded with partially filled queues it means you built
| and tuned your system such that it operates on the border of
| failure.
|
| It means your system is continuously hitting unsustainable
| velocities all the time, but stops short of total failure (a full
| queue).
|
| Remember, think about the system in terms of it's derivative. The
| average derivative must be negative for your system to even
| function. Any time a queue is filled partially or full it means
| there was a period of positive velocity. These periods cannot
| exceed the amount of time the system is in negative velocity.
| HarHarVeryFunny wrote:
| It depends on what the queue is being used for.
|
| If a queue is just being used to decouple a producer/consumer
| pair, then "it should be [almost] empty" is a reasonable notion,
| assuming consumer/producer are compute bound and there's no
| fundamental reason why one should be faster than the other...
|
| Which brings us to the other type of use where things are being
| enqueued specifically as a means of _waiting_ for resources that
| are in limited supply: maybe just compute if these are
| computationally heavy jobs, or maybe I /O bandwidth, etc. In that
| case we expect the queue to often be non-empty, and there's
| nothing wrong with that, even if there's potential to throw
| money/hardware at it to get the queue size down to zero!
| orf wrote:
| No they shouldn't.
|
| The way I see it is that there are two opposing things you can
| optimise for that depend on queue depth: utilisation or latency.
|
| If you care about processing each message as quickly as possible
| then queues should be empty. This often requires a higher cost
| and lower utilisation as you inevitably have idle workers
| _waiting_ for new messages.
|
| If you care about utilisation, then you never want your queue to
| be empty while something is polling it. This might be some
| background task that runs on a GPU - every second that sits idle
| is wasted cash, and those tasks usually benefit heavily from
| batching inputs together.
|
| In this case you _want_ it to _always_ have something to read
| from the queue and shut it down the moment this isn't the case.
| mabbo wrote:
| If a queue isn't approaching empty, then its input is exceeding
| its output. Therefore it's size is trending to infinity.
|
| Since we don't have infinite space, we can expect eventually to
| lose some messages in this scenario.
| [deleted]
| beardedetim wrote:
| I think this is a valid concern for Future Us to worry about.
| If you have alerts for when this is going to happen, you can
| punt the issue until it starts to become a concern and who
| knows if your product will be alive when that issue happens.
| londons_explore wrote:
| M/M/1 Queue theory doesn't apply to real queues... Because
| the work input into the queue usually _does_ depend on the
| latency of previous work completed by the worker.
|
| If there is a 1 hour queue for the checkout at a grocery
| store, you probably won't join the queue - you'll go to a
| different store...
| lokar wrote:
| In many systems you get the reverse. A client sends a retry
| without the ability to cancel the first request, now it has
| two requests in the queue.
| User23 wrote:
| The Little's law[1] offers considerable guidance here. In the
| shortish term it's safe to model the system as being
| stationary and then you need to provision for seasonality[2].
| Upward or downward trending demand after correcting for
| seasonality indicates that the system isn't stationary and
| your choices are either to scale up or reject requests.
| Nevertheless the optimal queue size is not necessarily zero
| except in cases where extremely low latency is necessary. The
| optimal queue capacity for most web service type applications
| is one that trends toward almost but not quite full when
| seasonal requests are higher.
|
| [1] https://en.m.wikipedia.org/wiki/Little%27s_law
|
| [2] broadly construed. For example a service that gets a mass
| of requests at the top of every hour displays seasonality.
| klyrs wrote:
| > If a queue isn't approaching empty, then its input is
| exceeding its output. Therefore it's size is trending to
| infinity.
|
| This is literally the slippery slope fallacy. You aren't
| accounting for minor fluctuations or finite timescales. If
| you were driving a car, you might say "if you aren't steering
| into oncoming traffic then you're steering towards the ditch"
| and conclude that no cars will ever safely reach their
| destination.
|
| If the only valid state for a queue is empty, then why waste
| time implementing a queue?
| heipei wrote:
| We use queues for various jobs, some of which are time-
| sensitive (e.g. emails), so the queue should be empty most
| of the time, some of which are background or re-processing
| (e.g. re-indexing), so queues can balloon and then take
| days to drain again.
|
| The unifying aspects of using queues for us is that it
| allows us to load-balance jobs across workers, and allows
| us to monitor throughput in a centralised fashion. We can
| set alerts on the time-sensitive queues and then use
| different thresholds for the background queues, but we're
| using the same metric data and same alerting system.
| teolandon wrote:
| GP said "approaching empty", meaning it should be empty in
| the long term. It's not a slippery slope.
| camdenreslink wrote:
| It assumes the velocity of items being added to the queue
| never changes. In real life it speeds up (approaching
| infinite items), and it slows down (approaching empty). A
| more meaningful measurement would be something like
| average number of items in the queue and average time in
| the queue for any given item.
|
| If you feel like you have enough spare capacity and any
| given item isn't taking too long to process then it
| doesn't matter if you are ever empty.
| Espressosaurus wrote:
| You're on the right track, but I personally find averages
| to be not a useful measure...on average.
|
| Corner conditions are where I start to care:
|
| * What is the worst case latency? How long until the work
| can be done? How long until the work can finish from the
| point at which it enters the queue?
|
| * What is the worst case number of items in the queue?
| How large does the queue need to be?
|
| * What is the maximum utilization over some time period
| with some granularity? How spiky is the workload? A queue
| is a rate-matching tool. If your input rate exactly
| equals your output rate at all times, you don't need a
| queue.
|
| * What is the minimum utilization over some time period?
| Another view on how spiky the workload is.
|
| Minimums and maximums I find are much more illustrative
| than averages or even mean-squared-errors. Minimums and
| maximums bound performance; an average doesn't tell you
| where your boundary conditions are.
|
| In general you don't want your queue to fully fill up,
| but like the other poster said, it's some tradeoff
| between utilization and latency, and the two are
| diametrically opposed.
| camdenreslink wrote:
| Sure, I guess by average I just meant shorthand for some
| sort of measurement for making decisions about resources
| allocated to your queues. Median, mean, P99, whatever is
| useful.
| orf wrote:
| > Since we don't have infinite space, we can expect
| eventually to lose some messages in this scenario.
|
| Sure, that's technically correct but applies to basically
| everything. It is very likely your users/orders/whatever
| table in your database is also "trending to infinity" over
| time, except this doesn't mean databases should always be
| empty or that we should expect to eventually lose some
| users/orders/whatever.
|
| Or, more succinctly, if your queue capacity is 10 million
| messages and your queue messages represent "houses purchased
| through our website", then in reality your queue capacity is
| infinite because nobody will ever purchase 10 million homes
| through your application per small unit of time.
| scubbo wrote:
| Your response is correct, but there's a more fundamental
| flaw in the preceding comment: although both paragraphs are
| true in isolation, the implication between them is false.
| "The queue is _currently_ trending to infinity" does not
| imply "The queue will _continue_ to monotonically trend to
| infinity". Periodicity is commonplace in real-life
| situations; and while parts of Queueing Theory produces
| some insightful and helpful insights by ignoring it, care
| should be taken when applying them.
| mecsred wrote:
| This is so academic. Take any two samples; slope > 0 => we're
| going to be dropping messages!!! Slope < 0 our threads are
| idle!!!
| vhcr wrote:
| The growth rate of a queue is not constant, it could get
| bigger during the day, but smaller at night, since customers
| are not using the service.
| [deleted]
| derefr wrote:
| What about a queue that is the ingress point for a fan-in
| between O(N) active continuous producers to a single active
| continuous consumer?
|
| In such a case, you'd expect at least N messages in the queue
| at all times -- and often a small multiple of N. Not because
| the consumer can't consume those messages in a timely manner,
| but because, given a reliable queue, messages can't be
| consumed until they've been 2PCed into the queue, and that
| means that there are always going to be some messages sitting
| around in the "acknowledged to the producer, but not yet
| available to the consumer" state. Which still takes up
| resources of the queueing system, and so should still be
| modelled as the messages being _in the queue_.
| remram wrote:
| > If a queue isn't approaching empty, then its input is
| exceeding its output.
|
| It depends what you mean by "approaching empty". If you mean
| "empty most of the time", then no, that's wasted resources.
| If you mean "its size is decreasing most of the time", then
| no, not possible. Even if you mean "it is empty some of the
| time", then maybe, but that is not a strong requirement
| either.
|
| Of course more average input than average output is bad, but
| stating it as "should be empty" or "should be approaching
| empty" seem to suggest the wrong thing entirely. It should be
| _smaller than the target latency of the process_.
| Smaug123 wrote:
| Eh? It is possible for size to be decreasing most of the
| time. Just make traffic extremely bursty - add 60 on the
| hour every hour, and consume at one per minute.
| remram wrote:
| What does average mean there? Increasing and decreasing
| are punctual events. If you mean the 1-minute average can
| be negative most of the time, that is true, but it would
| not work for other time horizons, whether larger or
| shorter.
| kqr wrote:
| This is true only when you have relatively steady flows, or
| infinite multiprocessing. In that case, yes, the queue is in
| practise either "almost empty" or "infinite".
|
| I think the reason you get the responses you do is people are
| used to systems with high variation in flow rates that can
| only process a limited number of items at the same time, and
| for these, queues can soak up a lot of variability by growing
| long and then being worked off quickly again.
| [deleted]
| ghusbands wrote:
| > should always be full
|
| > _always_ have something to read from the queue
|
| In the first case, you've gone a bit far. To maximise
| utilisation, you want the queue to never be empty when polled.
| orf wrote:
| Indeed, and this is what I meant but worded poorly.
|
| Obviously you don't want it to be 100% full, but you do want
| it to be the kind of full you get after a nice meal but you
| still have room for desert and a coffee.
| roflyear wrote:
| Good reading here:
| https://blog.danslimmon.com/2016/08/26/the-most-important-
| th...
|
| You really don't want your queue more than 70-80% utilized.
| Once it gets to 75% you should consider that "red zone" and
| you need to fix it somehow. Once the queue is 80% you're
| really hitting into some issues. Above 80% and you will
| have significant problems.
| orf wrote:
| Thanks for this, that has helped me see something I
| missed.
|
| If you're dealing with something like SQS and are
| receiving variable sized batches of messages, the
| consumers can be classed as "100% utilised" (always
| processing a batch of messages with no empty receives)
| whilst the queue would not be considered 100% utilised
| (the average delivered batch size is always less than the
| max).
| roflyear wrote:
| What issue are you trying to solve? The expense of
| polling?
|
| Obviously there is a line where you do not want a bunch
| of expensive consumers hanging around - but this isn't
| the place IMO to micro-optimize your costs. You shouldn't
| have like 20x the consumers you need but you shouldn't
| worry about having EXACTLY the correct amount.
|
| In your example, are you implying that if more jobs come
| in more workers get spawned? Is that specific to the
| Amazon service?
| orf wrote:
| If a process is busy processing a task, it is busy.
|
| A process could receive N tasks in a batch and process
| them at the same time.
|
| In this scenario it is possible for all workers to be
| utilised whilst the queue is not "100% utilised" - which
| is a failure mode the link you shared explained:
|
| > And by induction, we can show that a system that's
| forever at 100% utilization will exceed any finite queue
| size you care to pick
|
| The inflows to the queue are less than the outflows,
| despite every worker spending 100% of its time processing
| tasks.
|
| It is specific on the utilisation type though, I'm mainly
| thinking about ML inference that has a fairly fixed
| initial cost per inference but scales quite nicely with
| >1 input.
| roflyear wrote:
| If your queue is always full (100% utilization) your jobs will
| on average never be processed.
|
| Love the downvotes! People should read this:
| https://blog.danslimmon.com/2016/08/26/the-most-important-th...
|
| I understand this is a little counterintuitive but it is true.
| You should not have a queue that always has a job ready to be
| processed.
| adrr wrote:
| Queue isn't full. It isn't empty and things are being
| processed not in "real" time. I've always run my queues where
| they are never empty. If queue depth hits certain levels,
| workers are scaled up. When it drops, they are scaled down.
| Last thing I want is under utilized instances.
| roflyear wrote:
| Unsure what you're saying. If your queue is NEVER empty
| wait times will grow to infinity. You shouldn't aim for
| this. You should aim for something reasonable like, "empty
| 50% of the time" etc..
| adrr wrote:
| They haven't been empty for two years and my waiting
| times average 5 minutes and 30 minutes during high load
| times. I don't understand how they will grow to infinity.
| [deleted]
| kilgnad wrote:
| Wrong. They should.
|
| Any queue that fills up, say halfway, is an indication the
| system is in a precarious state.
|
| Its simple. Queues have limited space if the velocity of data
| going in is faster then data going out the queue will fill up
| until it's full.
|
| A half full queue indicates the velocity is in such a state and
| such a state is unsustainable. You need to build your system
| such that it stays out of unsustainable states as much as
| possible.
|
| If you're finding that your queue fills up half way a lot that
| means you're building your system in such a way that it
| operates at the border of sustainability. In other words the
| system often ingests data at rates that cannot be sustained
| indefinitely.
|
| If a system with a queue half full operates in that same state
| twice as long, you now have a full queue and you're losing
| data.
|
| In essence your queues should mostly be empty most of the time.
| At best you can account for occasional spikes as the article
| stated but a queue even half full should never be the
| operational norm.
| zokier wrote:
| > You need to build your system such that it stays out of
| unsustainable states as much as possible.
|
| You assert that without justification. Typical counter-
| example would be system where you have predictable daily
| load-pattern, it can be perfectly reasonable to build the
| system that accumulates items during peak hours and clears
| queues during quiet hours.
| orf wrote:
| This response lacks nuance and focuses too much on the
| operational side of queues.
|
| Not all queues are "realistically bounded". Let's take a
| hypothetical silly "order queue" where each entry represents
| a EUR10 item that's been purchased. The queue feeds into a
| boxing machine that puts the item in a box and posts it to
| the user.
|
| The point at which a queue hits an upper bound would
| represent billions of euros of orders. That's not really a
| realistic failure mode to hit without noticing long
| beforehand.
|
| Meanwhile, if you always want your queue empty then your
| boxing machine is going to be shipping thousands of small
| boxes because the moment an order arrives, it's processed
| into a box.
|
| It's much more efficient to have a small, managed and time-
| bounded backlog of orders that the boxing machine uses to
| batch orders together in a smaller number of larger boxes.
| kilgnad wrote:
| Wrong. Your thinking is where the nuance is lacking. As I
| said you need to think of the system in terms of it's
| derivative. The rate of data consumed must on average be
| faster then the production. The net difference of these
| rates must equal a negative velocity with the rate of data
| going in as a positive number and data consumed as
| negative.
|
| There is no system on the face of the earth that can
| operate at an average positive velocity. That is
| categorical failure.
|
| This is mathematically and logically true no matter how big
| your queue is.
|
| Even for the situation your describing... A really big ass
| queue, the average velocity of that system must still be
| negative. Unless your queue is so big that it can withstand
| an average negative velocity over the lifetime of the
| system which is kind of absurd.
|
| Most of the time your queues should be empty. If your
| queues are observationally often filled with data but not
| full it means you're still operating at a negative velocity
| but you've tuned your system to operate at the border of
| failure.
|
| It means your system hits unsustainable velocities often.
| You may want this for cost reasons but but such a system is
| not robust enough for my comfort. The queue should be empty
| most of the time with occasional spikes at most.
| orf wrote:
| This is a very long-winded way of saying "you should have
| enough capacity to process all items in the queue before
| the queue fills up and everything explodes".
|
| I agree. But that's not the argument the post is making,
| nor does it support the position that "queues _should_ be
| empty". No, they shouldn't. It's fine to let a queue
| build up (and thus not be empty), then process all the
| items in the queue in a short time frame. There are
| efficiency savings if you are able to do this.
| kilgnad wrote:
| No. That is not what I'm saying. I'm saying on average
| queues must be empty most of the time. And if you are
| seeing queues that are filled most of the time, your
| entire system is operating on the border of failure.
|
| This is true even when you have a big ass queue. The
| system must on average be more empty then it is filled.
|
| This is not to say that your system has to never have
| spikes in data. Your system can often have spikes, but
| this cannot ever be the norm as in the amount of spikes
| in data production rate cannot push the average
| production rate to exceed the average consumption rate.
| orf wrote:
| I don't see how this holds true. Let's say I have a
| process that takes 1 hour to process a _batch_ of items
| up to 1000 (I.e doesn't scale linearly with input size).
| So, 1 hour to process 1 item and the same 1 hour to
| process 1000 items.
|
| If I have 1 message per minute being added to the queue,
| I can safely run 2 of these batch processes once a day to
| clear the queue. Or more generally can launch (queue_size
| / 1000) tasks once a day.
|
| Because surely it doesn't matter about the queue being
| empty, what matters is the _rate at which messages can be
| processed_.
|
| With 2 processes running once a day the rate would be
| 2000/24/60 = 1.3 messages per minute in aggregate.
|
| The queue can be filled 23 hours per day without issue
| and without being on the brink of a failure state.
|
| Edit: you edited your comment whilst I was replying to
| specifically mention consumption rate. The aggregate
| consumption rate is all that matters, but it doesn't
| follow that the queue "has to be more empty than not
| empty". Those are different issues.
| kilgnad wrote:
| You're talking about a different scenario. I'm talking
| about continuous inflow and outflow. Where outflow is
| processing (fixed rate) and inflow is traffic (variable).
| This is the classic assumption. In this state the space
| used in the queue is an indicator of mismatched
| consumption and production velocities.
|
| You're talking about a single batch job that runs once a
| day and empties the queue at the same rate no matter how
| much data is in the queue. Thus the more data you grab in
| batch the more efficient the processing so you grab data
| once per day instead of continuosly. This is a clear
| exception from the assumed norm that needs to be
| mentioned.
|
| That is not to say that your case is invalid. Many
| analytic databases possess your ingestion parameters. But
| the case is exceptional enough that if not explicitly
| mentioned it can be assumed that such a batch job is not
| a primitive in the model being discussed.
| orf wrote:
| I disagree it's exceptional even if my examples have been
| caricatures.
|
| Very few technical problems don't have any elasticity in
| outflows, but a lot of problems do benefit from a form of
| batching. Heck, even something as simple as inserting
| records into a non-analytic database benefits greatly
| from batching (inserting N rows rather than 1) and thus
| reducing TPS, or calling an external service, or writing
| files to some storage, or running inferences, or writing
| to a ledger, or anything.
|
| Batching, in all forms, exists absolutely everywhere from
| super-scalar CPU uops to storage to networking to the
| physical world. The _exception_ is to do something
| without batching, even if the whole stack you're using
| from the ground up is built on it.
|
| Given the common use case of batching, which inherently
| requires items to be queued for a period of time, how can
| you say that queues should always be empty? They would
| statistically rarely be empty. Which is a good thing.
| Because it enables batching.
|
| If you want to queue and insert 1000 records into a
| database with 1000 transactions across 1000 workers, then
| say your workload is highly optimised because there are
| never any records in your queue, cool. Not very true
| though.
| kilgnad wrote:
| >Very few technical problems don't have any elasticity in
| outflows,
|
| They have elasticity but you don't care for it. You run
| it at the maximum. As you mentioned the only case where
| you wouldn't is if you have some transaction that doesn't
| scale linearly with with frames/events.
|
| >Given the common use case of batching, which inherently
| requires items to be queued for a period of time, how can
| you say that queues should always be empty? They would
| statistically rarely be empty. Which is a good thing.
| Because it enables batching.
|
| In my experience, while batching is not uncommon,
| streaming is the much more common use case. So we can
| agree to disagree here.
| ndriscoll wrote:
| It's not really that exceptional. You can do similar
| things with e.g. batch flushes every 5 ms for synchronous
| web requests and the same ideas will apply. This can give
| a massive increase in throughput while possibly
| _decreasing_ overall response time for OLTP.
| kilgnad wrote:
| It's common, but exceptional enough that it should be
| mentioned because it is a specialized database that does
| batch ingestion.
|
| Let's say you flew from one city to another. People
| assume you took a plane. If you took a helicopter, though
| helicopters are common, it should be mentioned otherwise
| it won't be assumed.
| ndriscoll wrote:
| It doesn't take a special database though. You can do
| this sort of thing in your application in like 10-20
| lines of code (assuming you have a framework with queues
| and promises). It works great with mysql/postgresql which
| both benefit dramatically from batching queries.
| kilgnad wrote:
| The context here is external queues like kafka. If you're
| writing stuff to a queue in your web app then it must be
| a relatively small load. This is different.
|
| For analytics databases the batch ingestion requirement
| are usually so big that you have to save it to the file
| system hence the need for an external queue.
|
| Heap allocated queues are different, at that stage I
| would say that it's in "processing" already and popped
| out of the external queue.
| ndriscoll wrote:
| I guess "relatively small load" is... relative, but I've
| written that kind of thing to be able to handle ~70k
| sustained OLTP requests per second (including persistence
| to postgres) when load testing locally on my laptop. In
| any case the same thing applies to external queues. Your
| workers will often be more efficient if you pull off
| chunks of work to process together.
| kilgnad wrote:
| By small load I mean size. The frames in your batch jobs
| are tiny and few in number if you're getting 70k batch
| jobs per second.
|
| It's more similar to streaming... what you're doing here.
| In that case my velocity measurements are more
| applicable. You want your queues to be empty in general.
| A batch job is run every hour or something like that
| which is not what you're doing here.
|
| If you ran your load test for 5 minutes and you see your
| queues are 50 percent full. Well that means 10 minutes in
| you'll hit OOM. Assuming your load tests are at a
| constant rate.
| Xylakant wrote:
| Queues can be full most of the time without the system
| being in a precarious state. Take for example an
| extremely spiky pattern of processing: once a day, a big
| batch of jobs is delivered. You know ahead of time how
| many items there are and have a pretty good handle on how
| long it takes to process one item. The items should be
| processed before the next batch arrives. You can then
| tune your system to empty the queue just before the next
| batch arrives. The queue is non-empty the majority of
| time, yet the system is perfectly stable.
|
| Now, that is an extreme example, but similar behavior
| exists. It's fine to have queues partially filled most of
| the time, as long as your average rate of processing
| exceeds the rate of incoming items by some margin.
| kilgnad wrote:
| AS I said in the other post I am talking about the
| average and most likely use case. a flat and continuous
| rate of data processing and varied rates of data
| production.
|
| Spiking the processing is not what people think about
| with queues and is most likely NOT what the author of the
| article is talking about.
|
| Typically there's no reason why you would spike your
| processing unless such processing doesn't scale linearly
| with the amount of items processed (as the author in a
| sibling post to yours mentioned). Such a case must be
| deliberately introduced into the discussion as an
| exception to the model because it's not what people think
| about when discussing this. Or maybe it is given that 3
| people brought the same exact exception up. Welp if
| that's the case then I can only say the author of this
| article and I certainly aren't referring to this use
| case.
|
| edit: I'm rate limited. So I'm referencing another post
| in my response that you won't see until the rate limit
| dies and I post it.
| zokier wrote:
| > You may want this for cost reasons but but such a
| system is not robust enough for my comfort
|
| Your comfort is secondary to the business needs that the
| system is designed to fulfill. In practice some risks are
| better to accept than mitigate
| kilgnad wrote:
| My comfort is a measure of business aversion to risk. It
| depends on the business of our respective industries.
|
| Either way you have admitted that you tuned the system to
| operate with an acceptance of risk. Likely for cost
| reasons. This is fine, if you deliberately chose to do
| this.
| zokier wrote:
| > This is fine, if you deliberately chose to do this.
|
| Yet you bravely asserted that
|
| > At best you can account for occasional spikes as the
| article stated but a queue even half full should never be
| the operational norm.
|
| Suddenly this thing that you said "should never be"
| changed to "is fine"
| kilgnad wrote:
| When humans communicate. We don't talk about exceptions
| where you deliberately choose to operate outside of the
| norm. If you deliberately choose to drive a car without a
| seat belt knowing this risk. This is fine.
|
| You want to sky dive while deliberately choosing not to
| use a parachute? That's fine. But don't expect me to
| adjust my communication to account for your exception.
| You chose to do this.
|
| But let's not pretend that you don't know about this fact
| in human communication. It's an obvious thing you know.
| Yet you chose to characterize your post in a way that
| negatively makes my statement look unreasonable and your
| actions look normal. Please don't do this. If you choose
| to sky dive without a parachute, that's fine, but admit
| you are operating with risk for the sake of cost. Do not
| try mischaracterize my intentions, that is just rude.
| dooglius wrote:
| You are making a somewhat subtle error here, which is
| assuming that the (output_rate-input_rate) is statistically
| independent of the size of the queue. Note that this does not
| necessarily imply that the output process changes based on
| the size of the queue. For example, consider a microphone
| that emits 100 raw samples to a buffer per ms and a real-time
| thread scheduled every 2ms that processes very quickly. a
| 300-sample buffer is more than enough here, even though the
| buffer will be over half full about a quarter of the time,
| and will never be empty. In this case, the velocity
| conditional on size being less than half is -100 samples/ms
| and the velocity conditional on size being greater than half
| is 400 samples/ms.
| kilgnad wrote:
| This example suffers from the same problem as the other
| counter examples. Why are you deliberately slowing down the
| processing to only happen at 2ms intervals? If you have
| continuous processing the queue will be empty always
| assuming processing is faster then data production.
|
| You're maybe referring to what the other person
| mentioned... processing that does not scale with the amount
| of data processed so you want the processing to operate on
| the biggest batch of data possible. This is, while a common
| use case, not common enough that it should be assumed
| without be mentioned.
|
| However, in your example it seems to be happening on the
| same computer (you mentioned mic), (and thus not web, which
| is what's the assumed context here) therefore data
| production rate doesn't seem to spike. If your processing
| is faster then data production you don't even need a queue,
| just stream each frame straight to processing in a single
| thread. But this also assumes the nature of your
| processing. The details for the nature of the processing
| can change things to require a queue for something like the
| averages of frames at discreet intervals... but that's
| different from the use case of queues like kafka or zmq for
| which I think is the main context here.
|
| Edit: I reference other responses here, but you won't see
| them immediately. I'm currently rate limited and I have
| those replies in a queue so those will be posted later
| today.
| marcosdumay wrote:
| Your comment only shows that if you value resource utilization,
| you shouldn't use queues.
|
| Because, no, nothing you said changes the fact that queues only
| do anything when kept mostly empty. And the only thing they do
| is some short term trade of latency for reliability. They
| actually can't change utilization.
| orf wrote:
| I'm currently waiting for a flight. Isn't airport is a kind
| of queue that buffers people over time then delivers them to
| the sky at steady, fixed intervals?
|
| This airport seems busy but the planes seem full, how can we
| improve utilisation?
| Jtsummers wrote:
| What does it mean to improve utilization if the aircraft
| are at 100% utilization?
| orf wrote:
| Bigger aircraft I guess. But the point is that we are at
| full utilisation despite the "queue" (airport) never
| being empty, and thousands of people awaiting "delivery".
|
| Which refutes the original point about queues not doing
| much unless they are empty and being unable to effect
| utilisation. If you have a somewhat inelastic output that
| benefits from batching, but has variable inputs, they are
| absolutely crucial to utilisation.
| Jtsummers wrote:
| I'm being nitpicky but my intended prompt was to get you
| to define what needs to be improved, which you did. You
| need to improve (or increase) capacity, not utilization.
| The two are related, but they aren't the same.
| Utilization being at 100% means it only has two ways to
| go: Stay at 100% or drop below 100%. Which one
| constitutes an improvement? It depends, but for the
| airline utilization at 100% is already what they want (by
| the numbers, should be their maximum profit) so it can't
| be improved. Only capacity can be improved.
| smetj wrote:
| I came here to say the same and saw your comment. Totally
| agree.
| matchagaucho wrote:
| _as you inevitably have idle workers waiting for new messages._
|
| This is no longer the case using the latest in serverless IaaS
| architecture, where the pool of queue workers dynamically
| expand to meet demand.
| orf wrote:
| This has always been a thing, far far before the "latest in
| IaaS architecture". The idle polling still exists except they
| manage it for you. Their margin is then improved if they
| intelligently reduce wasted polls through a variety of
| techniques.
|
| However, yes, you don't pay for this or care about it much.
| It's good.
| atdixon wrote:
| You're assuming the consumer is the one producing batches.
| anonymoushn wrote:
| Seems like there's some implicit assumption in this comment
| that everything is rented by the minute and latency never
| matters. Renting things by the minute isn't really the cheapest
| way to buy lots of things though. Also, its sometimes nice to
| respond to the content of the article instead of its title.
| orf wrote:
| I think the implicit assumption in the comment is that it
| would be possible for readers to understand the underlying
| idea of the comment rather than only see the specific
| example. "resource cost per minute" is not the only thing
| helped by a non-empty queue.
|
| Also please don't accuse someone of not "responding to the
| content of the article" when the content is a bog standard
| list. If you want my response to that: "yes, that is indeed a
| list of 4 things queues help with". Not sure it adds much to
| the discussion.
| anonymoushn wrote:
| When will my users be happier because I took longer to do
| things on their behalf? I apologize for not understanding
| the unwritten portion of the comment.
|
| I guess I should really ask, why do people care about
| utilization instead of throughput, latency, and cost? We
| could increase utilization by rewriting everything to be
| more wasteful, for example, but it doesn't seem like we
| should do it.
| zmgsabst wrote:
| Utilization is choosing throughout and cost over latency.
|
| People care about utilization when, eg, processing
| petabytes of data -- something that takes hours to days,
| where you're optimizing batch throughout and cost. By
| making sure you efficiently utilize your hardware.
| anonymoushn wrote:
| > Utilization is choosing throughout and cost over
| latency.
|
| For a given rate of incoming work, increasing throughput
| decreases utilization. So I guess it's not choosing
| throughput too much.
|
| > People care about utilization when, eg, processing
| petabytes of data -- something that takes hours to days,
| where you're optimizing batch throughout and cost. By
| making sure you efficiently utilize your hardware.
|
| If you have fixed resources and lots of work, it would be
| good to have nonempty queues to prevent them from idling.
| That is maybe the big unwritten part of the comment
| upthread. It doesn't seem worth saying "no, the article
| is wrong in all cases because of this particular case!"
| though, so I'm not sure.
| remram wrote:
| If you define throughput as "processed items over time
| per processor", the tradeoff becomes clear. Given the
| rate of incoming requests, you can get more processors,
| which will increase cost and reduce latency and
| throughput-per-processor, or you can get fewer
| processors, which will decrease cost and increase latency
| and throughput-per-processor.
|
| If you have fixed resources and fixed work, wanting your
| queues to be empty is a wish, not a strategy.
| Kinrany wrote:
| Isn't a full queue basically equivalent to no queue? The same
| number of requests is being dropped.
| WJW wrote:
| Not all queues serve (soft) real time requests and not all
| queues are bounded (1). If the rate of incoming queue items
| is predictable but not constant (for example, average load in
| the weekend or at night tends to be lower for many systems)
| you can often get away with having enough capacity to serve
| the average load while using the queue as a buffer. This is
| fine if the work done is not time critical, like sending most
| types of marketing emails.
|
| You might argue that a queue with many items in it which can
| still accept more items is not really "full", but it is
| clearly not empty either.
|
| (1) Obviously all queues are bounded by the capacity of the
| hardware, but since most queueing software can persist to
| disk and/or lives in the cloud with elastic storage this is
| not usually a problem.
| remram wrote:
| The difference is that the requests that are being served are
| getting extreme latency. You are serving the same number of
| requests, but every single one is X days old.
| yxhuvud wrote:
| No, there are use cases where new entries are not ingested by
| requests - sometimes you know all work that needs to be done
| upfront. And sometimes you have a situation where you have a
| few requests coming in over time but that you at the same
| time need to sometimes go through everything you have seen
| earlier, once more and flush it through the queue.
| gillh wrote:
| HN: our team has been working on an advanced API queue/load
| management system called Aperture[0]. It has following
| capabilities -
|
| 1. Limits concurrency based on latency gradient or queue depths
| etc.
|
| 2. Weighted fair queueing scheduler - prioritized load shedding
| based on workload priorities and latency estimation.
|
| Anyone interested in Little's Law, token buckets, network
| schedulers, control theory (PID controllers) should definitely
| check it out! [1]
|
| [0]: https://github.com/fluxninja/aperture
|
| [1]: https://docs.fluxninja.com/
| bob1029 wrote:
| I have no strong opinion on this as long as you are not using
| queues for purposes of inter-thread communication.
|
| If you _are_ using queues for this purpose, then I recommend
| reading the following:
|
| https://lmax-exchange.github.io/disruptor/disruptor.html#_th...
|
| > When in use, queues are typically always close to full or close
| to empty due to the differences in pace between consumers and
| producers. They very rarely operate in a balanced middle ground
| where the rate of production and consumption is evenly matched.
|
| This has always been my experience. Has anyone ever had a perfect
| goldilocks scenario between 2 threads?
| wongarsu wrote:
| One good use-case for still using queues is if your producer or
| consumer has spiky characteristics on small timescales, but
| predicatable perfomance if you zoom out enough. In that case
| your queue size spikes to some high value, but always returns
| to 0.
|
| The reasons for this can be numerous, maybe you produce a lot
| of jobs at once because of a user interaction, or your consumer
| is running slower for five minutes because some cron job is
| stealing your resources.
| taeric wrote:
| Alternatively stated, queues are safest in a steady state. If
| they are growing in size, that is clearly bad if not bounded. If
| they are shrinking in size, that means you are going to be idling
| soon. I think it is fair that if you are picking between those
| states, you'd prefer the idling.
|
| That said, consider a physical machine. The "hopper" on many is
| essentially a queue to feed in items to work on. If you let that
| go empty, you messed up. Hard.
| mtlmtlmtlmtl wrote:
| If a datastructure is usually empty, you probably don't need that
| datastructure.
| [deleted]
___________________________________________________________________
(page generated 2023-02-19 23:01 UTC)