hngopher.com

       [HN Gopher] Queues should be empty
       ___________________________________________________________________
        
       Queues should be empty
        
       Author : redman25
       Score  : 103 points
       Date   : 2023-02-18 04:42 UTC (1 days ago)
        
 (HTM) web link (joshvoigts.com)
 (TXT) w3m dump (joshvoigts.com)
        
       | 8note wrote:
       | An always empty queue means you could run into problems when
       | traffic increases.
       | 
       | I've seen this come up recently, where our queue reader was(well,
       | still is) over scaled, where it could handle 100x the normal
       | queue throughout. When the input rate gets higher though, instead
       | of the queue filling a bit and controlling the traffic, we hammer
       | our dependencies.
       | 
       | I now see a continuously empty queue as a smell. It should be
       | filling and emptying over and over again. Not all the time, but a
       | queue that never fills is a ticking time bomb
        
         | advisedwang wrote:
         | What you're describing (empty queue = unprepared for load) is a
         | code smell, but the solution doesn't have to be "fill the queue
         | sometimes". It could also be:
         | 
         | * (Mandatory) Have a back-pressure system.
         | 
         | * (Optional, depends on system requirements) have auto-scaling
         | of queue readers.
         | 
         | * (Mandatory) Make sure the system is regularly load tested.
        
       | philliphaydon wrote:
       | I think the title is meant to be:
       | 
       | "Error queues should be empty"
        
       | nkozyra wrote:
       | The _goal_ of a queue is to be empty, the _purpose_ of a queue is
       | to occasionally not be.
        
         | robotresearcher wrote:
         | Unless the goal of your queue is to avoid starving consumers.
         | Then it should always be non-empty.
        
       | ysavir wrote:
       | The author has a good sentiment, but wrong expression. It's not
       | that "queues should be empty", it's that "you should be emptying
       | your queues". Whether they reach an empty state is irrelevant,
       | it's just a sisyphian task that should always be working
       | _towards_ empty, and ideally, never actually reaching empty.
       | 
       | A good comparison is a restaurant's order queue.
       | 
       | If the order queue is always empty, that means no one is ordering
       | food at the restaurant. That is an undesirable state. We _want_
       | people to be ordering food, an empty queue shows that we have a
       | problem in getting people to eat at our restaurant.
       | 
       | We don't want our queue to be always growing. If it's always
       | growing, that means we're understaffed, and people waiting for
       | their meals will get frustrated. Some will wait and leave a bad
       | review, others might leave before they ever get their meal--and
       | won't pay for it, either. Lost time and effort on our part.
       | 
       | But an order queue that almost always has two meals in the making
       | is golden. It means we're getting consistent clientele to dine at
       | our restaurant, that meals are being cooked and served in good
       | time, diners are happy with the experience, and our system
       | operations are healthy.
       | 
       | I think the author of TFA understands this, but stopped a little
       | short of the optimal expression of their ideas. It's not about
       | what you have, it's about what you should always be working
       | towards.
        
         | colechristensen wrote:
         | You've got it all wrong. If you're running a restaurant and
         | orders never back up then you have too much staff, too big a
         | kitchen, and are losing money by having too much capacity. You
         | want to have a lunch rush, for some people to leave because
         | you're too busy. When the queue is backed up is when you're the
         | most profitable both per hour and per unit sold. Then you want
         | to find efficiencies to improve throughout at the times you're
         | backed up.
         | 
         | You need to intentionally design your bottlenecks not pretend
         | that you can avoid them completely.
        
       | jojobas wrote:
       | Is this an attempt at reinventing queue theory?
       | 
       |  _In the US it was called queue theory, in the USSR it was known
       | as mass service theory.
       | 
       | The thing is, there were no queues in the US and no mass service
       | in the USSR._
        
       | tweaktastic wrote:
       | I agree with author on the point that queues are best to handle
       | spikes. But when you are talking about scale dealing with
       | millions of messages in short span of time for extended period,
       | queues can never remain empty. Empty queues in this situation
       | would mean that your concurrency is high. When your concurrency
       | is high, this would result in too many concurrent connections to
       | your database which wouldend upcreating bottlenecks for your
       | system.
       | 
       | The system should be designed to handle as many messages as
       | possible ensuring that your resources don't get over burdened.
       | Empty queues where your messages keep getting rejected because
       | your resources are being hammered is not necessarily an efficient
       | system.
        
         | roflyear wrote:
         | > When your concurrency is high, this would result in too many
         | concurrent connections to your database which wouldend
         | upcreating bottlenecks for your system.
         | 
         | Can you expand on this?
        
           | dmoy wrote:
           | If you have a polling queue, as the number of workers grows
           | too large, the overhead of maintaining connections from
           | (worker) -> (stuff in the middle) -> (db or whatever keeps
           | queue state) becomes too high. In the extreme case, you can
           | get into a point of diminishing returns, where adding more
           | workers hurts both throughput and latency in a bad way.
           | 
           | That is - initially you can decrease latency by sacrificing
           | utilization and just adding more workers that usually sit
           | idle. This increases throughput during heavy load, and
           | decreases latency. Until you can't, because the overhead from
           | all the extra workers causes a slowdown up the stack (either
           | the controller that's handling messages, or the db itself
           | that's got the queue state, or whatever is up there). Then
           | when you throw more overhead, you increase latency and
           | decrease throughput, because of the extra time wasted on
           | overhead.
           | 
           | It depends on a lot of different variables.
           | 
           | If you have work items that vary significantly in cost, it
           | can add even more problems (i.e. if your queue items can vary
           | 3+ orders of magnitude in processing cost).
        
             | roflyear wrote:
             | Well yes but you can solve that in a lot of ways.
        
       | highspeedbus wrote:
       | >Decoupling components
       | 
       | >This way, the producer doesn't need to know anything about the
       | consumer and vice versa.
       | 
       | Both components needs to agree on shared data, down to what each
       | field actually means. The producer is just not saying What To Do
       | with the message.
       | 
       | Take an example of processing a purchase order in multiple,
       | asynchronous steps. While the webpage button spins there's a
       | whole dance of agreed messages and expected outcomes that needs
       | to happen in order for it to work. And there is no decoupling in
       | sight.
       | 
       | My take is that Decoupling is a vague word. One should design
       | distributed systems assuming that coupling is manageable, not
       | inexistent.
       | 
       | >you could submit the task and ACK immediately allowing the job
       | to finish in the background.
       | 
       | That's bad practice.
        
       | ruduhudi wrote:
       | Throttling stuff is another valid use case. I've used a pub/sub
       | for sending e-mail for a newsletter because we couldn't send too
       | many of them at once so the consumer would run in a fixed
       | interval and always pick up a fixed number of messages.
        
       | smadge wrote:
       | The most important reason for queues is the variability of job
       | inter-arrival times and service times. If there was no
       | variability there would be no need for queuing.
        
       | c3534l wrote:
       | This is a somewhat uninsightful observation that only has people
       | responding to it because of pedantic arguing. I don't think
       | anyone even disagrees with the author, they're just arguing about
       | words.
        
       | ww520 wrote:
       | Queues are great as a buffer to even out compute utilization.
       | With an empty queue, most workers can be shut down after a while,
       | saving on compute cost. When the queue grows and stays non-empty,
       | more workers can be started up. The length of the queue can be
       | used to estimate the number of workers to spin up, as a metric
       | for real time capacity planning.
        
       | Akronymus wrote:
       | Queues also enable "easy" batching of jobs. (For a very limited
       | set of problems)
        
       | edejong wrote:
       | Queues should have an ideal size based on your latency
       | requirements.
       | 
       | Say, you have 100 requests coming in per second. Each consumer of
       | your queue requires 0.3 seconds per request. Now, you can
       | optimise for the number of consumers. Would you choose 40
       | consumers, then the probability of serving the request with an
       | empty queue in 0.3 seconds would be 17% (and > 0.3 seconds would
       | be 84%). However, the total waiting time per request would be 0.1
       | second.
       | 
       | With 80 consumers, the probability of an empty queue would be
       | higher at 58%. However, most of those consumers are doing
       | nothing, while your waiting time is only reduced by 0.1 second!
       | Only a 30% improvement.
       | 
       | (edit: probability of empty queue in second example is higher,
       | not lower)
        
         | anonymoushn wrote:
         | Can you state the rest of your assumptions? What is the
         | distribution of the request arrival times?
        
           | edejong wrote:
           | Poisson. And 1.0 variance in request processing time.
        
         | ThrustVectoring wrote:
         | I'm not awake/invested enough to make up numbers and do the
         | math here, but shouldn't the cost for provisioning workers and
         | value provided at various latencies matter? Like, each worker
         | you provision costs a certain amount and shifts the latency
         | distribution towards zero by a certain amount, and in principle
         | there should be _some_ function that converts each distribution
         | to a dollar value.
         | 
         | Like, suppose you had a specific service level agreement that
         | requires X% of requests to be served within Y seconds over some
         | time interval, and breaking the SLA costs Z dollars. Each
         | provisioning level would generate some distribution of
         | latencies, which can get turned into likelihood of meeting the
         | SLA, and from there you can put a dollar value on it. And
         | crucially, this allows the "ideal" amount of provisioning to
         | vary based off the relative cost of over and under-
         | provisioning; if workers are cheap and breaking the SLA is
         | costly, you would have more workers than if the SLA is
         | relatively unimportant and workers are expensive.
        
           | edejong wrote:
           | Yes, that's an interesting topic, especially given the
           | prevalence of distributed compute nowadays and the rising
           | awareness of cloud costs.
           | 
           | In the end, the distribution is not really poisson, of
           | course. So, you might be interested in low pass filtering to
           | elastically scale your provisioned workers. There is quite
           | some theory about this, including sophisticated machine
           | learned models to predict future load. But I digress.
        
         | [deleted]
        
       | unxdfa wrote:
       | I think this could be better described as "queues should be
       | heading in the direction of empty"
        
         | remram wrote:
         | If you look at a queue at any random time, it will have
         | necessarily increased more than it has decreased over its
         | lifetime, otherwise it would hold negative items.
        
           | unxdfa wrote:
           | That's why you look at the average age of the items in the
           | queue.
        
             | remram wrote:
             | That's a good measure (average latency) but that's very
             | different from your earlier statement of "heading in the
             | direction of empty".
        
               | unxdfa wrote:
               | Well not really. If the average age is increasing it's
               | not heading in the direction of empty.
        
               | remram wrote:
               | You still haven't defined what "heading in the direction
               | of empty" means. Like I said, your queue can't be
               | decreasing in size over its history, and it cannot be
               | decreasing in size most of the time either, if it started
               | empty.
        
       | kbrannigan wrote:
       | I just realized how useful queues can be.
       | 
       | Instead of letting the system fail under heavy load.
       | 
       | You put the tasks in a holding place.
       | 
       | They will be processed later.
       | 
       | Computers have spoiled us with instant and now.... But even a 1
       | hour processing time is amazing if that task would take 3 days
       | manually.
        
         | AstixAndBelix wrote:
         | I had a course about queue theory in my SWE degree. I left the
         | course with a deep sense of dread because I knew most people
         | studying CS and SWE in other universities would be oblivious to
         | such a fundamental piece of knowledge.
         | 
         | Basically everything that processes data has a queue attached
         | to it, from your router to your scheduler. Not knowing anything
         | about queue theory is such an insult to the profession of SWE.
         | It's like being a Mech engineer and not knowing about control
         | of dynamic systems.
        
           | OJFord wrote:
           | Why did you think you knew that? I'm also one of your
           | supposed few to have studied it, I never thought that.
           | 
           | The dread should have been that most companies/teams won't
           | care, won't engineer on that level, will just set it up
           | however and make it bigger/faster/more-consumed if it's too
           | slow.
        
             | setr wrote:
             | In my uni, the class was more or less unadvertised and the
             | professor largely ignored -- you had to go out of your way
             | to take it (I was actively looking for poorly rated
             | professors, because I figured they were usually rated such
             | because they graded hard... which probably meant they cared
             | more). When I tried to google the subject/code examples
             | afterwards, I largely only found textbooks and one-off blog
             | posts.
             | 
             | Also a lot of GitHub repos with names like CS550...
             | 
             | Only place I've seen it referenced since that class was:
             | 
             | 1. HN periodically
             | 
             | 2. A YouTube documentary about Disneyland fast pass queues
             | 
             | 3. A business optimization consultancy
             | 
             | Ultimately, a pretty niche subject area AFAICT. But
             | implementing a fairly efficient queuing simulation seems to
             | be pretty easy, and the subject seems to be stupid powerful
             | and broadly applicable... so I've never understood why it's
             | not more discussed
        
               | OJFord wrote:
               | Compared to what, as if say information theory comes up
               | (explicitly) all the time? Seems like you could say that
               | about any more theoretically oriented course.
        
         | roflyear wrote:
         | Except most times people don't implement the surrounding infra
         | correctly.
         | 
         | Let's say you consume a job, do something, but that something
         | fails. You have to set it to be rescheduled somewhere. Lets say
         | your things writing jobs to the queue fails. That needs to be
         | handled as well.
         | 
         | "They will be processed later [if I use a queue]" is not true
         | by default (without work and thought) and assuming so will get
         | you in trouble, and will also make certain people on your team
         | who do understand that isn't the case upset.
        
         | yurishimo wrote:
         | There are entire languages built upon this model! Anything
         | using the BEAM is a great example. Erlang, Elixir, Gleam, etc.
         | 
         | The runtime acts as an event loop and if a specific program is
         | hogging too much time on the CPU, it will get paused and thrown
         | to the back of the execution stack. This is all transparent
         | (relatively) for the developer.
         | 
         | Here's a video if you want to see an example:
         | https://youtu.be/JvBT4XBdoUE
        
       | denton-scratch wrote:
       | The ideal queue length is one. That will minimize latency, and no
       | poll will result in a nothingburger.
        
         | roflyear wrote:
         | But this is not true. "Always having something to process" is
         | not the ideal length. It will cause major problems, though it
         | is somewhat counterintuitive. See:
         | https://blog.danslimmon.com/2016/08/26/the-most-important-th...
        
           | denton-scratch wrote:
           | I didn't say the ideal was "always having something to
           | process"; I said the ideal was always having exactly one
           | thing to process.
           | 
           | Come on, it's an ideal; it doesn't have to be realistic.
           | 
           | [Edit] Just read the Slimmons article. His argument is
           | specifically about a queue served by a server that is under
           | stress. His assertion is:
           | 
           | "As you approach maximum throughput, average queue size - and
           | therefore average wait time - approaches infinity."
           | 
           | He's quite right (and yes, it's a bit counter-intuitive). But
           | it's orthogonal to the matter of the ideal queue length.
           | 
           | Thanks for the link.
        
             | roflyear wrote:
             | Right but in reality you can't actually hit "always having
             | exactly one thing to process" without maxing out your
             | queue. It's actually impossible! Because they are actually
             | the same thing. "Always having something to process" means
             | no idle time - which means your queue is forever long :)
             | Aiming for that ideal is not a good idea because it's
             | misguided (for lack of a better term).
        
       | MaxGabriel wrote:
       | Another benefit, queues can enforce ordering, and limit
       | concurrency.
       | 
       | This can be helpful for avoiding resource contention, or hitting
       | an API limit.
        
         | remram wrote:
         | Queue can also allow batching. It might be that you need to
         | consume or pack or ship N items at a time, otherwise you're
         | losing money. Trying to aim for minimal latency would literally
         | hurt your business.
         | 
         | This article obviously has a very specific kind of
         | workload/service/target in mind that they fail to define and
         | they overgeneralize. This is bad advice in many situations that
         | they failed to imagine.
        
       | edejong wrote:
       | Queues should absolutely _not_ be empty. If they are often empty,
       | you 're over provisioning your consumers. There is an ideal size
       | to a queue, based on the balance equation
       | (https://en.wikipedia.org/wiki/Balance_equation). Also study
       | queueing theory if you want to know more:
       | https://en.wikipedia.org/wiki/Queueing_theory.
        
         | highspeedbus wrote:
         | This only works in _very_ predictable workloads. Once your
         | average input rate becomes slightly larger than consumption
         | rate your system is screwed. Over provision ensures spikes can
         | be processed along with average rate in a general increased
         | load state for some time.
        
           | edejong wrote:
           | True, my example is an oversimplification. In reality you
           | want to account for rapidly and slowly changing demands. This
           | can be solved with elasticity of systems or by
           | overprovisioning. You can get deep, with time series
           | analysis, low pass filtering and various metrics, such as
           | quantile distributions... however, saying zero queue length
           | is desirable is a gross oversimplification.
        
           | colanderman wrote:
           | Or use load shedding, in which case you want your queue
           | nonempty to take advantage of negative spikes.
        
         | clnq wrote:
         | They should be empty if you wish to minimize latency. For
         | example, when emergency room queues are full in hospitals, that
         | is a very undesirable scenario. The same can be applied in
         | software - there are parts of systems that should only use
         | queues to handle spike workloads. And the longer these queues
         | are on average, the worse the overall performance of the system
         | is. Just like patient health outcomes drop with longer ER
         | queues.
         | 
         | And really, we need more context to say whether a queue should
         | be empty, trend towards being empty, or towards always having
         | tasks. I don't think it's possible to make a general
         | prescription for all queues.
        
           | remram wrote:
           | They should be empty to reach absolute minimal latency (e.g.
           | the only latency is the time it takes to process the task,
           | which you could call 0 latency). Different services have
           | different latency targets.
           | 
           | The right statement is not "they should be empty" but "they
           | should be under the target latency", which might be 0 (then
           | "the queues should be empty"), or might be some other target
           | (then empty means overprovisioned resources).
           | 
           | Everyone in this thread seem to be talking over each other
           | because they imagine vastly different services (e.g. shipping
           | company vs hospital emergency rooms).
        
           | edejong wrote:
           | What are spike workloads? Is this poisson? What's the
           | distribution?
        
             | clnq wrote:
             | Distributions are useful for optimizing. But the decision
             | whether a queue should generally be empty or have elements
             | comes out of system requirements.
             | 
             | Then you can employ statistics to plan what resources you
             | will allocate to processing the queue.
             | 
             | Though having capacity for Poisson event spikes isn't
             | nearly the only way to ensure queues trend to zero
             | elements. For example, you can also scale the workload for
             | each queue element depending on the length of the queue.
             | 
             | It's very popular in video game graphics to allocate a
             | given amount of frame time for ray tracing or anti
             | aliasing, and temporally accumulate the results. So we may
             | process a frame in a real time rendering pipeline for more
             | or less time, depending on the state of the pipeline. Say,
             | if the GPU is still drawing the previous frame, we can
             | accumulate some ray traced lighting info on the CPU for the
             | next frame for longer. If the CPU can't spit out frames
             | fast enough, we might accumulate ray traced lighting
             | kernels/points for less time each frame and scale their
             | size/impact.
             | 
             | Statistical distributions of frame timings don't figure
             | here much as workloads are unpredictable and may spike at
             | any time - you have to adjust/scale rendering quality on
             | the fly to maintain stable frame rates. This example
             | assumes that we're preparing one frame on the CPU while
             | drawing the previous one on the GPU. But there may also be
             | other threads that handle frames in parallel.
             | 
             | Other workload spikes could be predicted by things like
             | Poisson distributions. But once again, the key takeaway is
             | that you can't generalise this stuff in any practical
             | application. It all depends on system requirements and
             | context.
        
         | colanderman wrote:
         | Additionally, they _must not_ be empty to maximize throughput
         | of consumers which benefit from batching. If there 's only ever
         | one item to grab, consumers cannot amortize fetches across a
         | batch, which impacts throughput, _without any inherent benefit
         | to latency_.
         | 
         | (Imagine sending a TCP packet for every individual byte that
         | was enqueued on a socket!)
         | 
         | Same applies if you have any sort of load shedding on the
         | consumer side. If the queue is empty, it means you've shed load
         | you didn't need to; when the next brief drop in load comes, you
         | can't take advantage of it.
         | 
         | (Fun analogy: this is why Honda hybrids typically don't charge
         | their battery all the way while driving, to allow the chance to
         | do so for free while braking.)
        
       | plandis wrote:
       | > If the queue is not empty, it means that whatever job is
       | consuming from the queue is not keeping up with messages coming
       | into it.
       | 
       | You could conceivably have a non-zero queue size that
       | asymptotically reaches some stable size over time. In such a case
       | you are keeping up with messages but with some approximately
       | static amount of added latency.
       | 
       | I don't think I've seen this happen in practice but I think some
       | systems can be engineered to be approximately stable (grows and
       | shrinks but within predictable bounds)
        
         | remram wrote:
         | This is very common when you can adjust the consume rate. This
         | happens in practice: serverless platforms scale, Kubernetes has
         | HorizontalPodAutoscaler, you can manually grab more VMs, or IRL
         | this might mean calling in more staff or at larger time scales
         | hiring more people in your organization.
        
       | fwlr wrote:
       | Spiky workloads, safely persisting data, decoupling components,
       | and enabling asynchronous processing are all things that can be
       | provided by a stack or a heap as well. The call stack in any
       | given programming language is a good example of a stack for
       | handling spiky workloads, for instance. A min-heap is more
       | flexible than a queue for handling async/decoupled/pubsub stuff
       | (and arguably more correct for many use cases).
       | 
       | The thing that queues are good for is enforcing order: when I get
       | to the front of the queue and it's time to do my thing, I have a
       | guarantee that anyone who entered the queue before me has already
       | done their thing, and anyone who entered the queue after me has
       | not yet had a chance to do their thing. This is a useful
       | guarantee for tasks that have an order dependence, for example
       | database operations looking to guarantee consistency.
        
       | dylan604 wrote:
       | Should this title be given some sort of limiter/qualifier as in
       | "Messaging queues should be empty"? I come from the video
       | production world where you have things like render queues. These
       | are designed to utilize a limited number (due to sheer expense)
       | of hardware instances where if they finished the jobs so fast
       | that the queue was always empty would indicate you have too many
       | very expensive machines. A queue that is never empty, but is just
       | finishing as new big pushes to the queue happen would be the
       | ideal balance.
        
       | wielebny wrote:
       | Also, water should be wet and SQL tables need to have primary
       | keys.
        
       | kilgnad wrote:
       | He's right. You guys need to see the system in terms of it's
       | derivative. The rate of speed in which data is produced and
       | consumed.
       | 
       | This article is saying you basically want an average negative
       | velocity where consumers eat data faster then producers on
       | average.
       | 
       | Any queue that is non empty means the velocity is positive.
       | Producers are making data faster then consumers and you have a
       | limited time window before the queue becomes full and your system
       | starts dropping data.
       | 
       | If you observe that your system never has a full queue but is
       | constantly loaded with partially filled queues it means you built
       | and tuned your system such that it operates on the border of
       | failure.
       | 
       | It means your system is continuously hitting unsustainable
       | velocities all the time, but stops short of total failure (a full
       | queue).
       | 
       | Remember, think about the system in terms of it's derivative. The
       | average derivative must be negative for your system to even
       | function. Any time a queue is filled partially or full it means
       | there was a period of positive velocity. These periods cannot
       | exceed the amount of time the system is in negative velocity.
        
       | HarHarVeryFunny wrote:
       | It depends on what the queue is being used for.
       | 
       | If a queue is just being used to decouple a producer/consumer
       | pair, then "it should be [almost] empty" is a reasonable notion,
       | assuming consumer/producer are compute bound and there's no
       | fundamental reason why one should be faster than the other...
       | 
       | Which brings us to the other type of use where things are being
       | enqueued specifically as a means of _waiting_ for resources that
       | are in limited supply: maybe just compute if these are
       | computationally heavy jobs, or maybe I /O bandwidth, etc. In that
       | case we expect the queue to often be non-empty, and there's
       | nothing wrong with that, even if there's potential to throw
       | money/hardware at it to get the queue size down to zero!
        
       | orf wrote:
       | No they shouldn't.
       | 
       | The way I see it is that there are two opposing things you can
       | optimise for that depend on queue depth: utilisation or latency.
       | 
       | If you care about processing each message as quickly as possible
       | then queues should be empty. This often requires a higher cost
       | and lower utilisation as you inevitably have idle workers
       | _waiting_ for new messages.
       | 
       | If you care about utilisation, then you never want your queue to
       | be empty while something is polling it. This might be some
       | background task that runs on a GPU - every second that sits idle
       | is wasted cash, and those tasks usually benefit heavily from
       | batching inputs together.
       | 
       | In this case you _want_ it to _always_ have something to read
       | from the queue and shut it down the moment this isn't the case.
        
         | mabbo wrote:
         | If a queue isn't approaching empty, then its input is exceeding
         | its output. Therefore it's size is trending to infinity.
         | 
         | Since we don't have infinite space, we can expect eventually to
         | lose some messages in this scenario.
        
           | [deleted]
        
           | beardedetim wrote:
           | I think this is a valid concern for Future Us to worry about.
           | If you have alerts for when this is going to happen, you can
           | punt the issue until it starts to become a concern and who
           | knows if your product will be alive when that issue happens.
        
           | londons_explore wrote:
           | M/M/1 Queue theory doesn't apply to real queues... Because
           | the work input into the queue usually _does_ depend on the
           | latency of previous work completed by the worker.
           | 
           | If there is a 1 hour queue for the checkout at a grocery
           | store, you probably won't join the queue - you'll go to a
           | different store...
        
             | lokar wrote:
             | In many systems you get the reverse. A client sends a retry
             | without the ability to cancel the first request, now it has
             | two requests in the queue.
        
           | User23 wrote:
           | The Little's law[1] offers considerable guidance here. In the
           | shortish term it's safe to model the system as being
           | stationary and then you need to provision for seasonality[2].
           | Upward or downward trending demand after correcting for
           | seasonality indicates that the system isn't stationary and
           | your choices are either to scale up or reject requests.
           | Nevertheless the optimal queue size is not necessarily zero
           | except in cases where extremely low latency is necessary. The
           | optimal queue capacity for most web service type applications
           | is one that trends toward almost but not quite full when
           | seasonal requests are higher.
           | 
           | [1] https://en.m.wikipedia.org/wiki/Little%27s_law
           | 
           | [2] broadly construed. For example a service that gets a mass
           | of requests at the top of every hour displays seasonality.
        
           | klyrs wrote:
           | > If a queue isn't approaching empty, then its input is
           | exceeding its output. Therefore it's size is trending to
           | infinity.
           | 
           | This is literally the slippery slope fallacy. You aren't
           | accounting for minor fluctuations or finite timescales. If
           | you were driving a car, you might say "if you aren't steering
           | into oncoming traffic then you're steering towards the ditch"
           | and conclude that no cars will ever safely reach their
           | destination.
           | 
           | If the only valid state for a queue is empty, then why waste
           | time implementing a queue?
        
             | heipei wrote:
             | We use queues for various jobs, some of which are time-
             | sensitive (e.g. emails), so the queue should be empty most
             | of the time, some of which are background or re-processing
             | (e.g. re-indexing), so queues can balloon and then take
             | days to drain again.
             | 
             | The unifying aspects of using queues for us is that it
             | allows us to load-balance jobs across workers, and allows
             | us to monitor throughput in a centralised fashion. We can
             | set alerts on the time-sensitive queues and then use
             | different thresholds for the background queues, but we're
             | using the same metric data and same alerting system.
        
             | teolandon wrote:
             | GP said "approaching empty", meaning it should be empty in
             | the long term. It's not a slippery slope.
        
               | camdenreslink wrote:
               | It assumes the velocity of items being added to the queue
               | never changes. In real life it speeds up (approaching
               | infinite items), and it slows down (approaching empty). A
               | more meaningful measurement would be something like
               | average number of items in the queue and average time in
               | the queue for any given item.
               | 
               | If you feel like you have enough spare capacity and any
               | given item isn't taking too long to process then it
               | doesn't matter if you are ever empty.
        
               | Espressosaurus wrote:
               | You're on the right track, but I personally find averages
               | to be not a useful measure...on average.
               | 
               | Corner conditions are where I start to care:
               | 
               | * What is the worst case latency? How long until the work
               | can be done? How long until the work can finish from the
               | point at which it enters the queue?
               | 
               | * What is the worst case number of items in the queue?
               | How large does the queue need to be?
               | 
               | * What is the maximum utilization over some time period
               | with some granularity? How spiky is the workload? A queue
               | is a rate-matching tool. If your input rate exactly
               | equals your output rate at all times, you don't need a
               | queue.
               | 
               | * What is the minimum utilization over some time period?
               | Another view on how spiky the workload is.
               | 
               | Minimums and maximums I find are much more illustrative
               | than averages or even mean-squared-errors. Minimums and
               | maximums bound performance; an average doesn't tell you
               | where your boundary conditions are.
               | 
               | In general you don't want your queue to fully fill up,
               | but like the other poster said, it's some tradeoff
               | between utilization and latency, and the two are
               | diametrically opposed.
        
               | camdenreslink wrote:
               | Sure, I guess by average I just meant shorthand for some
               | sort of measurement for making decisions about resources
               | allocated to your queues. Median, mean, P99, whatever is
               | useful.
        
           | orf wrote:
           | > Since we don't have infinite space, we can expect
           | eventually to lose some messages in this scenario.
           | 
           | Sure, that's technically correct but applies to basically
           | everything. It is very likely your users/orders/whatever
           | table in your database is also "trending to infinity" over
           | time, except this doesn't mean databases should always be
           | empty or that we should expect to eventually lose some
           | users/orders/whatever.
           | 
           | Or, more succinctly, if your queue capacity is 10 million
           | messages and your queue messages represent "houses purchased
           | through our website", then in reality your queue capacity is
           | infinite because nobody will ever purchase 10 million homes
           | through your application per small unit of time.
        
             | scubbo wrote:
             | Your response is correct, but there's a more fundamental
             | flaw in the preceding comment: although both paragraphs are
             | true in isolation, the implication between them is false.
             | "The queue is _currently_ trending to infinity" does not
             | imply "The queue will _continue_ to monotonically trend to
             | infinity". Periodicity is commonplace in real-life
             | situations; and while parts of Queueing Theory produces
             | some insightful and helpful insights by ignoring it, care
             | should be taken when applying them.
        
           | mecsred wrote:
           | This is so academic. Take any two samples; slope > 0 => we're
           | going to be dropping messages!!! Slope < 0 our threads are
           | idle!!!
        
           | vhcr wrote:
           | The growth rate of a queue is not constant, it could get
           | bigger during the day, but smaller at night, since customers
           | are not using the service.
        
           | [deleted]
        
           | derefr wrote:
           | What about a queue that is the ingress point for a fan-in
           | between O(N) active continuous producers to a single active
           | continuous consumer?
           | 
           | In such a case, you'd expect at least N messages in the queue
           | at all times -- and often a small multiple of N. Not because
           | the consumer can't consume those messages in a timely manner,
           | but because, given a reliable queue, messages can't be
           | consumed until they've been 2PCed into the queue, and that
           | means that there are always going to be some messages sitting
           | around in the "acknowledged to the producer, but not yet
           | available to the consumer" state. Which still takes up
           | resources of the queueing system, and so should still be
           | modelled as the messages being _in the queue_.
        
           | remram wrote:
           | > If a queue isn't approaching empty, then its input is
           | exceeding its output.
           | 
           | It depends what you mean by "approaching empty". If you mean
           | "empty most of the time", then no, that's wasted resources.
           | If you mean "its size is decreasing most of the time", then
           | no, not possible. Even if you mean "it is empty some of the
           | time", then maybe, but that is not a strong requirement
           | either.
           | 
           | Of course more average input than average output is bad, but
           | stating it as "should be empty" or "should be approaching
           | empty" seem to suggest the wrong thing entirely. It should be
           | _smaller than the target latency of the process_.
        
             | Smaug123 wrote:
             | Eh? It is possible for size to be decreasing most of the
             | time. Just make traffic extremely bursty - add 60 on the
             | hour every hour, and consume at one per minute.
        
               | remram wrote:
               | What does average mean there? Increasing and decreasing
               | are punctual events. If you mean the 1-minute average can
               | be negative most of the time, that is true, but it would
               | not work for other time horizons, whether larger or
               | shorter.
        
           | kqr wrote:
           | This is true only when you have relatively steady flows, or
           | infinite multiprocessing. In that case, yes, the queue is in
           | practise either "almost empty" or "infinite".
           | 
           | I think the reason you get the responses you do is people are
           | used to systems with high variation in flow rates that can
           | only process a limited number of items at the same time, and
           | for these, queues can soak up a lot of variability by growing
           | long and then being worked off quickly again.
        
         | [deleted]
        
         | ghusbands wrote:
         | > should always be full
         | 
         | > _always_ have something to read from the queue
         | 
         | In the first case, you've gone a bit far. To maximise
         | utilisation, you want the queue to never be empty when polled.
        
           | orf wrote:
           | Indeed, and this is what I meant but worded poorly.
           | 
           | Obviously you don't want it to be 100% full, but you do want
           | it to be the kind of full you get after a nice meal but you
           | still have room for desert and a coffee.
        
             | roflyear wrote:
             | Good reading here:
             | https://blog.danslimmon.com/2016/08/26/the-most-important-
             | th...
             | 
             | You really don't want your queue more than 70-80% utilized.
             | Once it gets to 75% you should consider that "red zone" and
             | you need to fix it somehow. Once the queue is 80% you're
             | really hitting into some issues. Above 80% and you will
             | have significant problems.
        
               | orf wrote:
               | Thanks for this, that has helped me see something I
               | missed.
               | 
               | If you're dealing with something like SQS and are
               | receiving variable sized batches of messages, the
               | consumers can be classed as "100% utilised" (always
               | processing a batch of messages with no empty receives)
               | whilst the queue would not be considered 100% utilised
               | (the average delivered batch size is always less than the
               | max).
        
               | roflyear wrote:
               | What issue are you trying to solve? The expense of
               | polling?
               | 
               | Obviously there is a line where you do not want a bunch
               | of expensive consumers hanging around - but this isn't
               | the place IMO to micro-optimize your costs. You shouldn't
               | have like 20x the consumers you need but you shouldn't
               | worry about having EXACTLY the correct amount.
               | 
               | In your example, are you implying that if more jobs come
               | in more workers get spawned? Is that specific to the
               | Amazon service?
        
               | orf wrote:
               | If a process is busy processing a task, it is busy.
               | 
               | A process could receive N tasks in a batch and process
               | them at the same time.
               | 
               | In this scenario it is possible for all workers to be
               | utilised whilst the queue is not "100% utilised" - which
               | is a failure mode the link you shared explained:
               | 
               | > And by induction, we can show that a system that's
               | forever at 100% utilization will exceed any finite queue
               | size you care to pick
               | 
               | The inflows to the queue are less than the outflows,
               | despite every worker spending 100% of its time processing
               | tasks.
               | 
               | It is specific on the utilisation type though, I'm mainly
               | thinking about ML inference that has a fairly fixed
               | initial cost per inference but scales quite nicely with
               | >1 input.
        
         | roflyear wrote:
         | If your queue is always full (100% utilization) your jobs will
         | on average never be processed.
         | 
         | Love the downvotes! People should read this:
         | https://blog.danslimmon.com/2016/08/26/the-most-important-th...
         | 
         | I understand this is a little counterintuitive but it is true.
         | You should not have a queue that always has a job ready to be
         | processed.
        
           | adrr wrote:
           | Queue isn't full. It isn't empty and things are being
           | processed not in "real" time. I've always run my queues where
           | they are never empty. If queue depth hits certain levels,
           | workers are scaled up. When it drops, they are scaled down.
           | Last thing I want is under utilized instances.
        
             | roflyear wrote:
             | Unsure what you're saying. If your queue is NEVER empty
             | wait times will grow to infinity. You shouldn't aim for
             | this. You should aim for something reasonable like, "empty
             | 50% of the time" etc..
        
               | adrr wrote:
               | They haven't been empty for two years and my waiting
               | times average 5 minutes and 30 minutes during high load
               | times. I don't understand how they will grow to infinity.
        
         | [deleted]
        
         | kilgnad wrote:
         | Wrong. They should.
         | 
         | Any queue that fills up, say halfway, is an indication the
         | system is in a precarious state.
         | 
         | Its simple. Queues have limited space if the velocity of data
         | going in is faster then data going out the queue will fill up
         | until it's full.
         | 
         | A half full queue indicates the velocity is in such a state and
         | such a state is unsustainable. You need to build your system
         | such that it stays out of unsustainable states as much as
         | possible.
         | 
         | If you're finding that your queue fills up half way a lot that
         | means you're building your system in such a way that it
         | operates at the border of sustainability. In other words the
         | system often ingests data at rates that cannot be sustained
         | indefinitely.
         | 
         | If a system with a queue half full operates in that same state
         | twice as long, you now have a full queue and you're losing
         | data.
         | 
         | In essence your queues should mostly be empty most of the time.
         | At best you can account for occasional spikes as the article
         | stated but a queue even half full should never be the
         | operational norm.
        
           | zokier wrote:
           | > You need to build your system such that it stays out of
           | unsustainable states as much as possible.
           | 
           | You assert that without justification. Typical counter-
           | example would be system where you have predictable daily
           | load-pattern, it can be perfectly reasonable to build the
           | system that accumulates items during peak hours and clears
           | queues during quiet hours.
        
           | orf wrote:
           | This response lacks nuance and focuses too much on the
           | operational side of queues.
           | 
           | Not all queues are "realistically bounded". Let's take a
           | hypothetical silly "order queue" where each entry represents
           | a EUR10 item that's been purchased. The queue feeds into a
           | boxing machine that puts the item in a box and posts it to
           | the user.
           | 
           | The point at which a queue hits an upper bound would
           | represent billions of euros of orders. That's not really a
           | realistic failure mode to hit without noticing long
           | beforehand.
           | 
           | Meanwhile, if you always want your queue empty then your
           | boxing machine is going to be shipping thousands of small
           | boxes because the moment an order arrives, it's processed
           | into a box.
           | 
           | It's much more efficient to have a small, managed and time-
           | bounded backlog of orders that the boxing machine uses to
           | batch orders together in a smaller number of larger boxes.
        
             | kilgnad wrote:
             | Wrong. Your thinking is where the nuance is lacking. As I
             | said you need to think of the system in terms of it's
             | derivative. The rate of data consumed must on average be
             | faster then the production. The net difference of these
             | rates must equal a negative velocity with the rate of data
             | going in as a positive number and data consumed as
             | negative.
             | 
             | There is no system on the face of the earth that can
             | operate at an average positive velocity. That is
             | categorical failure.
             | 
             | This is mathematically and logically true no matter how big
             | your queue is.
             | 
             | Even for the situation your describing... A really big ass
             | queue, the average velocity of that system must still be
             | negative. Unless your queue is so big that it can withstand
             | an average negative velocity over the lifetime of the
             | system which is kind of absurd.
             | 
             | Most of the time your queues should be empty. If your
             | queues are observationally often filled with data but not
             | full it means you're still operating at a negative velocity
             | but you've tuned your system to operate at the border of
             | failure.
             | 
             | It means your system hits unsustainable velocities often.
             | You may want this for cost reasons but but such a system is
             | not robust enough for my comfort. The queue should be empty
             | most of the time with occasional spikes at most.
        
               | orf wrote:
               | This is a very long-winded way of saying "you should have
               | enough capacity to process all items in the queue before
               | the queue fills up and everything explodes".
               | 
               | I agree. But that's not the argument the post is making,
               | nor does it support the position that "queues _should_ be
               | empty". No, they shouldn't. It's fine to let a queue
               | build up (and thus not be empty), then process all the
               | items in the queue in a short time frame. There are
               | efficiency savings if you are able to do this.
        
               | kilgnad wrote:
               | No. That is not what I'm saying. I'm saying on average
               | queues must be empty most of the time. And if you are
               | seeing queues that are filled most of the time, your
               | entire system is operating on the border of failure.
               | 
               | This is true even when you have a big ass queue. The
               | system must on average be more empty then it is filled.
               | 
               | This is not to say that your system has to never have
               | spikes in data. Your system can often have spikes, but
               | this cannot ever be the norm as in the amount of spikes
               | in data production rate cannot push the average
               | production rate to exceed the average consumption rate.
        
               | orf wrote:
               | I don't see how this holds true. Let's say I have a
               | process that takes 1 hour to process a _batch_ of items
               | up to 1000 (I.e doesn't scale linearly with input size).
               | So, 1 hour to process 1 item and the same 1 hour to
               | process 1000 items.
               | 
               | If I have 1 message per minute being added to the queue,
               | I can safely run 2 of these batch processes once a day to
               | clear the queue. Or more generally can launch (queue_size
               | / 1000) tasks once a day.
               | 
               | Because surely it doesn't matter about the queue being
               | empty, what matters is the _rate at which messages can be
               | processed_.
               | 
               | With 2 processes running once a day the rate would be
               | 2000/24/60 = 1.3 messages per minute in aggregate.
               | 
               | The queue can be filled 23 hours per day without issue
               | and without being on the brink of a failure state.
               | 
               | Edit: you edited your comment whilst I was replying to
               | specifically mention consumption rate. The aggregate
               | consumption rate is all that matters, but it doesn't
               | follow that the queue "has to be more empty than not
               | empty". Those are different issues.
        
               | kilgnad wrote:
               | You're talking about a different scenario. I'm talking
               | about continuous inflow and outflow. Where outflow is
               | processing (fixed rate) and inflow is traffic (variable).
               | This is the classic assumption. In this state the space
               | used in the queue is an indicator of mismatched
               | consumption and production velocities.
               | 
               | You're talking about a single batch job that runs once a
               | day and empties the queue at the same rate no matter how
               | much data is in the queue. Thus the more data you grab in
               | batch the more efficient the processing so you grab data
               | once per day instead of continuosly. This is a clear
               | exception from the assumed norm that needs to be
               | mentioned.
               | 
               | That is not to say that your case is invalid. Many
               | analytic databases possess your ingestion parameters. But
               | the case is exceptional enough that if not explicitly
               | mentioned it can be assumed that such a batch job is not
               | a primitive in the model being discussed.
        
               | orf wrote:
               | I disagree it's exceptional even if my examples have been
               | caricatures.
               | 
               | Very few technical problems don't have any elasticity in
               | outflows, but a lot of problems do benefit from a form of
               | batching. Heck, even something as simple as inserting
               | records into a non-analytic database benefits greatly
               | from batching (inserting N rows rather than 1) and thus
               | reducing TPS, or calling an external service, or writing
               | files to some storage, or running inferences, or writing
               | to a ledger, or anything.
               | 
               | Batching, in all forms, exists absolutely everywhere from
               | super-scalar CPU uops to storage to networking to the
               | physical world. The _exception_ is to do something
               | without batching, even if the whole stack you're using
               | from the ground up is built on it.
               | 
               | Given the common use case of batching, which inherently
               | requires items to be queued for a period of time, how can
               | you say that queues should always be empty? They would
               | statistically rarely be empty. Which is a good thing.
               | Because it enables batching.
               | 
               | If you want to queue and insert 1000 records into a
               | database with 1000 transactions across 1000 workers, then
               | say your workload is highly optimised because there are
               | never any records in your queue, cool. Not very true
               | though.
        
               | kilgnad wrote:
               | >Very few technical problems don't have any elasticity in
               | outflows,
               | 
               | They have elasticity but you don't care for it. You run
               | it at the maximum. As you mentioned the only case where
               | you wouldn't is if you have some transaction that doesn't
               | scale linearly with with frames/events.
               | 
               | >Given the common use case of batching, which inherently
               | requires items to be queued for a period of time, how can
               | you say that queues should always be empty? They would
               | statistically rarely be empty. Which is a good thing.
               | Because it enables batching.
               | 
               | In my experience, while batching is not uncommon,
               | streaming is the much more common use case. So we can
               | agree to disagree here.
        
               | ndriscoll wrote:
               | It's not really that exceptional. You can do similar
               | things with e.g. batch flushes every 5 ms for synchronous
               | web requests and the same ideas will apply. This can give
               | a massive increase in throughput while possibly
               | _decreasing_ overall response time for OLTP.
        
               | kilgnad wrote:
               | It's common, but exceptional enough that it should be
               | mentioned because it is a specialized database that does
               | batch ingestion.
               | 
               | Let's say you flew from one city to another. People
               | assume you took a plane. If you took a helicopter, though
               | helicopters are common, it should be mentioned otherwise
               | it won't be assumed.
        
               | ndriscoll wrote:
               | It doesn't take a special database though. You can do
               | this sort of thing in your application in like 10-20
               | lines of code (assuming you have a framework with queues
               | and promises). It works great with mysql/postgresql which
               | both benefit dramatically from batching queries.
        
               | kilgnad wrote:
               | The context here is external queues like kafka. If you're
               | writing stuff to a queue in your web app then it must be
               | a relatively small load. This is different.
               | 
               | For analytics databases the batch ingestion requirement
               | are usually so big that you have to save it to the file
               | system hence the need for an external queue.
               | 
               | Heap allocated queues are different, at that stage I
               | would say that it's in "processing" already and popped
               | out of the external queue.
        
               | ndriscoll wrote:
               | I guess "relatively small load" is... relative, but I've
               | written that kind of thing to be able to handle ~70k
               | sustained OLTP requests per second (including persistence
               | to postgres) when load testing locally on my laptop. In
               | any case the same thing applies to external queues. Your
               | workers will often be more efficient if you pull off
               | chunks of work to process together.
        
               | kilgnad wrote:
               | By small load I mean size. The frames in your batch jobs
               | are tiny and few in number if you're getting 70k batch
               | jobs per second.
               | 
               | It's more similar to streaming... what you're doing here.
               | In that case my velocity measurements are more
               | applicable. You want your queues to be empty in general.
               | A batch job is run every hour or something like that
               | which is not what you're doing here.
               | 
               | If you ran your load test for 5 minutes and you see your
               | queues are 50 percent full. Well that means 10 minutes in
               | you'll hit OOM. Assuming your load tests are at a
               | constant rate.
        
               | Xylakant wrote:
               | Queues can be full most of the time without the system
               | being in a precarious state. Take for example an
               | extremely spiky pattern of processing: once a day, a big
               | batch of jobs is delivered. You know ahead of time how
               | many items there are and have a pretty good handle on how
               | long it takes to process one item. The items should be
               | processed before the next batch arrives. You can then
               | tune your system to empty the queue just before the next
               | batch arrives. The queue is non-empty the majority of
               | time, yet the system is perfectly stable.
               | 
               | Now, that is an extreme example, but similar behavior
               | exists. It's fine to have queues partially filled most of
               | the time, as long as your average rate of processing
               | exceeds the rate of incoming items by some margin.
        
               | kilgnad wrote:
               | AS I said in the other post I am talking about the
               | average and most likely use case. a flat and continuous
               | rate of data processing and varied rates of data
               | production.
               | 
               | Spiking the processing is not what people think about
               | with queues and is most likely NOT what the author of the
               | article is talking about.
               | 
               | Typically there's no reason why you would spike your
               | processing unless such processing doesn't scale linearly
               | with the amount of items processed (as the author in a
               | sibling post to yours mentioned). Such a case must be
               | deliberately introduced into the discussion as an
               | exception to the model because it's not what people think
               | about when discussing this. Or maybe it is given that 3
               | people brought the same exact exception up. Welp if
               | that's the case then I can only say the author of this
               | article and I certainly aren't referring to this use
               | case.
               | 
               | edit: I'm rate limited. So I'm referencing another post
               | in my response that you won't see until the rate limit
               | dies and I post it.
        
               | zokier wrote:
               | > You may want this for cost reasons but but such a
               | system is not robust enough for my comfort
               | 
               | Your comfort is secondary to the business needs that the
               | system is designed to fulfill. In practice some risks are
               | better to accept than mitigate
        
               | kilgnad wrote:
               | My comfort is a measure of business aversion to risk. It
               | depends on the business of our respective industries.
               | 
               | Either way you have admitted that you tuned the system to
               | operate with an acceptance of risk. Likely for cost
               | reasons. This is fine, if you deliberately chose to do
               | this.
        
               | zokier wrote:
               | > This is fine, if you deliberately chose to do this.
               | 
               | Yet you bravely asserted that
               | 
               | > At best you can account for occasional spikes as the
               | article stated but a queue even half full should never be
               | the operational norm.
               | 
               | Suddenly this thing that you said "should never be"
               | changed to "is fine"
        
               | kilgnad wrote:
               | When humans communicate. We don't talk about exceptions
               | where you deliberately choose to operate outside of the
               | norm. If you deliberately choose to drive a car without a
               | seat belt knowing this risk. This is fine.
               | 
               | You want to sky dive while deliberately choosing not to
               | use a parachute? That's fine. But don't expect me to
               | adjust my communication to account for your exception.
               | You chose to do this.
               | 
               | But let's not pretend that you don't know about this fact
               | in human communication. It's an obvious thing you know.
               | Yet you chose to characterize your post in a way that
               | negatively makes my statement look unreasonable and your
               | actions look normal. Please don't do this. If you choose
               | to sky dive without a parachute, that's fine, but admit
               | you are operating with risk for the sake of cost. Do not
               | try mischaracterize my intentions, that is just rude.
        
           | dooglius wrote:
           | You are making a somewhat subtle error here, which is
           | assuming that the (output_rate-input_rate) is statistically
           | independent of the size of the queue. Note that this does not
           | necessarily imply that the output process changes based on
           | the size of the queue. For example, consider a microphone
           | that emits 100 raw samples to a buffer per ms and a real-time
           | thread scheduled every 2ms that processes very quickly. a
           | 300-sample buffer is more than enough here, even though the
           | buffer will be over half full about a quarter of the time,
           | and will never be empty. In this case, the velocity
           | conditional on size being less than half is -100 samples/ms
           | and the velocity conditional on size being greater than half
           | is 400 samples/ms.
        
             | kilgnad wrote:
             | This example suffers from the same problem as the other
             | counter examples. Why are you deliberately slowing down the
             | processing to only happen at 2ms intervals? If you have
             | continuous processing the queue will be empty always
             | assuming processing is faster then data production.
             | 
             | You're maybe referring to what the other person
             | mentioned... processing that does not scale with the amount
             | of data processed so you want the processing to operate on
             | the biggest batch of data possible. This is, while a common
             | use case, not common enough that it should be assumed
             | without be mentioned.
             | 
             | However, in your example it seems to be happening on the
             | same computer (you mentioned mic), (and thus not web, which
             | is what's the assumed context here) therefore data
             | production rate doesn't seem to spike. If your processing
             | is faster then data production you don't even need a queue,
             | just stream each frame straight to processing in a single
             | thread. But this also assumes the nature of your
             | processing. The details for the nature of the processing
             | can change things to require a queue for something like the
             | averages of frames at discreet intervals... but that's
             | different from the use case of queues like kafka or zmq for
             | which I think is the main context here.
             | 
             | Edit: I reference other responses here, but you won't see
             | them immediately. I'm currently rate limited and I have
             | those replies in a queue so those will be posted later
             | today.
        
         | marcosdumay wrote:
         | Your comment only shows that if you value resource utilization,
         | you shouldn't use queues.
         | 
         | Because, no, nothing you said changes the fact that queues only
         | do anything when kept mostly empty. And the only thing they do
         | is some short term trade of latency for reliability. They
         | actually can't change utilization.
        
           | orf wrote:
           | I'm currently waiting for a flight. Isn't airport is a kind
           | of queue that buffers people over time then delivers them to
           | the sky at steady, fixed intervals?
           | 
           | This airport seems busy but the planes seem full, how can we
           | improve utilisation?
        
             | Jtsummers wrote:
             | What does it mean to improve utilization if the aircraft
             | are at 100% utilization?
        
               | orf wrote:
               | Bigger aircraft I guess. But the point is that we are at
               | full utilisation despite the "queue" (airport) never
               | being empty, and thousands of people awaiting "delivery".
               | 
               | Which refutes the original point about queues not doing
               | much unless they are empty and being unable to effect
               | utilisation. If you have a somewhat inelastic output that
               | benefits from batching, but has variable inputs, they are
               | absolutely crucial to utilisation.
        
               | Jtsummers wrote:
               | I'm being nitpicky but my intended prompt was to get you
               | to define what needs to be improved, which you did. You
               | need to improve (or increase) capacity, not utilization.
               | The two are related, but they aren't the same.
               | Utilization being at 100% means it only has two ways to
               | go: Stay at 100% or drop below 100%. Which one
               | constitutes an improvement? It depends, but for the
               | airline utilization at 100% is already what they want (by
               | the numbers, should be their maximum profit) so it can't
               | be improved. Only capacity can be improved.
        
         | smetj wrote:
         | I came here to say the same and saw your comment. Totally
         | agree.
        
         | matchagaucho wrote:
         | _as you inevitably have idle workers waiting for new messages._
         | 
         | This is no longer the case using the latest in serverless IaaS
         | architecture, where the pool of queue workers dynamically
         | expand to meet demand.
        
           | orf wrote:
           | This has always been a thing, far far before the "latest in
           | IaaS architecture". The idle polling still exists except they
           | manage it for you. Their margin is then improved if they
           | intelligently reduce wasted polls through a variety of
           | techniques.
           | 
           | However, yes, you don't pay for this or care about it much.
           | It's good.
        
         | atdixon wrote:
         | You're assuming the consumer is the one producing batches.
        
         | anonymoushn wrote:
         | Seems like there's some implicit assumption in this comment
         | that everything is rented by the minute and latency never
         | matters. Renting things by the minute isn't really the cheapest
         | way to buy lots of things though. Also, its sometimes nice to
         | respond to the content of the article instead of its title.
        
           | orf wrote:
           | I think the implicit assumption in the comment is that it
           | would be possible for readers to understand the underlying
           | idea of the comment rather than only see the specific
           | example. "resource cost per minute" is not the only thing
           | helped by a non-empty queue.
           | 
           | Also please don't accuse someone of not "responding to the
           | content of the article" when the content is a bog standard
           | list. If you want my response to that: "yes, that is indeed a
           | list of 4 things queues help with". Not sure it adds much to
           | the discussion.
        
             | anonymoushn wrote:
             | When will my users be happier because I took longer to do
             | things on their behalf? I apologize for not understanding
             | the unwritten portion of the comment.
             | 
             | I guess I should really ask, why do people care about
             | utilization instead of throughput, latency, and cost? We
             | could increase utilization by rewriting everything to be
             | more wasteful, for example, but it doesn't seem like we
             | should do it.
        
               | zmgsabst wrote:
               | Utilization is choosing throughout and cost over latency.
               | 
               | People care about utilization when, eg, processing
               | petabytes of data -- something that takes hours to days,
               | where you're optimizing batch throughout and cost. By
               | making sure you efficiently utilize your hardware.
        
               | anonymoushn wrote:
               | > Utilization is choosing throughout and cost over
               | latency.
               | 
               | For a given rate of incoming work, increasing throughput
               | decreases utilization. So I guess it's not choosing
               | throughput too much.
               | 
               | > People care about utilization when, eg, processing
               | petabytes of data -- something that takes hours to days,
               | where you're optimizing batch throughout and cost. By
               | making sure you efficiently utilize your hardware.
               | 
               | If you have fixed resources and lots of work, it would be
               | good to have nonempty queues to prevent them from idling.
               | That is maybe the big unwritten part of the comment
               | upthread. It doesn't seem worth saying "no, the article
               | is wrong in all cases because of this particular case!"
               | though, so I'm not sure.
        
               | remram wrote:
               | If you define throughput as "processed items over time
               | per processor", the tradeoff becomes clear. Given the
               | rate of incoming requests, you can get more processors,
               | which will increase cost and reduce latency and
               | throughput-per-processor, or you can get fewer
               | processors, which will decrease cost and increase latency
               | and throughput-per-processor.
               | 
               | If you have fixed resources and fixed work, wanting your
               | queues to be empty is a wish, not a strategy.
        
         | Kinrany wrote:
         | Isn't a full queue basically equivalent to no queue? The same
         | number of requests is being dropped.
        
           | WJW wrote:
           | Not all queues serve (soft) real time requests and not all
           | queues are bounded (1). If the rate of incoming queue items
           | is predictable but not constant (for example, average load in
           | the weekend or at night tends to be lower for many systems)
           | you can often get away with having enough capacity to serve
           | the average load while using the queue as a buffer. This is
           | fine if the work done is not time critical, like sending most
           | types of marketing emails.
           | 
           | You might argue that a queue with many items in it which can
           | still accept more items is not really "full", but it is
           | clearly not empty either.
           | 
           | (1) Obviously all queues are bounded by the capacity of the
           | hardware, but since most queueing software can persist to
           | disk and/or lives in the cloud with elastic storage this is
           | not usually a problem.
        
           | remram wrote:
           | The difference is that the requests that are being served are
           | getting extreme latency. You are serving the same number of
           | requests, but every single one is X days old.
        
           | yxhuvud wrote:
           | No, there are use cases where new entries are not ingested by
           | requests - sometimes you know all work that needs to be done
           | upfront. And sometimes you have a situation where you have a
           | few requests coming in over time but that you at the same
           | time need to sometimes go through everything you have seen
           | earlier, once more and flush it through the queue.
        
       | gillh wrote:
       | HN: our team has been working on an advanced API queue/load
       | management system called Aperture[0]. It has following
       | capabilities -
       | 
       | 1. Limits concurrency based on latency gradient or queue depths
       | etc.
       | 
       | 2. Weighted fair queueing scheduler - prioritized load shedding
       | based on workload priorities and latency estimation.
       | 
       | Anyone interested in Little's Law, token buckets, network
       | schedulers, control theory (PID controllers) should definitely
       | check it out! [1]
       | 
       | [0]: https://github.com/fluxninja/aperture
       | 
       | [1]: https://docs.fluxninja.com/
        
       | bob1029 wrote:
       | I have no strong opinion on this as long as you are not using
       | queues for purposes of inter-thread communication.
       | 
       | If you _are_ using queues for this purpose, then I recommend
       | reading the following:
       | 
       | https://lmax-exchange.github.io/disruptor/disruptor.html#_th...
       | 
       | > When in use, queues are typically always close to full or close
       | to empty due to the differences in pace between consumers and
       | producers. They very rarely operate in a balanced middle ground
       | where the rate of production and consumption is evenly matched.
       | 
       | This has always been my experience. Has anyone ever had a perfect
       | goldilocks scenario between 2 threads?
        
         | wongarsu wrote:
         | One good use-case for still using queues is if your producer or
         | consumer has spiky characteristics on small timescales, but
         | predicatable perfomance if you zoom out enough. In that case
         | your queue size spikes to some high value, but always returns
         | to 0.
         | 
         | The reasons for this can be numerous, maybe you produce a lot
         | of jobs at once because of a user interaction, or your consumer
         | is running slower for five minutes because some cron job is
         | stealing your resources.
        
       | taeric wrote:
       | Alternatively stated, queues are safest in a steady state. If
       | they are growing in size, that is clearly bad if not bounded. If
       | they are shrinking in size, that means you are going to be idling
       | soon. I think it is fair that if you are picking between those
       | states, you'd prefer the idling.
       | 
       | That said, consider a physical machine. The "hopper" on many is
       | essentially a queue to feed in items to work on. If you let that
       | go empty, you messed up. Hard.
        
       | mtlmtlmtlmtl wrote:
       | If a datastructure is usually empty, you probably don't need that
       | datastructure.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-02-19 23:01 UTC)