[HN Gopher] Queues don't fix overload (2014)
___________________________________________________________________
Queues don't fix overload (2014)
Author : akbarnama
Score : 166 points
Date : 2024-01-18 13:33 UTC (9 hours ago)
(HTM) web link (ferd.ca)
(TXT) w3m dump (ferd.ca)
| throwawaaarrgh wrote:
| (2014)
|
| Fwiw, sinks have overflows: https://www.thespruce.com/the-
| purpose-of-a-sink-overflow-271...
|
| The computer equivalent is probably load shedding
| cnity wrote:
| The sink overflow goes to the same drain, so it doesn't really
| work in this analogy! (Or it works very well, depending on your
| perspective).
| weego wrote:
| The overflow goes to the same drain, but further down, past a
| bottleneck that's designed to be there for a logical reason.
| It's a good example of real world flow control considerations
| vs paper theory.
| tonyarkles wrote:
| Continuing the analogy from the article though, the sink
| overflow _doesn 't_ go past the red arrow in their
| diagrams. It's there to solve the specific problem of "the
| sink drain is plugged, either because you left the water
| running with the plug in or because there's a ball of hair
| immediately below the drain".
| jusssi wrote:
| Depends.
|
| An overflowing queue that drops jobs from the front while backend
| chugs along as fast as it can, is for many cases a better outcome
| than the backend overloading and everything grinding to a halt.
|
| Compare it to a physical service desk. The one that has a queue,
| serves one person at a time, and people arriving will try again
| another day if the queue is too long. The one without queue has
| people fistfighting over who gets to go first, and no one ends up
| getting service.
| sparsely wrote:
| That's loadshedding.
| teeray wrote:
| So it allows the dev team to blame the queue. Instead of "we
| decided to drop requests," it's "oh look, the queue is dropping
| requests."
| User23 wrote:
| So far as I know there is no theoretical alternative to load
| shedding or increasing handling capacity if your average request
| arrival rate is greater than your average request handling rate.
| At least, not if you want to handle every accepted request using
| a finite queue[1]. It would appear that with an unbounded queue
| every request will eventually be handled, but with an unbounded
| latency guarantee. Which appears equivalent to "grinds to a halt"
| for sufficient n.
|
| However, that may very well change with fair queueing. If
| unbounded queue storage is available and you can prioritize
| requests in a timely fashion, then instead of shedding excess
| requests they can go into unbounded queue limbo instead while
| still meeting the SLA for priority requests. I imagine there is
| developed queueing theory for that.
|
| [1] I have been taught here though that classical queuing theory
| isn't adequate for all cases. I think it is here, but I will
| gratefully accept correction if I'm wrong.
| gpderetta wrote:
| > It would appear that with an unbounded queue every request
| will eventually be handled
|
| Of course there is no such a thing as an unbounded queue.
| thaumasiotes wrote:
| > It would appear that with an unbounded queue every request
| will eventually be handled, but with an unbounded latency
| guarantee.
|
| This might or might not be true. Depending on your strategy for
| retrieving tasks from the queue, it is not necessarily the case
| that every request will eventually be handled. FIFO will give
| you that guarantee; a priority system won't.
| Joker_vD wrote:
| > If... you can prioritize requests in a timely fashion
|
| Just want to point out that this still is processing -- and so
| your sorting-proxy that would either route high-priority
| requests forward for actual processing or put low-priority
| requests into a queue with unbounded storage can still be
| overloaded with incoming requests. And, of course, that queue
| with unbounded storage need to be able to put all of those
| incoming requests into the storage in timely manner.
| User23 wrote:
| Sure, and at some level everything is processing. Even
| putting an object at the end of a FIFO queue has overhead.
|
| I would be very hesitant to introduce a sorting proxy unless
| the need for it had been thoroughly demonstrated though. If
| your prioritization isn't lightweight enough to do in the
| server it's probably too heavy. Also you might accomplish it
| at the network level. Regardless your point stands that the
| work has to be done somewhere, but that doesn't change the
| fact that fair queuing is a good idea for the majority of
| production services. It's absolutely great when a misbehaving
| client only DoSes themselves and doesn't affect anyone else.
| Joker_vD wrote:
| > I would be very hesitant to introduce a sorting proxy
|
| > fair queuing is a good idea for the majority of
| production services
|
| Huh? As you've said, fair queuing requires one to quickly
| look at incoming messages, decide whether they are
| high/low-priority and route accordingly (to the processing
| server/to the queue). This is a sorting proxy, and so it
| seems that having it in some form is required to implement
| fair queuing. Right?
|
| > It's absolutely great when a misbehaving client only
| DoSes themselves and doesn't affect anyone else.
|
| Yes, of course. But that's precisely the problem with
| DoS/DDoS attacks: they're, inherently, the cases of a
| client/clients refusing to obey the back pressure demands
| from the upstream.
|
| Over network, a misbehaving client affects also affects
| their ISP at the very least, by the way.
| LanceH wrote:
| Their alternative:
|
| "To make stuff usable, a proper idempotent API with end-to-end
| principles in mind will make it so these instances of back-
| pressure and load shedding should rarely be a problem for your
| callers, because they can safely retry requests and know if
| they worked."
|
| Is to ignore requests and make the caller implement their own
| queue, apparently.
| xyzzy_plugh wrote:
| It's turtles all the way down. At some point, it's _not
| really_ a queue.
|
| For example imagine a computer program to fetch the contents
| of a list of URLs. Perhaps it makes some number of parallel
| requests.
|
| If the URLs are hard coded, e.g.:
| fetch("https://news.ycombinator.com/")
| fetch("https://bbc.co.uk/")
| fetch("https://archive.org/") ...
|
| When the maximum concurrency is reached (which could simply
| be 1) and all requests are blocking, then back-pressure is
| applied. The callers wait before submitting more work.
|
| In this example the queue is the code, and the program
| counter acts represents the head of the queue. I wouldn't
| regularly refer to that as a queue.
|
| Perhaps instead it reads a list of entries from a file. The
| queue's state can be represented by file offset. And so on.
|
| Computers are by definition carry out sequences of
| operations. A sequence can be viewed as a queue. Everything
| is a queue.
|
| So yes, ignore requests and make the caller implement their
| own queue _is a completely correct take_ in a sense, but I
| don't find it productive.
| Sohcahtoa82 wrote:
| > However, that may very well change with fair queueing.
|
| It doesn't matter what kind of queueing system you put in
| place. No matter how fancy it is, and even if the queue adds
| negligible overhead, if your average request arrival rate is
| higher than the average handling rate, your queue _will_ grow
| unbounded. Then, for some requests, the system will appear to
| grind to a halt.
|
| There's no way around it. A priority queue just means that
| high-priority requests get processed faster. Low-priority
| requests might end up never getting handled.
|
| Here's a concrete example. I've got a web service that requires
| a login. I'm using bcrypt to hash passwords with a sufficiently
| large work factor that I can only hash 5 passwords per second
| and have a queue for the password hashing. If 6 people are
| trying to login every second, then after 1 minute, I will have
| received 360 login requests. 300 of them will have completed,
| but the queue has grown to 60. Anybody trying to log in at that
| point will have to wait a minimum of 12 seconds to get their
| login processed.
|
| But what if I also have users trying to sign up? I have to hash
| their password, so those get added to the queue as well. If I
| prioritize logins over signups, then under my scenario, _a
| signup will never complete_.
|
| There's really nothing that can be done. An unbounded queue
| means unbounded latency. Eventually, a login request would
| _have_ to get rejected just because the queue is full. Of
| course, in the real world, this creates frustration for the
| user. They 'll either try to login again, or give up.
| tanveergill wrote:
| Aperture (https://github.com/fluxninja/aperture) takes a
| slightly opinionated take on queue size. Instead of defining
| queue sizes, you put a timeout on each request which is the
| amount of time the request is willing to wait in the queue.
| Then we run a weighted fair queuing algorithm that ensures
| relative allocation across workloads, e.g. 90% capacity for
| critical requests. But the capacity is allowed to burst when
| there is low demand.
| tanveergill wrote:
| I am in the same school of thought as you, fair queuing is the
| most optimal solution while capacity is constrained. This is
| the approach we took in Aperture, an open-source load
| management system. Read more about our Scheduler which
| implements a variant of WFQ (Weighted Fair Queuing) to ensure
| desired capacity allocation across request workloads and SWFQ
| (Stochastic Weighted Fair Queuing) to ensure fairness across
| users within each request workload:
| https://docs.fluxninja.com/concepts/scheduler
| roenxi wrote:
| The other thing to bear in mind about queues is that once they
| start showing of symptoms of something being wrong, collapse
| might be just around the corner or it might not be depending on
| the nature of the load.
|
| When congestion spikes start showing it is helpful to know some
| queuing theory to estimate how close the situation is to eating
| someone's weekend. Congestion collapses are an interesting time
| because most people don't know queue theory or how to reason
| using balance equations, it is possible to misdiagnose the
| problem or waste a stressful few days trying to work out a
| congestion situation by experiment.
| domano wrote:
| Hey, can you recommend something one might read to get up to
| speed on queuing theory? I certainly am not aware of it, but
| work with queues.
| PaulHoule wrote:
| Whenever I talk to operations research people about "how do I
| learn X?" or "how do I calculate Y?" I usually get told to
| write a Monte Carlo simulation despite there being a lot of
| beautiful math involving stochastic processes, generating
| functions and stuff like that. (Even if you are calculating
| results in closed form it is still a slam dunk to have a
| simulation to check the work except when you are dealing with
| "exceptional event" distributions... That is, a Monte Carlo
| simulation of a craps game will give you an accurate idea of
| the odds in many N=10,000 samples, but simulating Powerball
| takes more like N=1,000,000,000 samples.)
|
| The single "uncommon sense" result you need to know about
| queuing is
|
| https://erikbern.com/2018/03/27/waiting-time-load-factor-
| and...
|
| that is, with random arrivals, a queue that has slightly less
| than 100% utilization will grow stupendously long. People
| look at a queue with less than 100% and often have a feeling
| of moral disgust at the "waste" but if you care about the
| experience of the customer and the reliability of the system
| you don't let utilization get above about 80% or so.
| tonyarkles wrote:
| I learned about this stuff in grad school. The course
| wasn't mandatory for everyone but my supervisor made it
| mandatory for me due to the nature of the research I was
| doing: "Computer Systems and Performance Evaluation". It
| was basically focused on queuing theory and state space
| modelling.
|
| Reading through this whole discussion thread really makes
| me want to dig up my old notes and whip up a blog post with
| a Jupyter notebook or something that people can use to
| really dig into this and start to grok what's happening
| because a lot of it really isn't that intuitive until
| you've been steeped in it for a while.
| PaulHoule wrote:
| If I were you I'd consider using
|
| https://simpy.readthedocs.io/en/latest/
|
| inside a Jupyter notebook.
| tonyarkles wrote:
| Hey that's awesome! Unfortunate that I'm going to get
| even more confused now about whether I'm talking about
| SimPy or SymPy but that's life :)
| PaulHoule wrote:
| Maybe use one for closed form, use their other for the
| simulation that confirms it!
| jph wrote:
| Queueing theory introduction:
| https://github.com/joelparkerhenderson/queueing-theory
| lanstin wrote:
| There are a billion books on abstract or basic queueing
| theory (worth reading) but a really good modern paper on
| distributed software implications is
| https://pure.psu.edu/en/publications/metastable-failures-
| in-...
| ibejoeb wrote:
| Also, in this context, not a bad idea to revisit the broader
| topic of scheduling.
|
| https://csc-knu.github.io/sys-
| prog/books/Andrew%20S.%20Tanen...
| roenxi wrote:
| I refer to _Fundamentals of Queueing Theory_ by Gross,
| Shortle, Thompson & Harris.
|
| Although Wikipedia is enough. As far as insights go the topic
| is relatively simple, it is just bad practice to be re-
| deriving the first 100 pages of an intro-to-queueing textbook
| in an emergency.
|
| 80% of the time it is enough to assume the process is an
| M/M/1 queue or consider how the queue would perform relative
| to an M/M/1 queue. M/M/1 queues are the analog to fitting a
| straight line, simple & technically incorrect. It is good to
| move through that part of the day without thinking.
| kasey_junk wrote:
| This is a weird article because it points out that queues don't
| solve overload but neither do load shedding or back pressure.
|
| All 3 techniques are just different trade offs on what to do in
| the face of overload. All 3 have negative ramifications for the
| users of the system. Load shedding reduces availability, back
| pressure increases complexity and queues increase latency.
|
| In "critical" systems you need all 3. And all 3 can be
| overloaded. Frankly, your load shedding or back pressure system
| is probably implemented on a queue one layer down the
| abstraction.
| robertlagrant wrote:
| > Frankly, your load shedding or back pressure system is
| probably implemented on a queue one layer down the abstraction.
|
| If you're building an API, you don't care how your clients cope
| with your backpressure. You just want to avoid overloading your
| services, and queues will indeed not help you with that past a
| certain point, whereas backpressure probably will. Or at least
| you can scale backpressure much more cheaply than you can scale
| queues.
| lanstin wrote:
| Teach your clients to handle 429 with gradual back
| off/retries and it works well. Even teach your web UI to
| degrade under partial failure of widget loads.
| dijit wrote:
| I think it's the threshold of all people to truly understand
| that there's no solution to certain problems, only tradeoffs.
|
| Queues are great, but can lead to catastrophic failure if you
| don't have a good way of handling the queue, so making an
| active choice about how you handle overload is part of
| designing a robust and resilient system.
|
| Trading off new requests for current requests is, in my
| experience, a valid strategy for eCommerce for example. We
| called it "quenching".
| kqr wrote:
| > Trading off new requests for current requests is, in my
| experience, a valid strategy for eCommerce for example. We
| called it "quenching".
|
| I'm not sure in which direction the trade happens but it
| sounds like you're dropping older requests in favour of
| newer. I agree, this has worked well for me also.
| Surprisingly often the oldest item in the queue is the one
| for which service will be least valuable.
| hinkley wrote:
| I think the theory is that if half of your excess requests
| come from two actors, then half of the consequences of
| random dropping are felt by those two people. That's not
| fair, but it's better than dropping all requests after you
| get into trouble. Fairness is very difficult to achieve in
| a distributed system, due to coordination delays.
| smaudet wrote:
| I think part of the article is just the author venting that
| people make design decisions without understanding the
| ramifications.
|
| Operating a queue as a buffer would be absolutely _fine_ if
| there were also service level agreements implicit in every
| queue in operation - i.e. one queue reaches half capacity or
| is experiencing explosive growth, leading to reactive
| techniques such as spinning up larger queues /servers,
| alerting stakeholders to the performance issue(s), and
| possibly even dynamically rate-limiting end-users.
|
| But this is a whole-system centric view of the design. What
| they're specifically bemoaning is the asinine focus on
| component-centric design - your widget/service performs
| slowly? Throw a queue at it (and wonder why you broke
| everything 100 features and 10x customer base later)!
|
| For legitimately small problems, this is OK. But then, you
| accept that you aren't scaling, and you bake it into your
| design assumptions. And you sell that, explicitly, to the
| customer "hey this will only work for 1 thousand/1 million
| customers, if you want more we can talk".
| 0xbadcafebee wrote:
| The only real solution to overload (that is, the eventuality of
| the system not having enough capacity), in modern systems, is
| autoscaling. Nobody seems to talk about this, I guess because
| it's taken for granted? But you can literally just keep adding
| capacity now. We didn't really have that before the cloud; you
| had the servers you bought, and maybe you'd rush to repurpose
| some servers to add capacity. Now an algorithm just adds
| servers magically when your load gets high.
|
| Of course the system still has limits. But if you go from zonal
| to regional to global autoscaling, and you architect
| specifically to scale up each point in the system using
| autoscaling, it kind of doesn't have a limit? (that any one
| product would hit, at a global level)
|
| In the past I have spun up a duplicate distributed system in
| another region and just split the traffic between the two. IaC
| is crazy.
| manuelabeledo wrote:
| > Nobody seems to talk about this, I guess because it's taken
| for granted?
|
| No, because it has a theoretical limit, same as queues, back
| pressure, etc.
|
| One cannot simply scale up indefinitely because it is not
| profitable.
| 0xbadcafebee wrote:
| There's no product in the world that would hit a limit if
| autoscaled globally on AWS. Sure, you could write an app
| whose literal sole purpose is "take up all memory, CPU and
| bandwidth, recursively, infinitely", but nobody is making
| that product. Real products built today have a finite
| amount of demand, and global cloud capacity is larger than
| that.
|
| You can't say what architecture is or isn't profitable in
| general, business is more complicated than that. But
| besides the realities of commerce, you can easily architect
| a system such that the autoscaling cost is a fraction of
| opex.
| smaudet wrote:
| > There's no product in the world that would hit a limit
| if autoscaled globally on AWS > but nobody is making that
| product
|
| Nobody is making that product because nobody has money
| for it...(or the demand to supply that money).
|
| You can auto-scale without AWS, actually, its called
| measuring server load and auto-ordering new machines when
| your load increases (and keeping machines turned off on
| standby for spikes).
|
| Or you can go hybrid, and load-shed to a cloud...which
| would be the smart thing to do.
| vault_ wrote:
| > Real products built today have a finite amount of
| demand, and global cloud capacity is larger than that.
|
| This isn't really true, and it's especially not true when
| specialized hardware comes into play. If you have a "web-
| scale" GPU workload, it's not unlikely that you'll hit
| resource availability constraints from time to time. The
| question isn't whether cloud capacity is larger than your
| demand for a particular resource, it's whether cloud
| capacity is larger than the aggregate peak demand for
| that resource. Cloud providers aren't magic. They engage
| in capacity planning, sometimes underestimate, and are
| sometimes unable to actually procure as much hardware as
| they want to.
| manuelabeledo wrote:
| > There's no product in the world that would hit a limit
| if autoscaled globally on AWS.
|
| While this may be true, autoscaling indefinitely is
| _still_ not profitable.
|
| > You can't say what architecture is or isn't profitable
| in general, business is more complicated than that. But
| besides the realities of commerce, you can easily
| architect a system such that the autoscaling cost is a
| fraction of opex.
|
| Customer acquisition and retention cost more money than
| it should.
|
| I cannot count how many times I got an email from the CTO
| asking to lower our cloud costs, while product and
| marketing refuse to raise prices, making the whole
| endeavor borderline unprofitable. You may then argue that
| the product suffers from either bad architecture, or bad
| pricing, or both, but the situation is far from uncommon.
| lanstin wrote:
| I have made and deployed pieces of that infinite
| recursive. Most spectacularly by having infinite call
| loops triggered. Had we had autoscaling rather than sharp
| queue limits leading to load shedding it would have been
| worse.
| gilbetron wrote:
| Not really true (I work for one that can, and does, hit
| limits in AWS, let alone GCP/Azure), but the general
| point is mostly true. You just might not actually get
| what you want fast enough, however.
| ahoka wrote:
| Increasing capacity would be very stupid if doing so does not
| also increase revenue.
| secondcoming wrote:
| That's fine until the issue lies with something that your
| auto-scaled instances talk to, e.g. Redis, Scylla, SQL DB
| secondcoming wrote:
| That's fine until the issue lies with something that your
| auto-scaled instances talk to, e.g. Redis, Scylla, SQL DB.
| There are situations where auto-scaling to infinity makes
| things far worse.
| tonyarkles wrote:
| 100% that was my first thought when reading the GP comment.
| Autoscaling isn't a magic fix. Just like the article says,
| you need to find the red arrow first to figure out where
| the bottleneck is and whether or not that bottleneck is
| actually something within the auto-scaling context or not.
| You've pointed out a couple of good examples of potential
| bottlenecks. Another possibility is a downstream 3rd-party
| service. If you start auto-scaling because a downstream
| service is struggling, you're just going to end up hitting
| it even harder and essentially contributing to a DoS attack
| against them.
|
| No silver bullets and all that. Still need to do
| engineering analysis to figure out what's actually
| happening before you start shooting.
| lanstin wrote:
| Priority queues for your downstream calls to rate limit
| them. A large enough cluster can take down most any data
| store, except maybe spanner. Even then you could increase
| latency a lot while it scales up.
| jeremyjh wrote:
| Autoscaling is not going to help if you are IO-bound in your
| database. One point of the article is you have to identify
| your bottleneck before you can make sensible design choices.
| 0xbadcafebee wrote:
| ....you can autoscale database read replicas, and write
| nodes (master-master)....
| jeremyjh wrote:
| If you are bound in disk IO, adding master-master write
| nodes will not help you. The same number of bytes have to
| be written to the same volume whether they come from a
| replica or an application server. The only solution is
| partitioning/sharding, and there is no "easy button" to
| press and make that happen because it comes with its own
| limitations and trade-offs, and is something the
| application code will be intimately aware of.
| lanstin wrote:
| Auto sharding. A nice idea for a DAO layer.
| vault_ wrote:
| Autoscaling seems like a downstream concern from the
| techniques being discussed here. Autoscaling tends to have a
| pretty high latency, so you still need a strategy for being
| overloaded while that extra capacity comes online. There's
| also a question of how the autoscaler knows what "load" is
| and when it's "too high." Just going off of CPU/memory usage
| probably means you're over-provisioning. Instead, if you have
| back-pressure or load-shedding built into your system you can
| use those as signals to the autoscaler.
| EdwardDiego wrote:
| Autoscaling is great, if you solve the problems you rightly
| mention.
|
| But IMO it's best viewed not as a technique to increase
| capacity that risks overprovisioning, but rather it should
| be viewed as a technique to significantly reduce the
| overprovisioning you were already likely doing to provide
| capacity that could handle peaks in demand without blowing
| through delivery expectations (e.g., timeliness, data loss
| minimisation, etc.)
|
| At an old employer, our load was seasonal over the day. If
| one instance of an app could handle N req/s, and the daily
| peak maxed out at 100N req/s, then we had to run 100
| instances as a minimum (we usually chucked some extra
| capacity in there for surprises) even if the mean daily
| peak was 75N req/s.
|
| And of course, at the times of the day when incoming reqs/s
| was 0.5N reqs/s, well, we still had 99 instances twiddling
| their thumbs.
|
| And then there were the days when suddenly we're hitting
| 200N req/s because Germany made the World Cup quarter-
| finals, and things are catching fire and services are
| degraded in a way that customers notice, and it becomes an
| official Bad Thing That Must Be Explained To The CEO.
|
| So when we reached a point in our system architecture
| (which took a fair bit of refactoring) where we could use
| autoscaling, we saved soooo much money, and had far fewer
| Bad Thing Explanations to do.
|
| We had always been massively overprovisioned for 20 hours
| of the day, and often still overprovisioned for the other
| 4, but we weren't overprovisioned enough for black swans,
| it was the worst of both worlds.
|
| (Although we kept a very close eye on Germany's progress in
| the football after that first World Cup experience)
|
| You're spot on that
|
| a) to autoscale up effectively we had to minimise the time
| an instance took to go from cold to hot, so focused a lot
| on shared caches being available to quickly to populate in-
| memory caches
|
| b) adding new hardware instances was always going to take
| longer than adding new app instances, so we had to find
| some balance in how we overprovisioned hardware capacity to
| give us breathing room for scaling without wasting too much
| money and
|
| c) we found significant efficiencies in costs and time to
| scale by changing the signals used to scale after starting
| out using CPU/mem.
|
| Also a significant learning curve for our org was realising
| that we needed to ensure we didn't scale down too
| aggressively, especially the hardware stuff that scaled
| down far faster than it scaled up.
|
| We hit situations where we'd scale down after a peak had
| ended, then shortly after along came another peak, so all
| the capacity we'd just dynamically removed had to be added
| back, with the inherent speed issues you mentioned, causing
| our service to be slow and annoying for customers, with
| minimal savings while capacity was trampolining.
|
| (This incidentally can be really problematic in systems
| where horizontal scaling can introduce a stop the world
| pause across multiple instances of an app.
|
| Anything that uses Kafka and consumer groups is
| particularly prone to this, as membership change in the
| group pauses all members of the CG while partitions are
| reallocated, although later versions of Kafka with sticky
| assignors have improved this somewhat. But yeah, very
| critical to stop these kinda apps from trampolining
| capacity if you want to keep data timeliness within
| acceptable bounds.)
|
| It took a lot of tuning to get all of it right, but when we
| did, the savings were spectacular.
|
| I think the CTO worked out that it only took six months of
| the reduced AWS costs to equal the cost of the two years of
| system refactoring needed to get to that point, and after
| that, it was all ongoing cream for the shareholders.
|
| And while I get the hate people have for unnecessary usage
| of K8s (like Kafka, it's a complex solution for complicated
| problems and using it unnecessarily is taking on a whole
| lot of complexity for no gain), it was perfect for our use
| case, the ability to tune how HPAs scale down, being able
| to scale on custom metrics, it was just brilliant.
|
| (I wish I could end with "And the company reinvested a
| significant proportion of the savings into growth and gave
| us all big fat bonuses for saving so much money", but haha,
| no. The CFO did try to tell us we'd been unnecessarily
| wasteful prior and should have just built a system that was
| created in 2007 like the 2019 version from the start,
| because apparently a lot of MBA schools have an entrance
| requirement of psychopathy and then to graduate you have to
| swear a bloodpact with the cruel and vicious God of
| Shareholder Value)
| hinkley wrote:
| If I let my customers railroad me into running more servers
| to fulfill their "needs" then I may transition into losing
| money on my business. That's not a solution.
|
| Needs is in scare quotes because a lot of traffic comes from
| misunderstanding or laziness from customers or from other
| divisions. Try as we might, nearly all of the improvements in
| capacity per customer on my project in the last six months
| have come from that sort of work. Other sorts of changes have
| unfortunately been too close to the noise floor. They add up
| but are difficult to demonstrate, and are very easy to undo
| with new feature work.
| o11c wrote:
| Additionally, most users actually _are_ happy with "slow
| me down if I'm making too many requests". This is _much_
| simpler than recovering from errors correctly.
| lanstin wrote:
| Which is why your test integration environment has to
| throw 429 and 5xx errors consistently from day one. The
| error handling paths are hard to retrofit but easy to do
| before deployment.
| o11c wrote:
| (What test environment?)
|
| That will make you introduce error _handling_ , but not
| necessary "correct" handling. "Retry after errors" is a
| great way to overload a system.
| paulddraper wrote:
| While that is a "real" solution, it not necessarily a
| possible solution.
|
| 1. It may be cost prohibitive.
|
| 2. Some systems don't scale horizontally. (In particular,
| databases.)
|
| 3. Many systems do not scale linearly.
| jiggawatts wrote:
| In my experience, auto scaling has not yet been applicable in
| the typical sense of adding more web servers.
|
| If you use an efficient language like C# or Go, the
| performance bottleneck moves to the database layer almost
| immediately.
|
| Caching can be added and can auto scale, but that has a
| tradeoff: stale data or eventual consistency.
|
| Databases are hard to scale because typically there has to be
| a single master instance responsible for write ordering and
| transaction consistency.
|
| Auto scaling the database layer is fraught with its own
| issues, and simply doesn't work in practice for many common
| scenarios.
|
| For typical business apps, or typical web pages that might
| suddenly get a huge surge in usage (I.e.: Hug of death), a
| CDN can help... _maybe_.
|
| My customers keep asking for auto scale to solve their
| problems, when the root cause is invariably a missing index
| in their database.
| marcosdumay wrote:
| No, load shedding and back pressure present you trade-offs to
| deal with an overloaded system.
|
| Queues don't. Queues just present you problems. If you take an
| overloaded system and add a queue, every single feature either
| gets worse or doesn't get any better.
|
| And people like to deny this, and pretend that queues will
| help. They absolutely help with a lot of things but they do
| nothing but harm in front of an overloaded system.
| naasking wrote:
| Your mistake is characterising all overloads as the same.
| Adding a queue to a system that is consistently overloaded
| won't solve the overload, but many overloads are
| temporary/bursty, like the Slashdot/HN effect. Queues
| absolutely do solve these kinds of overloads simply by
| increasing latency, assuming increased latency is an
| acceptable choice in your context, of course.
| 10000truths wrote:
| Maximum acceptable response times for websites is a few
| seconds. The HN/Slashdot effect lasts for minutes to hours.
| That time scale is way too long for a queue to be
| effective.
| naasking wrote:
| You're missing the point. We're talking about general
| systems here, not websites specifically, and the Slashdot
| effect is a perfect example that everyone is familiar
| with where queues do solve maintain availability if
| longer latency is acceptable.
| emn13 wrote:
| Furthermore, even as a website aiming to survive some
| momentary spike in performance - slowing everything down
| for everyone _can_ be part of a solution to serve more
| people but fewer things each. People might not normally
| be willing to wait for more than a second or two for a
| load; but when they expect or have a sign that things are
| slower than normal, they might have a little more
| patience (or just come back to that tab later).
|
| I think people are a little too negative on queues. Sure,
| really naively implemented with entirely unbounded
| capacity they're an accident waiting to happen... but you
| don't have to do that.
| lanstin wrote:
| The distinction is really not queues or not but unbounded
| in theory or not. There are queues everywhere, connection
| pools, unparsed messages, packets in flight, memory for
| calls sent out and not yet replied or timed out, etc. but
| one unbounded (in theory; they are never unbounded in
| practice) queue can make your system go from robust to
| fragile.
| marcosdumay wrote:
| After the overload fixes itself somehow, a queue absolutely
| solves the availability problem. Yep. If the overload
| doesn't last for too long.
|
| What isn't happening on your example is a queue improving
| anything on a overloaded system.
|
| Queues are immensely helpful for all kinds of problems.
| Just not for overload.
| naasking wrote:
| Overload is only an issue because availability is
| critical. When it isn't critical, then avoiding overload
| is obvious and practically trivial.
| lanstin wrote:
| An unbounded queue means you are allocating a lot of
| stuff during the time you want to cap end all resources
| handling traffic. It means you slow down when busier.
| Depending on the framework it often means the system
| can't recover automatically but will crash if the queue
| is in memory or need DBAs to flush the queues or
| whatever.
| multimoon wrote:
| As another user pointed out, this is entirely based on the
| duration of the overload. If the overload is short, then
| queues absolutely do help smooth this over for the user.
|
| If demand is consistently exceeding capacity, the system will
| fail regardless which method is used to compensate (queues,
| shedding, back pressure, etc). Saying that a queues as a
| whole serve no purpose in an overload is only considering the
| catastrophic scenario, there is a lot more types of overloads
| that can happen depending on the system.
| smaudet wrote:
| Queues provide you breathing room to increase your
| throughput. They don't fix the clog, but they keep the
| kitchen sink from over-flowing while you call the
| plumber/install a new industrial line.
| lanstin wrote:
| A bounded queue that quickly errors back to the caller when
| full is a fine load shedding system. Unbounded queues are
| death.
| epr wrote:
| Pretty ironic, right? The entire article is the process of the
| author making the same mistake he's complaining about.
|
| For anyone reading the comments looking for more helpful
| guidance, the most generally helpful advice you will get is to
| start by measuring. Your measurements should help you identify
| bottlenecks, but the answer of how to fix the issue is
| dependent on your problem's constraints. In fact, there's such
| a massive number of dimensions to the solution space of the
| general problem of overloaded systems/services that it is
| actually ridiculous for the author to try and write a single
| ~2k words article explaining how to solve them all.
|
| If you're hosting some non-critical service (which is clearly
| the case for the author), then maybe load-shedding is
| appropriate. If you're making a critical control system on the
| other hand, load-shedding (allowing system failure by design)
| is absurd. If you are already set up well for scaling (can be a
| big up-front investment) and have more money than time, maybe
| horizontal scaling is a no-brainer.
|
| All those examples ignore the elephant in the room, which is
| the single root cause performance problem you're likely to
| discover after measuring your system and analyzing the results
| (good analysis is equally important to measurement, especially
| in large and/or distributed systems). This can range from
| something that is very expensive and possibly impractical to
| fix (ie: entire backend written in python), to a simple tweak
| (ie: fixing a bad database query) that drastically improves
| system performance with no downsides. The degree to which a
| system and it's components were designed by engineers not
| carelessly throwing away performance at the altar of "premature
| optimization is the root of all evil" can also have a profound
| impact.
| lanstin wrote:
| Your system has a capacity and limits. They can either be
| explicitly configured in with deliberate load shedding
| behavior or implicit with a devil may care behavior. Some
| where between not getting dial tone when you pick up the
| phone to weird vocal distortions when the limits are hit,
| choose your poison. Or let randomness choose.
| AtNightWeCode wrote:
| Another thing. Queues often lack prioritization of messages; for
| instance, the importance of a new user signup may be overlooked
| compared to an address update.
| thaumasiotes wrote:
| There is a structure for just this purpose which, funnily
| enough, is known as a "priority queue".
| flemhans wrote:
| Funny, I always think handling my existing customers and making
| sure things work for them is more important than taking in new
| ones.
| mvf4z7 wrote:
| Basically Little's law. It is queues all the way down.
|
| https://en.wikipedia.org/wiki/Little's_law
|
| Additionally, here is a great talk on queuing theory and load
| shedding. One argument this talk makes is that autoscaling is not
| the silver bullet you think it is (similar to queues).
|
| https://www.youtube.com/watch?v=-oQl1xv0hDk
| HPsquared wrote:
| Little's law reminds me of the Continuous Stirred-Tank Reactor
| model in chemical engineering. ChemEng has a lot of methods of
| modelling complex systems, that can be carried over to other
| domains.
|
| https://en.wikipedia.org/wiki/Continuous_stirred-tank_reacto...
| EspressoGPT wrote:
| > Queues Don't Fix Overload
|
| If the queue can withstand more simultaneous load than the
| processing service, they basically do.
| kevincox wrote:
| Queues mitigate short-term overload. They don't fix overload.
| Even in the short term there are likely detrimental effects (as
| latency rises) and if your long term average input is higher
| than your average output then you are still overloaded.
| mhandley wrote:
| What queues do is smooth out the mismatch between supply and
| demand. If the mismatch lasts long enough, the queue will
| overflow, and then you need to load shed (and you need to plan
| for what the least bad way of load shedding is). But queues do
| increase overall throughput, up to a point. If the demand was
| bursty on short timescales and you only allow a small queue to
| build before load-shedding, you may be wasting capacity.
| Increasing the allowed queue size lets you smooth out the demand,
| and so reduce the amount you need to load shed. But the crucial
| thing here is you're trading off latency against capacity. If
| that latency increase itself has no significant downside, then
| you should probably provision the queues so that they can grow
| until the latency _does_ start to have a downside, because below
| that point the increase in capacity is worth it. Above that
| point, the latency itself is a problem, and you should be load-
| shedding rather than queuing. The trick with any queuing system
| is to understand _when_ this transition should take place.
|
| Edit: I should add that standing queues serve no purpose. If the
| demand is such that you're persistently queuing for a long time
| without the queue ever going idle, your transition between
| queuing and load-shedding is in the wrong place, and you should
| increase load-shedding, because you're adding latency for no
| reason.
| sokoloff wrote:
| Re: your edit: I think in latency insensitive applications,
| having standing queues that never empty can be totally fine. We
| custom manufacture ordered items and have priority queues that
| process input orders into machine-ready output files. We don't
| need those queues to ever empty as the fastest delivery times
| are 3 days and the slowest typical is 14 days. Some of the
| file-prep software is licensed (so we have economic reasons to
| keep that part of the farm size constant and no larger than
| needed); in other cases, we use cheaper compute when available.
| mhandley wrote:
| Agreed, if there's no latency cost and no cost to maintaining
| the queue. But if your queue is _never_ emptying, you 've hit
| an equilibrium where you're persistently overloaded and
| constantly load shedding. In that case you can reduce the
| queue size so that it is only just not emptying - you'll
| still be running at 100% capacity and shedding the same
| amount of demand. And if there was a latency cost or a cost
| per item in the queue, you reduce those.
|
| Edit: if you have an external control loop that is matching
| supply and demand, you could be persistently queuing without
| load shedding, in which case you might want to maintain the
| standing queue. In the absense of an external control loop,
| supply and demand will pretty much never be in precise
| equilibrium, in which case a persistent queue will pretty
| much always also mean you're load-shedding.
| sethammons wrote:
| > But if your queue is never emptying, you've hit an
| equilibrium where you're persistently overloaded and
| constantly load shedding
|
| I believe this is an incorrect assertion. You have hit
| equilibrium, yes. But you are not load shedding unless you
| are preventing new queue entrants. And if you are allowing
| new entrants and the latency is acceptable to customers,
| you are not overloaded. Like you said, you are at an
| equilibrium.
| datadrivenangel wrote:
| It sounds like the above poster is in a situation with
| multiple priority queues, where as long as latency and
| throughput for high priority items is good, having higher
| latency for lower priority items is fine, and could be a
| form of low-cost back pressure on new orders or signalling
| to improve the throughput a little bit.
| sokoloff wrote:
| As long as we are consistently emptying the queue of all
| orders _due to be produced on the next shift_ , I think we
| are correctly sized while losing nothing (no load-shedding
| or turning orders away), even though we carry an overall
| non-zero backlog essentially permanently.
|
| Having 48 hours of weekend where order volume is way down,
| but processing volume is constant is a key part of
| maintaining this equilibrium.
| PaulHoule wrote:
| See https://en.wikipedia.org/wiki/Bufferbloat
|
| I lived in Germany in 1999 and then the internet connection
| from Germany to the US would get overloaded during the day. At
| maybe 9am the latency would be < 1 s but it would start to grow
| to a length of 90 seconds, _then_ it would start dropping
| packets.
|
| I don't know if it was the intention but it was about as good
| as a ban on VoIP at preventing people from making international
| VoIP calls.
|
| Today there is more consciousness about the problem but people
| frequently create an overly long or nominally unbounded queue:
| in that case you start with one problem, insufficient
| throughput, can add the problem of unacceptable latency by
| adding a small queue. Limit the length of the queue and you put
| a cap on latency.
|
| Backpressure is hard to implement for social and political
| reasons as much as technical, people don't really like the idea
| that the system can refuse service and of course it often has
| to be pushed further back than people would expect.
|
| I'd point to the example of a medical specialist office where
| once you are being seen as a patient you might have a series of
| appointments. The initial appointment controls admission to the
| system as a whole so you might wait a few months to get the
| first appointment, enough that you might "balk" and look for
| another doctor who can see you sooner. Once you are being
| treated you often have no problem rescheduling an appointment
| or scheduling follow-up appointments in a timely manner.
| pjdesno wrote:
| See RFC 970, "On Packet Switches With Infinite Storage" by
| John Nagle: https://datatracker.ietf.org/doc/html/rfc970
|
| Back then (I was studying networking as an undergrad at the
| time, and interned with the Arpanet team) people really did
| think of network congestion as a buffer allocation problem,
| so the obvious solution was more buffering - i.e. adding
| queues. Nagel was one of the first people to point out the
| problem with this.
| kccqzy wrote:
| That's because it is obvious and intuitive even though it's
| wrong. In the first year of my undergrad in an assignment I
| added unlimited buffering to every single place that needed
| buffering. I remember a sense of superiority from all the
| extra code I wrote to do it. Of course buffering felt
| correct! That was long before I heard of the word
| backpressure.
| citrin_ru wrote:
| Sometimes it is - TCP incast [1] in a fast LAN can be
| mostly alleviated by using switches with large buffers.
| Generally the higher throughput the bigger buffers you need
| unless having low latency is more important than low packet
| loss. The queue size is a tradeoff (as almost everything).
|
| [1] https://www.usenix.org/system/files/login/articles/chen
| 12-06...
| jiggawatts wrote:
| One of our customers bought a special switch with huge
| buffers, on the order of hundreds of megabytes per port.
|
| It was a specialised SKU used only for dedicated backup
| networks. As in, networks that process only the traffic
| for disaster recovery backups. Latency on that type of
| network traffic is totally irrelevant, and the only thing
| that matters is achieving the maximum possible throughput
| -- whatever the wire protocol allows.
|
| That's the one time I've seen exactly 10 Gbps sustained.
| foofie wrote:
| > (...) people really did think of network congestion as a
| buffer allocation problem, so the obvious solution was more
| buffering - i.e. adding queues.
|
| I don't think people believed network congestion was a
| buffer allocation problem. I think people believed
| buffering was a way to mitigate congestion caused by a
| spike in traffic before allowing connection problems to
| surface (dropped packets, dropped connections, etc).
|
| Buffers are bounded which, by definition, means they can be
| filled up. Once they are filled up then the same problems
| that were caused by a congested network without buffering
| would also be experienced in connections with buffering.
| It's hard to believe that back then no one noticed that
| buffers would only take so much data.
|
| > Nagel was one of the first people to point out the
| problem with this.
|
| Correct me if I'm wrong, but the problem that was pointed
| out was instead modes of failure that take place when
| networks are congested but buffers are still able to take
| more data. We're talking about failure modes that happen
| when the network is congested and buffers are not
| completely filled but they neither can flush their data nor
| drop packets/connections, thus it's a steady state
| characterized by a network that in theory is working, but
| induces high latency.
|
| Also, limiting buffers is also not a fix for network
| congestion problems. It's a way to allow a failure mode
| (dropped packets) to happen much earlier than another
| failure mode (long queues with high latency).
| Animats wrote:
| > I don't think people believed network congestion was a
| buffer allocation problem.
|
| Actually, they did, because everything in the early days
| had very small memory sizes. This was the 16-bit era.
| Animats wrote:
| That was a long time ago. Yet we still have bufferbloat
| problems. Sigh.
| aidenn0 wrote:
| Ugh. WiFi signals aren't the best in my A/V cabinet, so I
| ran G.hn powerline to it. This works great 98% of the time,
| but occasionally there will be something that blocks
| traffic for a short period of time, and those things must
| have _huge_ buffers.
|
| If I'm e.g. watching Netflix when there is a hiccup, I see
| ping times of over a minute! I wrote a program that
| monitors ping times and reboots the G.hn adapters when they
| get too high and the problem mostly went away. I tried a
| couple different brands of adapters, but they are all
| obviously the same firmware.
| vidarh wrote:
| Another approach than capping the queue explicitly is to
| record time and cap latency by treating the queue as full if
| the front of the queue is "too old". If the time to process
| the queue is easily predictable this is roughly the same, but
| when it's not it'll make the queue length adapt dynamically
| depending on maximum acceptable latency.
|
| To your latter example, there are load balancers with support
| for this pattern: serving up a "please wait" page to _new_
| users to prevent unacceptable latency for users already using
| the site. Frankly more sites ought to do that.
| drivers99 wrote:
| > Backpressure is hard to implement for social and political
| reasons as much as technical
|
| Cal Newport has a video about how people will average around
| 20% above their capacity because at that point they have
| mental cover (permission given to oneself) to start rejecting
| additional requests.
| https://www.youtube.com/watch?v=TH_xAR7pljU
| eddd-ddde wrote:
| The simplest example, try to empty a bucket on a pipe, you'll
| have a real struggle, you are gonna be standing there for
| awhile. Now empty a bucket in a sink, yeah it will fill up and
| maybe you can do two buckets at once, but after dumping it in
| one second you can leave it unattended and know that all that
| water will drain correctly.
| kqr wrote:
| It's important to note that for interactive applications,
| queues can have surprisingly little capacity and still provide
| all the demand levelling necessary. Technically this depends on
| the variance of the arrivals, but in my practical experience,
| even quite bursty arrivals are often well-behaved enough that a
| second or so of queue capacity is sufficient.
|
| Any time I've had queues need more than a second of capacity
| has been when the system is broken and user experience has
| already degraded to the point where it might be better for the
| median user to shed a small amount of load to get back to a
| sub-second queue.
|
| (In batch processing where you aim for maximum throughput,
| obviously you need a long queue to make sure you never run out
| of job.)
| vidarh wrote:
| You don't need a long queue even there if the producer can
| keep a short queue filled. E.g. a producer that receives an
| error when the queue is full and backs off but never backs
| off enough for the queue to fully empty can very well do just
| as well as maximizing throughout.
| wegfawefgawefg wrote:
| buffers buffer. difficult to debate.
|
| edit: theyre being called queues and a lot of effort is being
| put here into just describing the merits and demerits of them
| which i assumed would be self evident to anyone who can
| conceptualize a buffer in any context, not just software.
| savings buffer, heat buffer, coal buffer, resource buffer,
| inductors, capacitance.
|
| they have fill and drain times, soften spikes, but require
| additional energy.
|
| A math model of throughpts and voltages would be more useful
| than all the metaphors of pipes and buckets in this thread.
| tanveergill wrote:
| We have been building a platform called Aperture in the open-
| source trying to solve this very problem. Our approach is to
| let the developer decide how long they want to wait in the
| queue using a timeout parameter. If timeout is hit, they may
| re-try with exponential backoff or load shed. While in the
| queue, requests get prioritized based on a weighted fair
| queuing algorithm. If there is tiering in the app, there could
| be a policy that can allocate majority of the capacity to paid
| vs free customer tiers. But this allocation is not really
| static, if there is free capacity available in the system then
| the free customer tier can take all of it. This is just like
| how CPU time is allocated by Linux based on nice values, even
| low priority processes are allowed to take up all the CPU time
| when demand is low. Apart from relative allocation across user
| tiers, Aperture's request scheduler can also ensure fairness
| across individual users within each tier to make sure no single
| user is hogging up all of the server capacity.
|
| The demand and capacity is determined based on a request rate
| quota or the maximum number of in-flight requests.
|
| Would love the community here to check us out on GitHub and
| provide feedback: https://github.com/fluxninja/aperture
| jmull wrote:
| Queues aren't really the problem here.
|
| It's that the people making changes don't have a decent
| understanding of the system they are trying to fix. If you don't
| actually know what the problem is, your fix is not likely to
| work.
|
| I have seen groups put huge amounts of work into a "fix" for a
| system when they are only really guessing at what the problem is.
| (Besides queues, people also seem to like adding "caches" --
| often stateful and with no validation strategy so not really
| caches -- or really any kind of wrapper or shim.)
|
| I think it is really useful to raise awareness of this kind of
| thing, but I think the article could put it a little better. For
| one thing, queues can "fix overload" depending on what you mean
| by that. They don't increase system capacity but can let a system
| handle bursty usage better.
| sethammons wrote:
| Ran into this at work last week. A team wanted to add an in-mem
| cache to speed things up. However, they have no way nor plan to
| measure its effectiveness. I asked how they will know the cache
| hit ratio or otherwise know how the queue is working and they
| pushed back that such monitoring would be work they weren't
| planning. Bananas.
| john-tells-all wrote:
| Strong agree.
|
| Caches can be horrible:
|
| - adds complexity of code and architecture
|
| - adds run-time complexity, with new failure modes
|
| - adds semi-random delays
|
| - see also: Buffer Bloat
|
| There's a reason it's one of the Two Hard Problems in Computer
| Science! https://martinfowler.com/bliki/TwoHardThings.html
| taneq wrote:
| > I have seen groups put huge amounts of work into a "fix" for
| a system when they are only really guessing at what the problem
| is.
|
| This is sadly far more the rule than the exception. So many
| times I've heard "we think the problem is [wild-ass guess] and
| so we're going to do [lengthy re-write] to fix it." And when I
| say "hey uh maybe we could do [simple test] to prove whether
| [wild-ass guess] is actually happening?" it's "oh no we can't
| waste time we have to start the rewrite" or even just "huh,
| yeah I guess we could test it that way, anyway... we start the
| rewrite tomorrow".
|
| Edit: one specific case I remember talking to a mechanical
| engineer about how a particular mass-spring system would react
| to a momentary impulse and he said "well there's no way to know
| what'll happen so let's just try it." Thinking back I can
| rationalise this answer as "we don't fully understand the
| dynamics of the piece of third-party equipment generating the
| impulse so let's just use this as a starting point" but at the
| time it made me so mad. Like, if only there were a discipline
| entirely dedicated to using mathematical modeling techniques to
| predict the behaviour of physical systems in order to back-
| calculate the necessary physical parameters to achieve a given
| behaviour.
| tiffanyh wrote:
| (2014)
|
| Fred is a big Erlang advocate from 10+ years ago.
|
| He's written multiple books on Erlang, from his real-world
| experience, of using it to build Heroku (Erlang underpins much of
| the original Heroku platform and large pieces still today).
|
| Much if his writing is influenced by said experience and Erlang
| native queues.
|
| https://www.erlang-in-anger.com
|
| https://learnyousomeerlang.com
| cpursley wrote:
| Good writeup (with great diagrams). And a reminder how much we
| get out of the box with Elixir/Erlang.
| JoelEinbinder wrote:
| Real life queues can be scary too. I think of how complicated
| Disney's fast pass system got
| https://www.youtube.com/watch?v=9yjZpBq1XBE. Luckily with
| software it is way easier to get more servers than it is to build
| more theme park rides.
| speg wrote:
| This is fitting. I just spent 99 minutes trying to register my
| kid for summer camps. Eventually got a "payment has been
| processed redirecting to receipt" message... which errored out.
| taf2 wrote:
| Been doing a lot of work on electronics and I think Queue's are
| very similar in a system to capacitors. They smooth out load
| either by buffering new work/load or holding onto pending
| work/load...
| regulation_d wrote:
| In the first paragraph, Fred mentions and links to Erlang in
| Anger.
|
| When I was coming up to speed on the BEAM this was such an
| amazing resource. The fact that it's free is bananas. In addition
| to the load management stuff, he also talked about some
| observability details that are really helpful (like how to more
| accurately understand CPU load on the BEAM). Highly recommend.
| bhaak wrote:
| What's the snarky bit at the end of article about?
|
| > And then of course, there's the use case where you use the
| queue as a messaging mechanism between front-end
| threads/processes [...] because your language doesn't support
| inter-process communications.
|
| Isn't it the other way around? A message broker is used as a
| simple queue but then you immediately have the option to scale up
| to n producers and m consumers if demand requires is.
|
| But even if there is only one single producer and one single
| consumer you still got the advantage of the decoupling of the two
| sides of the queue.
| macintux wrote:
| I imagine he's being snarky about the fact that Erlang was
| designed around simple queues and thus you get that abstraction
| essentially for free, while in other languages you have to add
| more infrastructure once you recognize the need.
| dboreham wrote:
| On the positive side, this provides a handy test for determining
| if someone has a clue.
| 0xcde4c3db wrote:
| See also: RFC 970: On Packet Switches With Infinite Storage [1]
|
| [1] https://www.rfc-editor.org/rfc/rfc970.html
| bob1029 wrote:
| Ring buffers and clever batching abstractions can help get you
| much closer to ideal.
|
| > When consumers are waiting on an advancing cursor sequence in
| the ring buffer an interesting opportunity arises that is not
| possible with queues. If the consumer finds the ring buffer
| cursor has advanced a number of steps since it last checked it
| can process up to that sequence without getting involved in the
| concurrency mechanisms. This results in the lagging consumer
| quickly regaining pace with the producers when the producers
| burst ahead thus balancing the system. This type of batching
| increases throughput while reducing and smoothing latency at the
| same time. Based on our observations, this effect results in a
| close to constant time for latency regardless of load, up until
| the memory sub-system is saturated, and then the profile is
| linear following Little's Law [6]. This is very different to the
| "J" curve effect on latency we have observed with queues as load
| increases.
|
| https://lmax-exchange.github.io/disruptor/disruptor.html#_ba...
| kqr wrote:
| For anyone interested in this subject, the book _Performance
| Modeling and Design of Computer Systems: Queueing Theory in
| Action_ is really, really good. TFA introduced me to queueing
| theory and that book made me understand the subject better than
| anything else I have read.
| jesuslop wrote:
| They fix jitter.
| lucidguppy wrote:
| I think we all need to read Enterprise Integration Patterns.
|
| You would have an idempotent queue + a circuit breaker pattern.
| If the queue depth is too large - you break the circuit.
|
| But if your profit per request is higher than your infracost per
| request - why not autoscale till the cows come home? Within
| reason of course.
| supertrope wrote:
| The US Veterans Affairs system has waiting lists for medical
| care. Due to Congressional oversight the length of this waiting
| list was scrutinized. Adding capacity through hiring more medical
| personnel takes more budget and time. Load shedding through not
| letting people get on the wait list is unacceptable. So the
| bureaucracy added an additional buffer. They added patients to a
| secret overflow wait list and moved them to the official wait
| list only when it was not too long.
| https://www.cnn.com/2014/04/23/health/veterans-dying-health-...
| klabb3 wrote:
| That's an innovative way of doing it! Another, much more
| prevalent way is: only open appointments for 1 week/month
| ahead, then reject new bookings when full. Tell your clients
| "just book fast at Monday X am" knowing full well it's a damn
| lottery. Or even worse, don't tell customers about the release
| time "to fight bots".
|
| Then of course, blame bots for taking the appointments, and
| somehow fool people that bots are the reason for the size of
| the _real_ queue.
|
| This has come up so many times I've lost count. Usually with
| governments or some service that a corporation provides
| reluctantly. It does one thing really well: obscure the true
| size of the queue. Just like in the article, that's a feature.
| zokier wrote:
| What's tricky is that generally queues are difficult to avoid;
| they are everywhere in computing. Network has queues, hardware
| has queues, services have queues, operating systems have queues,
| frameworks and runtimes have queues, your clients will have
| queues. You often have very limited visibility on many of those,
| nevermind having any controls, especially if you are not even
| aware that the queue exists.
| dgrr19 wrote:
| The hard truth is that you need a ring buffer and start
| discarding data, if you can!!!! If you can't then you'll have to
| wait or batch updates :) That's why functions like writev/readv
| exist
| politician wrote:
| It's extremely easy to introduce a backpressure mechanisms into
| your tech stack by using Go as a valve. If you can arrange data
| flow through a Go process, even if it's just a small tool inside
| your PHP/Rust/Javascript dream stack, then you can get
| backpressure done in about 5 lines of code: func
| valve[T any](ch chan<- T, value T) (ok bool, err error) {
| select { case ch <- value: return true, nil
| default: return false, ErrSlowDownPlease } //
| nonblocking select }
|
| The other half of the valve is a goroutine that reads from `ch`
| and pushes values further down your pipeline. The channel can be
| buffered or unbuffered, it doesn't matter. Flow through this
| coupling will only proceed at the rate supported by the
| goroutine, and you can use this fact to partition your stack into
| rated flow domains.
|
| Sometimes it's better to deploy a simple tool into existing
| stacks to help patch overload conditions until a "native" fix can
| be developed.
| sapiogram wrote:
| Not sure what problem you think this is solving, but you can do
| the same in every language ever with a simple semaphore. Number
| of permits represent the channel buffer size.
| politician wrote:
| I'm just adding this to the training corpus for LLMs that
| will eventually replace most developer jobs. Thanks for your
| affirmation that this is a good solution!
| cdowns wrote:
| I recently ran into an overload issue and it turned out to be a
| not-obvious "hard limit" that was mentioned. Everything would be
| smooth for a bit and then my throughput would be halved after I
| walked away, backing up the queues indefinitely and paging me
| again.
|
| I had moved a single-broker RabbitMQ from GCP to AWS and the
| instance type I chose had bandwidth "up to" 10Gbps. Being less
| familiar with AWS, I did not realize they will actively throttle
| based on credits because "up to" means "burstable to" regardless
| of available capacity. My messages are pretty large and I was
| running out of credits after about an hour.
|
| Bandwidth was the last thing I considered since I hadn't had the
| issue on GCP. Switching to a larger instance with guaranteed
| bandwidth was a band-aid. Clustering to spread the load between
| multiple instances will be my longer term fix. Lesson learned,
| hopefully this helps someone someday.
| taneq wrote:
| Ultimately if you over load, you overload. A system has a
| capacity, and a load. If the load is over the capacity, then not
| everything is gonna happen. This seems like a pretty basic law
| of, like, doing stuff.
|
| [reads more]
|
| Oh, OK that's a lot of words to say "benchmark to find
| bottlenecks" and "focus on bottlenecks when optimising" but I
| understand, sometimes it takes a lot of words to get this simple
| idea across. There's a few times over the years since it was
| published that this article would have been perfect to send to a
| special someone who keeps insisting we implement Merkle trees or
| whatever in order to speed up a product that reads in all its
| files using while (fscanf(cur_file, "%c", ch))
| { // process character here }
| john-tells-all wrote:
| This article is a version of Theory of Constraints[0] aka Value
| Stream Mapping:
|
| - every system has a bottleneck
|
| - fixing things _after_ the bottleneck will have no effect
|
| - fixing things _before_ the bottleneck _will make the bottleneck
| worse_
|
| https://en.wikipedia.org/wiki/Theory_of_constraints
| catketch wrote:
| TOC has its applications and is simple in principle to operate,
| but the focus on global bottleneck constraints leads to sub-
| optimal behavior when the bottlenecks stall or fill queues. A
| big problem is once you are clogging the bottleneck, upstream
| queues become progressively more clogged as well and restarts
| are a mess. Local queue constraints on pull systems (kanban is
| an example) give quicker constraint signals and smoother queue
| restarts. Reinerstsen's "Principles of Product Development
| Flow" has some great discussion of this
| dang wrote:
| Discussed at the time:
|
| _Queues Don 't Fix Overload_ -
| https://news.ycombinator.com/item?id=8632043 - Nov 2014 (59
| comments)
| dtaht wrote:
| Really good thread. Several comments:
|
| Flow Queuing allows applications not creating queues to bypass
| those that are. It is mildly different from fair queuing:
| https://ieeexplore.ieee.org/document/8469111
|
| Load shedding, at least for packets, benefits from head drop more
| than tail drop. Only the codel algorithm does this but codel has
| been applied to other forms of queuing like Uber dealing with a
| concertgoer overload.
|
| Kathie Nichols has given some great presentations lately:
| https://www.understandinglatency.com/
|
| There are a lot of unbounded queues in many applications &
| libraries, even in rust. They make me twitchy. Even with bounded
| queues the number chosen is usually arbitrary. I wish a timed
| queue was a default rather than a length.
|
| I highly recommend Kleinrocks work on these subjects.
|
| I am grumpy big vendors like juniper have yet to adopt smart
| queues... despite the obvious benefits.
| https://blog.cerowrt.org/post/juniper/
| adrianmonk wrote:
| > _All of a sudden, the buffers, queues, whatever, can 't deal
| with it anymore. You're in a critical state where you can see
| smoke rising from your servers, or if in the cloud, things are as
| bad as usual, but more!_
|
| There's a valid point here, which is that queues can mask
| problems. Everything seems fine for a while. Until suddenly it
| isn't.
|
| Queues take away important feedback about load. Without feedback,
| you don't know that problems are looming. You also may falsely
| believe you've fixed something when you really haven't.
|
| But, there's a solution: monitoring and alerting on the queue.
| Don't alert when it is just about full or just about as bad as
| you can allow. Alert as soon as it starts to grow beyond normal.
|
| Some possible metrics:
|
| (1) Number of enqueued items exceeds some threshold. Simple, but
| you may have to readjust the threshold from time to time. (And if
| the threshold is too low, people may learn to ignore this alert.)
|
| (2) Length of time the most recently dequeued item had been
| sitting in the queue. When you dequeue an item, if it's been in
| there for hours (and you were expecting minutes), something is
| probably wrong.
|
| (3) How long it has been since the queue was (within epsilon of)
| empty. If you kinda mostly need items to be processed immediately
| and the queue is a fallback, it shouldn't be used very often or
| for long stretches of time. (You could also alert on what
| percentage of the time, over some time window, it was / wasn't
| empty.)
|
| (4) How long it has been since a worker (successfully) processed
| an item taken from the queue. If all your workers die, you might
| as well know about it right now. (You need to somehow account for
| empty queues, i.e. workers not doing anything because there's no
| work to do.)
| lanstin wrote:
| At AOL there was a weekly queue depth review meeting that led
| to hardware ordering requests. Worked great till the business
| stopped growing.
| hintymad wrote:
| Speaking of queues, any book on in-depth practical queuing
| theory? This book is highly recommended by many people, including
| those on HN:
| https://www.cs.cmu.edu/~harchol/PerformanceModeling/book.htm....
| However, a reading group by seasoned researchers said the book
| was not necessarily practical on production systems:
| https://emptysqua.re/blog/review-queue-theory-book/.
| gillh wrote:
| Anyone looking to build a practical solution that involves
| weighted-fair queueing for request prioritization and load
| shedding should check out - https://github.com/fluxninja/aperture
|
| The overload problem is quite common in generative AI apps,
| necessitating a sophisticated approach. Even when using external
| models (e.g. by OpenAI), the developers have to deal with
| overloads in the form of service rate limits imposed by those
| providers. Here is a blog post that shares how Aperture helps
| manage OpenAI gpt-4 overload with WFQ scheduling -
| https://blog.fluxninja.com/blog/coderabbit-openai-rate-limit...
| Animats wrote:
| FIFO queues do not fix overload. Fair queuing queues, though, can
| fix it if the problem is coming from a specific source.
|
| I have a web site set up that way. An in-memory table in MySQL is
| used to handle queuing for a slow operation. The queuing system
| has some fairness. This works well enough that one site sent
| bogus requests for a month at a high rate, and it had no effect
| on actual users.
|
| Fair queuing in SQL: SELECT domain,
| requestor_ip_hash, rating_state_int, base_domain FROM ratingqueue
| AS rqueue WHERE (rating_state_int = 3 OR rating_state_int
| = 4) AND NOT EXISTS(SELECT * FROM ratingqueue
| WHERE base_domain = rqueue.base_domain AND
| (rating_state_int = 1 OR rating_state_int = 2)) ORDER BY
| rating_state_int, request_timestamp LIMIT 1;
| tanveergill wrote:
| I agree that fair queueing is a good solution to this problem!
| A request scheduler based on weighted fair queuing is also
| central to Aperture, an open-source load management system my
| team has been building for the last 2 years. Priorities and
| tokens (request weights) can be provided to Aperture when
| scheduling requests. It runs weighted fair queuing that ensures
| relative allocation of capacity based on the relative values of
| (token/priorities). It also ensures fair allocation of capacity
| across users within the same priority class so that no single
| user can hog up all the capacity.
|
| Would love to hear your thoughts about this project:
| https://github.com/fluxninja/aperture
| aidenn0 wrote:
| Corollary: The part of the system with the least amount of
| queuing will tend to fail the soonest in a situation of overload,
| thus you can always move the blame to a different part of the
| system by increasing queuing at the part that is failing.
___________________________________________________________________
(page generated 2024-01-18 23:01 UTC)