[HN Gopher] Graceful behavior at capacity
___________________________________________________________________
Graceful behavior at capacity
Author : ingve
Score : 57 points
Date : 2023-08-08 06:41 UTC (1 days ago)
(HTM) web link (blog.nelhage.com)
(TXT) w3m dump (blog.nelhage.com)
| ChrisMarshallNY wrote:
| I'm writing an app that is likely to have, at most, a thousand
| users, for the next couple of years.
|
| I'm testing it with 12,000 fake users.
|
| It works great.
| rdoherty wrote:
| I learned most of this the hard way as a SRE. How systems behave
| at and over their limits is far more important than how they
| behave under them. A system that is 'forgiving' (aka resilient)
| is worth its weight in gold. Otherwise you get into downward
| spirals with systems that can't recover unless they are rebooted.
| Great read!
| thewakalix wrote:
| From my armchair, I'm not sure that "random drop" actually does
| decrease latency. Most clients will just repeat the request,
| resulting in an "effective latency" of however many times it gets
| randomly dropped. The queue is now implicit, and I'd guess that
| it's less efficient to carry out several request/drop cycles than
| to just leave the client in a straightforward queue.
| notacoward wrote:
| FWIW, the single paragraph about "fair allocation" could be its
| own thesis. This gets into quality of service, active queue
| management, leaky buckets, deficit round robin, and so on _ad
| infinitum_. I did quite a bit of work on this on multiple
| projects at multiple companies, and it 's still one of the very
| few algorithmic areas that I still think about in retirement. I
| highly recommend following up on some of the terms above for some
| interesting explorations.
| tra3 wrote:
| Most of the microservice code I see is response =
| fetch(url, payload) if (response.error) ...
|
| but 99% of the folks I ask what is going to happen when the fetch
| does NOT error out but instead takes 10 seconds look at me like
| I'm speaking gibberish.
|
| This is the single biggest reason for cascading failures I see.
|
| Netflix has dealt with it via their Hystrix library (open
| source). These days it seems like a proxy like Consul is the way
| to go. It encapsulates all of the fancy logic (like circuit
| breakers and flow control) so your service doesn't have to.
| jiggawatts wrote:
| As an ardent fan of monoliths and how they generally avoid such
| tar pits, I have to acknowledge that service-oriented
| architectures have their uses.
|
| So do we all have to keep reinventing these wheels, but only
| after a production outage?
|
| Or is it time someone started work on a distributed operating
| system? Vaguely like Kubernetes but full-featured?
|
| I keep seeing the same patterns being re-engineered over and
| over. Maybe it's time to refactor these out...
| klooney wrote:
| It's more work, is the simple answer.
___________________________________________________________________
(page generated 2023-08-09 23:00 UTC)