[HN Gopher] Fair: A Go library for serving resources fairly
___________________________________________________________________
Fair: A Go library for serving resources fairly
Author : ngaut
Score : 168 points
Date : 2024-09-14 20:00 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| nstateful wrote:
| Very interesting and looking forward to trying this. I am a big
| fan of SFB for this type of stuff but haven't seen anything in
| distributed space that's beyond a science project. Would be great
| if I can use it on my 10k+ user web site.
| AnnaMere wrote:
| Extremely interesting and valuable
| tazu wrote:
| Does anyone have some real-world use cases for something like
| this? The algorithm is cool but I'm struggling to see where this
| is applicable.
| codaphiliac wrote:
| Thinking this could be useful in a multi tenants service where
| you need to fairly allocate job processing capacity across
| tenants to a number of background workers (like data export api
| requests, encoding requests etc.)
| jawns wrote:
| That was my first thought as well. However, in a lot of real
| world cases, what matters is not the frequency of requests,
| but the duration of the jobs. For instance, one client might
| request a job that takes minutes or hours to complete, while
| another may only have requests that take a couple of seconds
| to complete. I don't think this library handles such cases.
| codaphiliac wrote:
| defining a unit of processing like duration or quantity and
| then feeding the algorithm with the equivalent of units
| consumed (pre or post processing a request) might help.
| dtjohnnymonkey wrote:
| To mitigate this case you could limit capacity in terms of
| concurrency instead of request rate. Basically it would be
| like a fairly-acquired semaphore.
| hinkley wrote:
| I believe nginx+ has a feature that does max-conns by IP
| address. It's a similar solution to what you describe. Of
| course that falls down wrt fairness when fanout causes
| the cost of a request to not be proportional to the
| response time.
| hinkley wrote:
| Lots of heuristics continue to work pretty well as long as
| the least and greatest are within an order of magnitude of
| each other. It's one of the reasons why we break stories
| down to 1-10 business days. Anything bigger and the
| statistical characteristics begin to break down.
|
| That said, it's quite easy for a big job to exceed 50x the
| cost of the smallest job.
| mnadkvlb wrote:
| I responded above, but it could be used maybe for network
| libraries for eg. libvirt. I did my thesis on this topic a
| couple years ago.
|
| I am very intrigued to find out how this would fit in, if at
| all.
| otterley wrote:
| Rate limiters are used to protect servers from overload and to
| prevent attackers--or even legitimate but unintentionally
| greedy tenants--from starving other tenants of resources. They
| are a key component of a resilient distributed system.
|
| See, e.g.,
| https://docs.aws.amazon.com/wellarchitected/latest/framework...
|
| This project, however, looks like a concurrency limiter, not a
| rate limiter. I'm also not sure how it works across a load-
| balanced cluster.
| salomonk_mur wrote:
| Why would you learn and use this over typical load-balancing
| solutions like K8S? Honest question.
| joshuanapoli wrote:
| In a multi-tenant system, you might have one customer who drops
| a big job that creates a huge number of tasks. We'd like to
| process this as fast as possible, but not block the tasks of
| small jobs from other customers, which should normally be
| completed very quickly.
| dpatterbee wrote:
| Isn't the exactly the problem that preemptive multitasking is
| built for? For example any program built on the BEAM[1]
| wouldn't have this problem presumably. Do most languages not
| have a standard solution for this?
|
| [1]: https://en.m.wikipedia.org/wiki/BEAM_(Erlang_virtual_mac
| hine...
| maxmcd wrote:
| In preemptive multitasking you are trying to get all the
| work done as quickly as possible, but with FAIR and similar
| systems you want to make sure that a single
| client/user/resource cannot steal an inordinate amount of
| capacity.
|
| I do not think languages/runtimes typically implement that
| kind of prioritization/limiting mechanism.
| jerf wrote:
| Pre-emptive multitasking is a tool that a solution might
| use, but it is not a solution on its own. If you have three
| users spawning one thread/process/task and one user
| spawning a million (literally, not figuratively), that one
| user can easily completely starve the three little users.
| Some of their million tasks may also be starved. But
| they'll get more done overall.
|
| This whole problem gets way more complicated than our
| intuition generally can work with. The pathological
| distribution of the size of various workloads and the
| pathological distribution of the variety of resources that
| tasks can consume is not modeled well by our human brains,
| who really want to work with tasks that are essentially
| uniform. But they never are. A lot of systems end up
| punting, either to the OS which has to deal with this
| anyhow, or to letting programs do their own cooperative
| internal scheduling, which is what this library implements.
| In general "but what if I 'just'" solutions to this problem
| have undesirable pathological edge cases that seem like
| they "ought" to work, especially at the full generality of
| an operation system. See also the surprisingly difficult
| task of OOM-killing the "correct" process; the "well
| obviously you 'just'" algorithms don't work in the real
| world, for a very similar reason.
|
| As computers have gotten larger, the pathological
| distributions have gotten worse. To be honest, if you're
| thinking of using "fair" it is likely you're better off
| working on the ability to scale resources instead. There's
| a niche for this sort of library, but it is constantly
| shrinking relative to the totality of computing tasks we
| want to perform (even though it is growing in absolute
| terms).
| udkl wrote:
| Appreciate this very insightful comment
| kgeist wrote:
| We had to make something similar. We have both huge tenants
| (200k users in one tenant) and small tenants with 10 users.
| Sometimes there are spikes when a large tenant generates
| thousands of jobs. In a naive implementation, 1 tenant would
| be able to completely block job processing for all other
| tenants. We have to make sure the next job is picked from a
| different tenant each time, so that all tenants were served
| fairly. However, a large tenant may end up waiting for its
| jobs to complete for too long. In that case, we move such a
| tenant to a different infrastructure (sometimes, fully
| dedicated).
| remram wrote:
| Those are completely unrelated. K8s does not provide client
| rate-control and FAIR does not do load-balancing. It appears
| you misunderstood what this is.
| otterley wrote:
| They are complementary solutions, not substitutes. Load
| balancers distribute traffic among servers and are a key
| component to enable horizontal scalability. Throttling is a
| prophylactic feature that prevents attackers or greedy
| consumers from overutilizing capacity and depriving it from
| other legitimate users. This library relates to the latter.
|
| Unfortunately the title of the GitHub repo ("A Go library for
| serving resources fairly") is misleading. This is not a server;
| it's a library that a server can utilize to determine whether a
| request has exceeded fairness bounds and should be rejected
| with an HTTP 429 (too many requests) response.
| ignoramous wrote:
| > _Since the state is stored in a multi-level Bloom Filter style
| data structure, the memory needed is constant and does not scale
| with the number of clients._
|
| Constant memory, but those hashes will take up CPU cycles. If
| you're running a workload that completes sub 20 milliseconds,
| these cycles spent hashing may not be worth it over, say, a
| _constant-time_ admission control like token bucket.
| mnadkvlb wrote:
| Absolutely my thought as well. During my thesis i found token
| buckets to be better and more fair compared to other algos[1].
| This was in libvirt. my finding was that it scaled well up to
| around 1k buckets if memory serves right. after that the
| results were weird to say the least. (of course the servers i
| tested on were quite old with not too much memory). Would be
| nice to run my thesis tests again to find what happened in the
| last decade.
|
| I tried writing another algorithm for network splitting but
| didn't get any better results. [1]:
| https://www.csg.uzh.ch/publications/details.php?id=1007
| remram wrote:
| I'm not exactly sure what you propose, a token bucket per
| client? In a hashmap client=>bucket?
| otterley wrote:
| That's the typical design for a token-bucket rate limiter. A
| token-bucket limiter uses more memory but is a simpler
| design. I believe this bloom-filter based implementation is
| designed to be more memory efficient at the cost being less
| CPU efficient.
|
| As usual, tradeoffs are everywhere.
| hinkley wrote:
| Bloom filters really shine when they avoid a network round
| trip or pulling data from disk. Those are so far away you
| can easily afford to spend 5% of the cost of the request to
| avoid 99% of the requests.
| otterley wrote:
| Agreed! In a typical token bucket implementation, bucket
| state must be shared across multiple servers, which means
| you need a shared state repository such as Redis or a
| database (SQL or noSQL). This can add milliseconds to
| each request being handled.
| foobazgt wrote:
| Rate limiters seem like the poster child for something
| you'd want to host in an in-memory KVS. Typical access
| times look more like ~100us, not milliseconds. Even if
| you have a complicated algorithm, you can still limit it
| to a single round trip by using a Redis Function or some
| moral equivalent.
| otterley wrote:
| That would work if you can assign requests from a given
| tenant to a single instance, but there are many
| situations in which that's either impossible or unwise.
| What if a single server doesn't have enough capacity to
| handle all the traffic for a tenant? How do you preserve
| the state if that instance fails?
| foobazgt wrote:
| I'm sorry, I don't think I understand your question. Are
| you talking about the KVS? You shard the servers for
| extra capacity. Several KVS's have built-in clustering if
| you want to go that route. They're usually incredibly
| stable, but if one goes down for whatever reason (say the
| physical machine fails), you just spin another one up to
| take its place.
|
| In terms of preserving state, the answer for rate
| limiting is that it is almost always far, far less
| dangerous to fail open than it is to deny requests during
| a failure. If you really, really, wanted to preserve
| state (something I'd suggest avoiding for a rate
| limiter), several KVS's have optional persistence you can
| turn on, for example, Redis' AOF.
|
| The end services themselves should be designed with some
| sort of pushback mechanism, so they shouldn't be in any
| danger of overloading, regardless of what's going on with
| the rate limiter.
| otterley wrote:
| I think I misunderstood what you were saying. By "in-
| memory KVS" with "access times in the microseconds" I
| thought you were implying a KVS hosted on the server that
| handles the requests. Otherwise, even if the KVS can
| respond in 100us to a local query, network latency is
| going to add much more than that.
| foobazgt wrote:
| Ah ok, that makes sense. Over loopback, you can roundtrip
| Redis at least as fast as double-digit micros. Intra-DC,
| your network latency should be somewhere in the triple-
| digit micros. I'd say if you're not seeing that,
| something is probably wrong.
| hinkley wrote:
| I don't recall all the details but we did end up with lua
| to talk to redis or memcached to do some traffic shaping
| at one point. One for bouncing to error pages before we
| switched to CDN (long story), and another one for doing
| something too clever by half about TTFB. It's still
| really cheap especially on a box that is just doing load
| balancing and nothing else.
|
| If you wanted to throw another layer of load balancer in,
| there are consistent hashing-adjacent strategies in
| nginx+ that would allow you to go from 2 ingress routers
| to 3 shards with rate limiters to your services, using
| one KV store per box. But I highly suspect that the
| latency profile there will look remarkably similar to
| ingress routers doing rate limiting talking to a KV store
| cluster.
| jrockway wrote:
| You can also rate limit on the server side. Service A
| talks to Service B. Service B is 3 replicas. Each replica
| keeps track of how many times Service A talked to it. If
| that goes over the limit / 3, deny the request. Now there
| is no IPC required.
|
| This isn't as accurate, but it's often adequate. I have
| never liked making a network request to see if I can make
| a network request; too slow. (I do use this architecture
| to rate-limit my own website, mostly because I wanted to
| play with writing a service to do that. But I'd hesitate
| to make someone else use that.)
| hinkley wrote:
| > This can add milliseconds
|
| My previous (very long) project was in such a state when
| I got there that it was only in the last year I was there
| that measuring things in microseconds was something I
| could get on other people's radars. I wish I had started
| sooner because I found a _lot_ of microseconds in about a
| four month period. That was the most intensive period of
| user-visible latency reduction I saw the entire time it
| was there and second through fourth place took years of
| manpower to accomplish. Past me and future me are both
| still mad about that.
| roboben wrote:
| Shouldn't this be built into a queue somehow? I'd love to see a
| queuing solution like SQS but has a built in fairness, where you
| can fully utilize a capacity but as soon as, let's say customers
| compete on resources, some fairness kicks in. Is there anything
| like that?
___________________________________________________________________
(page generated 2024-09-16 23:00 UTC)