[HN Gopher] Fair: A Go library for serving resources fairly
       ___________________________________________________________________
        
       Fair: A Go library for serving resources fairly
        
       Author : ngaut
       Score  : 168 points
       Date   : 2024-09-14 20:00 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | nstateful wrote:
       | Very interesting and looking forward to trying this. I am a big
       | fan of SFB for this type of stuff but haven't seen anything in
       | distributed space that's beyond a science project. Would be great
       | if I can use it on my 10k+ user web site.
        
       | AnnaMere wrote:
       | Extremely interesting and valuable
        
       | tazu wrote:
       | Does anyone have some real-world use cases for something like
       | this? The algorithm is cool but I'm struggling to see where this
       | is applicable.
        
         | codaphiliac wrote:
         | Thinking this could be useful in a multi tenants service where
         | you need to fairly allocate job processing capacity across
         | tenants to a number of background workers (like data export api
         | requests, encoding requests etc.)
        
           | jawns wrote:
           | That was my first thought as well. However, in a lot of real
           | world cases, what matters is not the frequency of requests,
           | but the duration of the jobs. For instance, one client might
           | request a job that takes minutes or hours to complete, while
           | another may only have requests that take a couple of seconds
           | to complete. I don't think this library handles such cases.
        
             | codaphiliac wrote:
             | defining a unit of processing like duration or quantity and
             | then feeding the algorithm with the equivalent of units
             | consumed (pre or post processing a request) might help.
        
             | dtjohnnymonkey wrote:
             | To mitigate this case you could limit capacity in terms of
             | concurrency instead of request rate. Basically it would be
             | like a fairly-acquired semaphore.
        
               | hinkley wrote:
               | I believe nginx+ has a feature that does max-conns by IP
               | address. It's a similar solution to what you describe. Of
               | course that falls down wrt fairness when fanout causes
               | the cost of a request to not be proportional to the
               | response time.
        
             | hinkley wrote:
             | Lots of heuristics continue to work pretty well as long as
             | the least and greatest are within an order of magnitude of
             | each other. It's one of the reasons why we break stories
             | down to 1-10 business days. Anything bigger and the
             | statistical characteristics begin to break down.
             | 
             | That said, it's quite easy for a big job to exceed 50x the
             | cost of the smallest job.
        
         | mnadkvlb wrote:
         | I responded above, but it could be used maybe for network
         | libraries for eg. libvirt. I did my thesis on this topic a
         | couple years ago.
         | 
         | I am very intrigued to find out how this would fit in, if at
         | all.
        
         | otterley wrote:
         | Rate limiters are used to protect servers from overload and to
         | prevent attackers--or even legitimate but unintentionally
         | greedy tenants--from starving other tenants of resources. They
         | are a key component of a resilient distributed system.
         | 
         | See, e.g.,
         | https://docs.aws.amazon.com/wellarchitected/latest/framework...
         | 
         | This project, however, looks like a concurrency limiter, not a
         | rate limiter. I'm also not sure how it works across a load-
         | balanced cluster.
        
       | salomonk_mur wrote:
       | Why would you learn and use this over typical load-balancing
       | solutions like K8S? Honest question.
        
         | joshuanapoli wrote:
         | In a multi-tenant system, you might have one customer who drops
         | a big job that creates a huge number of tasks. We'd like to
         | process this as fast as possible, but not block the tasks of
         | small jobs from other customers, which should normally be
         | completed very quickly.
        
           | dpatterbee wrote:
           | Isn't the exactly the problem that preemptive multitasking is
           | built for? For example any program built on the BEAM[1]
           | wouldn't have this problem presumably. Do most languages not
           | have a standard solution for this?
           | 
           | [1]: https://en.m.wikipedia.org/wiki/BEAM_(Erlang_virtual_mac
           | hine...
        
             | maxmcd wrote:
             | In preemptive multitasking you are trying to get all the
             | work done as quickly as possible, but with FAIR and similar
             | systems you want to make sure that a single
             | client/user/resource cannot steal an inordinate amount of
             | capacity.
             | 
             | I do not think languages/runtimes typically implement that
             | kind of prioritization/limiting mechanism.
        
             | jerf wrote:
             | Pre-emptive multitasking is a tool that a solution might
             | use, but it is not a solution on its own. If you have three
             | users spawning one thread/process/task and one user
             | spawning a million (literally, not figuratively), that one
             | user can easily completely starve the three little users.
             | Some of their million tasks may also be starved. But
             | they'll get more done overall.
             | 
             | This whole problem gets way more complicated than our
             | intuition generally can work with. The pathological
             | distribution of the size of various workloads and the
             | pathological distribution of the variety of resources that
             | tasks can consume is not modeled well by our human brains,
             | who really want to work with tasks that are essentially
             | uniform. But they never are. A lot of systems end up
             | punting, either to the OS which has to deal with this
             | anyhow, or to letting programs do their own cooperative
             | internal scheduling, which is what this library implements.
             | In general "but what if I 'just'" solutions to this problem
             | have undesirable pathological edge cases that seem like
             | they "ought" to work, especially at the full generality of
             | an operation system. See also the surprisingly difficult
             | task of OOM-killing the "correct" process; the "well
             | obviously you 'just'" algorithms don't work in the real
             | world, for a very similar reason.
             | 
             | As computers have gotten larger, the pathological
             | distributions have gotten worse. To be honest, if you're
             | thinking of using "fair" it is likely you're better off
             | working on the ability to scale resources instead. There's
             | a niche for this sort of library, but it is constantly
             | shrinking relative to the totality of computing tasks we
             | want to perform (even though it is growing in absolute
             | terms).
        
               | udkl wrote:
               | Appreciate this very insightful comment
        
           | kgeist wrote:
           | We had to make something similar. We have both huge tenants
           | (200k users in one tenant) and small tenants with 10 users.
           | Sometimes there are spikes when a large tenant generates
           | thousands of jobs. In a naive implementation, 1 tenant would
           | be able to completely block job processing for all other
           | tenants. We have to make sure the next job is picked from a
           | different tenant each time, so that all tenants were served
           | fairly. However, a large tenant may end up waiting for its
           | jobs to complete for too long. In that case, we move such a
           | tenant to a different infrastructure (sometimes, fully
           | dedicated).
        
         | remram wrote:
         | Those are completely unrelated. K8s does not provide client
         | rate-control and FAIR does not do load-balancing. It appears
         | you misunderstood what this is.
        
         | otterley wrote:
         | They are complementary solutions, not substitutes. Load
         | balancers distribute traffic among servers and are a key
         | component to enable horizontal scalability. Throttling is a
         | prophylactic feature that prevents attackers or greedy
         | consumers from overutilizing capacity and depriving it from
         | other legitimate users. This library relates to the latter.
         | 
         | Unfortunately the title of the GitHub repo ("A Go library for
         | serving resources fairly") is misleading. This is not a server;
         | it's a library that a server can utilize to determine whether a
         | request has exceeded fairness bounds and should be rejected
         | with an HTTP 429 (too many requests) response.
        
       | ignoramous wrote:
       | > _Since the state is stored in a multi-level Bloom Filter style
       | data structure, the memory needed is constant and does not scale
       | with the number of clients._
       | 
       | Constant memory, but those hashes will take up CPU cycles. If
       | you're running a workload that completes sub 20 milliseconds,
       | these cycles spent hashing may not be worth it over, say, a
       | _constant-time_ admission control like token bucket.
        
         | mnadkvlb wrote:
         | Absolutely my thought as well. During my thesis i found token
         | buckets to be better and more fair compared to other algos[1].
         | This was in libvirt. my finding was that it scaled well up to
         | around 1k buckets if memory serves right. after that the
         | results were weird to say the least. (of course the servers i
         | tested on were quite old with not too much memory). Would be
         | nice to run my thesis tests again to find what happened in the
         | last decade.
         | 
         | I tried writing another algorithm for network splitting but
         | didn't get any better results. [1]:
         | https://www.csg.uzh.ch/publications/details.php?id=1007
        
         | remram wrote:
         | I'm not exactly sure what you propose, a token bucket per
         | client? In a hashmap client=>bucket?
        
           | otterley wrote:
           | That's the typical design for a token-bucket rate limiter. A
           | token-bucket limiter uses more memory but is a simpler
           | design. I believe this bloom-filter based implementation is
           | designed to be more memory efficient at the cost being less
           | CPU efficient.
           | 
           | As usual, tradeoffs are everywhere.
        
             | hinkley wrote:
             | Bloom filters really shine when they avoid a network round
             | trip or pulling data from disk. Those are so far away you
             | can easily afford to spend 5% of the cost of the request to
             | avoid 99% of the requests.
        
               | otterley wrote:
               | Agreed! In a typical token bucket implementation, bucket
               | state must be shared across multiple servers, which means
               | you need a shared state repository such as Redis or a
               | database (SQL or noSQL). This can add milliseconds to
               | each request being handled.
        
               | foobazgt wrote:
               | Rate limiters seem like the poster child for something
               | you'd want to host in an in-memory KVS. Typical access
               | times look more like ~100us, not milliseconds. Even if
               | you have a complicated algorithm, you can still limit it
               | to a single round trip by using a Redis Function or some
               | moral equivalent.
        
               | otterley wrote:
               | That would work if you can assign requests from a given
               | tenant to a single instance, but there are many
               | situations in which that's either impossible or unwise.
               | What if a single server doesn't have enough capacity to
               | handle all the traffic for a tenant? How do you preserve
               | the state if that instance fails?
        
               | foobazgt wrote:
               | I'm sorry, I don't think I understand your question. Are
               | you talking about the KVS? You shard the servers for
               | extra capacity. Several KVS's have built-in clustering if
               | you want to go that route. They're usually incredibly
               | stable, but if one goes down for whatever reason (say the
               | physical machine fails), you just spin another one up to
               | take its place.
               | 
               | In terms of preserving state, the answer for rate
               | limiting is that it is almost always far, far less
               | dangerous to fail open than it is to deny requests during
               | a failure. If you really, really, wanted to preserve
               | state (something I'd suggest avoiding for a rate
               | limiter), several KVS's have optional persistence you can
               | turn on, for example, Redis' AOF.
               | 
               | The end services themselves should be designed with some
               | sort of pushback mechanism, so they shouldn't be in any
               | danger of overloading, regardless of what's going on with
               | the rate limiter.
        
               | otterley wrote:
               | I think I misunderstood what you were saying. By "in-
               | memory KVS" with "access times in the microseconds" I
               | thought you were implying a KVS hosted on the server that
               | handles the requests. Otherwise, even if the KVS can
               | respond in 100us to a local query, network latency is
               | going to add much more than that.
        
               | foobazgt wrote:
               | Ah ok, that makes sense. Over loopback, you can roundtrip
               | Redis at least as fast as double-digit micros. Intra-DC,
               | your network latency should be somewhere in the triple-
               | digit micros. I'd say if you're not seeing that,
               | something is probably wrong.
        
               | hinkley wrote:
               | I don't recall all the details but we did end up with lua
               | to talk to redis or memcached to do some traffic shaping
               | at one point. One for bouncing to error pages before we
               | switched to CDN (long story), and another one for doing
               | something too clever by half about TTFB. It's still
               | really cheap especially on a box that is just doing load
               | balancing and nothing else.
               | 
               | If you wanted to throw another layer of load balancer in,
               | there are consistent hashing-adjacent strategies in
               | nginx+ that would allow you to go from 2 ingress routers
               | to 3 shards with rate limiters to your services, using
               | one KV store per box. But I highly suspect that the
               | latency profile there will look remarkably similar to
               | ingress routers doing rate limiting talking to a KV store
               | cluster.
        
               | jrockway wrote:
               | You can also rate limit on the server side. Service A
               | talks to Service B. Service B is 3 replicas. Each replica
               | keeps track of how many times Service A talked to it. If
               | that goes over the limit / 3, deny the request. Now there
               | is no IPC required.
               | 
               | This isn't as accurate, but it's often adequate. I have
               | never liked making a network request to see if I can make
               | a network request; too slow. (I do use this architecture
               | to rate-limit my own website, mostly because I wanted to
               | play with writing a service to do that. But I'd hesitate
               | to make someone else use that.)
        
               | hinkley wrote:
               | > This can add milliseconds
               | 
               | My previous (very long) project was in such a state when
               | I got there that it was only in the last year I was there
               | that measuring things in microseconds was something I
               | could get on other people's radars. I wish I had started
               | sooner because I found a _lot_ of microseconds in about a
               | four month period. That was the most intensive period of
               | user-visible latency reduction I saw the entire time it
               | was there and second through fourth place took years of
               | manpower to accomplish. Past me and future me are both
               | still mad about that.
        
       | roboben wrote:
       | Shouldn't this be built into a queue somehow? I'd love to see a
       | queuing solution like SQS but has a built in fairness, where you
       | can fully utilize a capacity but as soon as, let's say customers
       | compete on resources, some fairness kicks in. Is there anything
       | like that?
        
       ___________________________________________________________________
       (page generated 2024-09-16 23:00 UTC)