[HN Gopher] Scheduling threads like Thomas Jefferson
       ___________________________________________________________________
        
       Scheduling threads like Thomas Jefferson
        
       Author : todsacerdoti
       Score  : 58 points
       Date   : 2024-09-25 11:22 UTC (11 hours ago)
        
 (HTM) web link (stevana.github.io)
 (TXT) w3m dump (stevana.github.io)
        
       | bdcravens wrote:
       | Jefferson said we should rewrite the constitution every 19 years,
       | so I assumed that means if threads haven't completed in some
       | amount of time, to just blow them all away and start over lol
        
       | js8 wrote:
       | I am not sure why would you want to do this. If I take an example
       | of the bottling pipeline, it seems to me that the best way to
       | process the bottles on an SMP system is to have multiple threads
       | where each thread is doing all the work for a single bottle,
       | sequentially.
       | 
       | Maybe, if the processes at each stage are I/O-bound, then it
       | might make sense. But if they are CPU-bound, then I am not sure
       | this way of pipelining helps - you're moving data between
       | different CPUs, destroying cache locality.
        
         | adrianmonk wrote:
         | I was about to suggest the same thing, and I still think it
         | makes a lot of sense.
         | 
         | However, the cache locality thing is complicated. Each bottle
         | has data associated with it, but each processing stage might
         | also have data. For example, maybe a particular stage uses a
         | lookup table. Or maybe stages keep statistics as they process
         | bottles.
         | 
         | If you have one CPU doing all the work for a particular stage,
         | then per-stage data stays in its cache. But if you have one CPU
         | doing all the work for a particular bottle, then all the per-
         | bottle data stays in its cache. So there's a trade-off that
         | depends on the specifics.
        
           | kitd wrote:
           | More importantly, each processing stage may have external
           | resources it needs to manage to do its function. In many
           | cases, it makes more sense to let the stage own the resource
           | for all bottles, than to have a per-bottle process having to
           | contend for the resource at the point it's needed.
        
             | hinkley wrote:
             | You get very different Little's Law shapes from the too for
             | sure.
             | 
             | Who here hasn't bought themselves some time from vertically
             | or horizontally scaling a database by hoisting a bunch of
             | calculations up and out of the transaction? Do them before
             | or after the constraint.
             | 
             | The farther you have to reach to coordinate use of the
             | constraint the better it might be to hand the inputs to the
             | constraint and let an experts handle things.
        
         | anonymoushn wrote:
         | You probably gain some ILP or avoid some pipeline stalls by
         | handling a lot of bottles at a time rather than one.
        
           | hinkley wrote:
           | Particularly if you're doing collision detection against
           | static sets of data. Filter this data for entries that look
           | this way. Or don't look that way.
        
       | shermantanktop wrote:
       | I've done something like this in large service ecosystems. It has
       | two very unfortunate properties:
       | 
       | - if the actual performance deviates from the predicted (scored)
       | performance, the system easily enters a degenerate bottlenecked
       | state.
       | 
       | - and if that happens, the many internal queues make diagnosis,
       | root causing, and confidence in a fix all exponentially worse.
       | 
       | Now you might assert that this will be applied in situations
       | where scores are accurate and brown failures do not occur. Those
       | aren't the situations I deal with.
        
         | Joker_vD wrote:
         | The algorithm in TFA, as I understand it, requires the
         | scheduler to constantly monitor the queue lengths and
         | reschedule the workers appropriately, as opposed to manually
         | estimating the required number of workers for each stage and
         | then never changing it.
        
           | shermantanktop wrote:
           | This is what happens with PID controllers. It is by
           | definition a trailing view, and a single catastrophically bad
           | request can cause catastrophic deviation of the whole system
           | before the queue monitoring can "catch up."
        
             | hinkley wrote:
             | It's possible to write a scheduler that looks like work
             | stealing and can route _unordered_ tasks around a poison
             | pill.
             | 
             | But we've generally found it works better for the worker to
             | pull when its queue is empty, and of course if you have
             | partially ordered events nobody can process (start or
             | reject) those ordered tasks until the borked one finishes
             | or times out.
             | 
             | So the situation you describe can absolutely happen, but is
             | not a given. It'll depend on other decisions.
        
       | jerf wrote:
       | This also contains a hidden assumption that reallocation from one
       | task to another is very, very expensive, so it's necessary to
       | name the distribution of resources well in advance. This is
       | clearly true for a physical production line. But if you are in
       | the very common circumstance in the computer world where the
       | switching time is significantly less than the amount of progress
       | than can be made on a given task in some time, then the optimum
       | scenario is just to do work as it becomes available, and in many
       | cases, almost any scheduling algorithm that isn't deliberately
       | constructed for maximum pathology will ensure that significant
       | progress is made.
       | 
       | That's why in the vast majority of circumstances you'll be
       | running many things on many CPUs, you just throw all the work at
       | the CPUs and let the chips fall where they may. Deliberate
       | scheduling is a tool, but an unusual one, especially as many
       | times the correct solution to tight scheduling situations is to
       | throw more resources at it anyhow. (Trying to eke out wins by
       | changing your scheduling implies that you're also in a situation
       | where slight increases in workloads will back the entire system
       | up no matter what you do.)
        
         | hinkley wrote:
         | Even with lambdas you want a bit of server affinity because
         | setup is nontrivial. If the job absolutely needs to get done we
         | can get it done. But the opportunity cost is high and we can't
         | build a business around it. You want to prefer the warm path
         | over the cold one.
        
       | limit499karma wrote:
       | I do not believe 'pipelining' and parallelism are interchangeable
       | models and conflating them is a mistake. For example, consider a
       | parallel processing system that in fact works strictly using a
       | 'pipeline' of length 0, that is there is a hand-of from input to
       | processing stage and processing of that input. And you can have n
       | such parallel processing stages and voila 'parallelism'.
       | 
       | Pipelines are strictly processing stages where the 'production of
       | the input' and processing on the inputs are not synchronized. For
       | example, one sends n requests to via a pipeline protocol to a
       | remote server without waiting for acks for each input from the
       | server. There may only be one such processing pipeline (and thus
       | no parallelism) while there is pipelining.
        
         | bee_rider wrote:
         | I don't 100% follow your comment, so sorry if this is not quite
         | right.
         | 
         | But, I would consider pipelining to be a form or parallelism.
         | It breaks up a task so that parts of it can be run
         | simultaneously (in different stages of the pipeline,
         | simultaneously). There are other ways to do parallelism of
         | course, but it is a way.
         | 
         | In your example, if there are multiple pipeline stages in this
         | server, then the tasks should be worked on simultaneously, and
         | so parallelism is occurring.
         | 
         | Multi-core, SIMD, and pipelining. Parallelism has multiple
         | dimensions.
        
           | spc476 wrote:
           | Not quite. Going back to the bottling example, a bottle has
           | to be filled, capped, then labeled. At time 1, 1 bottle is
           | filling up, pipeline is advanced. At time 2, bottle 2 is
           | filling up, bottle 1 is being capped, pipeline is advanced.
           | At time 3, bottle 3 is filling up, bottle 2 is being capped,
           | and bottle 1 is being labeled. At time 4, bottle 1 is done.
           | Each bottle has to go through the pipeline in sequence, and
           | it takes three units of time for a bottle to go through the
           | pipeline. Yes, once the pipeline is filled up, you have the
           | three operations going on at the same time, but it's still a
           | sequence of steps that need to be performed in order for any
           | given bottle.
           | 
           | To make it faster, you either have to decrease the time for a
           | step (but you will always be capped with the slowest step),
           | or go for parallelism---a separate pipeline (or pipelines).
           | For example, with one pipeline, each step taking 1 unit of
           | time, once the pipeline is filled, will take six units of
           | time to make a six-pack (the first will take longer due to
           | the latency in filling the pipeline). You can make five other
           | pipelines, and then get a six-pack per unit of time (again,
           | after filling all the pipelines).
           | 
           | A single pipeline just makes the output have a predictable
           | latency and time; multiple pipelines give you parallelism.
        
             | Joker_vD wrote:
             | > For example, with one pipeline, each step taking 1 unit
             | of time, once the pipeline is filled, will take six units
             | of time to make a six-pack
             | 
             | Compared to 18 units of time needed to make a six-pack
             | without pipelining. Gee, what a wondrous invention this
             | "pipeline" is: having three workers means the work is
             | accomplished thrice as fast, yet there is (according to
             | you) no parallelism at all! So naturally, if we could
             | introduce parallelism _inside_ this single pipeline, we
             | would be able to make another triple reduction in time, and
             | get a production of six-pack take only 2 units of time.
        
             | Jtsummers wrote:
             | > To make it faster, you either have to decrease the time
             | for a step (but you will always be capped with the slowest
             | step), or go for parallelism---a separate pipeline (or
             | pipelines).
             | 
             | > A single pipeline just makes the output have a
             | predictable latency and time; multiple pipelines give you
             | parallelism.
             | 
             | No, the pipeline does give you parallelism because you're
             | doing three (in the bottling example) pieces of work
             | _simultaneously_ , that is: in parallel. Filling, capping,
             | labeling are each being done on different bottles at the
             | same time. How is that not parallelism?
             | 
             | Let's use some numbers:
             | 
             | Filling, capping, labeling take 30s, 15s, 15s each
             | (arbitrary, chosen for easy math). Without a pipeline you
             | will process 60 bottles per hour (say one station that does
             | all 3 tasks and then kicks out the bottle, let's ignore
             | transition times). With a pipeline you can get 120 per
             | hour. You've improved throughput. The latency per bottle is
             | still 60s.
             | 
             | BTW, you can double throughput again without even needing a
             | full second pipeline just by adding a second filling
             | station (with my example numbers) and feeding filled
             | bottles to the same capping and labeling stages, getting
             | you to 240 per hour.
        
             | bee_rider wrote:
             | I think you are using an unconventional definition of the
             | word parallelism, if you want to exclude
             | 
             | > Yes, once the pipeline is filled up, you have the _three
             | operations going on at the same time,_ but it 's still a
             | sequence of steps that need to be performed in order for
             | any given bottle.
             | 
             | as not parallel.
             | 
             | It seems to me that what you are calling parallelism, most
             | people would instead call homogeneous parallelism. Which is
             | a subset of parallelism.
        
             | hinkley wrote:
             | In modern bottling they fill a bunch of bottles at the same
             | time, and in some cases (How It's Made) the bottling or
             | pastry pouring or cookie cutting equipment moves with the
             | conveyor so it never has to stop.
             | 
             | But that's new equipment, so filling the bottle is still
             | the slow step in some past version of the bottling plant.
             | You're still going to fill three, four, six bottles at a
             | time, then cap them, but the labeling machine might just be
             | serial.
             | 
             | The way those machines work is also pipelined - add some
             | space between the bottles, slap a label on, use rollers to
             | make the label stick.
             | 
             | Then you load the cases n bottles at a time, not via pick
             | and pull.
             | 
             | Several of those steps could feed in from one machine to
             | four and back to two and then to one.
             | 
             | More equipment means more maintenance but also affords you
             | taking part of the system offline and still keeping output
             | from cratering.
        
         | Joker_vD wrote:
         | This is your own personal definition of "pipelining" which most
         | of other people wouldn't subscribe to.
         | 
         | And to pick a nit, there is _always_ some synchronization
         | between the submission of an input and the processing of that
         | input: the submission must happen before the processing,
         | unfortunately.
        
       | pwdisswordfishz wrote:
       | "Those who sacrifice freedom for memory safety deserve neither"
       | or something?
        
       ___________________________________________________________________
       (page generated 2024-09-25 23:01 UTC)