[HN Gopher] A C++17 thread pool for high-performance scientific ...
       ___________________________________________________________________
        
       A C++17 thread pool for high-performance scientific computing
        
       Author : pramodbiligiri
       Score  : 69 points
       Date   : 2022-06-14 18:51 UTC (4 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | Const-me wrote:
       | Writing a thread pool which is performant and scalable is
       | surprisingly hard, I did couple times. And now, whenever I can, I
       | tend to avoid doing that. Instead, I prefer using the stuff
       | already implemented by someone else, in OS userlands or runtime
       | libraries.
       | 
       | * C and C++ runtime libraries have OpenMP. Level of support
       | varies across compilers but still, all mainstream ones support at
       | least OpenMP 2.0.
       | 
       | * .NET runtime comes with its own thread pool. The standard
       | library provides both low-level APIs similar to the Windows'
       | built-in one, and higher-level things like Parallel.For remotely
       | comparable to OpenMP.
       | 
       | * Windows userland has another one, see CreateThreadpoolWork and
       | SubmitThreadpoolWork APIs introduced in Vista. The API is totally
       | different though, a programmer needs to manually submit each task
       | to the pool.
       | 
       | * iOS and OSX userlands also have a thread pool called Grand
       | Central Dispatch. It's very comparable to the Windows version.
        
       | [deleted]
        
       | formerly_proven wrote:
       | Code: https://github.com/bshoshany/thread-
       | pool/blob/master/BS_thre...
       | 
       | https://github.com/bshoshany/thread-pool/blob/master/BS_thre...
       | 
       | I don't see how this is built for particularly good performance,
       | or with many-core machines in mind.
        
         | WhitneyLand wrote:
         | A good paper will describe how the work is interesting or
         | useful in the field and how it opens the door for future
         | research.
         | 
         | Here the author acknowledges that other libraries may be more
         | performant, but describes the motivation for the work as
         | essentially something that's a small piece of code to maintain,
         | and easy to use.
         | 
         | I didn't run the code but I read through the paper and it
         | doesn't seem like it's made a convincing case for this.
        
         | 908B64B197 wrote:
         | Because it's not.
         | 
         | It's an undergrad assignment basically. The writeup is nice,
         | and it's great that it's on arxiv but it's not novel nor
         | publishable research. The implementation that ships with a
         | given OS is sure to run around in circles around it.
        
         | riskneutral wrote:
         | Depending on which C++17 standard libraries are being used,
         | performance efficiency may be an issue and platform dependent.
        
       | talolard wrote:
       | It's been a long time since I touched c++, so pardon my naivete.
       | I'd have assumed that optimized thread pools were a done thing.
       | What's new here and why was there a gap ?
        
         | Randor wrote:
         | I looked through the code and I don't see anything new at all.
         | Looks similar to the dozens of other thread pools I've reviewed
         | over the years.
         | 
         | The author is a physicist. Looks like he just decided to
         | publish about his C++ code.
        
         | rat9988 wrote:
         | Read 1.1 of the pdf: https://arxiv.org/pdf/2105.00613.pdf
         | 
         | It's too long to quote. I'm too lazy to summarise.
        
           | 0xffff2 wrote:
           | Nothing in that section explains why this is in any way new
           | or innovative. The only interesting claim is "performance",
           | but looking at the code, I don't see anything other than a
           | simple naive thread-pool implementation.
        
         | beached_whale wrote:
         | There are lots of them and many are built into the OS(e.g. GCD
         | on mac's, Windows has a thread pool api) plus TBB on all of
         | them...)
         | 
         | It would be neat if the github site
         | https://github.com/bshoshany/thread-pool or the paper did some
         | comparisons to the existing body out there.
        
       | bee_rider wrote:
       | > In particular, our implementation does not utilize OpenMP or
       | any other high-level multithreading APIs, and thus gives the
       | programmer precise low-level control over the details of the
       | parallelization, which permits more robust optimizations.
       | 
       | I totally disagree with this line of thinking. The person who
       | knows most about the hardware on which the program will be run is
       | the person running it, not the programmer. The OpenMP API
       | somewhat complicated in an attempt to allow the programmer to, at
       | a high level, express ideas about data locality to the runtime.
       | 
       | Unless we're imagining a universe in which the programmer is
       | around to tweak their source code for every hardware platform,
       | the idea of "giving the programmer more _control_ " is a dead
       | end. The programmer must be given expressiveness.
       | 
       | Threading libraries are complicated because hardware is
       | complicated. I mean first generation threadrippers are a little
       | old, but they still exist: do we really want to have everybody
       | re-write the code to handle "I have multiple packages on this
       | node, NUMA between the dies in a given package, NUMA between the
       | packages in the node!"
        
         | pvg wrote:
         | _Unless we 're imagining a universe in which the programmer is
         | around to tweak their source code for every hardware platform_
         | 
         | It doesn't have to be the whole universe just a region in which
         | a particular kind of specialized scientific computing is done.
         | I have not idea what this specific thing is good for but
         | looking at it in through the lens of general purpose computing
         | truisms probably misses more than it illuminates.
        
           | bee_rider wrote:
           | There's some truth there, but I think it is actually flipped.
           | 
           | Scientific computing codes generally stick around a while (or
           | at least they might, or at least we hope they will), they are
           | run on complicated platforms by motivated people. There's
           | incentive to do run-time tuning, and there's value to making
           | them general.
           | 
           | On the other hand, you might have things like a videogame
           | console, where the program is tuned for a particular
           | platform. There's plenty of room there for programmer control
           | I think (not sure though, that isn't my field -- in the
           | Unity/Unreal era maybe they try and make things general? But
           | I bet the programmer can exert great control if they want).
        
             | pvg wrote:
             | _Scientific computing codes generally stick around a while_
             | 
             | I don't think this is generally true and wasn't at all true
             | in my limited exposure to scientific computing. There's
             | lots of one-off, ad-hoc code that isn't building the next
             | BLAS or whatever. The whole trope of 'scientists are
             | terrible programmers' doesn't come from some universally
             | prevalent incentive to write general, long-living code.
        
               | phkahler wrote:
               | Scientific computing is not scientists doing computing.
               | Its weather simulation, CFD, and other engineering FEA
               | software. Lots of it has been around for decades and may
               | even be written in Fortran.
               | 
               | Come to think of it, this may be targeted at scientists
               | after all.
        
               | bee_rider wrote:
               | It says it is for "high-performance scientific computing"
               | though. Could actually be intended for scientists who
               | happen to be writing programs and mislabeled though. But
               | do they typically care about manual hardware tuning, the
               | sort that this library claims to enable?
        
         | photon-torpedo wrote:
         | > The person who knows most about the hardware on which the
         | program will be run is the person running it, not the
         | programmer.
         | 
         | Then again, in scientific computing, often these are the same
         | person.
        
         | physicsguy wrote:
         | Totally agree. Plus, OpenMP is a standard that's grown to take
         | into account new hardware. It works in C, C++ and Fortran,
         | which is a huge bonus. It's stuck around for a long time for a
         | reason - before it and MPI were standardised, every set of
         | processor hardware has it's own parallelism libraries and the
         | programmer had to learn a new set.
        
       | RcouF1uZ4gsC wrote:
       | Looking at the code, it seems like it is more of a class project
       | rather than a highly optimized library.
       | 
       | It is using mutexes, condition variables, and futures to write a
       | pretty much textbook implementation of a thread pool.
       | 
       | However, there will be significant contention, as all the workers
       | are reading from the same queue, and the submissions are going to
       | the same queue.
       | 
       | There are not multiple queues with work-stealing which is I think
       | a minimum for a production level version of something like this.
       | 
       | EDIT:
       | 
       | IIRC C++ Concurrency in Action by Anthony Williams has a nice
       | discussion of the issues of building a C++ thread pool using the
       | C++ standard library. It does address things like contention and
       | work-stealing.
        
         | facontidavide wrote:
         | "There are not multiple queues with work-stealing which is I
         | think a minimum for a production level version of something
         | like this."
         | 
         | 100% agree. I also agree that would be a minimum requirement to
         | call it "high-performance".
        
         | beached_whale wrote:
         | Sean Parent's Better Concurrency video(s) go over some of it
         | too. Also, good series of video's
        
           | facontidavide wrote:
           | Video link: https://www.youtube.com/watch?v=zULU6Hhp42w
        
         | TillE wrote:
         | Multiple queues with work stealing is an optimization for
         | really extreme cases (lots of tiny tasks). In most use cases, a
         | multi-consumer lockless queue (eg, moodycamel::ConcurrentQueue)
         | is more than adequate.
        
           | thinkharderdev wrote:
           | Is it? I though that work stealing was a basic requirement
           | for maintaining some semblance of cache locality.
        
           | kazinator wrote:
           | What I'm hearing is that in most cases, any one of the reams
           | of existing code for this is adequate.
        
       ___________________________________________________________________
       (page generated 2022-06-14 23:00 UTC)