[HN Gopher] A C++17 thread pool for high-performance scientific ...
___________________________________________________________________
A C++17 thread pool for high-performance scientific computing
Author : pramodbiligiri
Score : 69 points
Date : 2022-06-14 18:51 UTC (4 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| Const-me wrote:
| Writing a thread pool which is performant and scalable is
| surprisingly hard, I did couple times. And now, whenever I can, I
| tend to avoid doing that. Instead, I prefer using the stuff
| already implemented by someone else, in OS userlands or runtime
| libraries.
|
| * C and C++ runtime libraries have OpenMP. Level of support
| varies across compilers but still, all mainstream ones support at
| least OpenMP 2.0.
|
| * .NET runtime comes with its own thread pool. The standard
| library provides both low-level APIs similar to the Windows'
| built-in one, and higher-level things like Parallel.For remotely
| comparable to OpenMP.
|
| * Windows userland has another one, see CreateThreadpoolWork and
| SubmitThreadpoolWork APIs introduced in Vista. The API is totally
| different though, a programmer needs to manually submit each task
| to the pool.
|
| * iOS and OSX userlands also have a thread pool called Grand
| Central Dispatch. It's very comparable to the Windows version.
| [deleted]
| formerly_proven wrote:
| Code: https://github.com/bshoshany/thread-
| pool/blob/master/BS_thre...
|
| https://github.com/bshoshany/thread-pool/blob/master/BS_thre...
|
| I don't see how this is built for particularly good performance,
| or with many-core machines in mind.
| WhitneyLand wrote:
| A good paper will describe how the work is interesting or
| useful in the field and how it opens the door for future
| research.
|
| Here the author acknowledges that other libraries may be more
| performant, but describes the motivation for the work as
| essentially something that's a small piece of code to maintain,
| and easy to use.
|
| I didn't run the code but I read through the paper and it
| doesn't seem like it's made a convincing case for this.
| 908B64B197 wrote:
| Because it's not.
|
| It's an undergrad assignment basically. The writeup is nice,
| and it's great that it's on arxiv but it's not novel nor
| publishable research. The implementation that ships with a
| given OS is sure to run around in circles around it.
| riskneutral wrote:
| Depending on which C++17 standard libraries are being used,
| performance efficiency may be an issue and platform dependent.
| talolard wrote:
| It's been a long time since I touched c++, so pardon my naivete.
| I'd have assumed that optimized thread pools were a done thing.
| What's new here and why was there a gap ?
| Randor wrote:
| I looked through the code and I don't see anything new at all.
| Looks similar to the dozens of other thread pools I've reviewed
| over the years.
|
| The author is a physicist. Looks like he just decided to
| publish about his C++ code.
| rat9988 wrote:
| Read 1.1 of the pdf: https://arxiv.org/pdf/2105.00613.pdf
|
| It's too long to quote. I'm too lazy to summarise.
| 0xffff2 wrote:
| Nothing in that section explains why this is in any way new
| or innovative. The only interesting claim is "performance",
| but looking at the code, I don't see anything other than a
| simple naive thread-pool implementation.
| beached_whale wrote:
| There are lots of them and many are built into the OS(e.g. GCD
| on mac's, Windows has a thread pool api) plus TBB on all of
| them...)
|
| It would be neat if the github site
| https://github.com/bshoshany/thread-pool or the paper did some
| comparisons to the existing body out there.
| bee_rider wrote:
| > In particular, our implementation does not utilize OpenMP or
| any other high-level multithreading APIs, and thus gives the
| programmer precise low-level control over the details of the
| parallelization, which permits more robust optimizations.
|
| I totally disagree with this line of thinking. The person who
| knows most about the hardware on which the program will be run is
| the person running it, not the programmer. The OpenMP API
| somewhat complicated in an attempt to allow the programmer to, at
| a high level, express ideas about data locality to the runtime.
|
| Unless we're imagining a universe in which the programmer is
| around to tweak their source code for every hardware platform,
| the idea of "giving the programmer more _control_ " is a dead
| end. The programmer must be given expressiveness.
|
| Threading libraries are complicated because hardware is
| complicated. I mean first generation threadrippers are a little
| old, but they still exist: do we really want to have everybody
| re-write the code to handle "I have multiple packages on this
| node, NUMA between the dies in a given package, NUMA between the
| packages in the node!"
| pvg wrote:
| _Unless we 're imagining a universe in which the programmer is
| around to tweak their source code for every hardware platform_
|
| It doesn't have to be the whole universe just a region in which
| a particular kind of specialized scientific computing is done.
| I have not idea what this specific thing is good for but
| looking at it in through the lens of general purpose computing
| truisms probably misses more than it illuminates.
| bee_rider wrote:
| There's some truth there, but I think it is actually flipped.
|
| Scientific computing codes generally stick around a while (or
| at least they might, or at least we hope they will), they are
| run on complicated platforms by motivated people. There's
| incentive to do run-time tuning, and there's value to making
| them general.
|
| On the other hand, you might have things like a videogame
| console, where the program is tuned for a particular
| platform. There's plenty of room there for programmer control
| I think (not sure though, that isn't my field -- in the
| Unity/Unreal era maybe they try and make things general? But
| I bet the programmer can exert great control if they want).
| pvg wrote:
| _Scientific computing codes generally stick around a while_
|
| I don't think this is generally true and wasn't at all true
| in my limited exposure to scientific computing. There's
| lots of one-off, ad-hoc code that isn't building the next
| BLAS or whatever. The whole trope of 'scientists are
| terrible programmers' doesn't come from some universally
| prevalent incentive to write general, long-living code.
| phkahler wrote:
| Scientific computing is not scientists doing computing.
| Its weather simulation, CFD, and other engineering FEA
| software. Lots of it has been around for decades and may
| even be written in Fortran.
|
| Come to think of it, this may be targeted at scientists
| after all.
| bee_rider wrote:
| It says it is for "high-performance scientific computing"
| though. Could actually be intended for scientists who
| happen to be writing programs and mislabeled though. But
| do they typically care about manual hardware tuning, the
| sort that this library claims to enable?
| photon-torpedo wrote:
| > The person who knows most about the hardware on which the
| program will be run is the person running it, not the
| programmer.
|
| Then again, in scientific computing, often these are the same
| person.
| physicsguy wrote:
| Totally agree. Plus, OpenMP is a standard that's grown to take
| into account new hardware. It works in C, C++ and Fortran,
| which is a huge bonus. It's stuck around for a long time for a
| reason - before it and MPI were standardised, every set of
| processor hardware has it's own parallelism libraries and the
| programmer had to learn a new set.
| RcouF1uZ4gsC wrote:
| Looking at the code, it seems like it is more of a class project
| rather than a highly optimized library.
|
| It is using mutexes, condition variables, and futures to write a
| pretty much textbook implementation of a thread pool.
|
| However, there will be significant contention, as all the workers
| are reading from the same queue, and the submissions are going to
| the same queue.
|
| There are not multiple queues with work-stealing which is I think
| a minimum for a production level version of something like this.
|
| EDIT:
|
| IIRC C++ Concurrency in Action by Anthony Williams has a nice
| discussion of the issues of building a C++ thread pool using the
| C++ standard library. It does address things like contention and
| work-stealing.
| facontidavide wrote:
| "There are not multiple queues with work-stealing which is I
| think a minimum for a production level version of something
| like this."
|
| 100% agree. I also agree that would be a minimum requirement to
| call it "high-performance".
| beached_whale wrote:
| Sean Parent's Better Concurrency video(s) go over some of it
| too. Also, good series of video's
| facontidavide wrote:
| Video link: https://www.youtube.com/watch?v=zULU6Hhp42w
| TillE wrote:
| Multiple queues with work stealing is an optimization for
| really extreme cases (lots of tiny tasks). In most use cases, a
| multi-consumer lockless queue (eg, moodycamel::ConcurrentQueue)
| is more than adequate.
| thinkharderdev wrote:
| Is it? I though that work stealing was a basic requirement
| for maintaining some semblance of cache locality.
| kazinator wrote:
| What I'm hearing is that in most cases, any one of the reams
| of existing code for this is adequate.
___________________________________________________________________
(page generated 2022-06-14 23:00 UTC)