[HN Gopher] Is parallel programming hard, and, if so, what can y...
___________________________________________________________________
Is parallel programming hard, and, if so, what can you do about it?
[pdf]
Author : eric_khun
Score : 39 points
Date : 2023-02-19 16:39 UTC (6 hours ago)
(HTM) web link (mirrors.edge.kernel.org)
(TXT) w3m dump (mirrors.edge.kernel.org)
| binary0x01 wrote:
| Try using libthread on plan9, no locks.
| dang wrote:
| Related:
|
| _"Is Parallel Programming Hard, and, If So, What Can You Do
| About It?" v2 Is Out_ -
| https://news.ycombinator.com/item?id=26537298 - March 2021 (75
| comments)
|
| _Is parallel programming hard, and, if so, what can you do about
| it?_ - https://news.ycombinator.com/item?id=22030928 - Jan 2020
| (85 comments)
|
| _Is Parallel Programming Hard, and, If So, What Can You Do About
| It? [pdf]_ - https://news.ycombinator.com/item?id=9315152 - April
| 2015 (31 comments)
|
| _Is Parallel Programming Hard, And, If So, What Can You Do About
| It?_ - https://news.ycombinator.com/item?id=7381877 - March 2014
| (26 comments)
|
| _Is Parallel Programming Hard, And, If So, What Can You Do About
| It?_ - https://news.ycombinator.com/item?id=2784515 - July 2011
| (39 comments)
| dragontamer wrote:
| 1. Try process level I/O, such pipes, sockets, and the like. Have
| Linux deal with the concurrency problem, not you. (Note: the BASH
| & background job works in so many cases it ain't funny). Also try
| fork/join parallelism models like OpenMP. These are all far
| easier than dipping down to a lower level.
|
| 2. Try a mutex
|
| 3. If that doesn't work, try adding a condition variable.
|
| 4. If that still doesn't work, try an atomic in default
| sequentially consistent mode or equivalent (ex: Java volatile,
| InterlockedAdd, and the like). Warning: atomics are very subtle.
| Definitely have a review with an expert if you are here.
|
| 5. If that still doesn't work, consider lock free paradigms. That
| is, combinations of atomics and memory barriers.
|
| 6. If that still doesn't work, publish a paper on your problem
| lol.
|
| ---------
|
| #1 is my most important piece of advice. There was a Blender
| render I was doing, like 2.6 or something old a few years ago.
| Blenders parallelism wasn't too good and only utilized 25% of my
| computer.
|
| So I ran 4 instances of headless Blender. Bam, 100% utilization.
| Done.
|
| Don't overthink parallelism. It's stupid easy sometimes, as easy
| as a & on the end of your shell command.
| chasil wrote:
| The Oracle database has adopted process-level parallelism,
| utilizing System V IPC. Threading is used on Windows for
| performance reasons, but each client gets its own server pid by
| default on UNIX.
|
| This architecture expresses the original design intentions of
| "Columbus UNIX."
|
| "CB UNIX was developed to address deficiencies inherent in
| Research Unix, notably the lack of interprocess communication
| (IPC) and file locking, considered essential for a database
| management system... The interprocess communication features
| developed for CB UNIX were message queues, semaphores and
| shared memory support. These eventually appeared in mainstream
| Unix systems starting with System V in 1983, and are now
| collectively known as System V IPC."
|
| This approach has realized some degree of success.
|
| https://en.m.wikipedia.org/wiki/CB_UNIX
| dragontamer wrote:
| > utilizing System V IPC
|
| Hmm, that's a bit more complex than what I'd put at #1. I'd
| probably put System V IPC closer to #2 ("use a mutex") levels
| of complications.
|
| System V Shared memory + Semaphores is definitely "as
| complicated" as pthread mutexes and semaphores.
|
| But messages, signals, pipes, and other process-level IPC is
| much simpler. I guess SystemV IPC exists for that shady
| region "between" the high level stuff, and the complex low-
| level mutexes / semaphores.
|
| Maybe "1.75", if I were to put it in my list above somewhere.
| Closer to Mutexes in complexity, but still simpler in some
| respects. Depends on what bits of System V IPC you use, some
| bits are easier than others.
|
| ---------
|
| The main benefit of processes is that startup and shutdown
| behavior is very well defined. So something like a pipe,
| mmap, and other I/O has a defined beginning and end. All
| sockets are closed() properly, and so forth.
|
| SystemV throws a monkey wrench into that, because the
| semaphore or shared memory is "owned by Linux", so to speak.
| So a sem_post() is not necessarily going to be sem_wait(),
| especially if a process dies in a critical region.
| anarazel wrote:
| Postgres also uses a multi process architecture. But I think
| that turned out to be a mistake for something like a
| database, on modern systems.
|
| There are other reasons, but the biggest problem is that
| inter process context switches are considerably more
| expensive than intra process ones. Far less efficient use of
| the TLB being a big part of that. It used to be worse before
| things like process context identifiers, but even with them
| you're wasting a large portion of the TLB by storing
| redundant information.
| chasil wrote:
| Oracle still managed to be the TPC-C performance leader
| from 12/2010 until Oceanbase took the crown in 2019 (I
| don't think that Oracle cares anymore).
|
| https://www.tpc.org/tpcc/results/tpcc_results5.asp?print=fa
| l...
|
| https://www.alibabacloud.com/blog/oceanbase-breaks-tpc-c-
| rec...
|
| They did this with an SGA (Shared Global Area) that (I'm
| assuming) pollutes the Translation Lookaside Buffer (TLB)
| with different addresses for this shared memory in every
| process.
| zelphirkalt wrote:
| There is also: Try factoring out pure functions and run those
| in parallel.
| college_physics wrote:
| Its interesting that while Moore's law saturated many years ago
| there is still no parallel programming style that hits some sweet
| spot between productivity and performance for multicore cpus (and
| thus gets more adopted for mainstream development)
|
| Its not clear if this means there is not such "optimum" or simply
| it is not something anybody cares about
|
| people focused a lot on gpus but thats not easy either
| gavinhoward wrote:
| Funny. I've been reading this for the past six months and just
| finished today.
|
| Life is weird.
| JUNGLEISMASSIVE wrote:
| [dead]
| credit_guy wrote:
| And, did you like it? I guess you did, otherwise you wouldn't
| waste six months on it. But can you share in a few words what
| you think of this book?
| gavinhoward wrote:
| Silly me for thinking that people wouldn't care about my
| opinion.
|
| It's a great book. Some things could be better, of course,
| but it's a free book, so the quality to price ratio is off
| the charts, and not just because it's free. I would have
| happily paid $75 for it, maybe more.
|
| That said, it's a book that requires you to care about the
| Linux kernel, where the author's experience is. If that's a
| problem, then you won't get as much out of it.
|
| The book is best used as a reference after reading through it
| once. I would do a medium-deep read the first time. This will
| tell you the concepts you need to look up later, what
| techniques exist, etc.
|
| This is because the book delves into detail about various
| techniques to get concurrency. The philosophy is to do the
| easiest thing that works, which is great, but it does mean it
| talks about details. Thus, it's best as a reference later
| after absorbing the surface level.
|
| However, that medium-deep read is still necessary for you to
| know what you need to look for later when you need details.
|
| I hope that helps.
| bullen wrote:
| I use OS threads + non-blocking IO with concurrent package for
| shared data in Java. The performance is incredible.
|
| If I wanted to get a little more performance per watt I would
| probably rewrite it in C with arrays of atomic variables.
___________________________________________________________________
(page generated 2023-02-19 23:01 UTC)