[HN Gopher] Is parallel programming hard, and, if so, what can y...
       ___________________________________________________________________
        
       Is parallel programming hard, and, if so, what can you do about it?
       [pdf]
        
       Author : eric_khun
       Score  : 39 points
       Date   : 2023-02-19 16:39 UTC (6 hours ago)
        
 (HTM) web link (mirrors.edge.kernel.org)
 (TXT) w3m dump (mirrors.edge.kernel.org)
        
       | binary0x01 wrote:
       | Try using libthread on plan9, no locks.
        
       | dang wrote:
       | Related:
       | 
       |  _"Is Parallel Programming Hard, and, If So, What Can You Do
       | About It?" v2 Is Out_ -
       | https://news.ycombinator.com/item?id=26537298 - March 2021 (75
       | comments)
       | 
       |  _Is parallel programming hard, and, if so, what can you do about
       | it?_ - https://news.ycombinator.com/item?id=22030928 - Jan 2020
       | (85 comments)
       | 
       |  _Is Parallel Programming Hard, and, If So, What Can You Do About
       | It? [pdf]_ - https://news.ycombinator.com/item?id=9315152 - April
       | 2015 (31 comments)
       | 
       |  _Is Parallel Programming Hard, And, If So, What Can You Do About
       | It?_ - https://news.ycombinator.com/item?id=7381877 - March 2014
       | (26 comments)
       | 
       |  _Is Parallel Programming Hard, And, If So, What Can You Do About
       | It?_ - https://news.ycombinator.com/item?id=2784515 - July 2011
       | (39 comments)
        
       | dragontamer wrote:
       | 1. Try process level I/O, such pipes, sockets, and the like. Have
       | Linux deal with the concurrency problem, not you. (Note: the BASH
       | & background job works in so many cases it ain't funny). Also try
       | fork/join parallelism models like OpenMP. These are all far
       | easier than dipping down to a lower level.
       | 
       | 2. Try a mutex
       | 
       | 3. If that doesn't work, try adding a condition variable.
       | 
       | 4. If that still doesn't work, try an atomic in default
       | sequentially consistent mode or equivalent (ex: Java volatile,
       | InterlockedAdd, and the like). Warning: atomics are very subtle.
       | Definitely have a review with an expert if you are here.
       | 
       | 5. If that still doesn't work, consider lock free paradigms. That
       | is, combinations of atomics and memory barriers.
       | 
       | 6. If that still doesn't work, publish a paper on your problem
       | lol.
       | 
       | ---------
       | 
       | #1 is my most important piece of advice. There was a Blender
       | render I was doing, like 2.6 or something old a few years ago.
       | Blenders parallelism wasn't too good and only utilized 25% of my
       | computer.
       | 
       | So I ran 4 instances of headless Blender. Bam, 100% utilization.
       | Done.
       | 
       | Don't overthink parallelism. It's stupid easy sometimes, as easy
       | as a & on the end of your shell command.
        
         | chasil wrote:
         | The Oracle database has adopted process-level parallelism,
         | utilizing System V IPC. Threading is used on Windows for
         | performance reasons, but each client gets its own server pid by
         | default on UNIX.
         | 
         | This architecture expresses the original design intentions of
         | "Columbus UNIX."
         | 
         | "CB UNIX was developed to address deficiencies inherent in
         | Research Unix, notably the lack of interprocess communication
         | (IPC) and file locking, considered essential for a database
         | management system... The interprocess communication features
         | developed for CB UNIX were message queues, semaphores and
         | shared memory support. These eventually appeared in mainstream
         | Unix systems starting with System V in 1983, and are now
         | collectively known as System V IPC."
         | 
         | This approach has realized some degree of success.
         | 
         | https://en.m.wikipedia.org/wiki/CB_UNIX
        
           | dragontamer wrote:
           | > utilizing System V IPC
           | 
           | Hmm, that's a bit more complex than what I'd put at #1. I'd
           | probably put System V IPC closer to #2 ("use a mutex") levels
           | of complications.
           | 
           | System V Shared memory + Semaphores is definitely "as
           | complicated" as pthread mutexes and semaphores.
           | 
           | But messages, signals, pipes, and other process-level IPC is
           | much simpler. I guess SystemV IPC exists for that shady
           | region "between" the high level stuff, and the complex low-
           | level mutexes / semaphores.
           | 
           | Maybe "1.75", if I were to put it in my list above somewhere.
           | Closer to Mutexes in complexity, but still simpler in some
           | respects. Depends on what bits of System V IPC you use, some
           | bits are easier than others.
           | 
           | ---------
           | 
           | The main benefit of processes is that startup and shutdown
           | behavior is very well defined. So something like a pipe,
           | mmap, and other I/O has a defined beginning and end. All
           | sockets are closed() properly, and so forth.
           | 
           | SystemV throws a monkey wrench into that, because the
           | semaphore or shared memory is "owned by Linux", so to speak.
           | So a sem_post() is not necessarily going to be sem_wait(),
           | especially if a process dies in a critical region.
        
           | anarazel wrote:
           | Postgres also uses a multi process architecture. But I think
           | that turned out to be a mistake for something like a
           | database, on modern systems.
           | 
           | There are other reasons, but the biggest problem is that
           | inter process context switches are considerably more
           | expensive than intra process ones. Far less efficient use of
           | the TLB being a big part of that. It used to be worse before
           | things like process context identifiers, but even with them
           | you're wasting a large portion of the TLB by storing
           | redundant information.
        
             | chasil wrote:
             | Oracle still managed to be the TPC-C performance leader
             | from 12/2010 until Oceanbase took the crown in 2019 (I
             | don't think that Oracle cares anymore).
             | 
             | https://www.tpc.org/tpcc/results/tpcc_results5.asp?print=fa
             | l...
             | 
             | https://www.alibabacloud.com/blog/oceanbase-breaks-tpc-c-
             | rec...
             | 
             | They did this with an SGA (Shared Global Area) that (I'm
             | assuming) pollutes the Translation Lookaside Buffer (TLB)
             | with different addresses for this shared memory in every
             | process.
        
         | zelphirkalt wrote:
         | There is also: Try factoring out pure functions and run those
         | in parallel.
        
       | college_physics wrote:
       | Its interesting that while Moore's law saturated many years ago
       | there is still no parallel programming style that hits some sweet
       | spot between productivity and performance for multicore cpus (and
       | thus gets more adopted for mainstream development)
       | 
       | Its not clear if this means there is not such "optimum" or simply
       | it is not something anybody cares about
       | 
       | people focused a lot on gpus but thats not easy either
        
       | gavinhoward wrote:
       | Funny. I've been reading this for the past six months and just
       | finished today.
       | 
       | Life is weird.
        
         | JUNGLEISMASSIVE wrote:
         | [dead]
        
         | credit_guy wrote:
         | And, did you like it? I guess you did, otherwise you wouldn't
         | waste six months on it. But can you share in a few words what
         | you think of this book?
        
           | gavinhoward wrote:
           | Silly me for thinking that people wouldn't care about my
           | opinion.
           | 
           | It's a great book. Some things could be better, of course,
           | but it's a free book, so the quality to price ratio is off
           | the charts, and not just because it's free. I would have
           | happily paid $75 for it, maybe more.
           | 
           | That said, it's a book that requires you to care about the
           | Linux kernel, where the author's experience is. If that's a
           | problem, then you won't get as much out of it.
           | 
           | The book is best used as a reference after reading through it
           | once. I would do a medium-deep read the first time. This will
           | tell you the concepts you need to look up later, what
           | techniques exist, etc.
           | 
           | This is because the book delves into detail about various
           | techniques to get concurrency. The philosophy is to do the
           | easiest thing that works, which is great, but it does mean it
           | talks about details. Thus, it's best as a reference later
           | after absorbing the surface level.
           | 
           | However, that medium-deep read is still necessary for you to
           | know what you need to look for later when you need details.
           | 
           | I hope that helps.
        
       | bullen wrote:
       | I use OS threads + non-blocking IO with concurrent package for
       | shared data in Java. The performance is incredible.
       | 
       | If I wanted to get a little more performance per watt I would
       | probably rewrite it in C with arrays of atomic variables.
        
       ___________________________________________________________________
       (page generated 2023-02-19 23:01 UTC)