[HN Gopher] What is parallelism, anyhow? (2009)
       ___________________________________________________________________
        
       What is parallelism, anyhow? (2009)
        
       Author : moonchild
       Score  : 29 points
       Date   : 2022-05-08 10:39 UTC (12 hours ago)
        
 (HTM) web link (web.archive.org)
 (TXT) w3m dump (web.archive.org)
        
       | whistl034 wrote:
       | Is there a font missing or something? When I try to read the
       | paper, every instruction dependency is displayed as x ? y, with a
       | question mark between each object, even when the author implies
       | there is some difference between x ? y and x ? y. I'm sure there
       | is supposed to be some other characters there, because without
       | them the paper makes no sense.
       | 
       | Perhaps it's something that only shows up properly on Windows
       | computers?
        
         | rzzzt wrote:
         | I thought it's going to be mojibake, but it's not of the
         | variety, just a regular question mark. One candidate is the
         | "precedes" symbol: https://codepoints.net/U+227A
        
         | Jtsummers wrote:
         | The precedes symbol is < and the parallel symbol is |.
         | 
         | https://www.cprogramming.com/parallelism.html - another copy
         | without this problem.
        
         | bruce343434 wrote:
         | You skimmed too fast, from that same article:
         | 
         | > We say that an instruction x precedes an instruction y,
         | sometimes denoted x ? y, if x must complete before y can begin.
         | In a diagram for the dag, x ? y means that there is a positive-
         | length path from x to y. If neither x ? y nor y ? x, we say the
         | instructions are in parallel, denoted x ? y. The figure below
         | illustrates a multithreaded dag:
        
           | Jtsummers wrote:
           | That quote presents exactly the confusion for GP. x ? y is
           | described to mean _both_ that x precedes y and that x and y
           | are in parallel. It suggests that _something_ got lost in
           | translation here.
           | 
           | https://www.cprogramming.com/parallelism.html
           | 
           | Non-archive link: x < y (x precedes y) and x | y (x is
           | parallel with y).
        
       | erwincoumans wrote:
       | A great comment about conflicts between parts of a system
       | fighting for resources (parallelism, memory, caches) is hidden at
       | the end of the article:
       | 
       | "@Ilya: You assume that the desired result is to have a 9 month
       | process. What if the desired result is to have a newborn
       | available for filming a commercial for baby food?
       | 
       | A good real-life example, based on a true story: You want data to
       | be sorted. You may start out with a standard sort algorithm.
       | Suddenly you find, that it is slow. You find out, that you can
       | increase the speed 5 times by replacing quicksort with mergesort.
       | Then, you find out, that you can increase the speed 10 times more
       | by using an implementation, that knows how to use the CPU cache
       | well. Then, you realize, that Mergesort can use more CPUs, so you
       | use more CPUs. Suddenly, your application is 100 times slower,
       | because the writes to memory from one CPU is flushing the cache
       | from the other CPU. This didn't happen on your development
       | system, because the dual-CPU system was made in a different
       | way... so you realize that you need to make the parallel
       | programming differently.
       | 
       | Finally, you find that you need to separate the lists, that you
       | merge in your mergesort algorithm, better. After a while, your
       | algorithm works well on your development system and on the target
       | system.
       | 
       | Finally satisfied, you put the parallel mergesort algorithm into
       | all places where it makes sense. After a short while, the test
       | department comes back, and reports that the application runs
       | slower when your multithreaded sort algorithm is used. The reason
       | is, that it now uses 2 CPU caches, leaving less cache memory for
       | other parts of the application, slowing down these parts more,
       | than the speed benefit of using 2 CPUs for sorting.
       | 
       | You are finally asked to remove the parallel processing from your
       | mergesort in order to increase the speed of the system.
       | 
       | posted @ Monday, August 11, 2008 7:21 AM by Lars D"
        
         | hinkley wrote:
         | Or somewhere in there you discover that 99.9% of the time you
         | only need the top ten elements in the list so sorting the whole
         | thing is foolish anyway. Or that you're better off sorting the
         | list as the data is collected in the first place because then
         | the cost per entry is log(n), and reads swamp writes by a wide
         | margin.
        
       | eatonphil wrote:
       | Title is "What the $#@! is Parallelism, Anyhow?", which was
       | probably easy to guess but not completely clear from "$". I
       | assume this was over-eager HN autoformatting. :)
        
       | phkahler wrote:
       | I think anyone doing C++ and wanting performance should start
       | using OpenMP immediately. It's really easy to use without
       | changing much code. In some cases it's just adding some pragmas.
       | 
       | Once you wring out the easy parallel parts you'll probably be
       | facing the slow algorithms that have poor time complexity. That's
       | when the real work begins. Reducing time complexity often
       | requires some good data structures and/or fancy algorithms.
        
         | maxwell86 wrote:
         | lol please no
         | 
         | If you are using C++, and want to parallelize something, just
         | add "std::execution::par" to your algorithms.
         | 
         | Instead of writing "std::for_each(...)" just write
         | "std::for_each(std::execution::par, ...)".
         | 
         | That's it. It really is that simple. And with the right
         | compilers you can just compile the code to run on FPGAS, GPUs,
         | or whatever.
         | 
         | For someone that knows C++, doing that is the lowest barrier of
         | entry, and gets you most of the way there without having to
         | learn "some other programming language" like OpenMP (or
         | anything else).
        
           | gumby wrote:
           | > If you are using C++, and want to parallelize something,
           | just add "std::execution::par" to your algorithms.
           | 
           | Do any of the shipping standard libraries actually implement
           | execution policies? I only use gcc and clang so have to
           | resort to TBB to get this capability.
        
           | dragontamer wrote:
           | OpenMP can parellize a for loop like:
           | #pragma openmp parallel for         for(int i=0; i<1000000;
           | i++)             C[i] = A[i] + B[i];
           | 
           | It's normal C++ and the pragma auto-parallelizes the loop.
           | It's actually really easy and convenient.
           | 
           | OpenMP is probably the easiest join/fork model of parallelism
           | on any C++ system I've used. It doesn't always get the best
           | utilization of your CPU cores, but it's really, really
           | simple.
           | 
           | It's the best way to start IMO, far easier than std::thread,
           | or switching to functional style for other libraries. Just
           | write the same code as before but with a few #pragma omp
           | statements here and there.
        
             | maxwell86 wrote:
             | > It's normal C++
             | 
             | C++ is an ISO standard, you can use it _everywhere_, in
             | space, in automotive, in aviation, in trains, in medical
             | devices, _everywhere_.
             | 
             | OpenMP is not C++, it is a different programming language
             | than C++.
             | 
             | OpenMP is not an ISO standard, you can't use it in _most_
             | domains that you can use C++.
             | 
             | Your example:                   #pragma openmp parallel for
             | for(int i=0; i<1000000; i++)             C[i] = A[i] +
             | B[i];
             | 
             | shows how bad OpenMP is.
             | 
             | It does not run in parallel on GPUs or on FPGAS (lacking
             | target offload directives), and you can't use it on most
             | domains in which you can use C++.
             | 
             | The following is ISO standard C++:
             | std::for_each_n(std::execution::par,
             | std::ranges::iota(0).begin(), 1000000, [](int i) {
             | C[i] = A[i] + B[i];         });
             | 
             | it runs in _parallel_ EVERYWHERE: GPUs, CPUs, FPGAS, and it
             | is certified for all domains for which C++ is (that is: all
             | domains).
             | 
             | Show me how to sort an array in parallel on _ANY_ hardware
             | (CPUs, GPUs, FPGAs) with OpenMP. With C++ is as simple as:
             | std::sort(std::execution::par, array.begin(), array.end());
             | 
             | If you have a GPU, this offloads to the GPU. If you have an
             | FPGA, this offloads to the FPGA. If you have a CPU with 200
             | cores, this uses those 200 cores.
             | 
             | There is no need to turn your ISO C++ compliant program or
             | libraries into OpenMP. That prevents them from being used
             | by many domains on which C++ runs on. It also adds an
             | external dependency for parallelism, for no good reason.
             | 
             | For any problem that OpenMP can solve, OpenMP is _always_ a
             | worse solution than just using strictly ISO standard and
             | compliant C++.
             | 
             | OpenMP has completely lost a reason to exist. It's not 1990
             | anymore.
        
       | oxff wrote:
       | People often get parallelism, concurrency, multi-thread and async
       | concepts mixed in weird ways.
        
         | amelius wrote:
         | I think that's because parallelism and concurrency are words
         | that were originally very close in meaning and both could have
         | been slapped on the same concept.
        
           | marcosdumay wrote:
           | Do they come from geometry or does the geometry terms come
           | from them?
           | 
           | On manual work, pretty much as on geometry, they are almost
           | antonyms. If people can do a few tasks in parallel, that
           | means different people can do them at the same time without
           | going over each other. But if two people tasks as concurrent,
           | that means only one can happen at a time.
           | 
           | It's the mapping of them into computer terms that confused
           | everybody, because concurrency on computers is the opposite
           | of what it is for people. At least the modern meaning of
           | "concurrency", because it's got changed when computers became
           | multi-tasking.
        
           | jjtheblunt wrote:
           | There was certainly no difference in their use the late 1980s
           | working on early shared memory multiprocessors fulltime at a
           | big academic/industrial collaboration.
           | 
           | 20 years later the younger kids with PhDs would distinguish,
           | and i realized the terms had specialized in a way I'd not
           | noticed.
        
           | CogitoCogito wrote:
           | I think it's best to be aware of the differences, but not
           | overly pedantic about it. If you legitimately don't know what
           | people mean, then ask, but otherwise just accept that people
           | are sometimes a bit unclear about it. I know the difference
           | between all these things, but I'll often mix up terminology
           | as well.
        
         | paskozdilar wrote:
         | Concepts such as "multi-threading" and "async/await" are
         | particular implementations of concurrency and, potentially,
         | parallelism. They are often used by people who are used to a
         | particular implementation and don't see a reason to use any
         | more abstract word.
         | 
         | But mathematically speaking, "concurrency" and "parallelism"
         | are very much related - by definition, in any concurrent
         | algorithm, there are branches of computation that can be safely
         | parallelized; and every parallel algorithm is, by definition,
         | concurrent.
         | 
         | The difference between the two, I think, is that parallelism
         | colloquialy implies parallel execution (e.g. OpenMP, Numpy),
         | while concurrency does not (e.g. NodeJS) - concurrency is just
         | independence of computation.
         | 
         | That's the reason functional languages generally excel at
         | concurrency - by having explicit dependencies between
         | computational objects, the compiler can just topologically sort
         | the dependency graph, and trivially find which branches of
         | computation can be parallelized.
        
           | User23 wrote:
           | As I understand it the unifying concept is nondeterminism.
           | It's a logical framework that can make sense of parallelism,
           | concurrency, and interrupts.
           | 
           | Nondeterministic doesn't mean unpredictable. For example a
           | pure functional language without side effects is free to
           | evaluate function parameters in any order and still always
           | return the same result.
           | 
           | Other times a nondeterministic algorithm can produce
           | different results each execution but still produce a result
           | that satisfies some predicate. For example iterating over a
           | Go hash makes no promises about the order elements will be
           | visited in, but it does promise they will all be visited. So
           | if your reduce function is commutative and associative you'll
           | get the same result regardless of ordering.
        
         | bob1029 wrote:
         | Whenever I find myself wrapped around a post on these topics, I
         | force myself back to 1 fundamental question:
         | 
         | > Will this action/design increase or decrease the amount of
         | latency incurred on the hot path relative to the physical
         | layout of my processor.
         | 
         | So far, keeping an eye on this has guided me well. There are no
         | silver bullets either. async/await is elegant, but it incurrs
         | substantial latency overhead compared to busy waiting over
         | batches of work (which itself has obvious caveats).
        
         | agumonkey wrote:
         | IMO it's a matter of logical/functional dependency. Parallelism
         | implies no deps, therefore full overlap is possible.
         | Concurrency less so, so you need joins/rendez-vous.
        
         | [deleted]
        
         | dragontamer wrote:
         | I think I finally figured it out recently.
         | 
         | Concurrency is an I/O parallelism technique. One core can make
         | 10,000 TCP connections at the same time thanks to async, golang
         | lightweight threads and whatnot.
         | 
         | This is useful because many I/O operations are high latency. A
         | TCP connection, like HTTPS, takes high latency but the I/O
         | device supports many operations.
         | 
         | -------
         | 
         | If you are I/O latency bound, you use concurrency to increase
         | performance. Even with one CPU, concurrency or I/O parallelism
         | helps a lot.
         | 
         | But if you are CPU bound, you need to use totally different
         | techniques (parallel processing)
        
           | pengaru wrote:
           | Coroutines are a form of concurrency, and they _may_ be
           | executed in parallel if scheduled on separate cores /threads,
           | but usually are just cooperatively scheduled serially.
           | Concurrency is not an IO-specific concept.
        
             | dragontamer wrote:
             | The issue is that splitting up work into different blocks
             | wrecks your L1 cache and locality of data.
             | 
             | So if your code is:                   A(data)
             | B(data)         C(data)
             | 
             | Doing A(data1), B(data1), C(data1) then A(data2), B(data2),
             | C(data2) is much much faster than A(data1), A(data2),
             | B(data1), B(data2), C(data1), C(data2).
             | 
             | In most cases anyway (assuming data1 and data2 have a
             | portion that fits somewhere in the cache hierarchy).
             | 
             | So even if you are doing parallel execution, it ends up
             | being a lot slower than sequential.
             | 
             | --------
             | 
             | The goal of parallel processing is to actually get faster
             | in terms of CPU time. Just grabbing concurrency techniques
             | can be counterproductive.
        
           | paskozdilar wrote:
           | You're on a right track - the most useful application of
           | concurrency nowdays is I/O, because of the order-of-magnitude
           | larger latency than CPU.
           | 
           | However, concurrency is much more than IO. Concurrency can be
           | best defined as a lack of ordering of computation. For
           | example, you want to compute A, B, C and their values are
           | independent of each other, the order you compute them does
           | not matter - that is concurrency. If your language has means
           | of expressing "compute these things, i dont care about the
           | order", then your language supports concurrency.
           | 
           | I suggest reading Communicating Sequential Processes by Tony
           | Hoare. It is a very influential work on concurrency - and the
           | proof of its influence is the design of the Go programming
           | language.
        
           | Jtsummers wrote:
           | The current mainstream distinction seems to be:
           | 
           | Parallelism - _actually_ running (or intended to run) in
           | parallel. A lot of computational work is like this.
           | 
           | Concurrency - _may_ run in parallel, but is about design as
           | much, if not more, than execution.
           | 
           | Concurrency is a way (in the current mainstream sense) to
           | decompose a program into multiple "processes" (generic sense)
           | much like we decompose programs into functions, procedures,
           | and objects in other languages. I think it was Rob Pike's
           | talk "Concurrency is not parallelism" which pushed this
           | distinction further into the mainstream.
           | 
           | https://go.dev/blog/waza-talk - Concurrency is not
           | parallelsim, Rob Pike
           | 
           | Go, Erlang (later Elixir), and Clojure in particular seem to
           | have pushed the idea of using concurrency (above sense) in
           | the design of programs into the mainstream. Specifically
           | bringing concurrency into "the small". Concurrency is an
           | innate quality of networked programs, but _within_ a program
           | people often write sequential (non-concurrent) code and only
           | add concurrency elements to access parallelism for
           | performance (vice adding them because they clarify the code
           | or its design). But many problems are neatly decomposed into
           | well-performing concurrent designs that _may_ get a
           | performance boost if you have multiple cores available, but
           | it isn 't (strictly) the point of using concurrency.
        
       ___________________________________________________________________
       (page generated 2022-05-08 23:02 UTC)