[HN Gopher] How concurrecy works: A visual guide
       ___________________________________________________________________
        
       How concurrecy works: A visual guide
        
       Author : simplegeek
       Score  : 147 points
       Date   : 2024-12-18 13:22 UTC (2 days ago)
        
 (HTM) web link (wyounas.github.io)
 (TXT) w3m dump (wyounas.github.io)
        
       | kolektiv wrote:
       | Is there a deadlocked thread that should have rendered the 'n'?
        
       | tapland wrote:
       | Is text written an in an image field more visual than other text?
        
         | jazzyjackson wrote:
         | I don't know, are arrows textual?
        
       | neonsunset wrote:
       | space state The really visual of explosion is demonstration nice!
        
       | 7bit wrote:
       | This is a text book, not a visual guide. Visual does not mean
       | text in images, which this is.
        
       | tansan wrote:
       | Better title:
       | 
       | > How concurrecy works: A text guide
        
         | suprfsat wrote:
         | > concurrecy
        
       | cess11 wrote:
       | OK, great, formal verification gives some guarantees, that sounds
       | nice and all. It's just that all the academic math nerds that are
       | fluent in the checking tools are busy getting payed much more
       | than average software developers, and I have a suspicion that
       | they'll get less done per year in terms of features and
       | refactorings of the code that immediately makes the monies flow.
       | 
       | I find the BEAM/OTP process model relatively simple to use, and
       | it moves some of the friction from logic bugs to a throughput
       | problem.
        
         | neonsunset wrote:
         | Such model has usability shortcomings for tightly interleaving
         | concurrent code often encountered in high-performance
         | applications or even in business code which does not want to
         | maintain its worker pool or spool up a new BEAM process every
         | time it needs to send two HTTP requests at the same time.
        
           | spinningslate wrote:
           | > even in business code which does not want to maintain its
           | worker pool or spool up a new BEAM process every time it
           | needs to send two HTTP requests at the same time.
           | 
           | Just... no. First off, I'll bet a vanishingly small number of
           | application developers on BEAM languages do anything to
           | manage their own worker pools. The BEAM does it for them:
           | it's one of its core strengths. You also imply that "spinning
           | up a new BEAM process" has significant overhead, either
           | computationally or cognitively. That, again, is false.
           | Spinning up processes is intrinsic to Erlang/Elixir/Gleam and
           | other BEAM languages. It's encouraged, and the BEAM has been
           | refined over the years to make it fast and reliable. There
           | are mature, robust facilities for doing so - as well as
           | communicating among concurrent processes during their
           | execution and/or retrieving results at termination.
           | 
           | You've made clear before that you believe async/await is a
           | superior concurrency model to the BEAM process-based
           | approach. Or, more specifically, async/await as implemented
           | in .Net [1]. Concurrent programming is still in its infancy,
           | and it's not yet clear which model(s) will win out. Your
           | posts describing .Net's async approach are helpful
           | contributions to the discussion. Statements such as that
           | quoted above are not.
           | 
           | [1]: https://news.ycombinator.com/item?id=40427935
           | 
           | --
           | 
           | EDIT: fixed typo.
        
             | neonsunset wrote:
             | > You also imply that "spinning up a new BEAM process" has
             | significant overhead, either computationally or
             | cognitively. That, again, is false.
             | 
             | Spawning 1M processes with Elixir is anywhere between ~2.7
             | and 4 GiB with significant CPU overhead. Alternatives are
             | much more efficient at both:
             | https://hez2010.github.io/async-runtimes-
             | benchmarks-2024/tak... (a revised version).
             | 
             | Go, for example, probably does not want you to spawn the
             | Goroutines that are _too_ short-lived as each one carries
             | at least 2KiB of allocations by default which is not very
             | kind to Go 's GC (or any GC or allocator really). So it
             | expects you to be at least somewhat modest at such
             | concurrency patterns.
             | 
             | In Erlang and Elixir the process model implementation
             | follows similar design choices. However, Elixir lets you
             | idiomatically await task completion - its approach to
             | concurrency is much more pleasant to use and less prone to
             | user error. Unfortunately, the per-process cost remains
             | very Goroutine-like and is the highest among concurrency
             | abstractions as implemented by other languages. Unlike Go,
             | it is also very CPU-heavy and when the high amount of
             | processes are getting spawned and exit quickly, it takes
             | quite a bit of overhead for the runtime to keep up.
             | 
             | Sure, BEAM languages are a poor fit for compute-intensive
             | tasks without bridging to NIFs but, at least in my personal
             | experience, the use cases for "task interleaving" are
             | everywhere. They are just naturally missed because usually
             | the languages do not make it terse and/or cheap - you need
             | to go out of your way to dispatch operations concurrently
             | or in parallel, so the vast majority doesn't bother, even
             | after switching to languages where it is easy and
             | encouraged.
             | 
             | (I'm not stating that async/await model in .NET is the
             | superior option, I just think it has good tradeoffs for the
             | convenience and efficiency it brings. Perhaps a more modern
             | platform will come up with inverted approach where async
             | tasks/calls are awaited by default without syntax noise and
             | the cost of doing "fork" (or "go" if you will) and then
             | "join" is the same as with letting the task continue in
             | parallel and then awaiting it, without the downsides of
             | virtual threads in what we have today)
        
               | cess11 wrote:
               | Looking forward to you doing a Show HN with a TLA+
               | powered C# Whatsapp killer or somesuch. Maybe out-compete
               | Cisco after that?
        
           | cess11 wrote:
           | By "at the same time" do you mean that the execution starts
           | at exactly the same time on two adjacent cores, or do you
           | mean that they might as well be in the same queue and done
           | sequentially?
           | 
           | What do you mean by "tightly interleaving"? It sounds like a
           | technical term of some importance but when I web search it
           | seems people are saying it means that the order of execution
           | or data access is arbitrary, which you can let the built-in
           | preemptive multitasking or a queue with a worker pool figure
           | out for you.
           | 
           | In the alternatives you have in mind, how good is the
           | monitoring, observability and IPC tooling? What's their
           | hardware clustering story? I've done IPC in POSIX and that
           | was absolutely dreadful so I hope you're working with
           | something better than that at least.
        
       | cjfd wrote:
       | These kinds of analyses are all nice and funny and so on but the
       | issue here is that on real computers it does not quite work like
       | this. One thing is that an optimizer may change the order in
       | which statements are executed and then all guarantees go out of
       | the window. Another is when writing to main memory the hardware
       | may also reorder writes. So the whole process is filled with
       | gotchas on every level. What one should remember is that if
       | multiple threads do things with the same memory location where
       | one of the 'things' that are done is writing, it is always wrong.
       | The way to fix it then is to use the appropriate protection.
       | E.g., use a mutex or use an atomic variable.
        
         | vishnugupta wrote:
         | Heh this reminds me of a really good book, The Art of
         | Multiprocessor Programming [1]. It has two parts, the first one
         | goes quite a hit into the theory. And the second part begins
         | with a real example where the theory doesn't hold and it's
         | aptly titled "Welcome to The Real World".
         | 
         | [1]
         | https://github.com/amilajack/reading/blob/master/Computer_Sc...
        
           | ignoramous wrote:
           | I find Andrew McFadden's _Symmetric Multi-Processor Primer
           | for Android_
           | (https://developer.android.com/training/articles/smp) a good
           | warm-up to the book.
        
       | dragontamer wrote:
       | Therein lies SIMDs advantage.
       | 
       | The instruction pointer is all synchronized, providing you with
       | fewer states to reason about.
       | 
       | Then GPUs mess that up by letting us run blocks/thread groups
       | independently, but now GPUs have highly efficient barrier
       | instructions that line everyone back up.
       | 
       | It turns out that SIMDs innate assurances of instruction
       | synchronization at the SIMD lane level is why warp based coding /
       | wavefront coding is so efficient though, as none of those
       | barriers are necessary anymore.
        
         | titzer wrote:
         | SIMD and thread-level parallelism solve completely different
         | problems.
        
           | dragontamer wrote:
           | Ish.
           | 
           | We use threads to solve all kinds of things, including 'More
           | Compute'.
           | 
           | SIMD is limited to 'More Compute' (unable to process I/O like
           | sockets concurrently, or other such thread patterns). But as
           | it turns out, more compute is a problem that many programmers
           | are still interested in.
           | 
           | Similarly, you can use Async patterns for the I/O problem
           | (which seems to be more efficient anyway than threads).
           | 
           | --------
           | 
           | So when we think about a 2024 style program, you'd have SIMD
           | for compute limited problems (Neural Nets, Matricies,
           | Raytracing). Then Async for Sockets, I/O, etc. etc.
           | 
           | Which puts traditional threads in this weird jack of trades
           | position: not as good as SIMD methods for raw compute. Not as
           | good as Async for I/O. But threads do both.
           | 
           | Fortunately, there seem to be problems with both a lot of I/O
           | and a lot of compute involved simultaneously.
        
             | titzer wrote:
             | It's not just I/O, it's data pipelining. Threads can be
             | used to do a lot of different kinds of compute in parallel.
             | For example, one could _pipeline_ a multi-step computation,
             | like a compiler: make one thread for parsing, one for
             | typechecking, one for optimizing, and one for codegening,
             | and then have function move as work packages between
             | threads. Or, one could have many threads doing each stage
             | in serial for different functions in parallel. Threads give
             | programmers the flexibility to do a wide variety of
             | parallel processing (and sometimes even get it right).
             | 
             | IMHO the jury is still out on whether async I/O is worth
             | it, either in terms of performance or the potential
             | complexity that applications might incur in trying to do it
             | via callback hell. Many programmers find synchronous I/O to
             | be a really, really intuitive programming model, and the
             | lowest levels of the software stack (i.e. syscalls) are
             | almost always synchronous.
        
               | dragontamer wrote:
               | Fair. It's not like GPUs are entirely SIMD (and as I said
               | in a sibling post, I agree that GPUs have substantial
               | traditional threads involved).
               | 
               | -------
               | 
               | But let's zoom into Raytracing for a minute. Intel's
               | Raytrace (and indeed, the DirectX model of Raytracing) is
               | for Ray Dispatches to be consolidated in rather intricate
               | ways.
               | 
               | Intel will literally move the stack between SIMD lanes,
               | consolidating rays into shared misses and shared hits (to
               | minimize branch divergence).
               | 
               | There's some new techniques being presented here in
               | today's SIMD models that cannot easily be described by
               | the traditional threading models.
        
               | vacuity wrote:
               | The ability to directly program for asynchronous
               | phenomena is definitely worth it[0]. Something like
               | scheduler activations, which imbues this into the
               | threading interface, is just better than either construct
               | without the other. The main downside is complexity; I
               | think we will continuously improve on this but it will
               | always be more complex than the inevitably-less-nimble
               | synchronous version. Still, we got io_uring for a reason.
               | 
               | [0] https://news.ycombinator.com/item?id=42221316
        
         | CyberDildonics wrote:
         | Optimizing isn't either SIMD or large scale multi-threading.
         | You need both, which is why CPUs and GPUs both use both
         | techniques.
        
           | dragontamer wrote:
           | Fair enough.
           | 
           | GPUs in particular have a very hyperthread/SMT like model
           | where multiple true threads (aka instruction pointers) are
           | juggled while waiting for RAM to respond.
           | 
           | Still, the intermediate organizational step where SIMD gives
           | you a simpler form of parallelism is underrated and
           | understudied IMO.
        
       | rubee64 wrote:
       | *concurrency
        
       | titzer wrote:
       | The problem with this article, though it presents model checking
       | and state space exploration well, is that it doesn't present weak
       | memory models[1], which are the reality on all
       | multiprocessor/multicore systems these days. Thus is perpetuates
       | misconceptions about concurrent programming. This is a poor
       | example of model checking because it assumes sequential
       | consistency and actually confuses people by reinforcing incorrect
       | assumptions.
       | 
       | [1] With weak memory hardware, it is effectively impossible to
       | write correct programs without using at least one of atomic read-
       | modify-write or barriers, because both compilers and hardware can
       | reorder both reads and writes, to a maddening degree.
       | 
       | (I previously posted this comment as a reply to a comment that
       | got flagged.)
        
         | lala_lala wrote:
         | As I read it, it felt more like an exploration from state space
         | stand point (and not an example of model checking) which to me
         | sounded quite reasonable. Unusual and intuitive I'd say.
        
           | fn-mote wrote:
           | The author does start talking about model checking in the
           | third paragraph and go on using "SPIN", so there's a
           | significant part that is interested in model checking,
           | anyway.
           | 
           | I can see where the parent is coming from.
           | 
           | I think you can both be right - it can be valuable in any
           | case.
        
         | rramadass wrote:
         | > Thus is perpetuates misconceptions about concurrent
         | programming ... it assumes sequential consistency and actually
         | confuses people by reinforcing incorrect assumptions.
         | 
         | That's not quite true. The article explicitly mentions problems
         | with _interleaving_ of instructions between two processes (aka
         | threads) which is the fundamental problem of concurrency. If
         | you consider only a single thread in isolation its _program
         | order_ is always maintained even though compiler optimizations
         | and hardware OOO execution actually execute the instructions in
         | _data order_.
         | 
         |  _Consistency Model_ -
         | https://en.wikipedia.org/wiki/Consistency_model
        
           | threatofrain wrote:
           | IMO some of the confusion is in conflating execution with the
           | natural properties of a program. Concurrency is the property
           | of a program or algorithm, and this property allows you to
           | execute the program with partial or complete re-ordering
           | (which is the peak of what some people call embarrassingly
           | parallel). How you wish to execute on this property is up to
           | you.
        
             | rramadass wrote:
             | Concurrency should always be studied from higher-level
             | logical program/algorithm execution properties with a focus
             | on both shared-memory and message-passing architectures.
             | Only after the above should one even look at lower-level
             | hardware memory consistency model, cache coherence model,
             | atomic instructions, memory barriers etc. For example,
             | learning to use a logical shared-memory "Mutex" is far
             | easier than the fact that it is internally implemented
             | using atomic instructions, memory barriers etc.
             | 
             | Some relevant details in a previous comment of mine -
             | https://news.ycombinator.com/item?id=42422867
             | 
             | See also this book -
             | https://news.ycombinator.com/item?id=42472334
        
               | titzer wrote:
               | > Concurrency should always be studied
               | 
               | I disagree. If we want people to _understand_ , then we
               | should teach computer systems bottom-up and thus
               | understand the bottom-level behaviors before being
               | surprised that our bad (i.e. wrong) abstractions are
               | leaking later. If you want to train programmers to
               | accomplish parallel programming tasks via plugging
               | together nice abstractions, that's all well and good, but
               | that's a completely different kind of instruction than
               | trying to teach them how things work. If we only do the
               | latter then we'll never have experts again.
        
               | rramadass wrote:
               | In my previous comment i linked to an earlier comment of
               | mine which already pointed it out, so this is not
               | revelatory. W.r.t. concurrency however, the logical
               | approach should precede lower-level implementation
               | approach since there is lots more complexity involved in
               | the latter. Mutexes/Semaphores/Condition Variables etc.
               | are not bad/wrong abstractions but necessary correct
               | ones. They are fundamental to understanding/managing
               | concurrency itself.
        
               | vacuity wrote:
               | The issue is that logically correct algorithms can indeed
               | be made, but actually making them work in real conditions
               | is its own problem. It's not necessarily wrong to explore
               | more exotic algorithms, but a focus on algorithms that
               | people should actually use in their given situations
               | seems higher priority. We should be working on bringing
               | useful concurrency to our programs right now.
               | 
               | And while I will agree that mutexes and the like are
               | useful in their own right, they are not the most
               | fundamental for understanding concurrency.
        
           | titzer wrote:
           | > The article explicitly mentions problems with interleaving
           | of instructions between two processes (aka threads) which is
           | the fundamental problem of concurrency.
           | 
           | Again, with weak memory models, writes from another thread
           | can be observed _out of order_ due to hardware reordering,
           | even without compiler optimizations. With weak memory models,
           | there can be behaviors that correspond to _no_ valid
           | interleaving of two threads.
           | 
           | I am explicitly pointing out that what you are assuming,
           | along with the author, what is often called _sequential
           | consistency_
           | (https://en.wikipedia.org/wiki/Sequential_consistency) hasn't
           | been true of hardware for decades. It's a common
           | _misconception_ that most systems obey sequential
           | consistency. Your comment just seems to repeat that again,
           | and it isn 't true. See here
           | (https://preshing.com/20120930/weak-vs-strong-memory-models/)
           | for a deeper explanation.
        
             | rramadass wrote:
             | I used the phrase "single thread in isolation" to simply
             | point out that parallelization is going on underneath (via
             | optimization/OOO/superscaler) for that thread even in a
             | single-threaded scenario but not that it always implies
             | sequential consistency in a multi-threaded scenario. That
             | was the thing i disagreed with in your comment where you
             | said the author assumed sequential consistency only which
             | is not what i got. In my linked to previous comment i
             | explicitly said this;
             | 
             |  _Once you start thinking about a program as a sequence of
             | loads /stores (i.e. reads/writes to shared memory) and note
             | that Pipelining/OOO/Superscalar are techniques to
             | parallelize these (and other) instructions for a single
             | thread of control you start getting an idea of how
             | sequential order can be preserved though the actual
             | execution is not quite so._
        
           | cogman10 wrote:
           | > If you consider only a single thread in isolation its
           | program order is always maintained even though compiler
           | optimizations and hardware OOO execution actually execute the
           | instructions in data order.
           | 
           | It's not in most programming languages. That's actually a big
           | part of the linked Wikipedia article is when and how program
           | order is allowed to be modified.
           | 
           | A great portion of the JVM memory model definition is
           | dedicated to laying out exactly when and how read/write
           | barriers are established and when/how ordering is enforced.
           | Without any sort of barrier (in most cases) the JVM is fairly
           | free to completely reorder program execution.
        
             | rramadass wrote:
             | Not what i meant -
             | https://news.ycombinator.com/item?id=42474751
        
       | rramadass wrote:
       | Coincidentally, i am currently going through _Distributed
       | Algorithms: An Intuitive Approach_ (second edition) by Wan
       | Fokkink. It seems really good and is a catalog of classic
       | algorithms for both Shared-Memory and Message-Passing
       | architectures. It is a rather slim book with explanations and
       | precise mathematical notation. Pseudo-code for a subset of these
       | algorithms are given in the appendix.
       | 
       | The 1st edition divides the algorithms under two broad sections
       | viz; "Message Passing" (eg. chapter Mutual Exclusion) and "Shared
       | Memory" (eg. chapter Mutual Exclusion II) while the 2nd edition
       | removes these section headings and simply groups under functional
       | categories (eg. both the above under one chapter Mutual
       | Exclusion).
       | 
       | Book website -
       | https://mitpress.mit.edu/9780262037662/distributed-algorithm...
        
         | selimthegrim wrote:
         | Thank you for the book recommendations.
        
       | pphysch wrote:
       | > If you're implementing without formally verifying your solution
       | through model checking, you only think you're implementing it
       | correctly.
       | 
       | This is a comforting idea, but enumerating every possible state
       | of a real-world (integrated/distributed) software system is a
       | fool's task.
       | 
       | Spend less time on formal verification (fancy word for
       | perfectionism) and more time on risk analysis and cybernetics.
       | Instead of praying that nothing ever goes wrong, plan for faults
       | and design systems that integrate human and machine to respond
       | rapidly and effectively.
       | 
       | "Correctable" is a much more practical target than "always
       | correct".
        
       | divan wrote:
       | A bit more visual guide:
       | 
       | Visualizing Concurrency in Go (2016)
       | 
       | https://divan.dev/posts/go_concurrency_visualize/
        
       ___________________________________________________________________
       (page generated 2024-12-20 23:01 UTC)