[HN Gopher] How concurrecy works: A visual guide
___________________________________________________________________
How concurrecy works: A visual guide
Author : simplegeek
Score : 147 points
Date : 2024-12-18 13:22 UTC (2 days ago)
(HTM) web link (wyounas.github.io)
(TXT) w3m dump (wyounas.github.io)
| kolektiv wrote:
| Is there a deadlocked thread that should have rendered the 'n'?
| tapland wrote:
| Is text written an in an image field more visual than other text?
| jazzyjackson wrote:
| I don't know, are arrows textual?
| neonsunset wrote:
| space state The really visual of explosion is demonstration nice!
| 7bit wrote:
| This is a text book, not a visual guide. Visual does not mean
| text in images, which this is.
| tansan wrote:
| Better title:
|
| > How concurrecy works: A text guide
| suprfsat wrote:
| > concurrecy
| cess11 wrote:
| OK, great, formal verification gives some guarantees, that sounds
| nice and all. It's just that all the academic math nerds that are
| fluent in the checking tools are busy getting payed much more
| than average software developers, and I have a suspicion that
| they'll get less done per year in terms of features and
| refactorings of the code that immediately makes the monies flow.
|
| I find the BEAM/OTP process model relatively simple to use, and
| it moves some of the friction from logic bugs to a throughput
| problem.
| neonsunset wrote:
| Such model has usability shortcomings for tightly interleaving
| concurrent code often encountered in high-performance
| applications or even in business code which does not want to
| maintain its worker pool or spool up a new BEAM process every
| time it needs to send two HTTP requests at the same time.
| spinningslate wrote:
| > even in business code which does not want to maintain its
| worker pool or spool up a new BEAM process every time it
| needs to send two HTTP requests at the same time.
|
| Just... no. First off, I'll bet a vanishingly small number of
| application developers on BEAM languages do anything to
| manage their own worker pools. The BEAM does it for them:
| it's one of its core strengths. You also imply that "spinning
| up a new BEAM process" has significant overhead, either
| computationally or cognitively. That, again, is false.
| Spinning up processes is intrinsic to Erlang/Elixir/Gleam and
| other BEAM languages. It's encouraged, and the BEAM has been
| refined over the years to make it fast and reliable. There
| are mature, robust facilities for doing so - as well as
| communicating among concurrent processes during their
| execution and/or retrieving results at termination.
|
| You've made clear before that you believe async/await is a
| superior concurrency model to the BEAM process-based
| approach. Or, more specifically, async/await as implemented
| in .Net [1]. Concurrent programming is still in its infancy,
| and it's not yet clear which model(s) will win out. Your
| posts describing .Net's async approach are helpful
| contributions to the discussion. Statements such as that
| quoted above are not.
|
| [1]: https://news.ycombinator.com/item?id=40427935
|
| --
|
| EDIT: fixed typo.
| neonsunset wrote:
| > You also imply that "spinning up a new BEAM process" has
| significant overhead, either computationally or
| cognitively. That, again, is false.
|
| Spawning 1M processes with Elixir is anywhere between ~2.7
| and 4 GiB with significant CPU overhead. Alternatives are
| much more efficient at both:
| https://hez2010.github.io/async-runtimes-
| benchmarks-2024/tak... (a revised version).
|
| Go, for example, probably does not want you to spawn the
| Goroutines that are _too_ short-lived as each one carries
| at least 2KiB of allocations by default which is not very
| kind to Go 's GC (or any GC or allocator really). So it
| expects you to be at least somewhat modest at such
| concurrency patterns.
|
| In Erlang and Elixir the process model implementation
| follows similar design choices. However, Elixir lets you
| idiomatically await task completion - its approach to
| concurrency is much more pleasant to use and less prone to
| user error. Unfortunately, the per-process cost remains
| very Goroutine-like and is the highest among concurrency
| abstractions as implemented by other languages. Unlike Go,
| it is also very CPU-heavy and when the high amount of
| processes are getting spawned and exit quickly, it takes
| quite a bit of overhead for the runtime to keep up.
|
| Sure, BEAM languages are a poor fit for compute-intensive
| tasks without bridging to NIFs but, at least in my personal
| experience, the use cases for "task interleaving" are
| everywhere. They are just naturally missed because usually
| the languages do not make it terse and/or cheap - you need
| to go out of your way to dispatch operations concurrently
| or in parallel, so the vast majority doesn't bother, even
| after switching to languages where it is easy and
| encouraged.
|
| (I'm not stating that async/await model in .NET is the
| superior option, I just think it has good tradeoffs for the
| convenience and efficiency it brings. Perhaps a more modern
| platform will come up with inverted approach where async
| tasks/calls are awaited by default without syntax noise and
| the cost of doing "fork" (or "go" if you will) and then
| "join" is the same as with letting the task continue in
| parallel and then awaiting it, without the downsides of
| virtual threads in what we have today)
| cess11 wrote:
| Looking forward to you doing a Show HN with a TLA+
| powered C# Whatsapp killer or somesuch. Maybe out-compete
| Cisco after that?
| cess11 wrote:
| By "at the same time" do you mean that the execution starts
| at exactly the same time on two adjacent cores, or do you
| mean that they might as well be in the same queue and done
| sequentially?
|
| What do you mean by "tightly interleaving"? It sounds like a
| technical term of some importance but when I web search it
| seems people are saying it means that the order of execution
| or data access is arbitrary, which you can let the built-in
| preemptive multitasking or a queue with a worker pool figure
| out for you.
|
| In the alternatives you have in mind, how good is the
| monitoring, observability and IPC tooling? What's their
| hardware clustering story? I've done IPC in POSIX and that
| was absolutely dreadful so I hope you're working with
| something better than that at least.
| cjfd wrote:
| These kinds of analyses are all nice and funny and so on but the
| issue here is that on real computers it does not quite work like
| this. One thing is that an optimizer may change the order in
| which statements are executed and then all guarantees go out of
| the window. Another is when writing to main memory the hardware
| may also reorder writes. So the whole process is filled with
| gotchas on every level. What one should remember is that if
| multiple threads do things with the same memory location where
| one of the 'things' that are done is writing, it is always wrong.
| The way to fix it then is to use the appropriate protection.
| E.g., use a mutex or use an atomic variable.
| vishnugupta wrote:
| Heh this reminds me of a really good book, The Art of
| Multiprocessor Programming [1]. It has two parts, the first one
| goes quite a hit into the theory. And the second part begins
| with a real example where the theory doesn't hold and it's
| aptly titled "Welcome to The Real World".
|
| [1]
| https://github.com/amilajack/reading/blob/master/Computer_Sc...
| ignoramous wrote:
| I find Andrew McFadden's _Symmetric Multi-Processor Primer
| for Android_
| (https://developer.android.com/training/articles/smp) a good
| warm-up to the book.
| dragontamer wrote:
| Therein lies SIMDs advantage.
|
| The instruction pointer is all synchronized, providing you with
| fewer states to reason about.
|
| Then GPUs mess that up by letting us run blocks/thread groups
| independently, but now GPUs have highly efficient barrier
| instructions that line everyone back up.
|
| It turns out that SIMDs innate assurances of instruction
| synchronization at the SIMD lane level is why warp based coding /
| wavefront coding is so efficient though, as none of those
| barriers are necessary anymore.
| titzer wrote:
| SIMD and thread-level parallelism solve completely different
| problems.
| dragontamer wrote:
| Ish.
|
| We use threads to solve all kinds of things, including 'More
| Compute'.
|
| SIMD is limited to 'More Compute' (unable to process I/O like
| sockets concurrently, or other such thread patterns). But as
| it turns out, more compute is a problem that many programmers
| are still interested in.
|
| Similarly, you can use Async patterns for the I/O problem
| (which seems to be more efficient anyway than threads).
|
| --------
|
| So when we think about a 2024 style program, you'd have SIMD
| for compute limited problems (Neural Nets, Matricies,
| Raytracing). Then Async for Sockets, I/O, etc. etc.
|
| Which puts traditional threads in this weird jack of trades
| position: not as good as SIMD methods for raw compute. Not as
| good as Async for I/O. But threads do both.
|
| Fortunately, there seem to be problems with both a lot of I/O
| and a lot of compute involved simultaneously.
| titzer wrote:
| It's not just I/O, it's data pipelining. Threads can be
| used to do a lot of different kinds of compute in parallel.
| For example, one could _pipeline_ a multi-step computation,
| like a compiler: make one thread for parsing, one for
| typechecking, one for optimizing, and one for codegening,
| and then have function move as work packages between
| threads. Or, one could have many threads doing each stage
| in serial for different functions in parallel. Threads give
| programmers the flexibility to do a wide variety of
| parallel processing (and sometimes even get it right).
|
| IMHO the jury is still out on whether async I/O is worth
| it, either in terms of performance or the potential
| complexity that applications might incur in trying to do it
| via callback hell. Many programmers find synchronous I/O to
| be a really, really intuitive programming model, and the
| lowest levels of the software stack (i.e. syscalls) are
| almost always synchronous.
| dragontamer wrote:
| Fair. It's not like GPUs are entirely SIMD (and as I said
| in a sibling post, I agree that GPUs have substantial
| traditional threads involved).
|
| -------
|
| But let's zoom into Raytracing for a minute. Intel's
| Raytrace (and indeed, the DirectX model of Raytracing) is
| for Ray Dispatches to be consolidated in rather intricate
| ways.
|
| Intel will literally move the stack between SIMD lanes,
| consolidating rays into shared misses and shared hits (to
| minimize branch divergence).
|
| There's some new techniques being presented here in
| today's SIMD models that cannot easily be described by
| the traditional threading models.
| vacuity wrote:
| The ability to directly program for asynchronous
| phenomena is definitely worth it[0]. Something like
| scheduler activations, which imbues this into the
| threading interface, is just better than either construct
| without the other. The main downside is complexity; I
| think we will continuously improve on this but it will
| always be more complex than the inevitably-less-nimble
| synchronous version. Still, we got io_uring for a reason.
|
| [0] https://news.ycombinator.com/item?id=42221316
| CyberDildonics wrote:
| Optimizing isn't either SIMD or large scale multi-threading.
| You need both, which is why CPUs and GPUs both use both
| techniques.
| dragontamer wrote:
| Fair enough.
|
| GPUs in particular have a very hyperthread/SMT like model
| where multiple true threads (aka instruction pointers) are
| juggled while waiting for RAM to respond.
|
| Still, the intermediate organizational step where SIMD gives
| you a simpler form of parallelism is underrated and
| understudied IMO.
| rubee64 wrote:
| *concurrency
| titzer wrote:
| The problem with this article, though it presents model checking
| and state space exploration well, is that it doesn't present weak
| memory models[1], which are the reality on all
| multiprocessor/multicore systems these days. Thus is perpetuates
| misconceptions about concurrent programming. This is a poor
| example of model checking because it assumes sequential
| consistency and actually confuses people by reinforcing incorrect
| assumptions.
|
| [1] With weak memory hardware, it is effectively impossible to
| write correct programs without using at least one of atomic read-
| modify-write or barriers, because both compilers and hardware can
| reorder both reads and writes, to a maddening degree.
|
| (I previously posted this comment as a reply to a comment that
| got flagged.)
| lala_lala wrote:
| As I read it, it felt more like an exploration from state space
| stand point (and not an example of model checking) which to me
| sounded quite reasonable. Unusual and intuitive I'd say.
| fn-mote wrote:
| The author does start talking about model checking in the
| third paragraph and go on using "SPIN", so there's a
| significant part that is interested in model checking,
| anyway.
|
| I can see where the parent is coming from.
|
| I think you can both be right - it can be valuable in any
| case.
| rramadass wrote:
| > Thus is perpetuates misconceptions about concurrent
| programming ... it assumes sequential consistency and actually
| confuses people by reinforcing incorrect assumptions.
|
| That's not quite true. The article explicitly mentions problems
| with _interleaving_ of instructions between two processes (aka
| threads) which is the fundamental problem of concurrency. If
| you consider only a single thread in isolation its _program
| order_ is always maintained even though compiler optimizations
| and hardware OOO execution actually execute the instructions in
| _data order_.
|
| _Consistency Model_ -
| https://en.wikipedia.org/wiki/Consistency_model
| threatofrain wrote:
| IMO some of the confusion is in conflating execution with the
| natural properties of a program. Concurrency is the property
| of a program or algorithm, and this property allows you to
| execute the program with partial or complete re-ordering
| (which is the peak of what some people call embarrassingly
| parallel). How you wish to execute on this property is up to
| you.
| rramadass wrote:
| Concurrency should always be studied from higher-level
| logical program/algorithm execution properties with a focus
| on both shared-memory and message-passing architectures.
| Only after the above should one even look at lower-level
| hardware memory consistency model, cache coherence model,
| atomic instructions, memory barriers etc. For example,
| learning to use a logical shared-memory "Mutex" is far
| easier than the fact that it is internally implemented
| using atomic instructions, memory barriers etc.
|
| Some relevant details in a previous comment of mine -
| https://news.ycombinator.com/item?id=42422867
|
| See also this book -
| https://news.ycombinator.com/item?id=42472334
| titzer wrote:
| > Concurrency should always be studied
|
| I disagree. If we want people to _understand_ , then we
| should teach computer systems bottom-up and thus
| understand the bottom-level behaviors before being
| surprised that our bad (i.e. wrong) abstractions are
| leaking later. If you want to train programmers to
| accomplish parallel programming tasks via plugging
| together nice abstractions, that's all well and good, but
| that's a completely different kind of instruction than
| trying to teach them how things work. If we only do the
| latter then we'll never have experts again.
| rramadass wrote:
| In my previous comment i linked to an earlier comment of
| mine which already pointed it out, so this is not
| revelatory. W.r.t. concurrency however, the logical
| approach should precede lower-level implementation
| approach since there is lots more complexity involved in
| the latter. Mutexes/Semaphores/Condition Variables etc.
| are not bad/wrong abstractions but necessary correct
| ones. They are fundamental to understanding/managing
| concurrency itself.
| vacuity wrote:
| The issue is that logically correct algorithms can indeed
| be made, but actually making them work in real conditions
| is its own problem. It's not necessarily wrong to explore
| more exotic algorithms, but a focus on algorithms that
| people should actually use in their given situations
| seems higher priority. We should be working on bringing
| useful concurrency to our programs right now.
|
| And while I will agree that mutexes and the like are
| useful in their own right, they are not the most
| fundamental for understanding concurrency.
| titzer wrote:
| > The article explicitly mentions problems with interleaving
| of instructions between two processes (aka threads) which is
| the fundamental problem of concurrency.
|
| Again, with weak memory models, writes from another thread
| can be observed _out of order_ due to hardware reordering,
| even without compiler optimizations. With weak memory models,
| there can be behaviors that correspond to _no_ valid
| interleaving of two threads.
|
| I am explicitly pointing out that what you are assuming,
| along with the author, what is often called _sequential
| consistency_
| (https://en.wikipedia.org/wiki/Sequential_consistency) hasn't
| been true of hardware for decades. It's a common
| _misconception_ that most systems obey sequential
| consistency. Your comment just seems to repeat that again,
| and it isn 't true. See here
| (https://preshing.com/20120930/weak-vs-strong-memory-models/)
| for a deeper explanation.
| rramadass wrote:
| I used the phrase "single thread in isolation" to simply
| point out that parallelization is going on underneath (via
| optimization/OOO/superscaler) for that thread even in a
| single-threaded scenario but not that it always implies
| sequential consistency in a multi-threaded scenario. That
| was the thing i disagreed with in your comment where you
| said the author assumed sequential consistency only which
| is not what i got. In my linked to previous comment i
| explicitly said this;
|
| _Once you start thinking about a program as a sequence of
| loads /stores (i.e. reads/writes to shared memory) and note
| that Pipelining/OOO/Superscalar are techniques to
| parallelize these (and other) instructions for a single
| thread of control you start getting an idea of how
| sequential order can be preserved though the actual
| execution is not quite so._
| cogman10 wrote:
| > If you consider only a single thread in isolation its
| program order is always maintained even though compiler
| optimizations and hardware OOO execution actually execute the
| instructions in data order.
|
| It's not in most programming languages. That's actually a big
| part of the linked Wikipedia article is when and how program
| order is allowed to be modified.
|
| A great portion of the JVM memory model definition is
| dedicated to laying out exactly when and how read/write
| barriers are established and when/how ordering is enforced.
| Without any sort of barrier (in most cases) the JVM is fairly
| free to completely reorder program execution.
| rramadass wrote:
| Not what i meant -
| https://news.ycombinator.com/item?id=42474751
| rramadass wrote:
| Coincidentally, i am currently going through _Distributed
| Algorithms: An Intuitive Approach_ (second edition) by Wan
| Fokkink. It seems really good and is a catalog of classic
| algorithms for both Shared-Memory and Message-Passing
| architectures. It is a rather slim book with explanations and
| precise mathematical notation. Pseudo-code for a subset of these
| algorithms are given in the appendix.
|
| The 1st edition divides the algorithms under two broad sections
| viz; "Message Passing" (eg. chapter Mutual Exclusion) and "Shared
| Memory" (eg. chapter Mutual Exclusion II) while the 2nd edition
| removes these section headings and simply groups under functional
| categories (eg. both the above under one chapter Mutual
| Exclusion).
|
| Book website -
| https://mitpress.mit.edu/9780262037662/distributed-algorithm...
| selimthegrim wrote:
| Thank you for the book recommendations.
| pphysch wrote:
| > If you're implementing without formally verifying your solution
| through model checking, you only think you're implementing it
| correctly.
|
| This is a comforting idea, but enumerating every possible state
| of a real-world (integrated/distributed) software system is a
| fool's task.
|
| Spend less time on formal verification (fancy word for
| perfectionism) and more time on risk analysis and cybernetics.
| Instead of praying that nothing ever goes wrong, plan for faults
| and design systems that integrate human and machine to respond
| rapidly and effectively.
|
| "Correctable" is a much more practical target than "always
| correct".
| divan wrote:
| A bit more visual guide:
|
| Visualizing Concurrency in Go (2016)
|
| https://divan.dev/posts/go_concurrency_visualize/
___________________________________________________________________
(page generated 2024-12-20 23:01 UTC)