[HN Gopher] Rust Atomics and Locks: Low-Level Concurrency in Pra...
       ___________________________________________________________________
        
       Rust Atomics and Locks: Low-Level Concurrency in Practice
        
       Author : signa11
       Score  : 219 points
       Date   : 2023-01-06 06:46 UTC (16 hours ago)
        
 (HTM) web link (marabos.nl)
 (TXT) w3m dump (marabos.nl)
        
       | [deleted]
        
       | estebank wrote:
       | The foreword by Paul E. McKenney makes a great case to read this
       | book even if you don't care about Rust at all:
       | 
       | > Which brings us to another group of potential readers, the Rust
       | skeptics. While I do believe that most Rust skeptics are doing
       | the community a valuable service by pointing out opportunities
       | for improvement, all but the most Rust-savvy of skeptics would
       | benefit from reading this book. If nothing else, doing so would
       | enable them to provide sharper and better-targeted criticisms.
       | 
       | > Then there are those dyed-in-the-wool non-Rust developers who
       | would prefer to implement Rust's concurrency-related safety
       | mechanisms in their own favorite language. This book will give
       | them a deeper understanding of the Rust mechanisms that they
       | would like to replicate, or, better yet, improve upon.
       | 
       | https://marabos.nl/atomics/foreword.html
        
         | seanhunter wrote:
         | In case anyone here doesn't know, Paul McKenney was one of the
         | main contributors to the RCU[1] implementation for the Linux
         | kernel. So the guy knows a thing or two about concurrency.
         | 
         | [1] https://en.wikipedia.org/wiki/Read-copy-update if you just
         | want to know what it is and
         | http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01...
         | if you want the good stuff
        
           | chandlore wrote:
           | He also wrote a good introduction to parallel programming
           | that's freely available as well:
           | 
           | https://mirrors.edge.kernel.org/pub/linux/kernel/people/paul.
           | ..
        
           | cwzwarich wrote:
           | Although, somewhat amusingly, neither C++ nor Rust (which
           | basically just copies the C++ atomics and memory model) can
           | be used to correctly express RCU written within the language.
           | The leading proposal for C++ (at least AFAICT) is to just add
           | RCU primitives to the library and make their correctness the
           | implementation's concern:
           | 
           | https://www.open-
           | std.org/jtc1/sc22/wg21/docs/papers/2022/p25...
           | 
           | I hope we get a next generation of systems languages with
           | improved memory models that can actually express these things
           | directly.
        
             | throwaway17_17 wrote:
             | Can you provide any extra information about what is unable
             | to be expressed? I'd like to look into what sort of
             | constructs could be useful in this context.
        
               | cwzwarich wrote:
               | The keyword to search for is "memory_order_consume",
               | which was C++11's proposed solution to the problem that
               | turned out to not work in practice. Here are some of the
               | C++ WG documents describing the issues:
               | 
               | https://www.open-
               | std.org/jtc1/sc22/wg21/docs/papers/2016/p00...
               | https://www.open-
               | std.org/jtc1/sc22/wg21/docs/papers/2017/p01...
               | https://www.open-
               | std.org/jtc1/sc22/wg21/docs/papers/2017/p04...
               | https://www.open-
               | std.org/jtc1/sc22/wg21/docs/papers/2019/p07...
               | 
               | A lot of the problems center around the reliance on data
               | dependencies for ordering, and the difficulty of
               | expressing these requirements in a manner that is
               | compatible with separate compilation and compiler
               | optimizations.
        
             | gpderetta wrote:
             | > Although, somewhat amusingly neither C++ nor Rust [...]
             | can be used to correctly express RCU written within the
             | language.
             | 
             | What do you mean exactly? Of course the required remote
             | memory barrier need to be a primitive as it is provided by
             | the OS, and ultimately the hardware, but AFAIK the
             | asymmetric synchronization needed by RCU can be modelled
             | within the C++ as a signal and a atomic signal fence.
        
               | cwzwarich wrote:
               | You can't express the equivalent of Linux's
               | rcu_dereference primitive in a way that avoids data races
               | without using memory_order_acquire (or
               | memory_order_consume, which is treated like
               | memory_order_acquire in all implementations, and is
               | fundamentally broken as I link to in my sibling reply).
               | With a memory consistency model weaker than TSO (e.g.
               | ARM, RISC-V, POWER), this imposes unnecessary overhead
               | and eliminates some of the benefits of RCU.
               | 
               | To work around this flaw, you can use inline assembly,
               | implementation-specific "compiler-only" barriers, etc.
               | but that's outside of the language. It also forces the
               | compiler to be much more pessimistic about optimizations
               | than is actually necessary to preserve dependency
               | ordering.
        
         | mrkeen wrote:
         | > Then there are those dyed-in-the-wool non-Rust developers who
         | would prefer to implement Rust's concurrency-related safety
         | mechanisms in their own favorite language.
         | 
         | The book's called "atomics and locks". What language doesn't
         | already have those? If I'm touching those, doesn't that mean
         | I'm already trying to reimplement my own concurrency-safety
         | mechanisms?
         | 
         | Howabout Rust implements my favourite concurrency-related
         | safety mechanisms, so I don't need to ever know about atomics
         | or locks.
        
           | infamouscow wrote:
           | There are many practical instances where an atomic is the
           | simplest solution, e.g., shared counters.
        
           | echelon wrote:
           | > Howabout Rust implements my favourite concurrency-related
           | safety mechanisms, so I don't need to ever know about atomics
           | or locks.
           | 
           | How on earth are you doing concurrency without touching these
           | two key primitives at some point? Lots of lockless?
        
             | peoplefromibiza wrote:
             | > How on earth are you doing concurrency without touching
             | these two key primitives at some point?
             | 
             | using one of the many higher level concepts/frameworks
             | available.
             | 
             | - actor model
             | 
             | - fork join
             | 
             | - CSP
             | 
             | etc.
        
               | estebank wrote:
               | Someone has to implement those frameworks, and those
               | people need to understand pretty much everything covered
               | by this book :)
        
               | peoplefromibiza wrote:
               | not to disagree or antagonize, but the question was
               | 
               | > How on earth are you doing concurrency without touching
               | these two key primitives
               | 
               | and not how to write high level concurrency frameworks.
               | 
               | I wouldn't trust very much a concurrency framework
               | written by someone who just learned about locks and
               | atomics on a book.
               | 
               | I wouldn't even trust myself to write low level
               | concurrent code using locks and atomics.
               | 
               | I have 26 years of experience as a programmer and if
               | there's something I've learned is that low level
               | concurrency is really hard to get right, even when you
               | perfectly understand the underlying concepts, and it's
               | best left to real experts that dedicated their career to
               | it.
               | 
               | edit: I'm sure the book is great and the authors very
               | knowledgeable and I'm really curious to read it.
        
               | ecnahc515 wrote:
               | Sometimes all you need is an atomic counter though (or
               | lock). Like, for the commonly used Prometheus metrics
               | format, you usually just have global, atomic counters,
               | because it's arguably the most efficient/easiest way to
               | do it. Higher level primitives are nice, but even lower
               | level APIs might be "the right choice" for even high-
               | level problems.
        
           | tialaramex wrote:
           | What sort of things are "your favourite" and why ought Rust
           | to specifically do things you favour, rather than what
           | anybody else might like ?
           | 
           | Rust got scoped threads (again), that's a rather nice safe
           | concurrency feature, a way to say "This thread shall only
           | live during this object's lifetime". For example you can give
           | a new HttpConnection to a scoped thread, knowing that Rust
           | ensures the thread ends before the HttpConnection ceases to
           | exist. Before scoped threads Rust wouldn't let you (safely)
           | make threads which use outside objects with a finite
           | lifespan, as it can't be sure the thread won't touch an
           | object which no longer exists.
           | 
           | But this sort of book is more aimed at people who are
           | interested in the details you apparently don't care about (or
           | don't want to care about). Somebody has to know how it works
           | to build the higher level mechanisms you desire.
        
           | estebank wrote:
           | The "About this book" tells you what the book covers. The
           | whole book is available for you to peek at if you don't trust
           | it to correctly summarize itself.
           | 
           | As you would expect, Rust _does_ have concurrency primitives,
           | which is why the book shows you how they are implemented and
           | why they are implemented they way they are. In doing so it
           | also teaches you how to do low level things for when you _can
           | 't_ use the language provided primitives, like if you're
           | interacting with FFI boundaries or implementing your own
           | lockless datastructure.
           | 
           | Your comment is like seeing a link to "Learning Rust With
           | Entirely Too Many Linked Lists" and reacting with "what
           | language doesn't have linked lists?"
           | 
           | Edit: I see that you might have meant "why would I
           | reimplement Rust's atomic primitives in another language".
           | Well, some documentation is needed for anyone that will have
           | to implement those language primitives, in existing languages
           | (because there are undressed needs) or in new ones. More
           | documentation and explanation of tricky concepts is good.
        
         | penguin_booze wrote:
         | Also from Paul: The Perf Book: https://mirrors.edge.kernel.org/
         | pub/linux/kernel/people/paul....
        
         | bonzini wrote:
         | The combination of Mara as the author and Paul McKenney writing
         | a foreword is already a good reason to read the book. :)
        
       | elfprince13 wrote:
       | FWIW, this isn't mentioned in the book, and only _extremely_
       | cursorily in the official Rust docs, but looks like 128bit
       | atomics have landed (on supported architectures):
       | https://github.com/rust-lang/rust/blob/19bc8fb05ab083a315ee4...
        
       | zengid wrote:
       | Its very kind of Mara to share the book online, though I still
       | preordered a paper copy for bedtime reading. :)
        
       | sylware wrote:
       | BTW, I am looking for a statically linked rust compiler (the one
       | written in rust ofc) toolchain for ELF64.
       | 
       | This is for bootstrap.
        
       | gavinray wrote:
       | I'm curious how folks in Rust implement striped-locking?
       | 
       | Striped locking is where you chunk sections of data behind the
       | same lock to improve scalability. So, 1 lock might protect 1,000
       | items in an array, for example.
        
         | lilyball wrote:
         | I've never done this but I imagine you could use something like
         | [Mutex<[T; 1000]>; 10]
        
         | remexre wrote:
         | First thing that'd come to mind for me would be replacing the
         | unstriped Vec<Mutex<T>> with some wrapper like:
         | struct Striped<T, const StripeSize: usize>(Vec<Mutex<[T;
         | StripeSize]>>, Mutex<tinyvec::ArrayVec<[T; StripeSize]>>);
         | 
         | I think I'd probably want a wrapper either way to ensure I'm
         | always taking locks in a particular order to avoid deadlocks,
         | anyway.
        
       | oslac wrote:
       | One of the things most concurrency material leaves out is the
       | actual definition of concurrency. I read the first chapter, but
       | didn't see it defined, but I see this touches upon Happens-Before
       | so it might include it later (?)
        
       | capableweb wrote:
       | Interesting to check what the various stores prices the book as.
       | Cheapest I found was 34 EUR and most expensive 68 EUR, pretty
       | much double the price.
        
       | samsquire wrote:
       | Thanks for putting the time and extensive effort into this book.
       | I am deeply interested in concurrency, asynchrony and parallelism
       | I even journal about it everyday on my GitHub journal ideas
       | repository (see my profile)
       | 
       | I've been talking on the r/programminglanguages discord about
       | syntax for async/await. I am writing a programming language that
       | has parallelism as a first class citizen. I use this actor2.java
       | lock free algorithm below for sending messages between threads.
       | 
       | I want my async/await tasks to be eager and run in their own
       | threads. So if I do this                 handle1 = async task1();
       | handle2 = async task2();       handle3 = async task3();       //
       | at this point in time, the token ring thread pool shall be
       | running 3 tasks in parallel       return1 = await handle1;
       | return2 = await handle2;       return3 = await handle3;
       | 
       | I believe other languages do not run CPU bound async/await
       | computations on a thread except for IO.
       | 
       | I wrote an unrolled state machine for my async/await in Java.
       | This models a simple async/await program and runs tasks on other
       | threads - without locks. I use a design I call token ring
       | parallelism, where threads take turns and are linked together in
       | a ring structure. A semaphore communicates write status along the
       | ring. The plan is to write a transpiler to generate the switch
       | statements automatically. Then any language with threading can
       | have async await support even if it is not native.
       | 
       | https://github.com/samsquire/multiversion-concurrency-contro...
       | 
       | I wrote a own lock free algorithm here that I use to do message
       | passing between actor threads. My goal is high throughput
       | performance and low latency.
       | 
       | https://github.com/samsquire/multiversion-concurrency-contro...
       | 
       | With 11 threads (on a 12 core processor, deliberately left one
       | core for Windows) I get
       | 
       | This code with 100 threads, gets 114657500 total requests
       | 22079241.286347 requests per second Time taken: 5.193000
       | 
       | 166512180 total requests 33249237.220447 requests per second Time
       | taken: 5.008000
       | 
       | For a lock based benchmark with 100 threads, I get:
       | 
       | 16307090 total requests 3246484.172805 requests per second
       | 
       | With 11 threads, the lock benchmark gets
       | 
       | 178770160 total requests 35732592.444533 requests per second Time
       | taken: 5.003000
       | 
       | Each request is a synchronization event that sends 10 messages in
       | a loop or adds 10 to a counter to represent the synchronization
       | event.
       | 
       | There's probably problems with my measuring methodology, I am
       | doing the simplest possible thing that shall work. I am trying to
       | show that my lock free algorithm scales even with heavy
       | contention.
        
       | infamouscow wrote:
       | This book is absolutely wonderful and a pleasure to read.*
       | 
       | * I say this as a staunch critic of Rust.
        
       ___________________________________________________________________
       (page generated 2023-01-06 23:01 UTC)