[HN Gopher] Rust Atomics and Locks: Low-Level Concurrency in Pra...
___________________________________________________________________
Rust Atomics and Locks: Low-Level Concurrency in Practice
Author : signa11
Score : 219 points
Date : 2023-01-06 06:46 UTC (16 hours ago)
(HTM) web link (marabos.nl)
(TXT) w3m dump (marabos.nl)
| [deleted]
| estebank wrote:
| The foreword by Paul E. McKenney makes a great case to read this
| book even if you don't care about Rust at all:
|
| > Which brings us to another group of potential readers, the Rust
| skeptics. While I do believe that most Rust skeptics are doing
| the community a valuable service by pointing out opportunities
| for improvement, all but the most Rust-savvy of skeptics would
| benefit from reading this book. If nothing else, doing so would
| enable them to provide sharper and better-targeted criticisms.
|
| > Then there are those dyed-in-the-wool non-Rust developers who
| would prefer to implement Rust's concurrency-related safety
| mechanisms in their own favorite language. This book will give
| them a deeper understanding of the Rust mechanisms that they
| would like to replicate, or, better yet, improve upon.
|
| https://marabos.nl/atomics/foreword.html
| seanhunter wrote:
| In case anyone here doesn't know, Paul McKenney was one of the
| main contributors to the RCU[1] implementation for the Linux
| kernel. So the guy knows a thing or two about concurrency.
|
| [1] https://en.wikipedia.org/wiki/Read-copy-update if you just
| want to know what it is and
| http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01...
| if you want the good stuff
| chandlore wrote:
| He also wrote a good introduction to parallel programming
| that's freely available as well:
|
| https://mirrors.edge.kernel.org/pub/linux/kernel/people/paul.
| ..
| cwzwarich wrote:
| Although, somewhat amusingly, neither C++ nor Rust (which
| basically just copies the C++ atomics and memory model) can
| be used to correctly express RCU written within the language.
| The leading proposal for C++ (at least AFAICT) is to just add
| RCU primitives to the library and make their correctness the
| implementation's concern:
|
| https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2022/p25...
|
| I hope we get a next generation of systems languages with
| improved memory models that can actually express these things
| directly.
| throwaway17_17 wrote:
| Can you provide any extra information about what is unable
| to be expressed? I'd like to look into what sort of
| constructs could be useful in this context.
| cwzwarich wrote:
| The keyword to search for is "memory_order_consume",
| which was C++11's proposed solution to the problem that
| turned out to not work in practice. Here are some of the
| C++ WG documents describing the issues:
|
| https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2016/p00...
| https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2017/p01...
| https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2017/p04...
| https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2019/p07...
|
| A lot of the problems center around the reliance on data
| dependencies for ordering, and the difficulty of
| expressing these requirements in a manner that is
| compatible with separate compilation and compiler
| optimizations.
| gpderetta wrote:
| > Although, somewhat amusingly neither C++ nor Rust [...]
| can be used to correctly express RCU written within the
| language.
|
| What do you mean exactly? Of course the required remote
| memory barrier need to be a primitive as it is provided by
| the OS, and ultimately the hardware, but AFAIK the
| asymmetric synchronization needed by RCU can be modelled
| within the C++ as a signal and a atomic signal fence.
| cwzwarich wrote:
| You can't express the equivalent of Linux's
| rcu_dereference primitive in a way that avoids data races
| without using memory_order_acquire (or
| memory_order_consume, which is treated like
| memory_order_acquire in all implementations, and is
| fundamentally broken as I link to in my sibling reply).
| With a memory consistency model weaker than TSO (e.g.
| ARM, RISC-V, POWER), this imposes unnecessary overhead
| and eliminates some of the benefits of RCU.
|
| To work around this flaw, you can use inline assembly,
| implementation-specific "compiler-only" barriers, etc.
| but that's outside of the language. It also forces the
| compiler to be much more pessimistic about optimizations
| than is actually necessary to preserve dependency
| ordering.
| mrkeen wrote:
| > Then there are those dyed-in-the-wool non-Rust developers who
| would prefer to implement Rust's concurrency-related safety
| mechanisms in their own favorite language.
|
| The book's called "atomics and locks". What language doesn't
| already have those? If I'm touching those, doesn't that mean
| I'm already trying to reimplement my own concurrency-safety
| mechanisms?
|
| Howabout Rust implements my favourite concurrency-related
| safety mechanisms, so I don't need to ever know about atomics
| or locks.
| infamouscow wrote:
| There are many practical instances where an atomic is the
| simplest solution, e.g., shared counters.
| echelon wrote:
| > Howabout Rust implements my favourite concurrency-related
| safety mechanisms, so I don't need to ever know about atomics
| or locks.
|
| How on earth are you doing concurrency without touching these
| two key primitives at some point? Lots of lockless?
| peoplefromibiza wrote:
| > How on earth are you doing concurrency without touching
| these two key primitives at some point?
|
| using one of the many higher level concepts/frameworks
| available.
|
| - actor model
|
| - fork join
|
| - CSP
|
| etc.
| estebank wrote:
| Someone has to implement those frameworks, and those
| people need to understand pretty much everything covered
| by this book :)
| peoplefromibiza wrote:
| not to disagree or antagonize, but the question was
|
| > How on earth are you doing concurrency without touching
| these two key primitives
|
| and not how to write high level concurrency frameworks.
|
| I wouldn't trust very much a concurrency framework
| written by someone who just learned about locks and
| atomics on a book.
|
| I wouldn't even trust myself to write low level
| concurrent code using locks and atomics.
|
| I have 26 years of experience as a programmer and if
| there's something I've learned is that low level
| concurrency is really hard to get right, even when you
| perfectly understand the underlying concepts, and it's
| best left to real experts that dedicated their career to
| it.
|
| edit: I'm sure the book is great and the authors very
| knowledgeable and I'm really curious to read it.
| ecnahc515 wrote:
| Sometimes all you need is an atomic counter though (or
| lock). Like, for the commonly used Prometheus metrics
| format, you usually just have global, atomic counters,
| because it's arguably the most efficient/easiest way to
| do it. Higher level primitives are nice, but even lower
| level APIs might be "the right choice" for even high-
| level problems.
| tialaramex wrote:
| What sort of things are "your favourite" and why ought Rust
| to specifically do things you favour, rather than what
| anybody else might like ?
|
| Rust got scoped threads (again), that's a rather nice safe
| concurrency feature, a way to say "This thread shall only
| live during this object's lifetime". For example you can give
| a new HttpConnection to a scoped thread, knowing that Rust
| ensures the thread ends before the HttpConnection ceases to
| exist. Before scoped threads Rust wouldn't let you (safely)
| make threads which use outside objects with a finite
| lifespan, as it can't be sure the thread won't touch an
| object which no longer exists.
|
| But this sort of book is more aimed at people who are
| interested in the details you apparently don't care about (or
| don't want to care about). Somebody has to know how it works
| to build the higher level mechanisms you desire.
| estebank wrote:
| The "About this book" tells you what the book covers. The
| whole book is available for you to peek at if you don't trust
| it to correctly summarize itself.
|
| As you would expect, Rust _does_ have concurrency primitives,
| which is why the book shows you how they are implemented and
| why they are implemented they way they are. In doing so it
| also teaches you how to do low level things for when you _can
| 't_ use the language provided primitives, like if you're
| interacting with FFI boundaries or implementing your own
| lockless datastructure.
|
| Your comment is like seeing a link to "Learning Rust With
| Entirely Too Many Linked Lists" and reacting with "what
| language doesn't have linked lists?"
|
| Edit: I see that you might have meant "why would I
| reimplement Rust's atomic primitives in another language".
| Well, some documentation is needed for anyone that will have
| to implement those language primitives, in existing languages
| (because there are undressed needs) or in new ones. More
| documentation and explanation of tricky concepts is good.
| penguin_booze wrote:
| Also from Paul: The Perf Book: https://mirrors.edge.kernel.org/
| pub/linux/kernel/people/paul....
| bonzini wrote:
| The combination of Mara as the author and Paul McKenney writing
| a foreword is already a good reason to read the book. :)
| elfprince13 wrote:
| FWIW, this isn't mentioned in the book, and only _extremely_
| cursorily in the official Rust docs, but looks like 128bit
| atomics have landed (on supported architectures):
| https://github.com/rust-lang/rust/blob/19bc8fb05ab083a315ee4...
| zengid wrote:
| Its very kind of Mara to share the book online, though I still
| preordered a paper copy for bedtime reading. :)
| sylware wrote:
| BTW, I am looking for a statically linked rust compiler (the one
| written in rust ofc) toolchain for ELF64.
|
| This is for bootstrap.
| gavinray wrote:
| I'm curious how folks in Rust implement striped-locking?
|
| Striped locking is where you chunk sections of data behind the
| same lock to improve scalability. So, 1 lock might protect 1,000
| items in an array, for example.
| lilyball wrote:
| I've never done this but I imagine you could use something like
| [Mutex<[T; 1000]>; 10]
| remexre wrote:
| First thing that'd come to mind for me would be replacing the
| unstriped Vec<Mutex<T>> with some wrapper like:
| struct Striped<T, const StripeSize: usize>(Vec<Mutex<[T;
| StripeSize]>>, Mutex<tinyvec::ArrayVec<[T; StripeSize]>>);
|
| I think I'd probably want a wrapper either way to ensure I'm
| always taking locks in a particular order to avoid deadlocks,
| anyway.
| oslac wrote:
| One of the things most concurrency material leaves out is the
| actual definition of concurrency. I read the first chapter, but
| didn't see it defined, but I see this touches upon Happens-Before
| so it might include it later (?)
| capableweb wrote:
| Interesting to check what the various stores prices the book as.
| Cheapest I found was 34 EUR and most expensive 68 EUR, pretty
| much double the price.
| samsquire wrote:
| Thanks for putting the time and extensive effort into this book.
| I am deeply interested in concurrency, asynchrony and parallelism
| I even journal about it everyday on my GitHub journal ideas
| repository (see my profile)
|
| I've been talking on the r/programminglanguages discord about
| syntax for async/await. I am writing a programming language that
| has parallelism as a first class citizen. I use this actor2.java
| lock free algorithm below for sending messages between threads.
|
| I want my async/await tasks to be eager and run in their own
| threads. So if I do this handle1 = async task1();
| handle2 = async task2(); handle3 = async task3(); //
| at this point in time, the token ring thread pool shall be
| running 3 tasks in parallel return1 = await handle1;
| return2 = await handle2; return3 = await handle3;
|
| I believe other languages do not run CPU bound async/await
| computations on a thread except for IO.
|
| I wrote an unrolled state machine for my async/await in Java.
| This models a simple async/await program and runs tasks on other
| threads - without locks. I use a design I call token ring
| parallelism, where threads take turns and are linked together in
| a ring structure. A semaphore communicates write status along the
| ring. The plan is to write a transpiler to generate the switch
| statements automatically. Then any language with threading can
| have async await support even if it is not native.
|
| https://github.com/samsquire/multiversion-concurrency-contro...
|
| I wrote a own lock free algorithm here that I use to do message
| passing between actor threads. My goal is high throughput
| performance and low latency.
|
| https://github.com/samsquire/multiversion-concurrency-contro...
|
| With 11 threads (on a 12 core processor, deliberately left one
| core for Windows) I get
|
| This code with 100 threads, gets 114657500 total requests
| 22079241.286347 requests per second Time taken: 5.193000
|
| 166512180 total requests 33249237.220447 requests per second Time
| taken: 5.008000
|
| For a lock based benchmark with 100 threads, I get:
|
| 16307090 total requests 3246484.172805 requests per second
|
| With 11 threads, the lock benchmark gets
|
| 178770160 total requests 35732592.444533 requests per second Time
| taken: 5.003000
|
| Each request is a synchronization event that sends 10 messages in
| a loop or adds 10 to a counter to represent the synchronization
| event.
|
| There's probably problems with my measuring methodology, I am
| doing the simplest possible thing that shall work. I am trying to
| show that my lock free algorithm scales even with heavy
| contention.
| infamouscow wrote:
| This book is absolutely wonderful and a pleasure to read.*
|
| * I say this as a staunch critic of Rust.
___________________________________________________________________
(page generated 2023-01-06 23:01 UTC)