[HN Gopher] Atomics and Concurrency
___________________________________________________________________
Atomics and Concurrency
Author : LAC-Tech
Score : 124 points
Date : 2025-05-28 08:00 UTC (2 days ago)
(HTM) web link (redixhumayun.github.io)
(TXT) w3m dump (redixhumayun.github.io)
| ozgrakkurt wrote:
| This is great, also looking for information about performance
| impact of using atomics like atomically reference counting etc.
| if anyone can recommend something to learn from
| cmrdporcupine wrote:
| https://travisdowns.github.io/blog/2020/07/06/concurrency-co...
| j_seigh wrote:
| Performance from using atomics in reference counting is going
| to degrade quite a bit under heavy usage due to cache
| contention. If you are in that situation, you should be looking
| at something else like RCU or hazard Pointers.
| tialaramex wrote:
| Note that these strategies (collectively, "delayed
| reclamation") lose one benefit of the RAII concept. With
| Rust's Drop or C++ destructors we are guaranteed that when we
| don't need this Goat the clean-up-unused-goat code we wrote
| will run now - unlike with a Garbage Collector where that
| code either runs at some unspecified future time or not at
| all.
|
| Delayed reclamation means that "when we don't need this Goat"
| will _sometimes_ be arbitrarily delayed after we apparently
| didn 't need it, and it's no longer possible to say for sure
| by examining the program. This will almost always be a trade
| you're willing to make, but you need to know it exists, this
| is not a fun thing to discover while debugging.
| zozbot234 wrote:
| With RCU the "we don't need this Goat" occurs as soon as
| the last readers are done with the earlier version of Goat,
| modulo a limited grace period in some RCU implementations.
| New readers always get the latest version, so there is no
| risk of waiting for an "unspecified" or "arbitrary" amount
| of time.
|
| Hazard pointers are a bit fiddlier AIUI, since the
| reclamation step must be triggered explicitly and verify
| that no hazard pointers exist to the object being
| reclaimed, so it is quite possible that you won't promptly
| reclaim the object.
| tialaramex wrote:
| From the point of view of any particular updater, the
| death of every current reader (who would keep the Goat
| alive) is arbitrary. Obviously in a real system it won't
| literally be unbounded, but I don't think it will usually
| make sense to work out how long that could take - in
| terms of clock cycles, or wall clock time, or any
| external metric - it won't necessarily be immediate but
| it will happen "eventually" and so just write software
| accordingly.
|
| Maybe I'm wrong for some reason and there is a practical
| limit in play for RCU but not Hazard Pointers, I don't
| think so though.
| PaulDavisThe1st wrote:
| One of the central purposes of RCU is to decouple the
| updater/writer(s) from the reader(s). It doesn't matter
| to the writer if there is still a reader "out there"
| using the old version. And it likewise doesn't matter to
| (most) readers that the version they have is now old.
|
| What is delayed is the actual destruction/deallocation of
| the RCU-managed objects. Nobody cares about that delay
| unless the objects control limited resources (in which
| case, RCU is likely a bad fit) or there are so many
| objects and/or they are so large that the delay in
| deallocating them could cause memory pressure of some
| type.
| pythops wrote:
| For Rust, I highly recommend this book
| https://marabos.nl/atomics/
| reitzensteinm wrote:
| This is such a great book, especially the section on operating
| system primitives, which made the book wider in scope and more
| practical. After all, you're probably not building exotic data
| structures by hand in memory without also needing high
| performance IO.
|
| It's been a hobby of mine to collect concurrency examples from
| books and blog posts and simulating them in Temper, my Rust
| memory model simulator. As far as I know, it's the largest
| Rust/C++11 memory model test suite on the internet (but I'm
| happy to be corrected).
|
| This is the file for Rust Atomics and Locks:
|
| https://github.com/reitzensteinm/temper/blob/main/memlog/tes...
|
| I didn't find any bugs in the examples, but with how good the
| book was, I didn't expect to :)
|
| The Williams book for C++ contains many of the same ideas
| (Rust's memory model is a copy/paste from C++11 without the now
| deprecated Consume) and I can highly recommend that too.
| pythops wrote:
| absolutely ! This book is useful for even non rust developers
| I think
| tialaramex wrote:
| > TSan is a reliable way of detecting data races in your code,
| instead of trying to repeatedly run the code in a loop hoping for
| a data race to occur.
|
| I'm sure this doesn't intend to be misleading but I think it can
| mislead people anyway.
|
| TSan should detect races but just like "repeatedly run the code
| in a loop" it's not actually checking what you wrote doesn't have
| races, it's just reporting any which happened during testing.
| This is valuable - if you eliminate a race you've almost
| certainly fixed a real bug, maybe a very subtle bug you'd have
| cursed for years otherwise, but you can't use TSan to prove there
| are no further bugs of this kind.
|
| Tools to prove this stuff are often much harder to bring into a
| project, but you should put that work in if the difference
| between "It's probably fine" and "I proved it's correct" sounds
| necessary to you or for your work. I don't want my pacemaker to
| be "probably fine" but it's OK if the Youtube music player on my
| phone wasn't proved correct.
| dataflow wrote:
| My (limited) understanding is that the way TSAN works is better
| than what you're comparing it to. You don't have to rerun TSAN
| in a loop to finally catch a race. As long as your test has
| coverage of that scenario (i.e. it executes the lines of
| interest once within a reasonable time window) it should get
| caught. Of course, being dynamic, it can't catch code that you
| never execute during testing, but that's not such a huge
| problem because you can verify code coverage through other
| means. It's quite a bit of a different from having to make
| stars align by looping until some e.g. rare timing issue comes
| up.
|
| The real issue here isn't that TSAN is low-confidence about
| absence of data races in C++. The issue is that even if it
| statically proved the absence of data races in the C++ sense,
| that _still_ wouldn 't imply that your algorithm is race-free.
| Because it's trivial to patch every single instance of a data
| race by using an atomic to get TSAN to shut up, but that in no
| way implies the higher-level concurrent algorithm works
| correctly. (e.g., if two threads perform an atomic load of some
| variable, then add one, then perform an atomic store back in,
| they would obviously stomp on each other, but TSAN wouldn't
| care.) It's more like a UB checker than a correctness verifier.
| withoutboats3 wrote:
| The occurrence of data races depends on the specific non-
| deterministic sequence of execution of concurrent codepaths.
| Just because you have 100% code coverage does not mean you've
| covered every potential execution sequence, and its almost
| never practical to actually execute every possibility to
| ensure the absence of data races. Depending on the
| probability that your data race will occur, it could indeed
| be something you have to make stars align for TSAN to catch.
|
| Not to talk my own book, but there is a well-known
| alternative to C++ that can actually guarantee the absence of
| data races.
| dataflow wrote:
| It "could" for some algorithms, yes, but for a lot of
| algorithms, that kind of star alignment simply isn't
| necessary to find all the data races, was my point. And
| yes, TLA+ etc. can be helpful, but then you have the
| problem of matching them up with the code.
| Maxatar wrote:
| I feel like in a subtle way you're mixing up data races
| with race conditions, especially given the example you
| site about incrementing an atomic variable.
|
| TSAN does not check for race conditions in general, and
| doesn't claim to do so at all as the documentation
| doesn't include the term race condition anywhere. TSAN is
| strictly for checking data races and deadlocks.
|
| Consequently this claim is false:
|
| >The issue is that even if it statically proved the
| absence of data races in the C++ sense, that still
| wouldn't imply that your algorithm is race-free.
|
| Race-free code means absence of data races, it does not
| mean absence of the more general race condition. If you
| search Google Scholar for race free programming you'll
| find no one uses the term race-free to refer to complete
| absence of race conditions but rather to the absence of
| data races.
| dataflow wrote:
| There's "data race" in "C++ ISO standard" sense, and then
| there's "data race" in the general CS literature (as well
| as all the other terms). Two threads writing a value to
| the same memory location (even atomically) is usually a
| data race in the CS/algorithm sense (due to the lack of
| synchronization), but not the C++ sense. I'm not really
| interested in pedantic terminology here, just trying get
| a higher level point across about what you can & can't
| assume with a clean TSAN (and how _not_ to clean your
| TSAN errors). Feel free to mentally rewrite my comment
| with whatever preferred terminology you feel would get my
| points across.
| kiitos wrote:
| > Two threads writing a value to the same memory location
| (even atomically) is usually a data race in the
| CS/algorithm sense (due to the lack of synchronization),
| but not the C++ sense
|
| You seem to conflate the concepts of "data race" and
| "race condition", which are not the same thing.
|
| Two threads writing to the same memory location without
| synchronization (without using atomic operations, without
| going thru a synchronization point like a mutex, etc.) is
| a data race, and almost certainly also a race condition.
| If access to that memory location is synchronized,
| whether thru atomics or otherwise, then there's no data
| race, but there can still be a race condition.
|
| This isn't a pedantic distinction, it's actually pretty
| important.
| Maxatar wrote:
| This isn't pedantry, if you're going to talk about how
| specific tools work then you need to use the actual
| terminology that the tools themselves use or else you
| will confuse yourself and anyone you talk to about them.
| If we were discussing general concepts about thread
| safety then sure we can be loose about our words, but if
| we're talking about a specific tool used for a specific
| programming language then we should make sure we are
| using the correct terminology, if only to signal that we
| have the proper domain knowledge to speak about the
| subject.
|
| >Feel free to mentally rewrite my comment with whatever
| preferred terminology you feel would get my points
| across.
|
| If I rewrite your comment to use data race, then your
| comment is plainly incorrect since the supporting example
| you give is not a data race but a race condition.
|
| If I rewrite your comment to use race condition, then
| your comment is also incorrect since TSAN doesn't detect
| race conditions in general and doesn't claim to, it
| detects data races.
|
| So what exactly am I supposed to do in order to make
| sense of your post?
|
| The idea that you'd talk about the pros and cons of a
| tool like TSAN without knowing the difference between a
| race condition and a data race is kind of absurd. That
| you'd claim my clarification of these terms for the sake
| of better understanding your point is a form of pedantry
| is sheer hubris.
| dataflow wrote:
| Hold on, before attacking me. Say we have this Java
| program, and assume the semantics of the common JRE/JVM
| everyone uses. Do you believe it has a data race or not?
| Because the variable _is_ accessed atomically, whether
| whether you mark it as volatile or not:
| class Main { private static String s =
| "uninitialized"; public static void main(String[]
| args) { Thread t = new Thread() {
| public void run() { s = args[0]; } };
| t.start(); System.out.println(s); }
| }
|
| And I sure as heck have not heard anyone claim such data
| races are impossible in Java. (Have you?)
| vlovich123 wrote:
| > Two threads writing a value to the same memory location
| (even atomically) is usually a data race in the
| CS/algorithm sense (due to the lack of synchronization),
| but not the C++ sense
|
| Not only are you incorrect, it's even worse than you
| might think. Unsynchronized access to data in c++ is not
| only a data race, it's explicitly undefined behavior and
| the compiler can choose to do whatever in response of an
| observed data race (which you are promising it isn't
| possible by using the language).
|
| You are also misinformed about the efficacy of TSAN. Even
| in TSAN you have to run it in a loop - if TSAN never
| observes the specific execution order in a race it'll
| remain silent. This isn't a theoretical problem but a
| very real one you must deeply understand if you rely on
| these tools. I recall a bug in libc++ with
| condition_variable and reproducing it required running
| the repro case in a tight loop like a hundred times to
| get even one report. And when you fixed it, how long
| would you run to have confidence it was actually fixed?
|
| And yes, race conditions are an even broader class of
| problems that no tool other than formal verification or
| DST can help with. Hypothesis testing can help mildly but
| really you want at least DST to probabilistically search
| the space to find the race conditions (and DST's main
| weakness aside from the challenge of searching a
| combinatorial explosion of states is that it still relies
| on you to provide test coverage and expectations in the
| first place that the race condition might violate)
| dataflow wrote:
| Geez. I'm well aware it's UB. That was just sloppy
| wording. Should've said "not necessarily", okay. I only
| wrote "not in the C++ sense" because I had said "even
| atomically", not because I'm clueless.
| withoutboats3 wrote:
| The alternative to C++ that I meant was Rust, which
| statically prevents data races.
| sebtron wrote:
| TSan also catches data races that end up not having any visible
| effect, unlike rerunning and hoping to see the program
| misbehave.
| vlovich123 wrote:
| The false positive on TSAN is so low that it's worth fixing
| any false positives although concluding it's a false positive
| seems so hard that I always have a nagging feeling as to
| whether I actually did fix it
| gpderetta wrote:
| Probably I'm missing something, but enqueue dereferences the
| current tail node. What prevents it dereferencing a node that has
| just been deleted by dequeue.
|
| Actually, nothing ever sets head so I'm not sure how anything is
| ever dequeued. Probably the implementation is incomplete and the
| queue needs a sentinel node somewhere.
| john-h-k wrote:
| > The important thing to remember is that each of these cannot be
| split into separate instructions.
|
| Nitpick, but they absolutely can be split into several
| instructions, and this is the most common way it's implemented on
| RISClike processors, and also single instructions aren't
| necessarily atomic.
|
| The actual guarantee is that the entire operation (load, store,
| RMW, whatever) occurs in one "go" and no other thread can perform
| an operation on that variable during this atomic operation (it
| can't write into the low byte of your variable as you read it).
|
| It's probably best euphemismed by imagining that every atomic
| operation is just the normal operation wrapped in a mutex, but
| implemented in a much more efficient manner. Of course, with
| large enough types, Atomic variables may well be implemented via
| a mutex
| khrbtxyz wrote:
| https://en.wikipedia.org/wiki/Load-link/store-conditional
| OskarS wrote:
| > It's probably best euphemismed by imagining that every atomic
| operation is just the normal operation wrapped in a mutex
|
| Sort-of, but that's not quite right: if you imagine it as
| "taking a mutex on the memory", there's a possibility of
| starvation. Imagine two threads repeatedly "locking" the memory
| location to update it. With a mutex, it's possible that one of
| them get starved, never getting to update the location,
| stalling indefinitely.
|
| At least x86 (and I'm sure ARM and RISC-V as well) make a much
| stronger progress guarantee than a mutex would: the operation
| is effectively wait-free. All threads are guaranteed to make
| progress in some finite amount of time, no one will be starved.
| At least, that's my understanding from reading much smarter
| people talking about the cache synchronization protocols of
| modern ISAs.
|
| Given that, I think a better mental model is roughly something
| like the article describes: the operation might be slower under
| high contention, but not "blocking" slow, it is guaranteed to
| finish in a finite amount of time and atomically ("in one
| combined operation").
| john-h-k wrote:
| x86 and modern ARM64 has instructions for this, but original
| ARM and RISC approaches are literally a hardware-assisted
| polling loop. Unsure what guarantees they make, I shall look.
|
| Definitely a good clarification though yeah, important
| wbl wrote:
| Really finicky ones, and the initial ones made none. I
| think for RISC-V it's something like max 16 instructions
| covered with no other memory accesses to be assured
| progress on ll/sc sequences.
|
| That's to enable very minimal hardware implementations that
| can only track one line at a time.
| OskarS wrote:
| I got curious about this, because really the most important
| guarantee is in the C++ memory model, not any of the ISAs,
| and conforming compilers are required to fulfill them (and
| it's generally more restrictive than any of the platform
| guarantees). It's a little bit hard to parse, but if I'm
| reading section 6.9.2.3 of the C++23 standard (from here
| [1]), operations on atomics are only lock-free, not wait-
| free, and even that might be a high bar on certain
| platforms: Executions of atomic functions
| that are either defined to be lock-free (33.5.10)
| or indicated as lock-free (33.5.5) are lock-free
| executions. When one or more lock-free
| executions run concurrently, at least one should
| complete. [Note 3 : It is difficult for
| some implementations to provide absolute
| guarantees to this effect, since repeated and particularly
| inopportune interference from other threads
| could prevent forward progress, e.g., by
| repeatedly stealing a cache line for unrelated purposes
| between load-locked and store-conditional
| instructions. For implementations that follow this
| recommendation and ensure that such effects cannot
| indefinitely delay progress under expected
| operating conditions, such anomalies can therefore safely
| be ignored by programmers. Outside this
| document, this property is sometimes termed
| lock-free. -- end note]
|
| I'm guessing that note is for platforms like you mention,
| where the underlying ISA makes this (more or less)
| impossible. I would assume in the modern versions of these
| ISAs though, essentially everything in std::atomic is wait-
| free, in practice.
|
| [1] https://open-
| std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.p...
| p_l wrote:
| My understanding is that C++ memory model essentially
| pulled the concurrency bit from their own imagination,
| and as a result the only architecture that actually maps
| to it is RISC-V which explicitly decided to support it.
| p_l wrote:
| Early Alpha CPUs (no idea about laters) essentially had a
| single special register that asserted a mutex lock on a word-
| sized (64bit) location for any read/write operation on the
| Sysbus.
| latchkey wrote:
| Previously:
|
| _Atomics and Concurrency_
| https://news.ycombinator.com/item?id=38964675 (January 11, 2024
| -- 106 points, 48 comments)
| coolThingsFirst wrote:
| Amazing, i have decided to learn cpp for fun and wow the
| designers of the lang had serious gray matter.
|
| Binary search provided. Pair abstraction provided. Lower bound
| for binary search yup.
|
| Auto for type inference and fast as hell on top of it. Crazy
| powerful lang and also multiplayform threads.
___________________________________________________________________
(page generated 2025-05-30 23:01 UTC)