hngopher.com

       [HN Gopher] Atomics and Concurrency
       ___________________________________________________________________
        
       Atomics and Concurrency
        
       Author : LAC-Tech
       Score  : 124 points
       Date   : 2025-05-28 08:00 UTC (2 days ago)
        
 (HTM) web link (redixhumayun.github.io)
 (TXT) w3m dump (redixhumayun.github.io)
        
       | ozgrakkurt wrote:
       | This is great, also looking for information about performance
       | impact of using atomics like atomically reference counting etc.
       | if anyone can recommend something to learn from
        
         | cmrdporcupine wrote:
         | https://travisdowns.github.io/blog/2020/07/06/concurrency-co...
        
         | j_seigh wrote:
         | Performance from using atomics in reference counting is going
         | to degrade quite a bit under heavy usage due to cache
         | contention. If you are in that situation, you should be looking
         | at something else like RCU or hazard Pointers.
        
           | tialaramex wrote:
           | Note that these strategies (collectively, "delayed
           | reclamation") lose one benefit of the RAII concept. With
           | Rust's Drop or C++ destructors we are guaranteed that when we
           | don't need this Goat the clean-up-unused-goat code we wrote
           | will run now - unlike with a Garbage Collector where that
           | code either runs at some unspecified future time or not at
           | all.
           | 
           | Delayed reclamation means that "when we don't need this Goat"
           | will _sometimes_ be arbitrarily delayed after we apparently
           | didn 't need it, and it's no longer possible to say for sure
           | by examining the program. This will almost always be a trade
           | you're willing to make, but you need to know it exists, this
           | is not a fun thing to discover while debugging.
        
             | zozbot234 wrote:
             | With RCU the "we don't need this Goat" occurs as soon as
             | the last readers are done with the earlier version of Goat,
             | modulo a limited grace period in some RCU implementations.
             | New readers always get the latest version, so there is no
             | risk of waiting for an "unspecified" or "arbitrary" amount
             | of time.
             | 
             | Hazard pointers are a bit fiddlier AIUI, since the
             | reclamation step must be triggered explicitly and verify
             | that no hazard pointers exist to the object being
             | reclaimed, so it is quite possible that you won't promptly
             | reclaim the object.
        
               | tialaramex wrote:
               | From the point of view of any particular updater, the
               | death of every current reader (who would keep the Goat
               | alive) is arbitrary. Obviously in a real system it won't
               | literally be unbounded, but I don't think it will usually
               | make sense to work out how long that could take - in
               | terms of clock cycles, or wall clock time, or any
               | external metric - it won't necessarily be immediate but
               | it will happen "eventually" and so just write software
               | accordingly.
               | 
               | Maybe I'm wrong for some reason and there is a practical
               | limit in play for RCU but not Hazard Pointers, I don't
               | think so though.
        
               | PaulDavisThe1st wrote:
               | One of the central purposes of RCU is to decouple the
               | updater/writer(s) from the reader(s). It doesn't matter
               | to the writer if there is still a reader "out there"
               | using the old version. And it likewise doesn't matter to
               | (most) readers that the version they have is now old.
               | 
               | What is delayed is the actual destruction/deallocation of
               | the RCU-managed objects. Nobody cares about that delay
               | unless the objects control limited resources (in which
               | case, RCU is likely a bad fit) or there are so many
               | objects and/or they are so large that the delay in
               | deallocating them could cause memory pressure of some
               | type.
        
       | pythops wrote:
       | For Rust, I highly recommend this book
       | https://marabos.nl/atomics/
        
         | reitzensteinm wrote:
         | This is such a great book, especially the section on operating
         | system primitives, which made the book wider in scope and more
         | practical. After all, you're probably not building exotic data
         | structures by hand in memory without also needing high
         | performance IO.
         | 
         | It's been a hobby of mine to collect concurrency examples from
         | books and blog posts and simulating them in Temper, my Rust
         | memory model simulator. As far as I know, it's the largest
         | Rust/C++11 memory model test suite on the internet (but I'm
         | happy to be corrected).
         | 
         | This is the file for Rust Atomics and Locks:
         | 
         | https://github.com/reitzensteinm/temper/blob/main/memlog/tes...
         | 
         | I didn't find any bugs in the examples, but with how good the
         | book was, I didn't expect to :)
         | 
         | The Williams book for C++ contains many of the same ideas
         | (Rust's memory model is a copy/paste from C++11 without the now
         | deprecated Consume) and I can highly recommend that too.
        
           | pythops wrote:
           | absolutely ! This book is useful for even non rust developers
           | I think
        
       | tialaramex wrote:
       | > TSan is a reliable way of detecting data races in your code,
       | instead of trying to repeatedly run the code in a loop hoping for
       | a data race to occur.
       | 
       | I'm sure this doesn't intend to be misleading but I think it can
       | mislead people anyway.
       | 
       | TSan should detect races but just like "repeatedly run the code
       | in a loop" it's not actually checking what you wrote doesn't have
       | races, it's just reporting any which happened during testing.
       | This is valuable - if you eliminate a race you've almost
       | certainly fixed a real bug, maybe a very subtle bug you'd have
       | cursed for years otherwise, but you can't use TSan to prove there
       | are no further bugs of this kind.
       | 
       | Tools to prove this stuff are often much harder to bring into a
       | project, but you should put that work in if the difference
       | between "It's probably fine" and "I proved it's correct" sounds
       | necessary to you or for your work. I don't want my pacemaker to
       | be "probably fine" but it's OK if the Youtube music player on my
       | phone wasn't proved correct.
        
         | dataflow wrote:
         | My (limited) understanding is that the way TSAN works is better
         | than what you're comparing it to. You don't have to rerun TSAN
         | in a loop to finally catch a race. As long as your test has
         | coverage of that scenario (i.e. it executes the lines of
         | interest once within a reasonable time window) it should get
         | caught. Of course, being dynamic, it can't catch code that you
         | never execute during testing, but that's not such a huge
         | problem because you can verify code coverage through other
         | means. It's quite a bit of a different from having to make
         | stars align by looping until some e.g. rare timing issue comes
         | up.
         | 
         | The real issue here isn't that TSAN is low-confidence about
         | absence of data races in C++. The issue is that even if it
         | statically proved the absence of data races in the C++ sense,
         | that _still_ wouldn 't imply that your algorithm is race-free.
         | Because it's trivial to patch every single instance of a data
         | race by using an atomic to get TSAN to shut up, but that in no
         | way implies the higher-level concurrent algorithm works
         | correctly. (e.g., if two threads perform an atomic load of some
         | variable, then add one, then perform an atomic store back in,
         | they would obviously stomp on each other, but TSAN wouldn't
         | care.) It's more like a UB checker than a correctness verifier.
        
           | withoutboats3 wrote:
           | The occurrence of data races depends on the specific non-
           | deterministic sequence of execution of concurrent codepaths.
           | Just because you have 100% code coverage does not mean you've
           | covered every potential execution sequence, and its almost
           | never practical to actually execute every possibility to
           | ensure the absence of data races. Depending on the
           | probability that your data race will occur, it could indeed
           | be something you have to make stars align for TSAN to catch.
           | 
           | Not to talk my own book, but there is a well-known
           | alternative to C++ that can actually guarantee the absence of
           | data races.
        
             | dataflow wrote:
             | It "could" for some algorithms, yes, but for a lot of
             | algorithms, that kind of star alignment simply isn't
             | necessary to find all the data races, was my point. And
             | yes, TLA+ etc. can be helpful, but then you have the
             | problem of matching them up with the code.
        
               | Maxatar wrote:
               | I feel like in a subtle way you're mixing up data races
               | with race conditions, especially given the example you
               | site about incrementing an atomic variable.
               | 
               | TSAN does not check for race conditions in general, and
               | doesn't claim to do so at all as the documentation
               | doesn't include the term race condition anywhere. TSAN is
               | strictly for checking data races and deadlocks.
               | 
               | Consequently this claim is false:
               | 
               | >The issue is that even if it statically proved the
               | absence of data races in the C++ sense, that still
               | wouldn't imply that your algorithm is race-free.
               | 
               | Race-free code means absence of data races, it does not
               | mean absence of the more general race condition. If you
               | search Google Scholar for race free programming you'll
               | find no one uses the term race-free to refer to complete
               | absence of race conditions but rather to the absence of
               | data races.
        
               | dataflow wrote:
               | There's "data race" in "C++ ISO standard" sense, and then
               | there's "data race" in the general CS literature (as well
               | as all the other terms). Two threads writing a value to
               | the same memory location (even atomically) is usually a
               | data race in the CS/algorithm sense (due to the lack of
               | synchronization), but not the C++ sense. I'm not really
               | interested in pedantic terminology here, just trying get
               | a higher level point across about what you can & can't
               | assume with a clean TSAN (and how _not_ to clean your
               | TSAN errors). Feel free to mentally rewrite my comment
               | with whatever preferred terminology you feel would get my
               | points across.
        
               | kiitos wrote:
               | > Two threads writing a value to the same memory location
               | (even atomically) is usually a data race in the
               | CS/algorithm sense (due to the lack of synchronization),
               | but not the C++ sense
               | 
               | You seem to conflate the concepts of "data race" and
               | "race condition", which are not the same thing.
               | 
               | Two threads writing to the same memory location without
               | synchronization (without using atomic operations, without
               | going thru a synchronization point like a mutex, etc.) is
               | a data race, and almost certainly also a race condition.
               | If access to that memory location is synchronized,
               | whether thru atomics or otherwise, then there's no data
               | race, but there can still be a race condition.
               | 
               | This isn't a pedantic distinction, it's actually pretty
               | important.
        
               | Maxatar wrote:
               | This isn't pedantry, if you're going to talk about how
               | specific tools work then you need to use the actual
               | terminology that the tools themselves use or else you
               | will confuse yourself and anyone you talk to about them.
               | If we were discussing general concepts about thread
               | safety then sure we can be loose about our words, but if
               | we're talking about a specific tool used for a specific
               | programming language then we should make sure we are
               | using the correct terminology, if only to signal that we
               | have the proper domain knowledge to speak about the
               | subject.
               | 
               | >Feel free to mentally rewrite my comment with whatever
               | preferred terminology you feel would get my points
               | across.
               | 
               | If I rewrite your comment to use data race, then your
               | comment is plainly incorrect since the supporting example
               | you give is not a data race but a race condition.
               | 
               | If I rewrite your comment to use race condition, then
               | your comment is also incorrect since TSAN doesn't detect
               | race conditions in general and doesn't claim to, it
               | detects data races.
               | 
               | So what exactly am I supposed to do in order to make
               | sense of your post?
               | 
               | The idea that you'd talk about the pros and cons of a
               | tool like TSAN without knowing the difference between a
               | race condition and a data race is kind of absurd. That
               | you'd claim my clarification of these terms for the sake
               | of better understanding your point is a form of pedantry
               | is sheer hubris.
        
               | dataflow wrote:
               | Hold on, before attacking me. Say we have this Java
               | program, and assume the semantics of the common JRE/JVM
               | everyone uses. Do you believe it has a data race or not?
               | Because the variable _is_ accessed atomically, whether
               | whether you mark it as volatile or not:
               | class Main {         private static String s =
               | "uninitialized";         public static void main(String[]
               | args) {           Thread t = new Thread() {
               | public void run() { s = args[0]; }           };
               | t.start();           System.out.println(s);         }
               | }
               | 
               | And I sure as heck have not heard anyone claim such data
               | races are impossible in Java. (Have you?)
        
               | vlovich123 wrote:
               | > Two threads writing a value to the same memory location
               | (even atomically) is usually a data race in the
               | CS/algorithm sense (due to the lack of synchronization),
               | but not the C++ sense
               | 
               | Not only are you incorrect, it's even worse than you
               | might think. Unsynchronized access to data in c++ is not
               | only a data race, it's explicitly undefined behavior and
               | the compiler can choose to do whatever in response of an
               | observed data race (which you are promising it isn't
               | possible by using the language).
               | 
               | You are also misinformed about the efficacy of TSAN. Even
               | in TSAN you have to run it in a loop - if TSAN never
               | observes the specific execution order in a race it'll
               | remain silent. This isn't a theoretical problem but a
               | very real one you must deeply understand if you rely on
               | these tools. I recall a bug in libc++ with
               | condition_variable and reproducing it required running
               | the repro case in a tight loop like a hundred times to
               | get even one report. And when you fixed it, how long
               | would you run to have confidence it was actually fixed?
               | 
               | And yes, race conditions are an even broader class of
               | problems that no tool other than formal verification or
               | DST can help with. Hypothesis testing can help mildly but
               | really you want at least DST to probabilistically search
               | the space to find the race conditions (and DST's main
               | weakness aside from the challenge of searching a
               | combinatorial explosion of states is that it still relies
               | on you to provide test coverage and expectations in the
               | first place that the race condition might violate)
        
               | dataflow wrote:
               | Geez. I'm well aware it's UB. That was just sloppy
               | wording. Should've said "not necessarily", okay. I only
               | wrote "not in the C++ sense" because I had said "even
               | atomically", not because I'm clueless.
        
               | withoutboats3 wrote:
               | The alternative to C++ that I meant was Rust, which
               | statically prevents data races.
        
         | sebtron wrote:
         | TSan also catches data races that end up not having any visible
         | effect, unlike rerunning and hoping to see the program
         | misbehave.
        
           | vlovich123 wrote:
           | The false positive on TSAN is so low that it's worth fixing
           | any false positives although concluding it's a false positive
           | seems so hard that I always have a nagging feeling as to
           | whether I actually did fix it
        
       | gpderetta wrote:
       | Probably I'm missing something, but enqueue dereferences the
       | current tail node. What prevents it dereferencing a node that has
       | just been deleted by dequeue.
       | 
       | Actually, nothing ever sets head so I'm not sure how anything is
       | ever dequeued. Probably the implementation is incomplete and the
       | queue needs a sentinel node somewhere.
        
       | john-h-k wrote:
       | > The important thing to remember is that each of these cannot be
       | split into separate instructions.
       | 
       | Nitpick, but they absolutely can be split into several
       | instructions, and this is the most common way it's implemented on
       | RISClike processors, and also single instructions aren't
       | necessarily atomic.
       | 
       | The actual guarantee is that the entire operation (load, store,
       | RMW, whatever) occurs in one "go" and no other thread can perform
       | an operation on that variable during this atomic operation (it
       | can't write into the low byte of your variable as you read it).
       | 
       | It's probably best euphemismed by imagining that every atomic
       | operation is just the normal operation wrapped in a mutex, but
       | implemented in a much more efficient manner. Of course, with
       | large enough types, Atomic variables may well be implemented via
       | a mutex
        
         | khrbtxyz wrote:
         | https://en.wikipedia.org/wiki/Load-link/store-conditional
        
         | OskarS wrote:
         | > It's probably best euphemismed by imagining that every atomic
         | operation is just the normal operation wrapped in a mutex
         | 
         | Sort-of, but that's not quite right: if you imagine it as
         | "taking a mutex on the memory", there's a possibility of
         | starvation. Imagine two threads repeatedly "locking" the memory
         | location to update it. With a mutex, it's possible that one of
         | them get starved, never getting to update the location,
         | stalling indefinitely.
         | 
         | At least x86 (and I'm sure ARM and RISC-V as well) make a much
         | stronger progress guarantee than a mutex would: the operation
         | is effectively wait-free. All threads are guaranteed to make
         | progress in some finite amount of time, no one will be starved.
         | At least, that's my understanding from reading much smarter
         | people talking about the cache synchronization protocols of
         | modern ISAs.
         | 
         | Given that, I think a better mental model is roughly something
         | like the article describes: the operation might be slower under
         | high contention, but not "blocking" slow, it is guaranteed to
         | finish in a finite amount of time and atomically ("in one
         | combined operation").
        
           | john-h-k wrote:
           | x86 and modern ARM64 has instructions for this, but original
           | ARM and RISC approaches are literally a hardware-assisted
           | polling loop. Unsure what guarantees they make, I shall look.
           | 
           | Definitely a good clarification though yeah, important
        
             | wbl wrote:
             | Really finicky ones, and the initial ones made none. I
             | think for RISC-V it's something like max 16 instructions
             | covered with no other memory accesses to be assured
             | progress on ll/sc sequences.
             | 
             | That's to enable very minimal hardware implementations that
             | can only track one line at a time.
        
             | OskarS wrote:
             | I got curious about this, because really the most important
             | guarantee is in the C++ memory model, not any of the ISAs,
             | and conforming compilers are required to fulfill them (and
             | it's generally more restrictive than any of the platform
             | guarantees). It's a little bit hard to parse, but if I'm
             | reading section 6.9.2.3 of the C++23 standard (from here
             | [1]), operations on atomics are only lock-free, not wait-
             | free, and even that might be a high bar on certain
             | platforms:                   Executions of atomic functions
             | that are either defined to be lock-free          (33.5.10)
             | or indicated as lock-free (33.5.5) are lock-free
             | executions.              When one or more lock-free
             | executions run concurrently, at least one should
             | complete.                  [Note 3 : It is difficult for
             | some implementations to provide absolute
             | guarantees to this effect, since repeated and particularly
             | inopportune              interference from other threads
             | could prevent forward progress, e.g., by
             | repeatedly stealing a cache line for unrelated purposes
             | between load-locked              and store-conditional
             | instructions. For implementations that follow this
             | recommendation and ensure that such effects cannot
             | indefinitely delay progress              under expected
             | operating conditions, such anomalies can therefore safely
             | be              ignored by programmers. Outside this
             | document, this property is sometimes              termed
             | lock-free. -- end note]
             | 
             | I'm guessing that note is for platforms like you mention,
             | where the underlying ISA makes this (more or less)
             | impossible. I would assume in the modern versions of these
             | ISAs though, essentially everything in std::atomic is wait-
             | free, in practice.
             | 
             | [1] https://open-
             | std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.p...
        
               | p_l wrote:
               | My understanding is that C++ memory model essentially
               | pulled the concurrency bit from their own imagination,
               | and as a result the only architecture that actually maps
               | to it is RISC-V which explicitly decided to support it.
        
         | p_l wrote:
         | Early Alpha CPUs (no idea about laters) essentially had a
         | single special register that asserted a mutex lock on a word-
         | sized (64bit) location for any read/write operation on the
         | Sysbus.
        
       | latchkey wrote:
       | Previously:
       | 
       |  _Atomics and Concurrency_
       | https://news.ycombinator.com/item?id=38964675 (January 11, 2024
       | -- 106 points, 48 comments)
        
       | coolThingsFirst wrote:
       | Amazing, i have decided to learn cpp for fun and wow the
       | designers of the lang had serious gray matter.
       | 
       | Binary search provided. Pair abstraction provided. Lower bound
       | for binary search yup.
       | 
       | Auto for type inference and fast as hell on top of it. Crazy
       | powerful lang and also multiplayform threads.
        
       ___________________________________________________________________
       (page generated 2025-05-30 23:01 UTC)