[HN Gopher] What we learned from C++ atomics and memory model st...
       ___________________________________________________________________
        
       What we learned from C++ atomics and memory model standardization
       [video]
        
       Author : matt_d
       Score  : 34 points
       Date   : 2024-03-04 17:00 UTC (6 hours ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | tialaramex wrote:
       | There's clearly an opportunity for a much longer talk, this
       | teases that Paul (McKenney, of Linux fame) will have a very
       | different take and we don't hear what it is, maybe that was
       | presented at another session of this conference.
       | 
       | It's definitely true that translating into "standardese" is a bad
       | idea. Humans don't read standardese and neither do machines, a
       | machine readable model would be superior (more likely to be
       | correct, more easily tested) for this and for other tricky
       | technical problems. Given that even the for-profit C++ vendors
       | don't use the actual ISO document (it's out of date and
       | pointlessly expensive so why bother) having an appendix to the
       | "draft" with the machine proofs would be much better.
       | 
       | I think Hans makes an understandable but (IMO) wrong assumption
       | about a benefit from choosing Sequentially Consistent ordering
       | (the C++ default) over caring which order is correct. Much of the
       | tricky concurrent code non-experts are going to write against
       | these APIs will be wrong anyway, _even if_ you provide Sequential
       | Consistency.
       | 
       | As a result the committee's original assumption wouldn't have
       | much helped. For example I doubt that when (if? I don't like some
       | of the noises I've been hearing) Microsoft fixes SRWLock the fix
       | will be an ordering tweak.
        
         | davidtgoldblatt wrote:
         | > There's clearly an opportunity for a much longer talk, this
         | teases that Paul (McKenney, of Linux fame) will have a very
         | different take and we don't hear what it is, maybe that was
         | presented at another session of this conference.
         | 
         | Paul's talk is here:
         | https://www.youtube.com/watch?v=iJP6DWVrLjM
        
           | tialaramex wrote:
           | Thanks, I think I'll end up watching a good number of these
        
         | jcranmer wrote:
         | There is a bit of a trend in languages to shift towards more
         | formal semantics. In the C/C++ world, the relaxed memory model
         | and the pointer provenance work are being heavily driven by
         | actual formal semantics model. And many newer languages seem to
         | rely on a more overtly operational semantics description (Java
         | and JavaScript come to mind), although the C and C++ standards
         | themselves are largely free of this. I definitely would like to
         | see formal semantics be defined by the standards, although this
         | is difficult because they still don't exist in complete form,
         | and I'm not sure many committee members have an ability to read
         | or reason about formal semantics very well.
         | 
         | I think the commentary about the atomics memory model kind of
         | not working as intended is useful to note. Consume is useless
         | in practice, and there are probably better railings that could
         | have been put around acquire/release semantics.
         | 
         | I'll have to listen to several of the other talks.
        
       | matt_d wrote:
       | The Future of Weak Memory (FOWM) 2024 talks:
       | https://www.youtube.com/playlist?list=PLyrlk8Xaylp6u1S3R6gH0...
       | 
       | Abstract:
       | https://popl24.sigplan.org/details/fowm-2024-papers/16/What-...
       | 
       | The C++11 memory model was first included with thread support in
       | C++11, and then incrementally updated with later revisions. I
       | plan to summarize what I learned, both as a C++ standards
       | committee member, and more recently as a frequent user of this
       | model, mentioning as many of these as I have time for:
       | 
       | The C++ committee began with a view that higher level
       | synchronization facilities like mutexes and barriers should
       | constitute perhaps 90% of thread synchronization, sequentially
       | consistent atomics, maybe another 9%, and weakly ordered atomics
       | the other 1%. What I've observed in C++ code is often very far
       | from that. I see roughly as much atomics as mutex use, in spite
       | of some official encouragement to the contrary. Much of that uses
       | weakly ordered atomics. I see essentially no clever lock-free
       | data structures, along the lines of lock-free linked lists in the
       | code I work with. I do see a lot of atomic flags, counters,
       | fixed-size caches implemented with atomics, and the like. Code
       | bases vary, but I think this is not atypical.
       | 
       | In spite of their frequent use, the pay-off from weakly ordered
       | atomics is decreasing, and is much less than it was in Pentium 4
       | times. The perceived benefit on most modern mainstream CPUs seems
       | to significantly exceed the actual benefit, though probably not
       | so on GPUs. In my mind this casts a bit of doubt on the need to
       | expose dependency-based ordering, as in the unsuccessful
       | memory_order_consume, to the programmer, in spite of an abundance
       | of use cases. Even memory_order_seq_cst is often not
       | significantly slower. I'll illustrate with a microbenchmark.
       | 
       | We initially knew way too little about implementability on
       | various architectures. This came back to bite us recently [Lahav
       | et al.] This remains scary in places. Hardware constraints forced
       | us into a change that makes the interaction between
       | acquire/release and seq_cst hard to explain, and far less
       | intuitive than I would like. It seems to be generally believed
       | that this is hard or impossible to avoid with very high levels of
       | concurrency, as with GPUs.
       | 
       | We knew at the start that the out-of-thin-air problem would be an
       | issue. We initially tried to side-step it, which was a worse
       | disaster than the current hand-waving. This has not stopped
       | memory_order_relaxed from being widely used. Practical code seems
       | to work, but it is not provably correct given the C++ spec, and I
       | will argue that the line between this and non-working code will
       | inherently remain too fuzzy for working programmers. [P1217]
       | 
       | Unsurprisingly, programmers very rarely read the memory model in
       | the standard. We learned that commonly compiler writers do not
       | either. The real audience for language memory models mostly
       | consists of researchers who generate instruction mapping tables
       | for particular architectures. The translation from a mathematical
       | model to standardese is both error prone, and largely pointless.
       | We need to find a way to avoid the standardese.
       | 
       | Atomics mappings are part of the platform application binary
       | interface, and need to be standardized. They often include
       | arbitrary conventions that need to be consistently followed by
       | all compilers on a system for all programming languages. Later
       | evolution of these conventions is not always practical. I'll give
       | a recent RISC-V example of such a problem.
        
       ___________________________________________________________________
       (page generated 2024-03-04 23:01 UTC)