[HN Gopher] What we learned from C++ atomics and memory model st...
___________________________________________________________________
What we learned from C++ atomics and memory model standardization
[video]
Author : matt_d
Score : 34 points
Date : 2024-03-04 17:00 UTC (6 hours ago)
(HTM) web link (www.youtube.com)
(TXT) w3m dump (www.youtube.com)
| tialaramex wrote:
| There's clearly an opportunity for a much longer talk, this
| teases that Paul (McKenney, of Linux fame) will have a very
| different take and we don't hear what it is, maybe that was
| presented at another session of this conference.
|
| It's definitely true that translating into "standardese" is a bad
| idea. Humans don't read standardese and neither do machines, a
| machine readable model would be superior (more likely to be
| correct, more easily tested) for this and for other tricky
| technical problems. Given that even the for-profit C++ vendors
| don't use the actual ISO document (it's out of date and
| pointlessly expensive so why bother) having an appendix to the
| "draft" with the machine proofs would be much better.
|
| I think Hans makes an understandable but (IMO) wrong assumption
| about a benefit from choosing Sequentially Consistent ordering
| (the C++ default) over caring which order is correct. Much of the
| tricky concurrent code non-experts are going to write against
| these APIs will be wrong anyway, _even if_ you provide Sequential
| Consistency.
|
| As a result the committee's original assumption wouldn't have
| much helped. For example I doubt that when (if? I don't like some
| of the noises I've been hearing) Microsoft fixes SRWLock the fix
| will be an ordering tweak.
| davidtgoldblatt wrote:
| > There's clearly an opportunity for a much longer talk, this
| teases that Paul (McKenney, of Linux fame) will have a very
| different take and we don't hear what it is, maybe that was
| presented at another session of this conference.
|
| Paul's talk is here:
| https://www.youtube.com/watch?v=iJP6DWVrLjM
| tialaramex wrote:
| Thanks, I think I'll end up watching a good number of these
| jcranmer wrote:
| There is a bit of a trend in languages to shift towards more
| formal semantics. In the C/C++ world, the relaxed memory model
| and the pointer provenance work are being heavily driven by
| actual formal semantics model. And many newer languages seem to
| rely on a more overtly operational semantics description (Java
| and JavaScript come to mind), although the C and C++ standards
| themselves are largely free of this. I definitely would like to
| see formal semantics be defined by the standards, although this
| is difficult because they still don't exist in complete form,
| and I'm not sure many committee members have an ability to read
| or reason about formal semantics very well.
|
| I think the commentary about the atomics memory model kind of
| not working as intended is useful to note. Consume is useless
| in practice, and there are probably better railings that could
| have been put around acquire/release semantics.
|
| I'll have to listen to several of the other talks.
| matt_d wrote:
| The Future of Weak Memory (FOWM) 2024 talks:
| https://www.youtube.com/playlist?list=PLyrlk8Xaylp6u1S3R6gH0...
|
| Abstract:
| https://popl24.sigplan.org/details/fowm-2024-papers/16/What-...
|
| The C++11 memory model was first included with thread support in
| C++11, and then incrementally updated with later revisions. I
| plan to summarize what I learned, both as a C++ standards
| committee member, and more recently as a frequent user of this
| model, mentioning as many of these as I have time for:
|
| The C++ committee began with a view that higher level
| synchronization facilities like mutexes and barriers should
| constitute perhaps 90% of thread synchronization, sequentially
| consistent atomics, maybe another 9%, and weakly ordered atomics
| the other 1%. What I've observed in C++ code is often very far
| from that. I see roughly as much atomics as mutex use, in spite
| of some official encouragement to the contrary. Much of that uses
| weakly ordered atomics. I see essentially no clever lock-free
| data structures, along the lines of lock-free linked lists in the
| code I work with. I do see a lot of atomic flags, counters,
| fixed-size caches implemented with atomics, and the like. Code
| bases vary, but I think this is not atypical.
|
| In spite of their frequent use, the pay-off from weakly ordered
| atomics is decreasing, and is much less than it was in Pentium 4
| times. The perceived benefit on most modern mainstream CPUs seems
| to significantly exceed the actual benefit, though probably not
| so on GPUs. In my mind this casts a bit of doubt on the need to
| expose dependency-based ordering, as in the unsuccessful
| memory_order_consume, to the programmer, in spite of an abundance
| of use cases. Even memory_order_seq_cst is often not
| significantly slower. I'll illustrate with a microbenchmark.
|
| We initially knew way too little about implementability on
| various architectures. This came back to bite us recently [Lahav
| et al.] This remains scary in places. Hardware constraints forced
| us into a change that makes the interaction between
| acquire/release and seq_cst hard to explain, and far less
| intuitive than I would like. It seems to be generally believed
| that this is hard or impossible to avoid with very high levels of
| concurrency, as with GPUs.
|
| We knew at the start that the out-of-thin-air problem would be an
| issue. We initially tried to side-step it, which was a worse
| disaster than the current hand-waving. This has not stopped
| memory_order_relaxed from being widely used. Practical code seems
| to work, but it is not provably correct given the C++ spec, and I
| will argue that the line between this and non-working code will
| inherently remain too fuzzy for working programmers. [P1217]
|
| Unsurprisingly, programmers very rarely read the memory model in
| the standard. We learned that commonly compiler writers do not
| either. The real audience for language memory models mostly
| consists of researchers who generate instruction mapping tables
| for particular architectures. The translation from a mathematical
| model to standardese is both error prone, and largely pointless.
| We need to find a way to avoid the standardese.
|
| Atomics mappings are part of the platform application binary
| interface, and need to be standardized. They often include
| arbitrary conventions that need to be consistently followed by
| all compilers on a system for all programming languages. Later
| evolution of these conventions is not always practical. I'll give
| a recent RISC-V example of such a problem.
___________________________________________________________________
(page generated 2024-03-04 23:01 UTC)