[HN Gopher] The Tower of Weakenings: Memory Models for Everyone
___________________________________________________________________
The Tower of Weakenings: Memory Models for Everyone
Author : mooreds
Score : 97 points
Date : 2022-04-06 13:39 UTC (9 hours ago)
(HTM) web link (gankra.github.io)
(TXT) w3m dump (gankra.github.io)
| tialaramex wrote:
| I hope this new upper story of the tower is successful. I don't
| reckon I've actually needed anything weaker than the tower's
| ptr.addr() and ptr.with_addr(addr) in code I've actually shipped
| anywhere. Maybe people have examples of stuff (in Rust or
| otherwise) they've written where they're confident that would
| _not_ be enough?
| seo-speedwagon wrote:
| This is my question as well. I don't write a lot of C or Rust
| code, so I'm curious when int -> ptr and ptr -> int conversions
| are actually needed in practice. The only scenarios I can think
| of with my limited domain experience in are low-level things
| where you actually know the numeric addresses of certain
| important locations in memory. In which case, saying "yeah,
| that's a lower level in the weakenings tower" seems sensible.
| mh7 wrote:
| A classic example is when you're writing a custom allocator
| and need to align the address.
|
| Another is storing pointers in a hash table - you need to do
| arithmetic on the pointer bits to compute the hash.
| Gankra wrote:
| I already designed several new convenience methods for
| making this kind of thing simple and the PR to rust-lang is
| already open :)
| jcranmer wrote:
| There are definitely cases where you are converting integers
| into pointers ex-nihilo. Memory-mapped I/O is one (write
| something special to memory location 0x04ff0030 and it turns on
| the light!).
|
| My current Rust project would probably need to hit up the
| exposed_addr interface, but it's definitely in the twilight of
| "how to even memory model" since it involves trying to reason
| about the provenance of pointer values in the register array
| passed to you by the third argument of the signal handler.
| kibwen wrote:
| Surely the fact that you're using explicitly volatile
| accesses should be a dead giveaway to the compiler that funny
| business is going on and that the optimizer shouldn't touch
| it with a ten-foot pole, right? The whole point of volatile
| is to inhibit optimizations, after all. Maybe that's not good
| enough for CHERI (but surely they have _some_ way of doing
| MMIO), but it sounds good enough for Miri.
| jcranmer wrote:
| From a semantics perspective, volatile isn't really a
| solution so much as it is a brush-it-under-the-rug
| scenario. As you say, the point of volatile is to inhibit
| optimizations, but one of the issues of language semantics
| is that "optimization" isn't a meaningfully definable term,
| especially the further your target architecture gets from a
| Computer Architecture 101-model computer.
| Gankra wrote:
| My understanding is that CHERI just tells you to eat your
| vegetables and properly maintains a chain of custody from
| like bootloader to kernel to subsystems to drivers, so that
| anyone messing with MMIO has to do it the Right Way and get
| permission from whatever's up the chain of command from
| them to access that range of memory, even if they're also
| part of the kernel.
|
| This is super vague half-remembered conversations though.
| CJefferson wrote:
| The only example I can think of I've seen released code is the
| "xor linked list" trick, where each node of a link list stores
| "neighbours = prev xor next" (rather than pointers to the
| 'next' and 'prev' thing in the list).
|
| The idea is this lets you iterate up and down the linked list,
| as if you have prev, you can do "neighbours xor prev" to get
| next, and "neighbours xor next" to get prev.
|
| However, while in that particular program that did create a
| notable speed improvement and memory saving, I think it's
| probably worth sacrificing to the greater good.
| ATsch wrote:
| In general the primary reason for using doubly linked lists,
| despite their terrible cache behaviour, is that they are the
| only data structure that is very easy to write in C. So I'm
| not sure how much that is going to be an issue in Rust where
| it is much easier to use other data structures that perform
| significantly better in practice anyway.
| tialaramex wrote:
| That "terrible" cache behaviour is what you actually wanted
| for some concurrency problems, because if code on several
| physical CPUs or cores all tries to access the _same_
| region of memory that 's going to be very slow, while
| unrelated memory will get cached locally by each CPU/ core.
|
| That's not why Joe Junior's first C program has a linked
| list in it. But it _might well be_ why Joe Senior 's
| masterpiece Rust program has a linked list in it. On the
| other hand, depending on the algorithm being implemented,
| it may only be singly linked, and the XOR trick doesn't
| apply.
|
| Rust's alloc (the library you get if you have an allocator,
| but not necessarily the entire OS environment) does provide
| a linked list if you want one, and this would be a
| reasonable choice in this case whereas it warns you that
| you probably wanted Vec if you're not sure which data
| structure you need.
| ATsch wrote:
| This is true, but the concurrent work stealing queues and
| such that are commonly used for such purposes aren't
| usually textbook C-style linked lists. They (at least in
| the case of Crossbeam, don't have experience with others)
| allocate smaller contiguous blocks of items which are
| then linked together with pointers. This reduces overhead
| a lot more than the xor trick does while also improving
| cache behaviour. Also as you mentioned they're already
| only singly linked in the first place.
| zozbot234 wrote:
| I think this would work well with the model as described in
| the article, since for every list operation you're assumed to
| be starting from an actual pointer to some list element,
| which you can use to endow the recomputed addresses with the
| right provenance component.
| CJefferson wrote:
| If each element of the linked list has a different
| provenance, then I think you are out of luck. I could
| allocate the whole list in a single giant block, but that
| feels like it defeats the purpose -- now I'm just
| allocating every object in my program in one giant
| allocation.
| Gankra wrote:
| Yeah "it's actually xoring array indices" is the standard
| solution
|
| Or you can go Deep on memory models and try to apply
| Ralf's Xor Provenance Hack and claim that, well,
| provenance is stored in bytes, but it doesn't _have_ to
| be a pointer-sized range of bytes, so let me have some
| horrible way to express "the high half has provenance 1,
| the low half has provenance 2" and _handwave_ magic
| problem solved.
|
| This is of course horrible and also not at all a portable
| notion to CHERI which tracks provenance at the
| granularity of "aligned pointer-sized region of memory".
| But hey, if it helps you sleep at night.
| CJefferson wrote:
| I'm personally excited by CHERI, and feel it is part of
| (sorry if these words annoy some people) computing
| "growing up". It's interesting (to me at least) that that
| is the only trick I can think of (along with general
| pointer compression tricks, like interpreters storing
| 63-bit integers when a pointers last bit is set to 1),
| where CHERI shouldn't be (fairly) trivial to support.
|
| I'm happy to sacrific xoring pointers, the same way I
| wouldn't use some "cunning trick" to build my house with
| half as many nails, at the risk that any minor mistake
| installing any of those nails would lead to my house
| failing over.
| markjdb wrote:
| CHERI does permit tricks like storing flags in the low
| bits of a pointer, at least to some extent. Quite a lot
| of low level C code (including some in the CheriBSD
| kernel) needs that to work.
| GolDDranks wrote:
| A nice think about this work is that adopting this strict model
| in real life codebases is going to bring to light cases where
| it isn't enough, and hopefully more scrutiny can be directed
| towards those cases.
| IainIreland wrote:
| A possible non-Rust example: SpiderMonkey uses a trick we call
| NaN-boxing to represent JS values, which takes advantage of the
| fact that doubles have 2^52 different bit patterns for NaN to
| store 48-bit pointers with 4 bits leftover for a type tag.
| Provenance for the ptr->int->double->int->ptr loop might still
| be manageable using a "base of the GC heap" pointer and
| with_addr(), but there are also cases where built-in objects
| store arbitrary non-GC pointers in private slots (for example,
| a Map object has a raw pointer to its heap-allocated backing
| array). In that case, there isn't an obvious way to reestablish
| provenance (although we could hack it for e.g. Cheri by adding
| a layer of indirection through a GC allocation that stores the
| real pointer).
| Gankra wrote:
| Iain and I previously discussed this a bit, so just to add
| context: WebKit is already ported to CHERI BSD, so there are
| definitely solutions to the kinds of hax that JIT VMs like to
| do. The details just become very platform specific (specific
| instructions/operations that are supported like pointer
| shifting or high-bit tagging), so it's hard to
| discuss/describe abstractly without a CHERI expert or ISA
| manual next to you.
| tooltower wrote:
| I found this article hard to read. I was trying to understand
| what the current state of affairs is, and how these new
| abstractions improve the matter. E.g. the consistency properties,
| the weakenings, and when they help.
|
| Instead it read more like an exasperated rant, that links to rust
| toolchain improvements. It does have an account of how to use the
| new features, which was good. But I couldn't glean when and how
| that would help me as a user
| kmeisthax wrote:
| There's a prior post that this one is building off of, but
| basically:
|
| 1. Pointers are not addresses; they are pairs of (allocation,
| address). This is known as pointers having _provenance_.
|
| 2. Treating a pointer as an int-sized address is merely a
| hardware optimization; and there is already hardware that
| deliberately does not do this (e.g. CHERI and _maybe_ ARM PAC).
|
| 3. The compiler needs to know the allocation part of the
| pointer (the one that gets lost at runtime) in order to
| determine if a pointer write will change a local variable. If
| this association is lost then you get miscompiles.
|
| 4. Converting an integer back into a pointer (e.g. usize as
| ptr) does not establish pointer provenance, will break CHERI,
| and will miscompile on other architectures.
|
| 5. Rust made the mistake of allowing #4 in unsafe code. This is
| entirely unsound.
|
| 6. The proposed strict-provenance APIs allows doing something
| like #4, but sound, by letting the user stitch an address onto
| a pointer with a compatible allocation. This re-establishes the
| chain of provenance and avoids the miscompile.
| zozbot234 wrote:
| Looks like the root mistake here is conflating `usize` (the
| integer type giving the maximum size of a data object, and
| the type of offsets into an allocation) and `uptr` (the size
| of a pointer's integer representation). In a segmented memory
| architecture, or one using 'object-oriented memory', the two
| need _not_ be the same. You need pretty much the same sort of
| tweaks to be able to effectively target systems like the old
| 8086 /80286, the 65816 (of Apple II GS and SNES fame) or the
| Intel iAPX 432.
| Gankra wrote:
| Rust is no more unsound than C/C++/Swift here. It _basically_
| works but that 's a really frustrating answer for anyone who
| cares about validation/sanitizers/optimization/docs.
|
| It's just better if most code doesn't poke the dragon (llvm)
| and has _trivially correct_ semantics instead of "yeah this
| has to work but um, don't ask me how or why".
| jcranmer wrote:
| Programming language semantics turn out to be far, far thornier
| than most people (even people who work on language committees!)
| give them credit for. And the memory model turns out to be the
| absolute worst part of it all: the hardware plays lots of fun
| games on you, the OS plays lots of fun games on you, and the
| compiler itself is playing lots of fun games on you.
|
| Multithreaded--really, multiprocessor--memory models are really,
| really confusing. Hardware memory models are generally described
| in terms of what reorderings are possible (e.g., loads may be
| ordered before subsequent stores), which is easy to understand
| from a hardware perspective but less easy to deal with from a
| language perspective. There turns out to be a wonderful theorem,
| though, that lets you hide all of the hardware ugliness: if you
| properly synchronize your code, then you can't _see_ any of the
| reordering, and the compiler merely has to guarantee that the
| synchronization is properly handled. This is the data-race-free
| programming model, and it is the primary basis for all modern
| memory models. Stray away from this model (e.g., relaxed atomics)
| and you reenter the world of pain, for which no satisfactory
| solution has yet been found.
|
| Discovering the pains of pointer provenance is even more recent,
| arguably only about a decade old. For a long time, pointer
| provenance has been handwaved on a vague data-dependency basis
| (you see this in LLVM's LangRef, for example). Unfortunately, it
| turns out that data dependencies are not generally preserved by
| compilers (this also doomed the release/consume aspect of the
| C/C++ memory model). An escaping/lifetime-based pointer model is
| closer to the actual one used by compilers, but there remains
| tricky things about things like memcpy [1].
|
| It turns out there are two things that make pointer provenance
| painful: integer-to-pointer (where does the provenance come
| from?), and memory (since this usually ends up creating implicit
| integer-to-pointer casts in most putative models that you really
| want to avoid). The lesson we know from pointer provenance is
| that _pointers are not integers, and cannot be losslessly
| roundtripped via integers_ , and Rust being newer and with a
| heavier emphasis on safety gives it a better chance of adopting a
| model here that more forcefully breaks this mistaken link.
|
| [1] One consequence is that most formal pointer provenance models
| actually don't let you legally write memcpy in C code.
| fbkr wrote:
| > Stray away from this model (e.g., relaxed atomics)
|
| Relaxed atomics are still data-race free, did you mean non-
| atomic accesses?
| jcranmer wrote:
| I should clarify slightly. There are multiple definitions of
| "data race" in play. In the sense of the C/C++ standard,
| where a data race is essentially used to describe behaviors
| that are undefined, relaxed atomics are free of data races.
| However, from the perspective of a data-race-free model where
| data races are things that observably violate sequential
| consistency, relaxed atomics are absolutely data races.
|
| In fact, the raison d'etre for relaxed atomics is to permit
| applications to create "benign data races" [in the second
| sense] that aren't undefined behavior. As it turns out,
| though, actually specifying what semantics such a "benign
| data race" has is complicated, especially when you get into
| the realm of avoiding things like out-of-thin-air behavior
| (or, to reference a paper that crossed my desk a month ago,
| undefined behavior executed only if out-of-thin-air happens).
| bigcat123 wrote:
| Hmm, I am getting a sense that modern engineers shows a sign of
| losing touch to computer science theoretical foundation.
|
| It's not funny game. It's part of abstraction hierarchy and
| physical computing model vs. programming language computing
| model.
|
| It's only funny because one would presume that there ever was a
| time the hardware was not playing "funny" game.
|
| And no, it always has a giant abstraction gap between
| hierarchy, and that's a fundamental part of modern electronic
| computing. One shall allow oneself to learn those and get used
| to how things are actually working.
___________________________________________________________________
(page generated 2022-04-06 23:01 UTC)