[HN Gopher] The Tower of Weakenings: Memory Models for Everyone
       ___________________________________________________________________
        
       The Tower of Weakenings: Memory Models for Everyone
        
       Author : mooreds
       Score  : 97 points
       Date   : 2022-04-06 13:39 UTC (9 hours ago)
        
 (HTM) web link (gankra.github.io)
 (TXT) w3m dump (gankra.github.io)
        
       | tialaramex wrote:
       | I hope this new upper story of the tower is successful. I don't
       | reckon I've actually needed anything weaker than the tower's
       | ptr.addr() and ptr.with_addr(addr) in code I've actually shipped
       | anywhere. Maybe people have examples of stuff (in Rust or
       | otherwise) they've written where they're confident that would
       | _not_ be enough?
        
         | seo-speedwagon wrote:
         | This is my question as well. I don't write a lot of C or Rust
         | code, so I'm curious when int -> ptr and ptr -> int conversions
         | are actually needed in practice. The only scenarios I can think
         | of with my limited domain experience in are low-level things
         | where you actually know the numeric addresses of certain
         | important locations in memory. In which case, saying "yeah,
         | that's a lower level in the weakenings tower" seems sensible.
        
           | mh7 wrote:
           | A classic example is when you're writing a custom allocator
           | and need to align the address.
           | 
           | Another is storing pointers in a hash table - you need to do
           | arithmetic on the pointer bits to compute the hash.
        
             | Gankra wrote:
             | I already designed several new convenience methods for
             | making this kind of thing simple and the PR to rust-lang is
             | already open :)
        
         | jcranmer wrote:
         | There are definitely cases where you are converting integers
         | into pointers ex-nihilo. Memory-mapped I/O is one (write
         | something special to memory location 0x04ff0030 and it turns on
         | the light!).
         | 
         | My current Rust project would probably need to hit up the
         | exposed_addr interface, but it's definitely in the twilight of
         | "how to even memory model" since it involves trying to reason
         | about the provenance of pointer values in the register array
         | passed to you by the third argument of the signal handler.
        
           | kibwen wrote:
           | Surely the fact that you're using explicitly volatile
           | accesses should be a dead giveaway to the compiler that funny
           | business is going on and that the optimizer shouldn't touch
           | it with a ten-foot pole, right? The whole point of volatile
           | is to inhibit optimizations, after all. Maybe that's not good
           | enough for CHERI (but surely they have _some_ way of doing
           | MMIO), but it sounds good enough for Miri.
        
             | jcranmer wrote:
             | From a semantics perspective, volatile isn't really a
             | solution so much as it is a brush-it-under-the-rug
             | scenario. As you say, the point of volatile is to inhibit
             | optimizations, but one of the issues of language semantics
             | is that "optimization" isn't a meaningfully definable term,
             | especially the further your target architecture gets from a
             | Computer Architecture 101-model computer.
        
             | Gankra wrote:
             | My understanding is that CHERI just tells you to eat your
             | vegetables and properly maintains a chain of custody from
             | like bootloader to kernel to subsystems to drivers, so that
             | anyone messing with MMIO has to do it the Right Way and get
             | permission from whatever's up the chain of command from
             | them to access that range of memory, even if they're also
             | part of the kernel.
             | 
             | This is super vague half-remembered conversations though.
        
         | CJefferson wrote:
         | The only example I can think of I've seen released code is the
         | "xor linked list" trick, where each node of a link list stores
         | "neighbours = prev xor next" (rather than pointers to the
         | 'next' and 'prev' thing in the list).
         | 
         | The idea is this lets you iterate up and down the linked list,
         | as if you have prev, you can do "neighbours xor prev" to get
         | next, and "neighbours xor next" to get prev.
         | 
         | However, while in that particular program that did create a
         | notable speed improvement and memory saving, I think it's
         | probably worth sacrificing to the greater good.
        
           | ATsch wrote:
           | In general the primary reason for using doubly linked lists,
           | despite their terrible cache behaviour, is that they are the
           | only data structure that is very easy to write in C. So I'm
           | not sure how much that is going to be an issue in Rust where
           | it is much easier to use other data structures that perform
           | significantly better in practice anyway.
        
             | tialaramex wrote:
             | That "terrible" cache behaviour is what you actually wanted
             | for some concurrency problems, because if code on several
             | physical CPUs or cores all tries to access the _same_
             | region of memory that 's going to be very slow, while
             | unrelated memory will get cached locally by each CPU/ core.
             | 
             | That's not why Joe Junior's first C program has a linked
             | list in it. But it _might well be_ why Joe Senior 's
             | masterpiece Rust program has a linked list in it. On the
             | other hand, depending on the algorithm being implemented,
             | it may only be singly linked, and the XOR trick doesn't
             | apply.
             | 
             | Rust's alloc (the library you get if you have an allocator,
             | but not necessarily the entire OS environment) does provide
             | a linked list if you want one, and this would be a
             | reasonable choice in this case whereas it warns you that
             | you probably wanted Vec if you're not sure which data
             | structure you need.
        
               | ATsch wrote:
               | This is true, but the concurrent work stealing queues and
               | such that are commonly used for such purposes aren't
               | usually textbook C-style linked lists. They (at least in
               | the case of Crossbeam, don't have experience with others)
               | allocate smaller contiguous blocks of items which are
               | then linked together with pointers. This reduces overhead
               | a lot more than the xor trick does while also improving
               | cache behaviour. Also as you mentioned they're already
               | only singly linked in the first place.
        
           | zozbot234 wrote:
           | I think this would work well with the model as described in
           | the article, since for every list operation you're assumed to
           | be starting from an actual pointer to some list element,
           | which you can use to endow the recomputed addresses with the
           | right provenance component.
        
             | CJefferson wrote:
             | If each element of the linked list has a different
             | provenance, then I think you are out of luck. I could
             | allocate the whole list in a single giant block, but that
             | feels like it defeats the purpose -- now I'm just
             | allocating every object in my program in one giant
             | allocation.
        
               | Gankra wrote:
               | Yeah "it's actually xoring array indices" is the standard
               | solution
               | 
               | Or you can go Deep on memory models and try to apply
               | Ralf's Xor Provenance Hack and claim that, well,
               | provenance is stored in bytes, but it doesn't _have_ to
               | be a pointer-sized range of bytes, so let me have some
               | horrible way to express  "the high half has provenance 1,
               | the low half has provenance 2" and _handwave_ magic
               | problem solved.
               | 
               | This is of course horrible and also not at all a portable
               | notion to CHERI which tracks provenance at the
               | granularity of "aligned pointer-sized region of memory".
               | But hey, if it helps you sleep at night.
        
               | CJefferson wrote:
               | I'm personally excited by CHERI, and feel it is part of
               | (sorry if these words annoy some people) computing
               | "growing up". It's interesting (to me at least) that that
               | is the only trick I can think of (along with general
               | pointer compression tricks, like interpreters storing
               | 63-bit integers when a pointers last bit is set to 1),
               | where CHERI shouldn't be (fairly) trivial to support.
               | 
               | I'm happy to sacrific xoring pointers, the same way I
               | wouldn't use some "cunning trick" to build my house with
               | half as many nails, at the risk that any minor mistake
               | installing any of those nails would lead to my house
               | failing over.
        
               | markjdb wrote:
               | CHERI does permit tricks like storing flags in the low
               | bits of a pointer, at least to some extent. Quite a lot
               | of low level C code (including some in the CheriBSD
               | kernel) needs that to work.
        
         | GolDDranks wrote:
         | A nice think about this work is that adopting this strict model
         | in real life codebases is going to bring to light cases where
         | it isn't enough, and hopefully more scrutiny can be directed
         | towards those cases.
        
         | IainIreland wrote:
         | A possible non-Rust example: SpiderMonkey uses a trick we call
         | NaN-boxing to represent JS values, which takes advantage of the
         | fact that doubles have 2^52 different bit patterns for NaN to
         | store 48-bit pointers with 4 bits leftover for a type tag.
         | Provenance for the ptr->int->double->int->ptr loop might still
         | be manageable using a "base of the GC heap" pointer and
         | with_addr(), but there are also cases where built-in objects
         | store arbitrary non-GC pointers in private slots (for example,
         | a Map object has a raw pointer to its heap-allocated backing
         | array). In that case, there isn't an obvious way to reestablish
         | provenance (although we could hack it for e.g. Cheri by adding
         | a layer of indirection through a GC allocation that stores the
         | real pointer).
        
           | Gankra wrote:
           | Iain and I previously discussed this a bit, so just to add
           | context: WebKit is already ported to CHERI BSD, so there are
           | definitely solutions to the kinds of hax that JIT VMs like to
           | do. The details just become very platform specific (specific
           | instructions/operations that are supported like pointer
           | shifting or high-bit tagging), so it's hard to
           | discuss/describe abstractly without a CHERI expert or ISA
           | manual next to you.
        
       | tooltower wrote:
       | I found this article hard to read. I was trying to understand
       | what the current state of affairs is, and how these new
       | abstractions improve the matter. E.g. the consistency properties,
       | the weakenings, and when they help.
       | 
       | Instead it read more like an exasperated rant, that links to rust
       | toolchain improvements. It does have an account of how to use the
       | new features, which was good. But I couldn't glean when and how
       | that would help me as a user
        
         | kmeisthax wrote:
         | There's a prior post that this one is building off of, but
         | basically:
         | 
         | 1. Pointers are not addresses; they are pairs of (allocation,
         | address). This is known as pointers having _provenance_.
         | 
         | 2. Treating a pointer as an int-sized address is merely a
         | hardware optimization; and there is already hardware that
         | deliberately does not do this (e.g. CHERI and _maybe_ ARM PAC).
         | 
         | 3. The compiler needs to know the allocation part of the
         | pointer (the one that gets lost at runtime) in order to
         | determine if a pointer write will change a local variable. If
         | this association is lost then you get miscompiles.
         | 
         | 4. Converting an integer back into a pointer (e.g. usize as
         | ptr) does not establish pointer provenance, will break CHERI,
         | and will miscompile on other architectures.
         | 
         | 5. Rust made the mistake of allowing #4 in unsafe code. This is
         | entirely unsound.
         | 
         | 6. The proposed strict-provenance APIs allows doing something
         | like #4, but sound, by letting the user stitch an address onto
         | a pointer with a compatible allocation. This re-establishes the
         | chain of provenance and avoids the miscompile.
        
           | zozbot234 wrote:
           | Looks like the root mistake here is conflating `usize` (the
           | integer type giving the maximum size of a data object, and
           | the type of offsets into an allocation) and `uptr` (the size
           | of a pointer's integer representation). In a segmented memory
           | architecture, or one using 'object-oriented memory', the two
           | need _not_ be the same. You need pretty much the same sort of
           | tweaks to be able to effectively target systems like the old
           | 8086 /80286, the 65816 (of Apple II GS and SNES fame) or the
           | Intel iAPX 432.
        
           | Gankra wrote:
           | Rust is no more unsound than C/C++/Swift here. It _basically_
           | works but that 's a really frustrating answer for anyone who
           | cares about validation/sanitizers/optimization/docs.
           | 
           | It's just better if most code doesn't poke the dragon (llvm)
           | and has _trivially correct_ semantics instead of  "yeah this
           | has to work but um, don't ask me how or why".
        
       | jcranmer wrote:
       | Programming language semantics turn out to be far, far thornier
       | than most people (even people who work on language committees!)
       | give them credit for. And the memory model turns out to be the
       | absolute worst part of it all: the hardware plays lots of fun
       | games on you, the OS plays lots of fun games on you, and the
       | compiler itself is playing lots of fun games on you.
       | 
       | Multithreaded--really, multiprocessor--memory models are really,
       | really confusing. Hardware memory models are generally described
       | in terms of what reorderings are possible (e.g., loads may be
       | ordered before subsequent stores), which is easy to understand
       | from a hardware perspective but less easy to deal with from a
       | language perspective. There turns out to be a wonderful theorem,
       | though, that lets you hide all of the hardware ugliness: if you
       | properly synchronize your code, then you can't _see_ any of the
       | reordering, and the compiler merely has to guarantee that the
       | synchronization is properly handled. This is the data-race-free
       | programming model, and it is the primary basis for all modern
       | memory models. Stray away from this model (e.g., relaxed atomics)
       | and you reenter the world of pain, for which no satisfactory
       | solution has yet been found.
       | 
       | Discovering the pains of pointer provenance is even more recent,
       | arguably only about a decade old. For a long time, pointer
       | provenance has been handwaved on a vague data-dependency basis
       | (you see this in LLVM's LangRef, for example). Unfortunately, it
       | turns out that data dependencies are not generally preserved by
       | compilers (this also doomed the release/consume aspect of the
       | C/C++ memory model). An escaping/lifetime-based pointer model is
       | closer to the actual one used by compilers, but there remains
       | tricky things about things like memcpy [1].
       | 
       | It turns out there are two things that make pointer provenance
       | painful: integer-to-pointer (where does the provenance come
       | from?), and memory (since this usually ends up creating implicit
       | integer-to-pointer casts in most putative models that you really
       | want to avoid). The lesson we know from pointer provenance is
       | that _pointers are not integers, and cannot be losslessly
       | roundtripped via integers_ , and Rust being newer and with a
       | heavier emphasis on safety gives it a better chance of adopting a
       | model here that more forcefully breaks this mistaken link.
       | 
       | [1] One consequence is that most formal pointer provenance models
       | actually don't let you legally write memcpy in C code.
        
         | fbkr wrote:
         | > Stray away from this model (e.g., relaxed atomics)
         | 
         | Relaxed atomics are still data-race free, did you mean non-
         | atomic accesses?
        
           | jcranmer wrote:
           | I should clarify slightly. There are multiple definitions of
           | "data race" in play. In the sense of the C/C++ standard,
           | where a data race is essentially used to describe behaviors
           | that are undefined, relaxed atomics are free of data races.
           | However, from the perspective of a data-race-free model where
           | data races are things that observably violate sequential
           | consistency, relaxed atomics are absolutely data races.
           | 
           | In fact, the raison d'etre for relaxed atomics is to permit
           | applications to create "benign data races" [in the second
           | sense] that aren't undefined behavior. As it turns out,
           | though, actually specifying what semantics such a "benign
           | data race" has is complicated, especially when you get into
           | the realm of avoiding things like out-of-thin-air behavior
           | (or, to reference a paper that crossed my desk a month ago,
           | undefined behavior executed only if out-of-thin-air happens).
        
         | bigcat123 wrote:
         | Hmm, I am getting a sense that modern engineers shows a sign of
         | losing touch to computer science theoretical foundation.
         | 
         | It's not funny game. It's part of abstraction hierarchy and
         | physical computing model vs. programming language computing
         | model.
         | 
         | It's only funny because one would presume that there ever was a
         | time the hardware was not playing "funny" game.
         | 
         | And no, it always has a giant abstraction gap between
         | hierarchy, and that's a fundamental part of modern electronic
         | computing. One shall allow oneself to learn those and get used
         | to how things are actually working.
        
       ___________________________________________________________________
       (page generated 2022-04-06 23:01 UTC)