[HN Gopher] Using uninitialized memory for fun and profit (2008)
       ___________________________________________________________________
        
       Using uninitialized memory for fun and profit (2008)
        
       Author : AminZamani
       Score  : 21 points
       Date   : 2025-07-20 08:11 UTC (3 days ago)
        
 (HTM) web link (research.swtch.com)
 (TXT) w3m dump (research.swtch.com)
        
       | jojomodding wrote:
       | Interestingly enough, Rust does not allow you to access undefined
       | memory, not even if you do not care about the value stored there.
       | People have been proposing a `freeze` operation that replaces
       | uninitialized memory with garbage but initialized data (i.e. a
       | no-op in assembly).
       | 
       | But there is tension about this: Not allowing access to
       | uninitialized memory, ever, means that you get more guarantees
       | about what foreign (safe) Rust can do, for instance.
        
         | thrance wrote:
         | True of _safe_ rust only. You can always fall back to unsafe
         | rust, allocate a chunk of bytes and write /read to it as you
         | wish.
        
           | stouset wrote:
           | Even in unsafe Rust, this is undefined behavior.
        
             | LoganDark wrote:
             | You're _allowed_ to trigger as much undefined behavior as
             | you wish. It makes the program meaningless of course, but
             | it 's not like it stops you.
        
         | kmeisthax wrote:
         | My impression was that there was some kind of optimization in
         | LLVM that relied on being able to assume values were never
         | undef[0], which is why undefined memory access was always
         | illegal in Rust[1].
         | 
         | Putting that aside, a deliberate "read uninitialized memory
         | with bounded UB" primitive like freeze would only work for
         | types where all possible bit patterns are valid. So no freezing
         | chars[2], references, or sum types. And any transparent wrapper
         | type that has invariants - like, say, slices, vecs, strs,
         | and/or range-restricted integer types - would see them utterly
         | broken when frozen. I suppose you could define some operation
         | to "validate" the underlying bit pattern, but I'm not sure if
         | that would defeat the point of reading uninitialized memory.
         | 
         | [0] LLVM concept that represents uninitialized memory, among
         | other things.
         | 
         | [1] I believe a few other unsafe Rust concepts are actually
         | leaky abstractions around LLVM things
         | 
         | [2] Rust's char must hold valid UTF-8 and _will_ UB if you
         | stick surrogates in there
        
           | NobodyNada wrote:
           | > there was some kind of optimization in LLVM that relied on
           | being able to assume values were never undef
           | 
           | It's true that LLVM has restrictions on what you can do with
           | undef/poison memory, but LLVM also supports the "freeze"
           | operation that comes up in the Rust discussions (which
           | transforms an undefined value into an arbitrary, well-defined
           | value). It would certainly need to be unsafe to avoid
           | violating invariants like you mentioned, but "LLVM" isn't the
           | blocker to supporting this.
           | 
           | Rather, there are more subtle problems with reading from
           | initialized memory -- for example on Linux, a heap allocator
           | might use MADV_FREE on free memory, which hints to the kernel
           | that a page contains freed memory and the operating system is
           | not required to preserve its contents until the application
           | writes to it again. This means the following sequence of
           | events is possible:
           | 
           | - An application frees some memory, and the heap allocator
           | invokes madvise(MADV_FREE) on the address range.
           | 
           | - The application makes a heap allocation, obtaining a
           | pointer to the free'd memory.
           | 
           | - The application freezes the uninitialized memory and reads
           | from it.
           | 
           | - Due to memory pressure, the kernel decides to reclaim the
           | free'd memory. It unmaps it from the process and uses it
           | somewhere else.
           | 
           | - The application accesses the first allocation again, and
           | sees that its value has now changed to all-zeroes.
           | 
           | Thus, we can see that "freezing" arbitrary memory can't
           | actually be implemented on real-world systems -- the contents
           | of uninitialized memory really can change out from under you
           | until you write to that memory.
           | 
           | It would be possible to implement a "by-reference freeze"
           | that _copies_ a MaybeUninit <T> to a new location, but
           | introducing this functionality still has the downside that
           | you can write a Heartbleed bug without invoking undefined
           | behavior, which is what makes it controversial.
        
             | dooglius wrote:
             | Seems like the heap allocator has a bug if it doesn't
             | handle invalidating the free hint before it returns it to
             | the application. This does raise the question of why
             | MADV_FREE works on the basis of writes rather than accesses
             | -- there are PTE bits for both cases right, and it would
             | have been just as easy to have any access cancel the free
             | hint? (I am assuming x86 here.)
        
       | Sesse__ wrote:
       | An elegant optimization, but how would you intersect two of these
       | efficiently? It sounds like you'd need to iterate over the entire
       | dense vector and do a sparse-vector check for each and every
       | value (O(m) with a very high constant factor). Either that, or
       | sort both sparse vectors (O(n log n)).
        
         | dooglius wrote:
         | Why would iterating over the dense vector be O(m) rather than
         | O(n)?
        
       | dooglius wrote:
       | One thing worth pointing out is that Linux makes it pretty
       | difficult for use space to access uninitialized memory; the
       | MAP_UNINITIALIZED flag for mmap has to be specifically configured
       | but generally isn't, so the memory does get zeroed at some point.
       | Best you can hope for is that your heap allocator re-uses some
       | un-munmapped memory. The kernel will zero pages on-demand, which
       | helps, but you're still paying a cost for that zeroing.
        
       ___________________________________________________________________
       (page generated 2025-07-23 23:01 UTC)