[HN Gopher] Using uninitialized memory for fun and profit (2008)
___________________________________________________________________
Using uninitialized memory for fun and profit (2008)
Author : AminZamani
Score : 21 points
Date : 2025-07-20 08:11 UTC (3 days ago)
(HTM) web link (research.swtch.com)
(TXT) w3m dump (research.swtch.com)
| jojomodding wrote:
| Interestingly enough, Rust does not allow you to access undefined
| memory, not even if you do not care about the value stored there.
| People have been proposing a `freeze` operation that replaces
| uninitialized memory with garbage but initialized data (i.e. a
| no-op in assembly).
|
| But there is tension about this: Not allowing access to
| uninitialized memory, ever, means that you get more guarantees
| about what foreign (safe) Rust can do, for instance.
| thrance wrote:
| True of _safe_ rust only. You can always fall back to unsafe
| rust, allocate a chunk of bytes and write /read to it as you
| wish.
| stouset wrote:
| Even in unsafe Rust, this is undefined behavior.
| LoganDark wrote:
| You're _allowed_ to trigger as much undefined behavior as
| you wish. It makes the program meaningless of course, but
| it 's not like it stops you.
| kmeisthax wrote:
| My impression was that there was some kind of optimization in
| LLVM that relied on being able to assume values were never
| undef[0], which is why undefined memory access was always
| illegal in Rust[1].
|
| Putting that aside, a deliberate "read uninitialized memory
| with bounded UB" primitive like freeze would only work for
| types where all possible bit patterns are valid. So no freezing
| chars[2], references, or sum types. And any transparent wrapper
| type that has invariants - like, say, slices, vecs, strs,
| and/or range-restricted integer types - would see them utterly
| broken when frozen. I suppose you could define some operation
| to "validate" the underlying bit pattern, but I'm not sure if
| that would defeat the point of reading uninitialized memory.
|
| [0] LLVM concept that represents uninitialized memory, among
| other things.
|
| [1] I believe a few other unsafe Rust concepts are actually
| leaky abstractions around LLVM things
|
| [2] Rust's char must hold valid UTF-8 and _will_ UB if you
| stick surrogates in there
| NobodyNada wrote:
| > there was some kind of optimization in LLVM that relied on
| being able to assume values were never undef
|
| It's true that LLVM has restrictions on what you can do with
| undef/poison memory, but LLVM also supports the "freeze"
| operation that comes up in the Rust discussions (which
| transforms an undefined value into an arbitrary, well-defined
| value). It would certainly need to be unsafe to avoid
| violating invariants like you mentioned, but "LLVM" isn't the
| blocker to supporting this.
|
| Rather, there are more subtle problems with reading from
| initialized memory -- for example on Linux, a heap allocator
| might use MADV_FREE on free memory, which hints to the kernel
| that a page contains freed memory and the operating system is
| not required to preserve its contents until the application
| writes to it again. This means the following sequence of
| events is possible:
|
| - An application frees some memory, and the heap allocator
| invokes madvise(MADV_FREE) on the address range.
|
| - The application makes a heap allocation, obtaining a
| pointer to the free'd memory.
|
| - The application freezes the uninitialized memory and reads
| from it.
|
| - Due to memory pressure, the kernel decides to reclaim the
| free'd memory. It unmaps it from the process and uses it
| somewhere else.
|
| - The application accesses the first allocation again, and
| sees that its value has now changed to all-zeroes.
|
| Thus, we can see that "freezing" arbitrary memory can't
| actually be implemented on real-world systems -- the contents
| of uninitialized memory really can change out from under you
| until you write to that memory.
|
| It would be possible to implement a "by-reference freeze"
| that _copies_ a MaybeUninit <T> to a new location, but
| introducing this functionality still has the downside that
| you can write a Heartbleed bug without invoking undefined
| behavior, which is what makes it controversial.
| dooglius wrote:
| Seems like the heap allocator has a bug if it doesn't
| handle invalidating the free hint before it returns it to
| the application. This does raise the question of why
| MADV_FREE works on the basis of writes rather than accesses
| -- there are PTE bits for both cases right, and it would
| have been just as easy to have any access cancel the free
| hint? (I am assuming x86 here.)
| Sesse__ wrote:
| An elegant optimization, but how would you intersect two of these
| efficiently? It sounds like you'd need to iterate over the entire
| dense vector and do a sparse-vector check for each and every
| value (O(m) with a very high constant factor). Either that, or
| sort both sparse vectors (O(n log n)).
| dooglius wrote:
| Why would iterating over the dense vector be O(m) rather than
| O(n)?
| dooglius wrote:
| One thing worth pointing out is that Linux makes it pretty
| difficult for use space to access uninitialized memory; the
| MAP_UNINITIALIZED flag for mmap has to be specifically configured
| but generally isn't, so the memory does get zeroed at some point.
| Best you can hope for is that your heap allocator re-uses some
| un-munmapped memory. The kernel will zero pages on-demand, which
| helps, but you're still paying a cost for that zeroing.
___________________________________________________________________
(page generated 2025-07-23 23:01 UTC)