[HN Gopher] Observing stale instruction fetching on x86 with sel...
___________________________________________________________________
Observing stale instruction fetching on x86 with self-modifying
code
Author : userbinator
Score : 36 points
Date : 2022-12-10 01:36 UTC (1 days ago)
(HTM) web link (stackoverflow.com)
(TXT) w3m dump (stackoverflow.com)
| monocasa wrote:
| At least on the K6 this was implemented in a really neat way.
| Reusing most of the hardware for CPU exceptions, the store
| buffers and the instruction queues communicated and would perform
| most of the work rolling back state, and then instead of jumping
| through the IDT and pushing the first bit of exception state onto
| the stack, it just jumps directly to the next instruction after
| the rolled back state.
| Max-q wrote:
| Didn't self modifying code just die out when cache came around in
| the late eighties and early nineties? We used it a lot on C64 and
| Amiga (until A1200 with 68020 CPU).
| anonymousDan wrote:
| I think the Linux kernel has some limited forms of self
| modifying code?
| loeg wrote:
| Tracepoints are often implemented as self-modifying code (nop
| <-> trap instruction).
| cwzwarich wrote:
| All JITs use self-modifying code, and JITs are more popular now
| than even 15 years ago. Although you might make a distinction
| between modifying code that has already executed once, from the
| perspective of instruction/data consistency on a CPU it doesn't
| really matter.
| miohtama wrote:
| Self-modifying code in a traditional sense has been changing
| constants and instructions in a tight loops that are
| assembled by hand.
|
| I don't think any of JITs take this approach, as when they
| emit machine code it is always a fresh memory allocation and
| fresh compilation. JITs don't "poke and peek" already
| compiled code. But I could be wrong as JITs are getting
| complex today.
| ridiculous_fish wrote:
| JITs do indeed modify already-compiled code. For example,
| JavaScriptCore has a notion of a "patchable jump." These
| are jump instructions intended to be modified later. They
| may initially point to slow paths, and are later patched to
| point to faster paths as type information accumulates.
| https://webkit.org/blog/10298/inline-caching-delete/
| miohtama wrote:
| This looks super interesting, thank you.
| Jensson wrote:
| JIT doesn't modify the behavior though, it typically just
| changes a function pointer and then that function does the
| same thing but more optimized. So then instruction caches
| just has some effect on performance instead of changing
| program behavior.
| cwzwarich wrote:
| JITs also modify the code at the destination of the
| function pointer, which was likely also code at a different
| time (since it usually needs to be allocated from a
| separate heap that is marked executable). If the CPU
| prefetched/cached the instructions at the function pointer
| target prior to the modification, then the program would
| observe incorrect behavior, even though it's not self-
| modifying code in the usual sense.
| layer8 wrote:
| I still used it on a 486 in the mid-nineties in drawing
| routines to switch the drawing mode without having to have a
| dynamic function call in the inner loop (or having to duplicate
| the drawing routine for each mode).
|
| I think it was the Pentium that introduced split
| instruction/data caches on x86.
| retrac wrote:
| 1) Yes, cache, though it was specifically the introduction of
| split instruction/data caches. Modern processors are basically
| Harvard architecture internally, handling instructions and data
| separately to allow them to be processed in parallel. So either
| explicit instructions (RISC approach) or extensive tracking of
| cache entries (CISC approach) must be used to ensure that data
| cache changes are propagated to the instruction cache.
|
| 2) It greatly complicates things in terms of software
| engineering. It's bad practice unless strictly necessary. Code
| that rewrites itself is very hard to debug, trace and just
| generally work with, compared to immutable blocks of read-only
| code. Most high-level languages don't expose anything except
| function pointers in terms of self-modification, so compiled
| output would rarely need SMC. (Though it may be faster to emit
| SMC to implement high-level constructs, in some edge cases.)
|
| The hardware doesn't like it and maintainers, managers, and
| most professors don't like it, either.
| hyperman1 wrote:
| I saw an interesting trick on the i386: Some code intentionally
| updated the very next instruction. If you just ran it, the next
| instruction was already prefetched so the program worked. If you
| single stepped it, the next instruction was in the INT 1 handler,
| so the debugger ran the modified instruction. This way, your
| program knew it was being debugged, and could take
| countermeasured
| userbinator wrote:
| For better or worse, those tricks disappeared with the P6,
| where they decided the prefetch queue length shouldn't affect
| SMC semantics.
___________________________________________________________________
(page generated 2022-12-11 23:01 UTC)