[HN Gopher] Observing stale instruction fetching on x86 with sel...
       ___________________________________________________________________
        
       Observing stale instruction fetching on x86 with self-modifying
       code
        
       Author : userbinator
       Score  : 36 points
       Date   : 2022-12-10 01:36 UTC (1 days ago)
        
 (HTM) web link (stackoverflow.com)
 (TXT) w3m dump (stackoverflow.com)
        
       | monocasa wrote:
       | At least on the K6 this was implemented in a really neat way.
       | Reusing most of the hardware for CPU exceptions, the store
       | buffers and the instruction queues communicated and would perform
       | most of the work rolling back state, and then instead of jumping
       | through the IDT and pushing the first bit of exception state onto
       | the stack, it just jumps directly to the next instruction after
       | the rolled back state.
        
       | Max-q wrote:
       | Didn't self modifying code just die out when cache came around in
       | the late eighties and early nineties? We used it a lot on C64 and
       | Amiga (until A1200 with 68020 CPU).
        
         | anonymousDan wrote:
         | I think the Linux kernel has some limited forms of self
         | modifying code?
        
           | loeg wrote:
           | Tracepoints are often implemented as self-modifying code (nop
           | <-> trap instruction).
        
         | cwzwarich wrote:
         | All JITs use self-modifying code, and JITs are more popular now
         | than even 15 years ago. Although you might make a distinction
         | between modifying code that has already executed once, from the
         | perspective of instruction/data consistency on a CPU it doesn't
         | really matter.
        
           | miohtama wrote:
           | Self-modifying code in a traditional sense has been changing
           | constants and instructions in a tight loops that are
           | assembled by hand.
           | 
           | I don't think any of JITs take this approach, as when they
           | emit machine code it is always a fresh memory allocation and
           | fresh compilation. JITs don't "poke and peek" already
           | compiled code. But I could be wrong as JITs are getting
           | complex today.
        
             | ridiculous_fish wrote:
             | JITs do indeed modify already-compiled code. For example,
             | JavaScriptCore has a notion of a "patchable jump." These
             | are jump instructions intended to be modified later. They
             | may initially point to slow paths, and are later patched to
             | point to faster paths as type information accumulates.
             | https://webkit.org/blog/10298/inline-caching-delete/
        
               | miohtama wrote:
               | This looks super interesting, thank you.
        
           | Jensson wrote:
           | JIT doesn't modify the behavior though, it typically just
           | changes a function pointer and then that function does the
           | same thing but more optimized. So then instruction caches
           | just has some effect on performance instead of changing
           | program behavior.
        
             | cwzwarich wrote:
             | JITs also modify the code at the destination of the
             | function pointer, which was likely also code at a different
             | time (since it usually needs to be allocated from a
             | separate heap that is marked executable). If the CPU
             | prefetched/cached the instructions at the function pointer
             | target prior to the modification, then the program would
             | observe incorrect behavior, even though it's not self-
             | modifying code in the usual sense.
        
         | layer8 wrote:
         | I still used it on a 486 in the mid-nineties in drawing
         | routines to switch the drawing mode without having to have a
         | dynamic function call in the inner loop (or having to duplicate
         | the drawing routine for each mode).
         | 
         | I think it was the Pentium that introduced split
         | instruction/data caches on x86.
        
         | retrac wrote:
         | 1) Yes, cache, though it was specifically the introduction of
         | split instruction/data caches. Modern processors are basically
         | Harvard architecture internally, handling instructions and data
         | separately to allow them to be processed in parallel. So either
         | explicit instructions (RISC approach) or extensive tracking of
         | cache entries (CISC approach) must be used to ensure that data
         | cache changes are propagated to the instruction cache.
         | 
         | 2) It greatly complicates things in terms of software
         | engineering. It's bad practice unless strictly necessary. Code
         | that rewrites itself is very hard to debug, trace and just
         | generally work with, compared to immutable blocks of read-only
         | code. Most high-level languages don't expose anything except
         | function pointers in terms of self-modification, so compiled
         | output would rarely need SMC. (Though it may be faster to emit
         | SMC to implement high-level constructs, in some edge cases.)
         | 
         | The hardware doesn't like it and maintainers, managers, and
         | most professors don't like it, either.
        
       | hyperman1 wrote:
       | I saw an interesting trick on the i386: Some code intentionally
       | updated the very next instruction. If you just ran it, the next
       | instruction was already prefetched so the program worked. If you
       | single stepped it, the next instruction was in the INT 1 handler,
       | so the debugger ran the modified instruction. This way, your
       | program knew it was being debugged, and could take
       | countermeasured
        
         | userbinator wrote:
         | For better or worse, those tricks disappeared with the P6,
         | where they decided the prefetch queue length shouldn't affect
         | SMC semantics.
        
       ___________________________________________________________________
       (page generated 2022-12-11 23:01 UTC)