[HN Gopher] Custom Processing Unit: Hook, patch, and trace micro...
       ___________________________________________________________________
        
       Custom Processing Unit: Hook, patch, and trace microcode at the
       software level
        
       Author : todsacerdoti
       Score  : 70 points
       Date   : 2022-08-12 07:43 UTC (15 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | sebow wrote:
       | I assume this does not support AMD? Or even a broader & better
       | question: does AMD even have such undocumented instructions that
       | allow the 'takeover' of the CPU?
        
         | adrian_b wrote:
         | This, like also some reverse engineering of the CPU microcode
         | that has been discussed previously on HN, is supported only on
         | certain Intel Atom CPUs with Goldmont cores, i.e. Apollo Lake,
         | Gemini Lake, maybe also Denverton.
         | 
         | The reason is that only for these cores there is a known
         | exploit that can switch the CPU of any computer with this kind
         | of core into a debugging mode that gives access to the
         | microcode.
         | 
         | Similar debugging modes certainly exist on all Intel and AMD
         | CPUs, but there is no public knowledge about how they can be
         | activated and no known bugs that enable that. Moreover, this
         | debugging mode is normally disabled on the PCB in production
         | motherboards. So even for someone who would know how to do it
         | for other kinds of Intel cores or for AMD, it is expected that
         | physical access to the motherboard is needed, to make
         | modifications on the PCB.
         | 
         | While this new information about the internals of the Intel
         | Goldmont cores is very interesting, it does not increase the
         | chances of any similar hacking of the other more modern Intel
         | cores.
         | 
         | It is likely that Intel will be more careful in the future to
         | avoid such a reverse engineering of the microcode and of the
         | debugging mode, even if that does not really matter much,
         | either for security or for competitors.
        
         | cleemens wrote:
         | There has been some older work on reversing AMD K8 and K10 that
         | lets you do something vaguely similar. This work is just way
         | more generalized, with more options available in the Intel way
         | of things.
         | https://www.usenix.org/conference/usenixsecurity17/technical...
         | 
         | Seems like you can't really trace the AMD CPUs like the
         | reversing of that Intel Microcode allows, but you can still do
         | a lot.
        
       | TheDesolate0 wrote:
        
       | actionfromafar wrote:
       | I sometimes wonder if not these CPUs are just waiting to have
       | their power unleashed. Almost like FPGAs, they could be
       | reprogrammed to have custom instructions by someone skilled.
        
         | Tuna-Fish wrote:
         | This wouldn't actually help in common workloads, because the
         | instructions that already exist are typically better than what
         | you can achieve with microcode.
         | 
         | The reason for the RISC revolution was the understanding that
         | there are fairly strict limits of what you can achieve in a
         | single instruction if you want your cpu to be fast. An
         | instruction that does some complex task that you need done is
         | not really going to be any faster than composing the same
         | action out of RISC instructions because, thanks to pipelining,
         | the limiting factor on the speed of your cpu is going to be the
         | actual work anyway.
         | 
         | Microcode on modern x86 cpus is typically slower (takes longer
         | to output uops, outputs less per clock than the uop cache) than
         | simple instructions, and only exists for legacy and doing some
         | complex operations (like context switches) where being
         | uninterruptible and having a bit of extra scratch space is
         | useful.
        
           | actionfromafar wrote:
           | It makes sense what you say, but I still have doubts. Are you
           | saying that not all x86 instructions are implemented in
           | microcode? And wouldn't shader language be a perfect target
           | for custom microcode, etc.
        
             | mhh__ wrote:
             | The term microcode has become muddled in that you have
             | microcode for implementing highly complex instructions
             | _and_ "microcode" as-in an internal target that X86
             | instructions are translated into.
        
               | rep_lodsb wrote:
               | The internal instruction format is commonly called micro-
               | ops / uops. But easy to confuse those terms.
        
             | cesarb wrote:
             | > Are you saying that not all x86 instructions are
             | implemented in microcode?
             | 
             | If you look at discussions about the instruction decoder of
             | Intel processors, you often see it mentioned that they have
             | something like three "simple" decoders and one "complex"
             | decoder. The main difference is that the "complex" decoder
             | is the only one which can decode instructions implemented
             | in microcode (dispatching micro-operations from the
             | microcode ROM), while the "simple" decoders can only output
             | micro-operations directly (using something similar to hard-
             | wired pattern matching on the instruction). So yes, many
             | x86 instructions are not implemented in microcode.
        
         | mhh__ wrote:
         | These already exist in niche markets, they've stayed niche for
         | a reason
        
       | marcodiego wrote:
       | Does this help us getting rid of IME?
        
         | pedro2 wrote:
         | You meant ME? Maybe, they do refer the use of HAP mode on the
         | linked article (Goldmont's Red Unlock) BUT I think the ME
         | platform is required for useful, needed things along the
         | unneeded ones, so I think in the end, you always need a subset
         | of Intel ME working.
        
         | mhh__ wrote:
         | You want to inspect the management engine not replace it.
         | 
         | The management engine does do useful stuff the issue is that we
         | don't know what else it does too.
        
       | SaulJLH wrote:
       | Can someone please explain to me like I'm a 5yo; Why, if at all,
       | this is significant? It "seems" significant, but no real idea,
       | TBH.
        
         | pedro2 wrote:
         | NOTE: I just skimmed very quickly the article and the linked
         | article referring JTAG.
         | 
         | I believe it is providing tools which allow running the Intel
         | CPU in a way equivalent to running a regular program on GDB. It
         | may be possible to use a JTAG but not sure if it is a
         | requirement.
        
       | Zigurd wrote:
       | One of my first paid gigs was to write a debugger for a RIP
       | implemented in 2900 bitslice. It was a specialized SIMD
       | architecture. The debugger was written in C and ran on the
       | bootloader processor, which I recall being a 68k. Having taken an
       | architecture course as an undergrad I sort of knew what I was
       | doing, and microcode is inherently simple compared to, for
       | example, 64-bit x86. I recall thinking "this is what a really
       | wide PDP-11 would be like." The project was also interesting in
       | that my customer was one person: The guy who designed the
       | processor and who was writing the microcode.
        
       ___________________________________________________________________
       (page generated 2022-08-12 23:01 UTC)