[HN Gopher] Indirect branch tracking in Linux for Intel CPUs
       ___________________________________________________________________
        
       Indirect branch tracking in Linux for Intel CPUs
        
       Author : chmaynard
       Score  : 25 points
       Date   : 2022-03-31 15:07 UTC (7 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | KMag wrote:
       | It's great that Intel is adding a mode where indirect branches
       | must land on specially encoded no-ops. Ideally, they'd also
       | provide special encodings for "push rbp" and "sub rsp, x" (for
       | the -fomit-frame-pointer case) to cover the two most common first
       | instructions in function prologues that could also be indirect
       | branch targets.
       | 
       | Edit: actually, it would be best if indirect branches had to land
       | on "special-nop", "push rbp" or "sub rsp, x" and future compilers
       | needed to emit the new encodings for "special-push rbp" and
       | "special-sub rsp, x" in the very few places they aren't used in
       | function prologues. This would be better from a code density
       | perspective, and might even allow one to turn the feature on
       | without recompiling code and not have many applications break. I
       | think backward compatibility would mostly depend on details of
       | common jump table implementations (mostly C/C++ switch statements
       | in cases where the density is such that compilers don't emit
       | trees of conditional branches). I'm pretty sure the few users of
       | GCC's "computed goto" (most notably CPython) would still need
       | special no-ops at the start of each label.
       | 
       | Edit 2: is anyone aware of any common use of return where the
       | target of the return isn't immediately after a call instruction?
       | Return-oriented-programming for exploits is the main use case I'm
       | aware of for returning to an address that isn't immediately after
       | a call instruction. Unfortunately, checking this criterion in
       | cases where the previous instruction is on another cache line
       | would often cause extra cache misses.
        
         | saagarjha wrote:
         | Some threaded interpreters do this, as it lets you use the
         | return address branch predictor and continue control flow with
         | just a ret.
        
           | KMag wrote:
           | Interesting! My understanding of efficient direct-threaded
           | interpreters is basically limited to Jones Forth. If I
           | understand you correctly, these interpreters store their
           | lists of subroutines in reverse order, set the stack pointer
           | to the end of the list, and then everywhere jonesforth.s uses
           | "lodsl\njmp *(%eax)" to iterate forward over a list of
           | function pointers pointed to by %edi, these interpreters use
           | "ret" to iterate backward over a list of function pointers
           | pointed to by %esp. Is that correct?
           | 
           | I take it these interpreters can't register any signal
           | handlers, as at a minimum, the kernel would write a return
           | address above the stack's "red zone" before invoking the
           | signal handler, which would corrupt the code. Is that right?
        
       | saagarjha wrote:
       | > As Peter Zijlstra pointed out, there is another, perhaps
       | surprising advantage to removing the unneeded endbr instructions.
       | The kernel limits the functions that are available to loadable
       | modules, and proprietary modules are limited even further. It is
       | a common technique for proprietary modules to look up the non-
       | exported functions they need in the kernel's symbol table, then
       | call them via an indirect branch, thus bypassing the kernel's
       | limitations. But, with IBT enabled, any function lacking an endbr
       | instruction will no longer be callable in this way.
       | 
       | This is an amusing consequence but of course code that is
       | deliberately trying to violate the GPL an running in the kernel
       | address space is usually more than willing to work around such
       | trivialities, e.g. by patching the kernel's text at runtime to
       | remove the instructions. But maybe now we can call them out for
       | reducing kernel security in addition to being jerks in general :)
        
         | nybble41 wrote:
         | > e.g. by patching the kernel's text at runtime to remove the
         | instructions
         | 
         | Or simply by marking the module as GPL even when it isn't.
         | There is plenty of precedent for stronger measures (like
         | embedding a copyrighted & trademarked logo image in a console
         | ROM) being permitted when doing so was made necessary for
         | interoperability.
         | 
         | There are decent security-related reasons to make this change,
         | but the prospect of complicating access to select kernel
         | symbols under some questionable, idiosyncratic interpretation
         | of copyright where calling an external library function somehow
         | makes an independently-developed module _containing no kernel
         | code_ a derivative work of the kernel is not one of them.
        
       | sylware wrote:
       | selling recent hardware with those bugs still there???
        
         | NobodyNada wrote:
         | Indirect branch tracking protects against more than just
         | Spectre. Its primary motivation is protecting control-flow
         | integrity; i.e. mitigating vulnerabilities that involve
         | corrupting a function pointer to get arbitrary code execution.
         | 
         | CFI techniques have been around for a long time [0], though
         | they have usually been implemented in software rather than
         | hardware. This new Intel feature is a hardware implementation
         | of these existing techniques, in order to reduce the
         | performance impact.
         | 
         | [0]: https://en.wikipedia.org/wiki/Control-flow_integrity
        
       | phkahler wrote:
       | This seems like protection that would not be needed if everything
       | were written in Rust - or maybe even some other languages that
       | aren't C.
        
         | saagarjha wrote:
         | Correct, but given that everything is not written in Rust and
         | will not be anytime soon perhaps you can see the benefits of
         | having this kind of CFI around.
        
       | miohtama wrote:
       | I am happy to see new hardware mitigations against exploits
       | coming up. However all these more and more complex mitigations
       | techniques make me wonder how much they are affecting CPU cost
       | (design) and then run-time cost as the slower code execution.
        
         | eklitzke wrote:
         | Some of these security/safety techniques are intended to be
         | primarily used in debug kernels precisely because the overhead
         | is too high. For example, the most recent kernel 5.17 added a
         | feature called page table check which adds extra consistency
         | checking to operations that modify the page table. The overhead
         | is high enough that probably no one is going to run with page
         | table checks on desktop kernels or in production, but having
         | the feature there is useful for developers who want to run
         | additional checks against their changes. Think of it like asan:
         | no one really runs asan binaries in production, but being able
         | to build asan binaries for testing changes is still very
         | useful.
         | 
         | I'm not sure the exact overhead for IBT so I'm not sure whether
         | this is the case for this feature. But in general not all of
         | these safety features are intended to be built into production
         | kernels.
        
           | aseipp wrote:
           | Almost every CFI implementation I've seen is definitely
           | intended for production use cases. There is work to combine
           | the coarse-grained CFI aspects of Intel's IBT implementation
           | with the more fine-grained CFI implementation available in
           | systems like PaX/grsecurity, which is based on the assumption
           | that the ABI prototype matches what was detected at compile
           | time.
           | 
           | The synthetic benchmarks for Intel's Coarse IBT alone is
           | somewhere in the 1% range for the small number of benchmarks
           | they have here[1] (ok, let's be honest, I don't expect x264
           | to contain many indirect branches in the fast path of SPEC,
           | but whatever.) I believe Intel originally aimed for something
           | originally in the 1-5% range but I can't find a reference.
           | 
           | If that's it, then it seems ok enough if you're paranoid, but
           | if you're paranoid the fine-grained approach improves things
           | quite a bit. This all still seems to be a ways off for
           | everyone but I guess the kernel 5.18 support is a start.
           | 
           | [1] https://www.spinics.net/lists/kernel-
           | hardening/msg05347.html
        
       | wnoise wrote:
       | Wow, actual use of come from instructions.
        
       ___________________________________________________________________
       (page generated 2022-03-31 23:02 UTC)