[HN Gopher] Indirect branch tracking in Linux for Intel CPUs
___________________________________________________________________
Indirect branch tracking in Linux for Intel CPUs
Author : chmaynard
Score : 25 points
Date : 2022-03-31 15:07 UTC (7 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| KMag wrote:
| It's great that Intel is adding a mode where indirect branches
| must land on specially encoded no-ops. Ideally, they'd also
| provide special encodings for "push rbp" and "sub rsp, x" (for
| the -fomit-frame-pointer case) to cover the two most common first
| instructions in function prologues that could also be indirect
| branch targets.
|
| Edit: actually, it would be best if indirect branches had to land
| on "special-nop", "push rbp" or "sub rsp, x" and future compilers
| needed to emit the new encodings for "special-push rbp" and
| "special-sub rsp, x" in the very few places they aren't used in
| function prologues. This would be better from a code density
| perspective, and might even allow one to turn the feature on
| without recompiling code and not have many applications break. I
| think backward compatibility would mostly depend on details of
| common jump table implementations (mostly C/C++ switch statements
| in cases where the density is such that compilers don't emit
| trees of conditional branches). I'm pretty sure the few users of
| GCC's "computed goto" (most notably CPython) would still need
| special no-ops at the start of each label.
|
| Edit 2: is anyone aware of any common use of return where the
| target of the return isn't immediately after a call instruction?
| Return-oriented-programming for exploits is the main use case I'm
| aware of for returning to an address that isn't immediately after
| a call instruction. Unfortunately, checking this criterion in
| cases where the previous instruction is on another cache line
| would often cause extra cache misses.
| saagarjha wrote:
| Some threaded interpreters do this, as it lets you use the
| return address branch predictor and continue control flow with
| just a ret.
| KMag wrote:
| Interesting! My understanding of efficient direct-threaded
| interpreters is basically limited to Jones Forth. If I
| understand you correctly, these interpreters store their
| lists of subroutines in reverse order, set the stack pointer
| to the end of the list, and then everywhere jonesforth.s uses
| "lodsl\njmp *(%eax)" to iterate forward over a list of
| function pointers pointed to by %edi, these interpreters use
| "ret" to iterate backward over a list of function pointers
| pointed to by %esp. Is that correct?
|
| I take it these interpreters can't register any signal
| handlers, as at a minimum, the kernel would write a return
| address above the stack's "red zone" before invoking the
| signal handler, which would corrupt the code. Is that right?
| saagarjha wrote:
| > As Peter Zijlstra pointed out, there is another, perhaps
| surprising advantage to removing the unneeded endbr instructions.
| The kernel limits the functions that are available to loadable
| modules, and proprietary modules are limited even further. It is
| a common technique for proprietary modules to look up the non-
| exported functions they need in the kernel's symbol table, then
| call them via an indirect branch, thus bypassing the kernel's
| limitations. But, with IBT enabled, any function lacking an endbr
| instruction will no longer be callable in this way.
|
| This is an amusing consequence but of course code that is
| deliberately trying to violate the GPL an running in the kernel
| address space is usually more than willing to work around such
| trivialities, e.g. by patching the kernel's text at runtime to
| remove the instructions. But maybe now we can call them out for
| reducing kernel security in addition to being jerks in general :)
| nybble41 wrote:
| > e.g. by patching the kernel's text at runtime to remove the
| instructions
|
| Or simply by marking the module as GPL even when it isn't.
| There is plenty of precedent for stronger measures (like
| embedding a copyrighted & trademarked logo image in a console
| ROM) being permitted when doing so was made necessary for
| interoperability.
|
| There are decent security-related reasons to make this change,
| but the prospect of complicating access to select kernel
| symbols under some questionable, idiosyncratic interpretation
| of copyright where calling an external library function somehow
| makes an independently-developed module _containing no kernel
| code_ a derivative work of the kernel is not one of them.
| sylware wrote:
| selling recent hardware with those bugs still there???
| NobodyNada wrote:
| Indirect branch tracking protects against more than just
| Spectre. Its primary motivation is protecting control-flow
| integrity; i.e. mitigating vulnerabilities that involve
| corrupting a function pointer to get arbitrary code execution.
|
| CFI techniques have been around for a long time [0], though
| they have usually been implemented in software rather than
| hardware. This new Intel feature is a hardware implementation
| of these existing techniques, in order to reduce the
| performance impact.
|
| [0]: https://en.wikipedia.org/wiki/Control-flow_integrity
| phkahler wrote:
| This seems like protection that would not be needed if everything
| were written in Rust - or maybe even some other languages that
| aren't C.
| saagarjha wrote:
| Correct, but given that everything is not written in Rust and
| will not be anytime soon perhaps you can see the benefits of
| having this kind of CFI around.
| miohtama wrote:
| I am happy to see new hardware mitigations against exploits
| coming up. However all these more and more complex mitigations
| techniques make me wonder how much they are affecting CPU cost
| (design) and then run-time cost as the slower code execution.
| eklitzke wrote:
| Some of these security/safety techniques are intended to be
| primarily used in debug kernels precisely because the overhead
| is too high. For example, the most recent kernel 5.17 added a
| feature called page table check which adds extra consistency
| checking to operations that modify the page table. The overhead
| is high enough that probably no one is going to run with page
| table checks on desktop kernels or in production, but having
| the feature there is useful for developers who want to run
| additional checks against their changes. Think of it like asan:
| no one really runs asan binaries in production, but being able
| to build asan binaries for testing changes is still very
| useful.
|
| I'm not sure the exact overhead for IBT so I'm not sure whether
| this is the case for this feature. But in general not all of
| these safety features are intended to be built into production
| kernels.
| aseipp wrote:
| Almost every CFI implementation I've seen is definitely
| intended for production use cases. There is work to combine
| the coarse-grained CFI aspects of Intel's IBT implementation
| with the more fine-grained CFI implementation available in
| systems like PaX/grsecurity, which is based on the assumption
| that the ABI prototype matches what was detected at compile
| time.
|
| The synthetic benchmarks for Intel's Coarse IBT alone is
| somewhere in the 1% range for the small number of benchmarks
| they have here[1] (ok, let's be honest, I don't expect x264
| to contain many indirect branches in the fast path of SPEC,
| but whatever.) I believe Intel originally aimed for something
| originally in the 1-5% range but I can't find a reference.
|
| If that's it, then it seems ok enough if you're paranoid, but
| if you're paranoid the fine-grained approach improves things
| quite a bit. This all still seems to be a ways off for
| everyone but I guess the kernel 5.18 support is a start.
|
| [1] https://www.spinics.net/lists/kernel-
| hardening/msg05347.html
| wnoise wrote:
| Wow, actual use of come from instructions.
___________________________________________________________________
(page generated 2022-03-31 23:02 UTC)