[HN Gopher] BasicBlocker: ISA Redesign to Make Spectre-Immune CP...
       ___________________________________________________________________
        
       BasicBlocker: ISA Redesign to Make Spectre-Immune CPUs Faster
       (2021)
        
       Author : PaulHoule
       Score  : 33 points
       Date   : 2023-07-26 17:55 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | bob1029 wrote:
       | Speculative execution, despite whatever flaws, brings a style of
       | optimization that you simply cannot substitute with any other.
       | Conceptually, the ability to _continuously time travel into the
       | future and bring information back_ is a pretty insane form of
       | optimization. The fact that this also prefetches memory for us is
       | amazing, except in some unhappy adverse contexts. Perhaps we
       | should just pause there for a moment and reflect...
       | 
       | Imagine being able to simultaneously visit 4 retail stores and
       | dynamically select items depending on availability and pricing,
       | arriving back home having spent the amount of time it takes to
       | shop at 1.25 stores while burning 1.5x the fuel of a one-store
       | trip.
       | 
       | There is no amount of ISA redesign or recompilation that can
       | accommodate the dynamics of real-world trends in the same ways
       | that speculative execution can. Instead of trying to replace
       | speculative execution, I think we should try to put it into a
       | more secure domain where it can run free and be "dangerous"
       | without actually being allowed to cause trouble outside the
       | intended scope. Perhaps I am asking for superpositioned cake
       | here. Is there a fundamental reason we cannot make speculative
       | execution secure?
        
         | insanitybit wrote:
         | > Is there a fundamental reason we cannot make speculative
         | execution secure?
         | 
         | It is secure in many, many contexts. For example, I have no
         | concerns about speculative execution if I'm running a database
         | or service, which is great since those are the areas where
         | performance matters most.
         | 
         | Where it's troublesome is when you need isolation in the
         | presence of arbitrary code execution. My suggestion is that if
         | you ever find yourself in that scenario that you manage your
         | cores manually - ensure that "attacker" code never crosses with
         | anything sensitive on the same core. If you need that next
         | level of security, enable the mitigations.
         | 
         | Pinning your cores is going to help a lot with the mitigations
         | anyways - the TLB doesn't have to be flushed in circumstances
         | where the same process on the same core is switched in, or
         | something like that (someone please explain more, I forget and
         | I don't want to look it up right now). There's some process
         | context id cache blah blah blah, the point is that you can
         | improve things if you pin to a core.
         | 
         | I think basically you're right. Instead of removing an amazing
         | optimization let's find the areas where we can enable it,
         | understand the threat model where it's relevant, and find ways
         | to either reduce the cost of mitigations or otherwise mitigate
         | in a way that's free.
        
         | matu3ba wrote:
         | > Is there a fundamental reason we cannot make speculative
         | execution secure?
         | 
         | Any memory access leads to a time channel, which might be
         | observable or not. As example, hyperthreading is known to
         | create observable side channels even in L1 and L2 cache, since
         | those are shared.
         | 
         | L3 cache is also shared between CPU cores on the same socket,
         | so unless you can ensure the L3 cache data can never be shared
         | you can not entirely eliminate this time channel.
         | 
         | Now getting back to speculative execution: All possible
         | execution sequences must ensure to satisfy all possible cross-
         | interaction rules to not make any time channel visible, which
         | 1. includes restoring previous state fully and 2. not leaking
         | any timing behavior. Just think in your mind of all possible
         | cases, which would need to be verified (the complete
         | instruction set) and if you can think of any sane time travel
         | time leaking cache-aware separation logic to do this.
         | 
         | On top of that it has already been shown that fundamentally the
         | hardware guarantees on cache behavior are broken (they are
         | merely a hints).
        
         | mike_hock wrote:
         | > Is there a fundamental reason we cannot make speculative
         | execution secure
         | 
         | You've said it yourself
         | 
         | > The fact that this also prefetches memory for us is amazing
         | 
         | To be secure, speculatively executed instructions that don't
         | retire, have to have _no_ observable effects, including those
         | observable through timing. They cannot be allowed to modify the
         | cache hierarchy in any way.
        
           | bob1029 wrote:
           | Does speculative execution on my CPU affect your computing
           | environment?
        
         | PaulHoule wrote:
         | Look at the failure of VLIW, a compiler can't know what is
         | going on with the memory system at runtime but the hardware can
         | make a very good guess.
        
         | causality0 wrote:
         | I'm still of the opinion the industry response to spectre was
         | wildly overblown. It's been five years and not one single damn
         | person has been a confirmed victim of it yet we all tied an
         | albatross around our CPU's neck the moment the news broke.
        
           | insanitybit wrote:
           | > It's been five years and not one single damn person has
           | been a confirmed victim of it yet
           | 
           | Well, everyone patched things pretty aggressively, so
           | exploitation isn't really practical. At minimum, that's why.
           | Also, vuln research can take a long time to turn into
           | exploits - there are so many existing primitives that people
           | are exploring right now (ebpf, io_uring) that have
           | _practical_ exploitation primitives already designed, so
           | there isn 't much pressure to go for something that's already
           | patched and that would require a lot of novel research to
           | find reliable primitives for it.
           | 
           | As for the mitigations, just disable them? They're on by
           | default because Linux doesn't know what your use case is, but
           | if your use case isn't relevant to the mitigation's threat
           | model please feel free to disable them. It is very simple to
           | do so.
        
             | causality0 wrote:
             | Oh I did. I never installed the patches in the first place
             | and I've been a regular user of InSpectre.
        
         | JoshTriplett wrote:
         | I do wonder to what degree we could hard-partition caches, such
         | that speculative prefetches go straight into CPU-specific
         | caches, and doesn't get to go into shared caches (e.g. L3)
         | until it stops being speculative.
         | 
         | I also wonder to what degree we could partition between user
         | mode and supervisor mode (and provide similar facilities to
         | partition between user-mode and user-mode-sandbox, such as
         | WebAssembly or other JITs), with the same premise. Let the
         | kernel prefetch things but don't let userspace notice the
         | speculated entries.
        
           | insanitybit wrote:
           | > I do wonder to what degree we could hard-partition caches,
           | such that speculative prefetches go straight into CPU-
           | specific caches, and doesn't get to go into shared caches
           | (e.g. L3) until it stops being speculative.
           | 
           | This is already sort of possible. The TLB flushing can take
           | advantage of the PCID to determine, based on the process,
           | whether the cache must be flushed - this provides _process
           | level_ isolation of the TLB.
           | 
           | I believe recent CPUs are increasing the size of some PCID
           | related components since it's becoming increasingly important
           | post-kPTI.
        
           | matu3ba wrote:
           | This sounds like a huge slowdown for linear memory access
           | patterns (max. throughput), which do not fit into L1+L2
           | cache. I dont see options to prevent L3 cache time behavior
           | being leaked unless one takes performance cuts for memory
           | access patterns.
           | 
           | The only option I do see is something to prevent specific and
           | limited memory as not being allowed into L3 cache altogether.
           | 
           | Since you're active and interested in this stuff: What is the
           | state of art on cpusets for flexible task pinning on cores?
           | "Note. There is a minor chance that a task forks during move
           | and its child remains in the root cpuset." is mentioned in
           | the suse docs https://documentation.suse.com/sle-
           | rt/12-SP5/single-html/SLE..., but without any background
           | explanation and I do not understand what stuff breaks on
           | moving pinned Kernel tasks.
        
         | c-linkage wrote:
         | I am in no way an expert on CPU design, but one way this might
         | be possible is to use "memory tagging" in the sense that CPU
         | execution pipelines are extended through the CPU cache (and
         | possibly into RAM itself) by "tags" that link the state of a
         | cache cell to a branch of speculative execution.
         | 
         | For example, a pre-fetch linked to a speculative execution
         | would be tagged with a CPU-specific speculative execution
         | identifier such that the pre-fetched data would only be
         | accessible in that pipeline. If that speculative execution
         | becomes realized then the tag would be updated (perhaps to
         | zero?) to show it was _actually_ executed and visible to all
         | other CPUs and caches. In all other cases, the speculative
         | execution is abandoned and the tagged cache cells become marked
         | as available and undefined. Circuitry similar to register
         | renaming could be used to handle tagging in the caches at the
         | cost of effectively _halving_ cache sizes.
         | 
         | In a more macro sense, imagine git branches that get merged
         | back into main. The speculative execution only occurs on the
         | branch. When the CPU realizes the prediction was good and
         | doesn't need to rolled back, the branch is merged into the
         | trunk and becomes visible to all other systems having access to
         | the trunk.
        
           | greatfilter251 wrote:
           | [dead]
        
         | kirse wrote:
         | _Imagine being able to simultaneously visit 4 retail stores and
         | dynamically select items depending on availability and pricing,
         | arriving back home having spent the amount of time it takes to
         | shop at 1.25 stores while burning 1.5x the fuel of a one-store
         | trip._
         | 
         | What a fantastic analogy.
        
       | kmeisthax wrote:
       | This idea seems so simple that I'm pretty sure at least three
       | people have independently thought of the same idea when reading
       | about branch delay slots. I suspect some of the more strict
       | aspects of basic block enforcement would also at least frustrate
       | some ROP attacks.
       | 
       | All in all, seems like a good idea, when can we staple this onto
       | existing processors?
        
       | [deleted]
        
       | shiftingleft wrote:
       | Previous discussions:
       | 
       | https://news.ycombinator.com/item?id=24090632
       | 
       | https://news.ycombinator.com/item?id=34202099
        
         | CalChris wrote:
         | This is a new version (v2) of their paper.
        
       | Joel_Mckay wrote:
       | Clock domain crossing is a named problem with a finite set of
       | solutions.
       | 
       | When you get out of abstract logical domains into real world
       | physics, than the notion of machine state becomes fuzzier at
       | higher clock speeds.
       | 
       | Good luck. =)
        
       | phoe-krk wrote:
       | (2021)
        
         | dang wrote:
         | Added. Thanks!
        
         | ChrisArchitect wrote:
         | (2020) even
        
       ___________________________________________________________________
       (page generated 2023-07-26 23:00 UTC)