[HN Gopher] Lord of the Ring(s): Side Channel Attacks on the CPU...
       ___________________________________________________________________
        
       Lord of the Ring(s): Side Channel Attacks on the CPU On-Chip Ring
       Interconnect
        
       Author : nixgeek
       Score  : 179 points
       Date   : 2021-03-08 03:55 UTC (19 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | thu2111 wrote:
       | I've read a lot of side channel papers like this one over the
       | past few years. Here are some thoughts.
       | 
       | Firstly, this new technique is probably not exploitable against
       | 'real' cryptographic software. A very common technique in these
       | papers that I see all the time these days is they attack obsolete
       | versions of libgcrypt, because old versions of this relatively
       | obscure crypto library aren't using constant time code. All
       | modern and patched crypto implementations that people actually
       | use would _not_ leak using this technique, as the paper admits at
       | the end. And even libgcrypt was patched years ago. The version
       | number cited in the papers never makes this clear - you have to
       | look up the release history to realise this.
       | 
       | Secondly, this paper is a bit unusual in the sheer number of
       | steps that are simulated or otherwise simplified by e.g.
       | requiring root. It's got some work to do before it's usable
       | outside of lab condition demonstrations.
       | 
       | OK, so the crypto attack is theoretical and wouldn't work in real
       | life, but what about the ability to extract passwords from typing
       | patterns? Well, if we read that part of the paper carefully we
       | can see a rather massive caveat: all it takes is 2 threads doing
       | stuff in the background and the signal is drowned in noise. 4
       | threads doing stuff cause the signal to be entirely lost. So
       | there seems to be a simple mitigation and it's unclear that this
       | would work at all on a server that's under even moderate load.
       | 
       | Moreover, whilst a casual reader may get the impression they can
       | extract passwords, they don't actually demonstrate that. Rather,
       | they demonstrate what they claim is "a very distinguishable
       | pattern" in ring contention triggered by keystrokes, with "zero
       | false positives and zero false negatives". That sounds impressive
       | but no information is given on what they're comparing against:
       | what were the other events that were being tested here? The
       | obvious question in my mind is how much of the signal they're
       | seeing came from the actual keystroke vs output being printed to
       | a terminal emulator, in which case, I'd expect to see potential
       | FPs caused by any printing to the terminal. Their victim program
       | doesn't merely monitor keystrokes but they are also echoed to the
       | terminal - which is _not_ what happens during password input,
       | where characters aren 't visible. This mock victim program is not
       | much like a real password input program as a consequence.
       | 
       | Overall it's a clever paper, but I find myself increasingly
       | fatigued by this genre of research. They seem to have settled
       | into a template:
       | 
       | * Attack Intel, ignore everything else.
       | 
       | * Make grand and scary sounding claims
       | 
       | * Only demonstrate them against deliberately crippled victim
       | programs in very artificial conditions.
       | 
       | It's been a few years now and I don't think any Spectre attack
       | has ever been spotted in the wild, mounted by real attackers.
       | That's true despite a huge state-sponsored attack having just
       | been detected, that Microsoft claim might have had over 1000
       | developers work on it. Uarch side channel attacks sound cool but
       | it seems real attackers either can't make them work, or have
       | easier ways to get what they want. I find myself losing interest
       | as a consequence.
        
         | Flocular wrote:
         | I, for one, am mostly preoccupied with the covert channel. That
         | one sounds like it's pretty real.
        
         | theonlyklas wrote:
         | > _It 's been a few years now and I don't think any Spectre
         | attack has ever been spotted in the wild_
         | 
         | https://dustri.org/b/spectre-exploits-in-the-wild.html
         | 
         | However, I agree on your points. I just hope it makes Intel's
         | share price dip so I can make more money.
        
         | ricpacca wrote:
         | I am an author of the paper. Great question on the signal of
         | the actual keystroke vs output being printed to a terminal. We
         | did test that, and the attack works perfectly also without
         | ECHOing characters in the terminal emulator (i.e., printing to
         | the terminal is not a requirement).
        
         | adrian_b wrote:
         | I agree with most of what you said, but about a Spectre attack
         | having never been spotted in the wild, that seems to be no
         | longer true.
         | 
         | Just a few days ago there were some news about the discovery of
         | the first real malware that had exploited a Spectre variant.
         | 
         | Unfortunately I do not remember where I saw this, but it was
         | said that the Spectre-based malware might have spread for a few
         | months before being identified recently.
        
       | angry_octet wrote:
       | This is a great paper, really well explained. The source code:
       | https://github.com/FPSG-UIUC/lotr
       | 
       | The TL;DR is that the L3 cache (LLC: Last Level Cache) is shared
       | between all cores on the chip, but the L3 is composed of a number
       | of _slices_ colocated with each core: the L3 is actually CC-NUMA!
       | There is contention when reading /writing to the L3 via the ring
       | interconnect, which can be used to identify the memory addresses
       | access patterns of the cache. There's a bit more covering how the
       | L1/L2 caches, private to each core, are revealed via the set
       | inclusivity properly of Intel's cache design.
       | 
       | In monitoring LLC it is similar to
       | https://eprint.iacr.org/2015/898.pdf though it doesn't reference
       | them.
       | 
       | It isn't a game over attack but it reinforces how sharing
       | resources (cache, memory, cores) is a bad idea when that crosses
       | a security boundary. VM exclusive cache regions (Intel CAT cache
       | partitioning) and cache locking should help, but cache slice
       | partitioning is reqd too.
        
         | db48x wrote:
         | We should just stop running multiple programs on the same
         | hardware.
        
           | elihu wrote:
           | Alternatively: maybe eliminating observable side-effects from
           | cache behavior by restricting accurate timing information to
           | privileged processes is the way to go. Though I imagine
           | that's a lot easier done at the level of, say, a Javascript
           | interpreter than at the level of raw machine code. There's
           | probably a lot of indirect ways of figuring out how long
           | something took when you can run arbitrary instructions and/or
           | you're allowed to run multiple threads that share memory.
           | Even being able to launch two threads and find out which one
           | finished first can be used to construct a crude stopwatch.
        
             | the8472 wrote:
             | > or you're allowed to run multiple threads that share
             | memory.
             | 
             | Indeed, you can always spawn a helper thread that does
             | nothing but incrementing a counter on shared memory and use
             | that as clock substitute. That's why browsers disabled
             | shared arrays after spectre was revealed.
             | 
             | So this isn't practical for any programs that need fine-
             | grained parallelism.
        
             | db48x wrote:
             | Yea, that's a good way to go in the medium term. I like the
             | Mill design, where every fetch from memory can return NaR
             | (Not a Result, which is a little like a floating-point
             | NaN), and all operations on a NaR take the same amount of
             | time as normal but pass the NaR through. Also, they made
             | sure to put the memory protection _before_ the cache, so
             | when you don't have access to the memory you get the NaR
             | before it even consults the cache. Looks like there are
             | some nice performance benefits too.
             | 
             | In the long term though? Who knows. Separate
             | cpu+cache+memory hardware for every process seems a little
             | crazy, but the alternatives might be worse.
        
           | twic wrote:
           | And for additional security, reduce the program count by one
           | beyond that.
        
           | izacus wrote:
           | Original iOS and iPads were right - you should only be able
           | to see a single piece of software and everything else should
           | stop. This will make you safe and secure.
        
           | adgjlsfhk1 wrote:
           | it seems like it might be totally feasible to make a CPU
           | where one core runs all the Ring 0 stuff, and the rest don't.
        
             | toast0 wrote:
             | You don't really need a CPU to do that. Just operating
             | system decisions.
             | 
             | a) route all hardware interrupts to CPU 0 only
             | 
             | b) when a user thread does a syscall, immediately task
             | switch to another thread, marking the current thread as
             | entering/in the kernel. Only service threads entering/in
             | the kernel on CPU 0.
             | 
             | Technically, you need a little bit of ring 0 time to task
             | switch. But if you make all of the syscalls amazingly slow,
             | timing attacks are a lot harder.
        
               | angry_octet wrote:
               | There are some patches out there that allow you to lock
               | kernel threads to particular cores.
               | 
               | With all the cache flushes on context switch, making
               | syscalls slow should be no problem at all.
        
               | samus wrote:
               | I like that idea!
               | 
               | There's already a large slowdown because of communication
               | and NUMA. The syscall arguments have to be transferred to
               | CPU 0 memory as well.
        
             | elihu wrote:
             | I just skimmed a little bit of the paper so I'm probably
             | missing a lot of context, but would restricting to one core
             | even help in this instance? Information is leaking via the
             | L3 cache, which is shared.
        
           | 01100011 wrote:
           | Perhaps we should stop running multiple programs on the same
           | hardware at the same time while making sure information isn't
           | leaked between domains?
           | 
           | If the future provides us with large but thermally limited
           | dies, I could see a scenario where CPU components are
           | duplicated but not all running at once. A thread or set of
           | threads could own a hierarchy of caches, for example. Caches
           | are die-intensive, so I don't see this happening soon, but
           | maybe at some point in the future.
        
         | est31 wrote:
         | > VM exclusive cache regions and cache locking should help, but
         | cache slice partitioning is reqd too.
         | 
         | Note that in this instance the shared region is the _bus_ used
         | to access the cache. Reserving a cache region won 't help with
         | that. One would have to reserve time slices on the bus.
        
           | angry_octet wrote:
           | But if the cache is in a different set, and each security
           | partition has exclusive access to particular slices, you
           | can't get contention because access is forbidden.
           | 
           | Also, time slicing might not be sufficient due to the queues
           | in each LLC slice. If you can fill the queue the bus access
           | will retry I guess.
        
       | marcodiego wrote:
       | Can it be used to circumvent IME?
        
       | itcrowd wrote:
       | Why do authors use such silly titles as "Lord of the Rings"?
       | There is no mention of it anywhere in the paper besides the
       | title, it just looks childish(to me). Just use the second part of
       | the title as the paper's title, dump the LoTR thing.
        
         | zzzzzzzza wrote:
         | strong disagree, it's more memorable
        
       | maxtaco wrote:
       | Note that the implementation of EdDSA that the authors
       | investigated (libgcrypt) is not a constant-time implementation.
       | Better implementations are more likely to be safe.
       | 
       | See: https://news.ycombinator.com/item?id=21352821
        
       | molticrystal wrote:
       | >Finally, AMD CPUs utilize other proprietary technologies known
       | as Infinity Fabric/Architecture for their on-chip interconnect.
       | Investigating the feasibility of our attack on these platforms
       | requires future work. However, the techniques we use to build our
       | contention model can be applied on these platforms too.
       | 
       | I notice often that when coverage of these side-channels start
       | spreading Intel takes a hit and AMD thrives. While the majority
       | of these side-channel attacks are tuned to Intel chips, the
       | theory behind them along with a decent amount of work can often
       | make them applicable to AMD as well, and in some cases even other
       | architectures(ARM, MIPS, SPARC, etc).
        
         | londons_explore wrote:
         | Intel CPU's are more widely deployed, especially on the types
         | of systems under more attack. That's why attacks are designed
         | for them.
        
           | orclev wrote:
           | Considering the shift to AMD from Intel that has happened
           | over the last couple years it will be interesting to see if
           | we'll see AMD as more of a priority target in the future. AMD
           | has certainly been leading in sales in most demographics for
           | the last year or two, and there's little sign of Intel
           | closing that gap. AMD has even made some moves recently that
           | indicate some desire to chase Intel out of the lead in the
           | few niches they still control, like the x86 low power/cost
           | device market that has been dominated by atom/celeron.
        
         | josephg wrote:
         | Yep. I wish I could remember the details, but in one of the
         | episodes of the On The Metal podcast they talked to someone who
         | backported meltdown/spectre to a bunch of CPUs which are
         | decades old, or really exotic.
         | 
         | I don't know why, but there's something really surprising to me
         | about how far reaching the issues are, when you pull them back
         | to first principles.
        
           | chr15p wrote:
           | That was Jon Masters who led the technical Spectre/Meltdown
           | response for Red Hat, and was talking about porting it to
           | SPARC, and Itanium, and maybe others as well
           | 
           | https://oxide.computer/podcast/on-the-metal-8-jon-masters/
        
           | p_l wrote:
           | A big difference between impact on Intel and AMD was due to
           | only part of the recent attacks being a new class opening new
           | ground - namely the generic speculative execution part.
           | 
           | A big group that turned it into "Intel fail" was due to
           | various errors on Intel part, like moving verification of
           | memory accesses to instruction retirement and similar things
           | that even to uneducated person like me look like "tricks to
           | make single-core performance shine".
        
             | thu2111 wrote:
             | Those weren't Intel fails - no CPU manufacturer had any
             | kind of generic rule against that sort of
             | microarchitectural optimisation and it's hard to imagine
             | what sort of rule would have blocked such optimisations by
             | policy which wouldn't also rule out all speculative
             | execution. The only reason AMD wasn't hit by Meltdown too
             | is their cores are/were less optimised.
        
               | p_l wrote:
               | Moving memory access verification to instruction
               | retirement is the kind of risky optimization one should
               | beware immediately, even if spectre and the like weren't
               | yet known. If only because the risks if you get it wrong
               | are that much higher.
        
               | rrss wrote:
               | Ok, but "Intel fail" indicates meltdown was specific to
               | Intel processors, when it was also present of IBM and
               | some Arm designs.
        
               | p_l wrote:
               | I specifically separated the "new class of
               | microarchitecture timing attacks", which was common
               | across many systems (including AMD, POWER and ARM), from
               | "Intel fail" where too aggressive optimization tricks did
               | turn out to be bad idea.
        
               | tedunangst wrote:
               | What specific optimization are referring to? The one you
               | mentioned, moving access auth checks to retirement, was
               | something both ARM and POWER did.
        
               | jeffbee wrote:
               | "Too aggressive" is just, like, your opinion, man. There
               | are tons of workloads where only maximum throughput or
               | lowest latency are important and the operators of those
               | systems don't care one jot about side channel attacks.
        
               | wizzwizz4 wrote:
               | And PC / general-purpose CPUs aren't such a workload.
        
               | csharptwdec19 wrote:
               | > The only reason AMD wasn't hit by Meltdown too is their
               | cores are/were less optimised.
               | 
               | That doesn't answer whether they were less optimized
               | because someone on the Red team realized that was a good
               | idea.
        
         | Klwohu wrote:
         | Turns out exploiting these issues seems to be easier on Intel.
         | Perhaps since it's so dominant, Intel is where the majority of
         | the research is focused. But perhaps, the huge architectural
         | differences which made Intel more vulnerable to more Spectre
         | variants are responsible. It's clear Intel has been taking many
         | shortcuts with its CPU design in favor of speed and has been
         | for decades.
        
         | afrcnc wrote:
         | Because AMD has a tiny market share. Trust me, if AMD would be
         | on top, it would be riddled with holes, just like Intel
        
           | neogodless wrote:
           | https://www.techspot.com/news/87436-amd-chipping-away-
           | intel-...
           | 
           | After checking a few sites, it looks like AMD may have
           | between 20 and 37% of PC market share. I do not consider that
           | "tiny."
        
             | thu2111 wrote:
             | It's true. I've read one side channel attack paper that
             | admitted towards the end that the researchers didn't
             | actually even have access to AMD hardware to test with, but
             | thought it should work in principle.
             | 
             | Fact is Intel takes the brunt of these attacks because
             | their HW is more available especially in semi-standardised
             | environments like universities. Also, Intel fund
             | researchers (this paper was part funded by Intel), and
             | their technical documentation is better than AMDs, at least
             | in my experience. That makes it easier to understand the
             | CPU internals, which is a big part of these papers.
        
             | wolrah wrote:
             | In the server market Intel had 92.9% for Q4 2020. AMD is
             | actually up a lot in that market, Intel had 95.5% in 2019,
             | but outside of a few specific use cases Intel still owns
             | the market most likely to contain interesting information
             | an attacker might want.
             | 
             | This attack as it stands does not seem to apply to Intel
             | server hardware from Skylake to present due to a different
             | interconnect architecture, though the researchers indicate
             | a belief the attack may be portable it would also be more
             | limited.
             | 
             | Complete rectal estimation here, but I'd imagine executive
             | laptops to probably be the next juiciest target as the end
             | user station most likely to contain "interesting" data.
             | While AMD has been making incredible inroads in the laptop
             | market with Zen 2 and especially Zen 3, as opposed to the
             | Bulldozer era when seeing an AMD sticker on a laptop let
             | you know it was the cheapest one on the shelf, they still
             | haven't made it to the models commonly bought by
             | businesses.
             | 
             | AMD is absolutely killing it in the DIY desktop space,
             | deservedly so (this post typed from a 3900X that I love),
             | but on the OEM side of things you still generally have to
             | go out of your way to find them in anything not marketed at
             | gamers.
        
           | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-03-08 23:03 UTC)