[HN Gopher] Lord of the Ring(s): Side Channel Attacks on the CPU...
___________________________________________________________________
Lord of the Ring(s): Side Channel Attacks on the CPU On-Chip Ring
Interconnect
Author : nixgeek
Score : 179 points
Date : 2021-03-08 03:55 UTC (19 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| thu2111 wrote:
| I've read a lot of side channel papers like this one over the
| past few years. Here are some thoughts.
|
| Firstly, this new technique is probably not exploitable against
| 'real' cryptographic software. A very common technique in these
| papers that I see all the time these days is they attack obsolete
| versions of libgcrypt, because old versions of this relatively
| obscure crypto library aren't using constant time code. All
| modern and patched crypto implementations that people actually
| use would _not_ leak using this technique, as the paper admits at
| the end. And even libgcrypt was patched years ago. The version
| number cited in the papers never makes this clear - you have to
| look up the release history to realise this.
|
| Secondly, this paper is a bit unusual in the sheer number of
| steps that are simulated or otherwise simplified by e.g.
| requiring root. It's got some work to do before it's usable
| outside of lab condition demonstrations.
|
| OK, so the crypto attack is theoretical and wouldn't work in real
| life, but what about the ability to extract passwords from typing
| patterns? Well, if we read that part of the paper carefully we
| can see a rather massive caveat: all it takes is 2 threads doing
| stuff in the background and the signal is drowned in noise. 4
| threads doing stuff cause the signal to be entirely lost. So
| there seems to be a simple mitigation and it's unclear that this
| would work at all on a server that's under even moderate load.
|
| Moreover, whilst a casual reader may get the impression they can
| extract passwords, they don't actually demonstrate that. Rather,
| they demonstrate what they claim is "a very distinguishable
| pattern" in ring contention triggered by keystrokes, with "zero
| false positives and zero false negatives". That sounds impressive
| but no information is given on what they're comparing against:
| what were the other events that were being tested here? The
| obvious question in my mind is how much of the signal they're
| seeing came from the actual keystroke vs output being printed to
| a terminal emulator, in which case, I'd expect to see potential
| FPs caused by any printing to the terminal. Their victim program
| doesn't merely monitor keystrokes but they are also echoed to the
| terminal - which is _not_ what happens during password input,
| where characters aren 't visible. This mock victim program is not
| much like a real password input program as a consequence.
|
| Overall it's a clever paper, but I find myself increasingly
| fatigued by this genre of research. They seem to have settled
| into a template:
|
| * Attack Intel, ignore everything else.
|
| * Make grand and scary sounding claims
|
| * Only demonstrate them against deliberately crippled victim
| programs in very artificial conditions.
|
| It's been a few years now and I don't think any Spectre attack
| has ever been spotted in the wild, mounted by real attackers.
| That's true despite a huge state-sponsored attack having just
| been detected, that Microsoft claim might have had over 1000
| developers work on it. Uarch side channel attacks sound cool but
| it seems real attackers either can't make them work, or have
| easier ways to get what they want. I find myself losing interest
| as a consequence.
| Flocular wrote:
| I, for one, am mostly preoccupied with the covert channel. That
| one sounds like it's pretty real.
| theonlyklas wrote:
| > _It 's been a few years now and I don't think any Spectre
| attack has ever been spotted in the wild_
|
| https://dustri.org/b/spectre-exploits-in-the-wild.html
|
| However, I agree on your points. I just hope it makes Intel's
| share price dip so I can make more money.
| ricpacca wrote:
| I am an author of the paper. Great question on the signal of
| the actual keystroke vs output being printed to a terminal. We
| did test that, and the attack works perfectly also without
| ECHOing characters in the terminal emulator (i.e., printing to
| the terminal is not a requirement).
| adrian_b wrote:
| I agree with most of what you said, but about a Spectre attack
| having never been spotted in the wild, that seems to be no
| longer true.
|
| Just a few days ago there were some news about the discovery of
| the first real malware that had exploited a Spectre variant.
|
| Unfortunately I do not remember where I saw this, but it was
| said that the Spectre-based malware might have spread for a few
| months before being identified recently.
| angry_octet wrote:
| This is a great paper, really well explained. The source code:
| https://github.com/FPSG-UIUC/lotr
|
| The TL;DR is that the L3 cache (LLC: Last Level Cache) is shared
| between all cores on the chip, but the L3 is composed of a number
| of _slices_ colocated with each core: the L3 is actually CC-NUMA!
| There is contention when reading /writing to the L3 via the ring
| interconnect, which can be used to identify the memory addresses
| access patterns of the cache. There's a bit more covering how the
| L1/L2 caches, private to each core, are revealed via the set
| inclusivity properly of Intel's cache design.
|
| In monitoring LLC it is similar to
| https://eprint.iacr.org/2015/898.pdf though it doesn't reference
| them.
|
| It isn't a game over attack but it reinforces how sharing
| resources (cache, memory, cores) is a bad idea when that crosses
| a security boundary. VM exclusive cache regions (Intel CAT cache
| partitioning) and cache locking should help, but cache slice
| partitioning is reqd too.
| db48x wrote:
| We should just stop running multiple programs on the same
| hardware.
| elihu wrote:
| Alternatively: maybe eliminating observable side-effects from
| cache behavior by restricting accurate timing information to
| privileged processes is the way to go. Though I imagine
| that's a lot easier done at the level of, say, a Javascript
| interpreter than at the level of raw machine code. There's
| probably a lot of indirect ways of figuring out how long
| something took when you can run arbitrary instructions and/or
| you're allowed to run multiple threads that share memory.
| Even being able to launch two threads and find out which one
| finished first can be used to construct a crude stopwatch.
| the8472 wrote:
| > or you're allowed to run multiple threads that share
| memory.
|
| Indeed, you can always spawn a helper thread that does
| nothing but incrementing a counter on shared memory and use
| that as clock substitute. That's why browsers disabled
| shared arrays after spectre was revealed.
|
| So this isn't practical for any programs that need fine-
| grained parallelism.
| db48x wrote:
| Yea, that's a good way to go in the medium term. I like the
| Mill design, where every fetch from memory can return NaR
| (Not a Result, which is a little like a floating-point
| NaN), and all operations on a NaR take the same amount of
| time as normal but pass the NaR through. Also, they made
| sure to put the memory protection _before_ the cache, so
| when you don't have access to the memory you get the NaR
| before it even consults the cache. Looks like there are
| some nice performance benefits too.
|
| In the long term though? Who knows. Separate
| cpu+cache+memory hardware for every process seems a little
| crazy, but the alternatives might be worse.
| twic wrote:
| And for additional security, reduce the program count by one
| beyond that.
| izacus wrote:
| Original iOS and iPads were right - you should only be able
| to see a single piece of software and everything else should
| stop. This will make you safe and secure.
| adgjlsfhk1 wrote:
| it seems like it might be totally feasible to make a CPU
| where one core runs all the Ring 0 stuff, and the rest don't.
| toast0 wrote:
| You don't really need a CPU to do that. Just operating
| system decisions.
|
| a) route all hardware interrupts to CPU 0 only
|
| b) when a user thread does a syscall, immediately task
| switch to another thread, marking the current thread as
| entering/in the kernel. Only service threads entering/in
| the kernel on CPU 0.
|
| Technically, you need a little bit of ring 0 time to task
| switch. But if you make all of the syscalls amazingly slow,
| timing attacks are a lot harder.
| angry_octet wrote:
| There are some patches out there that allow you to lock
| kernel threads to particular cores.
|
| With all the cache flushes on context switch, making
| syscalls slow should be no problem at all.
| samus wrote:
| I like that idea!
|
| There's already a large slowdown because of communication
| and NUMA. The syscall arguments have to be transferred to
| CPU 0 memory as well.
| elihu wrote:
| I just skimmed a little bit of the paper so I'm probably
| missing a lot of context, but would restricting to one core
| even help in this instance? Information is leaking via the
| L3 cache, which is shared.
| 01100011 wrote:
| Perhaps we should stop running multiple programs on the same
| hardware at the same time while making sure information isn't
| leaked between domains?
|
| If the future provides us with large but thermally limited
| dies, I could see a scenario where CPU components are
| duplicated but not all running at once. A thread or set of
| threads could own a hierarchy of caches, for example. Caches
| are die-intensive, so I don't see this happening soon, but
| maybe at some point in the future.
| est31 wrote:
| > VM exclusive cache regions and cache locking should help, but
| cache slice partitioning is reqd too.
|
| Note that in this instance the shared region is the _bus_ used
| to access the cache. Reserving a cache region won 't help with
| that. One would have to reserve time slices on the bus.
| angry_octet wrote:
| But if the cache is in a different set, and each security
| partition has exclusive access to particular slices, you
| can't get contention because access is forbidden.
|
| Also, time slicing might not be sufficient due to the queues
| in each LLC slice. If you can fill the queue the bus access
| will retry I guess.
| marcodiego wrote:
| Can it be used to circumvent IME?
| itcrowd wrote:
| Why do authors use such silly titles as "Lord of the Rings"?
| There is no mention of it anywhere in the paper besides the
| title, it just looks childish(to me). Just use the second part of
| the title as the paper's title, dump the LoTR thing.
| zzzzzzzza wrote:
| strong disagree, it's more memorable
| maxtaco wrote:
| Note that the implementation of EdDSA that the authors
| investigated (libgcrypt) is not a constant-time implementation.
| Better implementations are more likely to be safe.
|
| See: https://news.ycombinator.com/item?id=21352821
| molticrystal wrote:
| >Finally, AMD CPUs utilize other proprietary technologies known
| as Infinity Fabric/Architecture for their on-chip interconnect.
| Investigating the feasibility of our attack on these platforms
| requires future work. However, the techniques we use to build our
| contention model can be applied on these platforms too.
|
| I notice often that when coverage of these side-channels start
| spreading Intel takes a hit and AMD thrives. While the majority
| of these side-channel attacks are tuned to Intel chips, the
| theory behind them along with a decent amount of work can often
| make them applicable to AMD as well, and in some cases even other
| architectures(ARM, MIPS, SPARC, etc).
| londons_explore wrote:
| Intel CPU's are more widely deployed, especially on the types
| of systems under more attack. That's why attacks are designed
| for them.
| orclev wrote:
| Considering the shift to AMD from Intel that has happened
| over the last couple years it will be interesting to see if
| we'll see AMD as more of a priority target in the future. AMD
| has certainly been leading in sales in most demographics for
| the last year or two, and there's little sign of Intel
| closing that gap. AMD has even made some moves recently that
| indicate some desire to chase Intel out of the lead in the
| few niches they still control, like the x86 low power/cost
| device market that has been dominated by atom/celeron.
| josephg wrote:
| Yep. I wish I could remember the details, but in one of the
| episodes of the On The Metal podcast they talked to someone who
| backported meltdown/spectre to a bunch of CPUs which are
| decades old, or really exotic.
|
| I don't know why, but there's something really surprising to me
| about how far reaching the issues are, when you pull them back
| to first principles.
| chr15p wrote:
| That was Jon Masters who led the technical Spectre/Meltdown
| response for Red Hat, and was talking about porting it to
| SPARC, and Itanium, and maybe others as well
|
| https://oxide.computer/podcast/on-the-metal-8-jon-masters/
| p_l wrote:
| A big difference between impact on Intel and AMD was due to
| only part of the recent attacks being a new class opening new
| ground - namely the generic speculative execution part.
|
| A big group that turned it into "Intel fail" was due to
| various errors on Intel part, like moving verification of
| memory accesses to instruction retirement and similar things
| that even to uneducated person like me look like "tricks to
| make single-core performance shine".
| thu2111 wrote:
| Those weren't Intel fails - no CPU manufacturer had any
| kind of generic rule against that sort of
| microarchitectural optimisation and it's hard to imagine
| what sort of rule would have blocked such optimisations by
| policy which wouldn't also rule out all speculative
| execution. The only reason AMD wasn't hit by Meltdown too
| is their cores are/were less optimised.
| p_l wrote:
| Moving memory access verification to instruction
| retirement is the kind of risky optimization one should
| beware immediately, even if spectre and the like weren't
| yet known. If only because the risks if you get it wrong
| are that much higher.
| rrss wrote:
| Ok, but "Intel fail" indicates meltdown was specific to
| Intel processors, when it was also present of IBM and
| some Arm designs.
| p_l wrote:
| I specifically separated the "new class of
| microarchitecture timing attacks", which was common
| across many systems (including AMD, POWER and ARM), from
| "Intel fail" where too aggressive optimization tricks did
| turn out to be bad idea.
| tedunangst wrote:
| What specific optimization are referring to? The one you
| mentioned, moving access auth checks to retirement, was
| something both ARM and POWER did.
| jeffbee wrote:
| "Too aggressive" is just, like, your opinion, man. There
| are tons of workloads where only maximum throughput or
| lowest latency are important and the operators of those
| systems don't care one jot about side channel attacks.
| wizzwizz4 wrote:
| And PC / general-purpose CPUs aren't such a workload.
| csharptwdec19 wrote:
| > The only reason AMD wasn't hit by Meltdown too is their
| cores are/were less optimised.
|
| That doesn't answer whether they were less optimized
| because someone on the Red team realized that was a good
| idea.
| Klwohu wrote:
| Turns out exploiting these issues seems to be easier on Intel.
| Perhaps since it's so dominant, Intel is where the majority of
| the research is focused. But perhaps, the huge architectural
| differences which made Intel more vulnerable to more Spectre
| variants are responsible. It's clear Intel has been taking many
| shortcuts with its CPU design in favor of speed and has been
| for decades.
| afrcnc wrote:
| Because AMD has a tiny market share. Trust me, if AMD would be
| on top, it would be riddled with holes, just like Intel
| neogodless wrote:
| https://www.techspot.com/news/87436-amd-chipping-away-
| intel-...
|
| After checking a few sites, it looks like AMD may have
| between 20 and 37% of PC market share. I do not consider that
| "tiny."
| thu2111 wrote:
| It's true. I've read one side channel attack paper that
| admitted towards the end that the researchers didn't
| actually even have access to AMD hardware to test with, but
| thought it should work in principle.
|
| Fact is Intel takes the brunt of these attacks because
| their HW is more available especially in semi-standardised
| environments like universities. Also, Intel fund
| researchers (this paper was part funded by Intel), and
| their technical documentation is better than AMDs, at least
| in my experience. That makes it easier to understand the
| CPU internals, which is a big part of these papers.
| wolrah wrote:
| In the server market Intel had 92.9% for Q4 2020. AMD is
| actually up a lot in that market, Intel had 95.5% in 2019,
| but outside of a few specific use cases Intel still owns
| the market most likely to contain interesting information
| an attacker might want.
|
| This attack as it stands does not seem to apply to Intel
| server hardware from Skylake to present due to a different
| interconnect architecture, though the researchers indicate
| a belief the attack may be portable it would also be more
| limited.
|
| Complete rectal estimation here, but I'd imagine executive
| laptops to probably be the next juiciest target as the end
| user station most likely to contain "interesting" data.
| While AMD has been making incredible inroads in the laptop
| market with Zen 2 and especially Zen 3, as opposed to the
| Bulldozer era when seeing an AMD sticker on a laptop let
| you know it was the cheapest one on the shelf, they still
| haven't made it to the models commonly bought by
| businesses.
|
| AMD is absolutely killing it in the DIY desktop space,
| deservedly so (this post typed from a 3900X that I love),
| but on the OEM side of things you still generally have to
| go out of your way to find them in anything not marketed at
| gamers.
| [deleted]
___________________________________________________________________
(page generated 2021-03-08 23:03 UTC)