[HN Gopher] GoFetch: New side-channel attack using data memory-d...
___________________________________________________________________
GoFetch: New side-channel attack using data memory-dependent
prefetchers
Author : kingsleyopara
Score : 139 points
Date : 2024-03-21 14:51 UTC (8 hours ago)
(HTM) web link (gofetch.fail)
(TXT) w3m dump (gofetch.fail)
| woadwarrior01 wrote:
| Reminded me of the Augury attack[1] from 2022, which also
| exploits the DMP prefetcher on Apple Silicon CPUs.
|
| [1]: https://www.prefetchers.info
| loeg wrote:
| Yes, they specifically mention that in the article and FAQ.
| Findecanor wrote:
| BTW. Three of the authors of GoFetch where also behind Augury.
| martinky24 wrote:
| Why does every attack needs its own branding, marketing page,
| etc...? Genuine question.
| xena wrote:
| So people talk about it
| sapiogram wrote:
| Well, names are useful for the same reason people's names are
| useful. The rest just kinda happens naturally, I think.
| yborg wrote:
| Yes, it saves time vs. starting a discussion on "that crypto
| cache sidechannel attack that one team in China found".
| martinky24 wrote:
| Name makes enough sense. "Branding, marketing page, etc..."
| was my question.
|
| "Happens naturally" isn't really an answer.
| ziddoap wrote:
| Is your position that any write-up about an attack must be
| plain text only, and must not use its own URL?
|
| I truly cannot understand why this is brought up so often.
| You aren't paying for it, it doesn't hurt you in any way,
| it detracts nothing from the findings (in fact, it makes
| the findings easier to discuss), etc. There is no downside
| I can think of.
|
| Can you share what the downsides of a picture of a puppy
| and a $5 domain are? Sorry, "branding" and "marketing
| page"?
|
| Or at least, maybe you can share what you think would be a
| more preferable way?
| fruktmix wrote:
| It's science these days. They need funding, one way is to get
| people to recognize the importance of their work
| modeless wrote:
| Science isn't just about discovering information. Dissemination
| is critical. Communicating ideas is just as important as
| discovering them and promotion is part of effective
| communication. It's natural and healthy for researchers to
| promote their ideas.
| FiloSottile wrote:
| Names are critical to enable discussion.
|
| The "marketing" page is where documentation is. Summaries that
| don't require reading a whole academic papers are a good thing,
| and they are the place where all the different links are
| collected. Same reason software has READMEs.
|
| Logos... are cute and take 10-60 minutes? If you spend months
| on some research might as well take the satisfaction of giving
| it a cute logo, why not.
| saagarjha wrote:
| Why does the comments of every such attack need a question
| about why it has its own branding, marketing page, etc...?
| Genuine question.
|
| (Seriously, this comes up every time, just do a search for it
| if you actually want to figure out why.)
| jerf wrote:
| As long as we're getting efficiency cores and such, maybe we need
| some "crypto cores" added to modern architectures, that make
| promises specifically related to constant time algorithms like
| this and promise not to prefetch, branch predict, etc. Sort of
| like the Itanium, but confined to a "crypto processor". Given how
| many features these things _wouldn 't_ have, they wouldn't be
| much silicon for the cores themselves, in principle.
|
| This is the sort of thing that would metaphorically drive me to
| drink if I were implementing crypto code. It's an uphill battle
| at the best of times, but even if I finally get it all right,
| there's dozens of processor features both current and future
| ready to blow my code up at any time.
| FiloSottile wrote:
| Speaking as a cryptography implementer, yes, these drive us up
| the wall.
|
| However, crypto coprocessors would be a tremendously disruptive
| solution: we'd need to build mountains of scaffolding to allow
| switching to and off these cores, and to share memory with
| them, etc.
|
| Even more critically, you can't just move the RSA
| multiplication to those cores and call it a day. The key is
| probably parsed from somewhere, right? Does the parser need to
| run on a crypto core? What if it comes over the network? And if
| you even manage to protect all the keys, what if a CPU side
| channel leaks the message you encrypted? Are you ok with it
| just because it's not a key? The only reason we don't see these
| attacks against non-crypto code is that finding targets is very
| application specific, while in crypto libraries everyone can
| agree leaking a key is bad.
|
| No, processor designers "just" need to stop violating
| assumptions, or at least talk to us before doing it.
| bee_rider wrote:
| I don't think the security community is also going to become
| experts in chip design, these are two full skill sets that
| are already very difficult to obtain.
|
| We must stop running untrustworthy code on modern full-
| performance chips.
|
| The feedback loop that powers everything is: faster chips
| allow better engineering and science, creating faster chips.
| We're not inserting the security community into that loop and
| slowing things down just so people can download random
| programs onto their computers and run them at random. That's
| just a stupid thing to do, there's no way to make it safe,
| and there never will be.
|
| I mean we're talking about prefetching. If there was a way to
| give ram cache-like latencies why wouldn't the hardware folks
| already have done it?
| FiloSottile wrote:
| > download random programs onto their computers and run
| them at random
|
| To be clear that includes what we're all doing by
| downloading and running Javascript to read HN.
|
| Maybe _I_ can say "don't run adversarial code on my same
| CPU" and only care about over-the-network CPU side-channels
| (of which there are still some), because I write Go crypto,
| but it doesn't sound like something my colleagues writing
| browser code can do.
| bee_rider wrote:
| Unfortunately somebody has tricked users to leaving
| JavaScript on for every site, it is a really bad
| situation.
| anonymous-panda wrote:
| Security and utility are in opposing balances often. The
| safest possible computer is one buried far underground
| without any cables in a faraday cage. Not very useful.
|
| > We're not inserting the security community into that
| loop and slowing things down just so people can download
| random programs onto their computers and run them at
| random. That's just a stupid thing to do, there's no way
| to make it safe, and there never will be.
|
| Setting aside JavaScript, you can see this today with
| cloud computers which have largely displaced private
| clouds. These run untrusted code on shared computers.
| Fundamentally that's what they're doing because that's
| what you need for economies of scale, durability,
| availability, etc. So figuring out a way to run untrusted
| code on another machine safely is fundamentally a
| desirable goal. That's why people are trying to do
| homomorphic encryption - so that the "safely" part can go
| both ways and both the HW owner and the "untrusted" SW
| don't need to trust each other to execute said code.
| arp242 wrote:
| Is this exploitable through JavaScript?
|
| In general from what I've seen, most of these JS-based
| CPU exploits didn't strike me as all that practical in
| real world conditions. I mean, it _is_ a problem, but not
| really all that worrying.
| csande17 wrote:
| Speak for yourself; I've got JavaScript disabled on
| news.ycombinator.com and it works just fine.
| titzer wrote:
| I almost gave you up an upvote until your third paragraph,
| but I have to now give a hard disagree. We're running more
| untrusted code than ever, and we absolutely should trust it
| less than ever and have hardware and software designed with
| security in mind. Security should be priority #1 from here
| on out. We are absolutely awash in performance and memory
| capacity but keep getting surprised by bad security
| outcomes because it's been second fiddle for too long.
|
| Software is now critical infrastructure in modern society,
| akin to the power grid and telephone lines. It's a
| strategic vulnerability to neglect security, and it must
| happen at all levels of the software and hardware stack.
| Meaning, trying to crash an enemy's entire society by
| bricking all of its computers and send them back to the
| dark ages in milliseconds. I fundamentally don't understand
| the mindset of people who want to take that kind of risk
| for a 10% boost in their games' FPS[1].
|
| Part of that is paying back the debt that decades of
| cutting corners has yielded us.
|
| In reality, the vast majority of the 1000x increase in
| performance and memory capacity over the past four decades
| has come from shrinking transistors and increasing
| clockspeeds and memory density--the 1 or 5 or 10% gains
| from turning off bounds checks or prefetching aren't the
| lion's share. And for the record, turning off bounds checks
| is monumentally stupid, and people should be jailed for it.
|
| [1] I'm exaggerating to make a point here. What we trade
| for a little desktop or server performance is an enormous,
| pervasive risk. Not just melting down in a cyberwar, but
| the constant barrage of intrusion and leaks that costs the
| economy billions upon billions of dollars per year. We're
| paying for security, just at the wrong end.
| bee_rider wrote:
| > I fundamentally don't understand the mindset of people
| who want to take that kind of risk for a 10% boost in
| their games' FPS[1]
|
| Me either. But, lots of engineers are out there just
| writing single threaded matlab and python codes with lots
| of data-dependencies and just hoping the system manages
| to do a good job (for those operations that can't be
| offloaded to BLAS). So I'm glad gamer dollars subsidize
| the development of fast single threaded chips that handle
| branchy codes well.
|
| > In reality, the vast majority of the 1000x increase in
| performance and memory capacity over the past four
| decades has come from shrinking transistors and
| increasing clockspeeds and memory density
|
| I disagree, modern designs include deep pipelines, lots
| of speculation, and complex caches _because_ that's the
| only way to spend that higher transistor budget for
| higher clocks and compensate for the fact that memory
| latencies haven't kept up.
|
| > Part of that is paying back the debt that decades of
| cutting corners has yielded us.
|
| It will be tough, but yeah, server and mainframe users
| need to roll back the decision to repurpose consumer
| focus chips like the x86 and arm families. RISC-V is
| looking good though and seems open enough that maybe they
| can pick-and-choose which features they take.
|
| > I almost gave you up an upvote until your third
| paragraph, but I have to now give a hard disagree.
|
| I'm not too worried about votes on this post; this site
| has lots of web devs and cloud users, pointing out that
| the ecosystem they rely on is impossible to secure is
| destined to get lots of downvotes-to-disagree.
| saagarjha wrote:
| How is RISC-V going to solve anything here?
| bee_rider wrote:
| It isn't a sure thing. Just, since it is a more open
| ecosystem, maybe the designers of chips that need to be
| able to safely run untrusted code can still borrow some
| features from the general population.
|
| I think it is basically impossible to run untrusted code
| safely or to build sand-proof sandboxes, but I thought
| the rest of my post was too pessimistic.
| saagarjha wrote:
| Turning off bounds checks is like a 5% performance
| penalty. Turning off prefetching is like using a computer
| from twenty years ago.
| aseipp wrote:
| I agree that hardware/software codesign are critical to
| solving things like this, but features like prefetching,
| speculation, and prediction are absolutely critical to
| modern pipelines and broadly speaking are what enable
| what we think of as "modern computer performance." This
| has been true for over 20 years now. In terms of
| "overhead" it's not in the same ballpark -- or even the
| same sport, frankly -- as something like bounds checking
| or even garbage collection. Hell, if the difference was
| within even one order magnitude, they'd have done it
| already.
| tadfisher wrote:
| > The feedback loop that powers everything is: faster chips
| allow better engineering and science, creating faster
| chips. We're not inserting the security community into that
| loop and slowing things down just so people can download
| random programs onto their computers and run them at
| random. That's just a stupid thing to do, there's no way to
| make it safe, and there never will be.
|
| Note that in the vast majority of cases, crypto-related
| code isn't what we spend compute cycles on. If there was a
| straightforward, cross-architecture mechanism to say, "run
| this code on a single physical core with no branch
| prediction, no shared caches, and using in-order execution"
| then the real-world performance impact would be minimal,
| but the security benefits would be huge.
| bee_rider wrote:
| I'm in favor of adding some horrible in-order, no
| speculation, no prefetching, 5 stage pipeline
| architectures 101 core which can be completely verified
| and bulletproof to chips.
|
| But the presence of this bulletproof core would not solve
| the problem of running bad code on modern hardware,
| unless all untrusted code is run on it.
| saagarjha wrote:
| Processor designers are very unlikely to do that for you,
| because everyone not working on constant time crypto gives
| them a whole lot of money to keep doing this. The best you
| might get is a mode where the set of assumptions they violate
| is reduced.
| gabrielhidasy wrote:
| Many modern architectures have crypto extensions, usually to
| accelerate a few common algorithms, maybe it would be good to
| add a few crypto-primitives instructions to allow new
| algorithms?
| sargun wrote:
| I think what's more likely is "mode switching" in which you can
| disable these components of the CPU for a certain section of
| executing code (the abstraction would probably be at the thread
| level).
| bee_rider wrote:
| One option would be for people to stop downloading viruses and
| then running them.
| Joel_Mckay wrote:
| Encrypted bus mmu have existed since the 1990's.
|
| However, the trend to consumer-grade hardware for cost-
| optimized cloud architecture ate the CPU market.
|
| Thus, the only real choice now is consumer CPUs even in scaled
| applications.
| theobservor wrote:
| The end result of these side channel attacks would be to have
| CPUs that perform no optimizations at all and all opcodes would
| run in the same number of cycles in all situations. But that will
| never happen. No one wants a slow CPU.
|
| As long as these effects cannot be exploited remotely, it's not a
| concern. Of course multi-tenant cloud-based virtualization would
| be a no go.
| bee_rider wrote:
| We need to drop all the untrusted code on some horrible in-
| order, no speculative execution, no prefetching, 5 stage
| pipeline from architectures 101 class core.
| graemep wrote:
| It might be preferable.
|
| We have ridiculously fast hardware. In many use cases (client
| machines in particular) we do not usually really need that. I
| would gladly drop features for security.
| bee_rider wrote:
| It will also be good because users will become more annoyed
| when people try to sneak full programs into their websites,
| hopefully resulting in a generally less bloated internet.
| _factor wrote:
| This is why high core counts and isolation matter. Isolate the
| code to a specific core. Assuming everything is working as
| intended, an exploit won't compromise other tenants.
| xiconfjs wrote:
| From the paper: "OpenSSL reported that local side-channel attacks
| (...) fall outside of their threat model. The Go Crypto team
| considers this attack to be low severity".
| john_alan wrote:
| On reading it seems a lib like libsodium can simply set the
| disable bit prior to cryptographic operations that are sensitive
| on M3 and above.
|
| Also looks like they need to predetermine aspects of the key.
|
| Very cool but I don't think it looks particularly practical.
| saagarjha wrote:
| > Can the DMP be disabled?
|
| > Yes, but only on some processors. We observe that the DIT bit
| set on m3 CPUs effectively disables the DMP. This is not the case
| for the m1 and m2.
|
| Surely there is a chicken bit somewhere to do this?
| Shtirlic wrote:
| Is it naive to ask whether implementing this mitigation would
| impact performance and memory interaction speed?
___________________________________________________________________
(page generated 2024-03-21 23:00 UTC)