[HN Gopher] A supply chain attack on PyTorch
___________________________________________________________________
A supply chain attack on PyTorch
Author : roblabla
Score : 281 points
Date : 2024-01-12 15:53 UTC (2 days ago)
(HTM) web link (johnstawinski.com)
(TXT) w3m dump (johnstawinski.com)
| CoolCold wrote:
| Among other nice things, I liked
|
| > We used our C2 repository to execute the pwd && ls && /home &&
| ip a command on the runner labeled "jenkins-worker-rocm-amd-34",
| confirming stable C2 and remote code execution. We also ran sudo
| -l to confirm we had root access.
|
| While it's not clear was it curated list of commands or just ALL,
| I assume the latter and that makes me feel no system
| administrator was involved into that pipelines setup - those guys
| are quite allergic to giving sudo/root access at all
| davnn wrote:
| Is 5k an appropriate amount for such a finding? Sounds incredibly
| cheap for such a large organization. How much would something
| like this be worth on the black market?
| vicktorium wrote:
| you have to consider that in the black market the rates would
| absorb the illegality of the action. while 5k is 'clean'
| ZoomerCretin wrote:
| Monero is always clean, too.
| wslh wrote:
| No, that is in general the issue with security bounties. They
| attract mainly people who have enough time for trial and error
| and/or prior domain expertise and/or extremely smart in
| specific software. Nowadays cybersecurity is a vast field and
| it is not the same to be a white hat hacker specialized in
| Google Chrome issues than one in iOS. Not saying it cannot be
| the same person but the amount of time required to catch issues
| is long.
|
| I think supply chain attacks are not being taken very
| seriously. Think that people working, for example, in Python or
| JavaScript use pip or npm daily no matter if they work for a
| nuclear agency or your uncle's bar.
| mardifoufs wrote:
| In an earlier article about the exploitation of GitHub actions
| in general (which this specific attack on pytorch is part of)
| they said:
|
| >So far, we've submitted over 20 bug bounty reports, raking in
| hundreds of thousands of dollars in bounties.
|
| So I think this is part of a chain of bounties? Though that can
| still be argued to be a bit too low for how powerful this
| exploit could be :)
| nightpool wrote:
| Those are from different organizations, I think. So 5k from
| pytorch only but more from other orgs
| 1B05H1N wrote:
| Maybe 5k is the max payout for the bug class?
| pvg wrote:
| This question comes of up frequently with these and it's
| premised on the hypothetical value of the bug on 'the black
| market'. The vast majority of such reported vulnerabilities
| have a 'black market' value of roughly zero, though, including
| this one. This doesn't say anything about the quality of the
| research, just that it's pretty hard to get monetary or other
| value out of most vulnerabilities.
| withinboredom wrote:
| It's quite a bit more nuanced than that. Businesses only want
| to pay because it costs less than the damage done to the
| brand and/or lawsuits from users/data controllers. They don't
| want to pay more than that. Researchers need money and are
| able to sell the fruits of their research to whomever they
| want. Generally, good-natured people will especially see if
| the bounty is worth it. It's clean money, so it has
| additional value vs. selling it on the black market.
|
| So, as you can hopefully see, it is a balancing act between
| all parties.
| pvg wrote:
| No, I don't think that holds much explanatory power - the
| vast majority of vulns have not only zero black market
| value, they also carry effectively zero brand or legal
| liability risk. This is also the case for this vuln.
| withinboredom wrote:
| Generally, getting root on internal infrastructure is
| just a step away from doing whatever you want. Even if it
| is just waiting for someone to ssh in with -A set so they
| can steal your ash keys.
| pvg wrote:
| Yes that is exactly the sort of thing that has zero non-
| bounty dollar value and next to no legal or brand risk.
| sp332 wrote:
| Bug bounties do not compete with the black market. Also on the
| business side, they are not as efficient as just paying an
| internal QA or security team. Katie Mousouris, who set up
| Microsoft's original bug bounty program has gone into a lot of
| detail on this. E.g. https://www.zdnet.com/article/relying-on-
| bug-bounties-not-ap...
| vicktorium wrote:
| very interesting
|
| i wonder about the over-dependence on third party packages and
| modules
|
| imagine the author of 'is-odd' injects a trojan there
|
| what are you gonna do?
|
| C has this solved but 'vendoring' is not as fast as this approach
| richbell wrote:
| You can't do much beyond setting up a corporate proxy that
| blocks or inspects outbound connections. Even then, you're
| relying on luck.
|
| These days it's practically a necessity for companies to shell
| out money to some sort of supply-chain protection software
| (Sonatype, Socket.dev etc.)
| lanstin wrote:
| Make the corporate proxy use an allow list only. Even then
| you fall prey to official PyPi hacked packages, but at least
| then the cryptominers or discord cred stealers can't phone
| home.
| chuckadams wrote:
| > C has this solved
|
| "Reflections on Trusting Trust" was a demonstration of this in
| the C ecosystem long before package managers were a thing.
| SethMLarson wrote:
| The recommended guidance is either vendoring dependencies or
| pinning to hashes (pip --require-hashes, poetry.lock, pipfile).
| When updating your dependencies you should review the actual
| file getting downloaded.
|
| Compiled binaries are harder, you might consider compiling them
| from source and comparing the output. This is where build
| reproducibility comes in to play.
|
| There's a lot more coming in the Python packaging security
| space that'll make this easier and just safer in general. Stay
| tuned :)
| BirAdam wrote:
| Well... sort of. C has become a standard with several
| implementations. It gains supply chain security by being
| decentralized. Likewise, it has many package managers with
| different repos for language specific things, and it then has
| many package managers and repos if we consider UNIX/Linux
| systems C development environments with dynamic linking and the
| like.
|
| The issue is, for any given implementation, similar attacks
| could still happen, and the package repos are still probably
| vulnerable.
| dontupvoteme wrote:
| I know that it is zeitgeist exploiting to say this, but seeing
| Boeing listed and not Airbus really says something to me.
|
| Lockheed being listed makes me wonder if the FBI/CIA really will
| (further) step up on cybercrime, because you now have potential
| national security implications in a core supplier to multiple
| military branches.
| Havoc wrote:
| Boeing is a lot more into military tech than airbus
| dralley wrote:
| That's not entirely true, Airbus is a participant in most
| European military aircraft projects. They participated in
| Eurofighter for example and are part of the FCAS program.
|
| It's true to the extent that the US does a lot more and
| broader military procurement in general, so Boeing gets a
| smaller piece of a much bigger pie. Wheras Airbus is getting
| a piece of most European projects as a member of one
| consortium or another, it's just a smaller pie.
| mshockwave wrote:
| compare to Boeing's volume? maybe, but Airbus is one of
| _Europe_'s largest defense contractors
| robertlagrant wrote:
| Well, it does have quite a lot of European state ownership.
| It would be very surprising if it didn't win a lot of
| European contracts.
| shakow wrote:
| Are they? Airbus has its hands in quite a few major military
| programs (Eurofighter, A-400M, Tigre, Super-Puma, ...), as
| well as in spatial programs, especially satellite
| intelligence.
| gray_-_wolf wrote:
| Hm, from the reading, it seem he was pretty careful to not do any
| harm, but still, is this type of practical research actually
| legal?
| richbell wrote:
| It depends on the company. Many companies have bug bounty or
| vulnerability disclosure programs that explicitly guarantee
| safe harbor+protections for researchers.
|
| However, not all organizations are happy to be contacted about
| security issues. Sometimes doing the right thing can still
| result in (threats of) legal repercussions.
|
| https://arstechnica.com/tech-policy/2021/10/missouri-gov-cal...
| azeemba wrote:
| The bug bounties are usually pretty clear that you aren't
| allowed to make changes in the production systems. Here they
| made many changes - including changing the name of a release.
|
| The bug bounties also prefer seeing a working attack instead
| of theoretical reports. So not sure how they could have
| tested their attack in this situation without making actual
| changes.
| richbell wrote:
| It depends. Sometimes companies only permit testing in
| specific test domains, other times they permit it as long
| as your activity is clearly identifiable (e.g., including a
| custom header in all request).
|
| It does seem like walking a precarious tight rope.
| richardwhiuk wrote:
| Essentially, generally, no.
|
| Once you've discovered a security hole, exploiting it to see
| how much access you can get is generally frowned upon.
| simonw wrote:
| The key to this attack is: "The result of these settings is that,
| by default, any repository contributor can execute code on the
| self-hosted runner by submitting a malicious PR."
|
| Problem: you need to be a "contributor" to the repo for your PR
| to trigger workflows without someone approving them first.
|
| So: "We needed to be a contributor to the PyTorch repository to
| execute workflows, but we didn't feel like spending time adding
| features to PyTorch. Instead, we found a typo in a markdown file
| and submitted a fix."
|
| I really don't like this aspect of GitHub that people who have
| submitted a typo fix gain additional privileges on the repo by
| default. That's something GitHub can fix: I think "this user gets
| to trigger PRs without approval in the future" should be an
| active button repo administrators need to click, maybe in the PR
| flow there could be "Approve this run" and "Approve this run and
| all future runs by user X" buttons.
| trevyn wrote:
| The vast majority of repos should be able to run CI on pull
| requests with no privileges at all. GitHub can manage any
| resource utilization issues on their end.
|
| Is the issue here that a self-hosted runner was needed for some
| hardware tests?
| withinboredom wrote:
| Self-hosted runners is the way to go, IMHO. Especially if you
| have bare metal resources. I love how fast my builds are with
| 16 cores, and gobs of ram.
| ethbr1 wrote:
| What's the GitHub Actions tooling like for emphemeral self-
| hosted runners?
|
| Afaict, a huge portion of this attack came from persistence
| on the self-hosted runner.
|
| Absent that, they would have needed a container jailbreak
| as well, which substantially ups the difficulty.
|
| And if a repo is running <100 builds a day, spin up + kill
| container seems a small per-build price to pay for the
| additional security isolation.
| jackwilsdon wrote:
| GitHub themselves don't seem to provide any mechanism to
| make runners ephemeral. It looks like all they allow you
| to do is flag a runner as ephemeral, meaning it will be
| de-registered once a job is completed - you need to write
| your own tooling to wipe it yourself (either via starting
| a whole new runner in a new environment and registering
| that or wiping the existing runner and re-registering
| it).
|
| https://docs.github.com/en/actions/hosting-your-own-
| runners/...
| gz5 wrote:
| there are 3rd party foss options (1):
|
| 1. ephemeral + zero implicit trust (2)
| https://blog.openziti.io/my-intern-assignment-call-a-
| dark-we...
|
| 2. zero implicit trust: https://github.com/openziti/ziti-
| webhook-action
|
| (1) disclosure, maintainer (2) zero implicit trust in
| this case = no open inbound ports on underlay; need to
| access via app-specific overlay which requires strong
| identity, authN, authZ
| o11c wrote:
| The problem is that there are fundamentally 2 different kinds
| of builds, but the current tooling is weak:
|
| * pre-merge builds on PRs. These should not have privileges,
| but making the distinction between the two cases requires a
| lot of care.
|
| * official builds of "master" or a feature branch from the
| main repo. These "need" privileges to upload the resulting
| artifacts somewhere. Of course, if all it did was wake up
| some daemon elsewhere, which could download straight from the
| CI in a verified way based on the _CI 's_ notion of the job
| name, it would be secure without privileges, but most CI
| systems don't want to preserve huge artifacts, and
| maintaining the separate daemon is also annoying.
| akx wrote:
| For the second point, PyPI's trusted publisher
| implementation does this very well:
| https://docs.pypi.org/trusted-publishers/
| trevyn wrote:
| It's called GitHub secrets.
|
| Builds off of main get the secrets, pull requests from
| randos don't.
|
| And public repos don't pay for CI on GitHub.
|
| Not rocket science, people.
| n2d4 wrote:
| Pytorch did use GH secrets for the valuables and you can
| see that this wasn't enough, right there in the OP,
| because the self-hosted runners are still shared
| adnanthekhan wrote:
| Yup! This is what makes this kind of attack scary and
| very unique to GitHub Actions. The baseline GITHUB_TOKEN
| just blows the door open on lateral movement via
| workflow_dispatch and and repository_dispatch events.
|
| In several of our other operations, not just PyTorch, we
| leveraged workflow_dispatch to steal a PAT from another
| workflows. Developers tend to over-provision PATs so
| often. More often than not we'd end up with a PAT that
| has all scopes checked and org admin permissions. With
| that one could clean out all of the secrets from an
| organization in minutes using automated tools such as
| https://github.com/praetorian-inc/gato.
| mlazos wrote:
| If they use the same runners, couldn't the attacker just
| wait? The runners would need to be sequestered too
| kevin_nisbet wrote:
| Absolutely. The real difficulty is tests on PR are by
| definition remote code execution by an untrusted source,
| so a full risk analysis and hardening needs to be done.
|
| Here's a similar mistake on an OSS repo a company I
| worked for made: https://goteleport.com/blog/hack-via-
| pull-request/
| Groxx wrote:
| > _The vast majority of repos should be able to run CI on
| pull requests with no privileges at all_
|
| When there are no side effects and no in-container secrets
| and the hosting is free or reasonably limited to prevent
| abusers, ideally yes.
|
| Outside that, heck no, that'd be crazy. You're allowing
| randos to run arbitrary code on your budget. Locking it down
| until it's reviewed is like step 1, they can validate locally
| until then.
| KptMarchewa wrote:
| GH actions are free on public repos.
| 1B05H1N wrote:
| How should one detect this kind of stuff?
|
| Also reverse props to the meta bug bounty program manager for not
| understanding the finding initially. I know it's difficult
| managing a program but it's not an excuse to brush something like
| this off.
| SethMLarson wrote:
| Great write-up! There's a few things you can do as either a
| producer or consumer to thwart this sort of attack:
|
| Producers:
|
| * Self-hosted infrastructure should not be running anonymous
| code. PRs should be reviewed before code executes on your
| infrastructure. Potentially should be a GitHub default when using
| self-hosted runners?
|
| * Permissions for workflows and tokens should be minimal and
| fine-grained. "permissions: read-all" should be your default when
| creating a new workflow. Prevents lateral movement via modifying
| workflow code.
|
| * Self-hosted infrastructure should be isolated and ephemeral,
| persistence was key for lateral movement with this attack.
|
| Consumers:
|
| * Use a lock file with pinned hashes, either --require-hashes or
| poetry/pipfile
|
| * Review the diff of the file getting installed, not the GitHub
| source code. This will get easier when build provenance becomes a
| feature of PyPI.
|
| * If your organization is large enough, consider mirroring PyPI
| with approved releases so the manual review effort can be
| amortized.
|
| * More coming in this space for Python, like third-party
| attestations about malware, provenance, build reproducibility,
| etc. Stay tuned! :)
| tlarkworthy wrote:
| > Self-hosted infrastructure should be isolated and ephemeral,
| persistence was key for lateral movement with this attack.
|
| Half the point of self hosting is to reuse cached resources.
| SethMLarson wrote:
| Isolation and ephemerality can still be accomplished using
| virtualization while providing the benefits of self-hosted
| resources.
| arcza wrote:
| The author's writing style is nauseating
| pnt12 wrote:
| Why?
| arcza wrote:
| Layered with hyperbole ("Bad. Very bad.") and takes way too
| much prose to get to the point.
| zX41ZdbW wrote:
| Recently, there were similar attempts (two) of supply chain
| attacks on the ClickHouse repository, but: - it didn't do
| anything because CI does not run without approval; - the user's
| account magically disappeared from GitHub with all pull requests
| within a day.
|
| Also worth reading a similar example:
| https://blog.cloudflare.com/cloudflares-handling-of-an-rce-v...
|
| Also, let me recommend our bug bounty program:
| https://github.com/ClickHouse/ClickHouse/issues/38986 It sounds
| easy - pick your favorite fuzzer, find a segfault (it should be
| easy because C++ isn't a memory-safe language), and get your
| paycheck.
| zX41ZdbW wrote:
| But I continue to find garbage in some of our CI scripts.
|
| Here is an example:
| https://github.com/ClickHouse/ClickHouse/pull/58794/files
|
| The right way is to:
|
| - always pin versions of all packages; - this includes OS
| package repositories, Docker repositories, as well as pip, npm,
| cargo, and others; - never download anything from the
| master/main or other branches; specify commit sha; - ideally,
| copy all Docker images to our own private registry; - ideally,
| calculate hashes after download and compare them with what was
| before; - frankly speaking, if CI runs air-gapped, it would be
| much better...
| EE84M3i wrote:
| I don't understand this PR. How is it an "attack"? It seems
| to just be pinning a package version, was the package
| compromised, or was this more a "vulnerability"?
| korhojoa wrote:
| If they're pulling from master instead of from a known
| version, it could be changed to be malicious, and the next
| time it is fetched, the malicious version would be used
| instead. It's a vulnerability.
| adnanthekhan wrote:
| Oh, you'll like this one then. Until 3 months ago
| GitHub's Runner images was pulling a package directly
| from Aliyun's CDN. This was executed during image testing
| (version check). So anyone with the ability to modify
| Aliyun's CDN in China could have carried out a pretty
| nasty attack. https://github.com/actions/runner-
| images/commit/6a9890362738...
|
| Now it's just anyone with write access to Aliyun's
| repository. :) (p.s. GitHub doesn't consider this a
| security issue).
| 29athrowaway wrote:
| $5,000 bounty? That doesn't seem fair.
|
| You could probably spend the same time on Amazon Mechanical Turk
| and make the same amount of money.
| Klasiaster wrote:
| With GARM (GitHub Actions Runner Manager) it's easy to manage
| ephemeral runners: https://github.com/cloudbase/garm
|
| One should also use workflow approvals for external contributors.
| cjbprime wrote:
| > Thankfully, we exploited this vulnerability before the bad
| guys.
|
| How do you know?
___________________________________________________________________
(page generated 2024-01-14 23:00 UTC)