[HN Gopher] Aggressive Attack on PyPI Attempting to Deliver Rust...
___________________________________________________________________
Aggressive Attack on PyPI Attempting to Deliver Rust Executable
Author : iamspoilt
Score : 131 points
Date : 2023-02-24 20:35 UTC (2 hours ago)
(HTM) web link (blog.phylum.io)
(TXT) w3m dump (blog.phylum.io)
| ashishbijlani wrote:
| We've built Packj [1] to detect packages with install hooks,
| embedded binary blobs, and other such malicious/risky packages.
| It performs static/dynamic/metadata analysis to look for
| "suspicious" attributes.
|
| 1. https://github.com/ossillate-inc/packj
| lrem wrote:
| Why are these things riskier than the plain Python code you
| likely don't read, but go ahead and execute?
| [deleted]
| ashishbijlani wrote:
| A number of academic researchers (including us) have studied
| malware samples from past open-source supply chain attacks
| and identified code/metadata attributes that make packages
| vulnerable to such attacks. Packj scans for several such
| attributes to identify insecure or "weak links" in your
| software supply chain (e.g., missing or incorrect GitHub
| repo, very high version number, use of decode+exec, etc.).
| Full list here: https://github.com/ossillate-
| inc/packj/blob/main/packj/audit...
| twodave wrote:
| It's relative, but I assume it's flagging for certain class
| of known malicious patterns. There's nothing stopping you
| from writing malicious python code, but essentially that
| script will only run while you expect it to in most cases
| unless it interacts with the OS in some way.
|
| It doesn't make plain Python code you blindly execute any
| safer, but at least you've explicitly given those packages
| your trust. I believe this is more geared toward detecting
| compromises of those packages you have given that trust.
| colatkinson wrote:
| Packages can do weird things like auto-loading into the
| interpreter (example: [0]). So in a scenario where a
| malicious package has ended up on your machine, you're a
| bit screwed whether it's a .so or a .py. I believe that was
| the point OP was making -- a pure-Python wheel is not
| really any safer than a wheel with embedded binaries.
|
| [0]: https://github.com/pyston/pyston/blob/1d65d4831912179c
| 26bb27...
| Godel_unicode wrote:
| It's like tor though; everyone that's malicious doesn't
| do this but the ratio is much higher so be much more
| suspicious. Security isn't about silver bullets, it's
| about a compilation of tricks that make malicious things
| more obvious.
| belinder wrote:
| How is the rust part relevant?
| hoppla wrote:
| Because everywhere it's used, it has to be explicitly
| mentioned, like clothing that use GoreTex(tm).
| throwthere wrote:
| Chatgpt recommended it for the upvotes.
| wheelerof4te wrote:
| "The most beloved programming language used to build and ship
| malware (PHOTO/VIDEO/NSFW)"
| HL33tibCe7 wrote:
| It's unusual for malware.
| Godel_unicode wrote:
| I mean, not really? There's a lot of legacy stuff written in
| other languages, but malware authors have realized that
| people are less skeptical of rust and are actively taking
| advantage of that fact.
| j-krieger wrote:
| Yes, but only for the time being. I've recently published a
| paper on the topic. Rust and Golang are getting immensely
| popular with malware authors.
| hoppla wrote:
| I would love a link so I could read your paper.
| arthurcolle wrote:
| Same! Please share GP
| sidlls wrote:
| We should only refer to Rust when it's included in positive
| events? How is it _not_ relevant here? It was used to build
| executables to inject, likely for malicious purposes. Given its
| newness and all the other hype around it, I 'd say it's very
| relevant.
| yabones wrote:
| We like when malware is written in a memory-safe language.
| baguettefurnace wrote:
| just like if a tesla is involved in a car crash, headline must
| mention Tesla
| butterNaN wrote:
| Isn't that because sometimes the tesla software might be at
| fault?
| mftb wrote:
| If a payload is native it's potentially more of a problem than
| a script. If the payload had been c or c++, I wouldn't have
| been surprised if they noted that either.
| i_love_cookies wrote:
| [dead]
| lelandbatey wrote:
| Interestingly, all the packages, even the ones from today, have
| all been taken down. So too have all the files that were being
| hosted on Dropbox.
| photochemsyn wrote:
| Wow this site runs a lot of JavaScript, speaking of aggressive
| data collection.
|
| https://blog.hubspot.com/website/data-mining
| fortran77 wrote:
| But I thought Rust was supposed to be safe?!
| woodruffw wrote:
| I understand that this is meant to be an eye-popping press
| release (and implicitly a product spotlight), but some of these
| claims make me gag.
|
| It's not an attack "on" PyPI, or even an attack at all: someone
| is just spamming the index with packages. There's no evidence
| that these packages are being downloaded _by anyone at all,_ or
| that the person in question has made any serious effort to
| conceal their attentions (it 's all stuffed in the setup script
| without any obfuscation, as the post says). The executable in
| question isn't even served through PyPI (for reasons that are
| unclear to me): it's downloaded by the dropper script.
| Ironically, serving the binary directly would probably raise
| fewer red flags.
|
| Supply chain security is important; we should reserve phrases
| like "aggressive attack" for things that aren't script kiddie
| spam.
| asperous wrote:
| I think it's a serious threat, especially with LLMs now because
| people can make believable packages at scale. Not everyone vets
| their packages thoroughly
| [deleted]
| woodruffw wrote:
| You've always been able to make "believable" packages at
| scale. PyPI doesn't enforce uniqueness: you can crank out
| malicious near-duplicates of any package you please.
| zeven7 wrote:
| And, to parent's point, now LLMs will tell people to use
| them and they will[1].
|
| [1] https://news.ycombinator.com/item?id=34916682
| Groxx wrote:
| Stack Overflow and Google search results were already
| doing that though, at massive scale. I agree it changes
| things _somehow_ , but people not thinking before acting
| is not a new problem.
| freeqaz wrote:
| I agree that it is a threat. I don't think this instance is
| (it's too noisy).
|
| I wrote a comment on the NPM thread earlier
| (https://news.ycombinator.com/threads?id=freeqaz) that I'll
| quote here:
|
| > "While being flooded with spam is never good, it gets
| immediately noticed and mitigated. It's harder for open
| source projects to spot and stop rare one-offs"
|
| This is the real problem that NPM and other ecosystems face.
| A determined attacker that is trying to "poison" a popular
| Open Source package just has to feign as a maintainer long
| enough to succeed[0]. Defeating these types of attacks will
| require rethinking how we think about trust of packages.
|
| Projects like Deno are one approach (fork the ecosystem)
| while projects like Packj (mentioned elsewhere here),
| Socket.dev, and LunaTrace[1] are taking the other angle (make
| it harder to install malware).
|
| It's hard to say which approach is better right away.
| (Probably a hybrid of both, realistically) It's just non-
| trivial to fix this in one clean swoop. It's messy.
|
| 0: https://www.trendmicro.com/vinfo/us/security/news/cybercri
| me...
|
| 1: https://github.com/lunasec-io/lunasec
| wheelerof4te wrote:
| Me, I just use the stdlib and my local packages.
|
| There's something beautiful in knowing you're using pure,
| clean Python. Much easier to install, also.
| codetrotter wrote:
| Speaking of LLMs. Since LLMs like to hallucinate every now
| and then, an LLM could also hallucinate names of packages
| that it tells people to install. And those packages could in
| turn have been squatted by malware authors.
|
| And in this way, malicious packages may be unintentionally
| downloaded by users even when those malicious packages did
| not yet exist when the LLM was trained. Just because the
| hallucinated package name was randomly later taken by someone
| malicious.
| freeqaz wrote:
| I've seen this effect get amplified also when somebody puts
| a "bad" answer in a public place like StackOverflow. It is
| possible to have quite a large blast radius from something
| like this!
| [deleted]
| agolio wrote:
| The most "aggressive" part is that those sweet package names
| like "colorslib" are being stolen.
| komali2 wrote:
| My biggest curiosity here is how they generated over a
| thousand package names ranging from feasible to interesting.
| I expected gibberish.
|
| Lol, maybe, "chatgpt, give me a thousand feasible pypi
| package names"?
| praash wrote:
| The names seem to be simple concatenations of random parts
| like "game", "lib", "vm", "cv", "http".
|
| They do look surprisingly convincing.
| lelandbatey wrote:
| Thankfully, they're not actually being stolen because all the
| packages were already taken down; they're available for
| legitimate use again: https://pypi.org/project/colorslib/
| worik wrote:
| No. This is very concerning.
|
| Attacking a popular repository like this does not have to have
| a high hit rate.
|
| "Script kiddie spam" is now computers get compromised.
| Unsophisticated mass attack.
|
| This sport of thing, combined with woeful security and fragile
| systems are causing havoc the world over.
| blibble wrote:
| why does pypi/pip still not have namespacing?
|
| Maven sorted this out 20 years ago
|
| what's a bit sad is the python packaging's authority survey from
| a few months ago seemed to be mostly interested in vision and
| mission statements
|
| rather that building a functional set of tools
| djbusby wrote:
| Every lang-ecosystem needs to re-implement CPAN the hard way.
| woodruffw wrote:
| Namespacing is not a security boundary: it's a usability
| feature that helps users visually distinguish between packages
| that share the same name but different owners. I don't think it
| would meaningfully affect things like package index spam, which
| this is.
|
| (This is not a reason _not_ to add namespacing; just an
| observation that it 's mostly irrelevant to contexts like
| this.)
| pphysch wrote:
| Namespacing is a _lot_ more than just a theoretical name
| collision avoider.
|
| Good namespacing (e.g. in Go), in practice, provides critical
| _context_ about the development /publication of a software
| package.
| blibble wrote:
| obviously, but it allows delegation of trust onto other
| systems (like the DNS)
|
| example: the package named "aws" on pypi was created by some
| random guy and has been abandoned for years
|
| if pypi/pip supported namespacing that would be
| info.randomdude.aws instead
|
| and amazon's packages would be under com.amazon
|
| not being able to namespace internal packages is another
| security issue that is substantially improved with proper
| namespacing
|
| to be blunt: not supporting it at this point is reckless and
| irresponsible
|
| (I note you're part of pypa!)
| georgyo wrote:
| I like the way golang handled this. Imports are the URL to
| the resource. No central distribution mechanism at all. In
| the past few years they implemented a optional catching
| layer so you a dependencies going offline doesn't
| necessarily mean that it unavailable anymore.
| dpedu wrote:
| Who's to say mr randomdude won't claim com.amazon first?
| dragonwriter wrote:
| You could in principle do proof-of-ownership checks like
| Google does for things like Webmaster Tools, so you'd
| need to control a domain to have thr corresponding
| namespace.
| sophacles wrote:
| Let's encrypt solved this by doing a proof of control
| over the domain name, and in an automated way.
|
| Pypi could do this. Or, they could require that someone
| demonstrate proof of ownership for a namespace by signing
| it with a certificate tied to the domain name (so you
| couldn't claim the com.bigco namespace without having the
| certs, which you can't get without owning that domain).
| There could even be signature requirements/proof for each
| package and/or version uploaded.
| pphysch wrote:
| It's much easier to correct the ownership of a single
| namespace than N packages in the global namespace
| natpalmer1776 wrote:
| Well, in theory you could have a namespace schema that
| differentiates between user-submitted and organization-
| submitted packages such that randomdude's would appear as
| 'public.randomdude.aws' and organization-owned namespaces
| verified by a DNS record would appear as 'com.amazon.aws'
| woodruffw wrote:
| DNS isn't a particularly secure root of trust; Java is
| somewhat unique among package ecosystems for picking it as
| their trust anchor.
|
| It also just kicks the can down the road: Amazon is the the
| easy case with `com.amazon`, but it isn't clear a priori
| whether you should trust `net.coolguy.importantpackage` or
| `net.cooldude.importantpackage`. These kinds of trust
| relationships require external communication of a kind that
| package indices are not equipped to supply, and should not
| attempt to solve haphazardly.
|
| > (I note you're part of pypa!)
|
| I am a member of PyPA, but I don't represent anyone's
| opinions but my own. It's a very loose collection of
| projects, and it would be incorrect to read a general
| opinion from mine.
| blibble wrote:
| > Amazon is the the easy case with `com.amazon`, but it
| isn't clear a priori whether you should trust
| `net.coolguy.importantpackage` or
| `net.cooldude.importantpackage`
|
| this is a classic example of not letting perfect be the
| enemy of good
|
| there is no perfect solution, there never will be
|
| piggybacking off of DNS works extremely well for Java and
| Go (and the tooling is a pleasure to work with)
|
| meanwhile Python continues to be a complete disaster
| woodruffw wrote:
| I agree there is no perfect solution. But I want a _good_
| solution, and I disagree that DNS is a _good_ one.
| blibble wrote:
| I look forward to another 20 years of no progress!
| woodruffw wrote:
| Your cynicism isn't warranted: we've made significant
| improvements to PyPI over the last 4 years[1][2], and I'm
| currently working on additional features that will make
| secure publishing to PyPI easier[3]. We're also working
| on a codesigning implementation for PyPI, based on
| Sigstore[4].
|
| Security needs to be evidence and outcome-driven, first
| and foremost. That takes a while, but improved outcomes
| make it worth it.
|
| [1]: https://pyfound.blogspot.com/2019/06/pypi-now-
| supports-two-f...
|
| [2]: https://pythoninsider.blogspot.com/2019/07/pypi-now-
| supports...
|
| [3]: https://github.com/pypi/warehouse/issues/12465
|
| [4]: https://www.sigstore.dev/
| blibble wrote:
| > That takes a while, but improved outcomes make it worth
| it.
|
| meanwhile the integrity of the supply chain continues to
| be compromised
|
| > Your cynicism isn't warranted
|
| it is: the python packaging situation is _worse_ today
| than it was when I started writing Python in 2005
|
| the legions of meetings, grandiose titles, conferences
| and mountains of unreadable proposals have produced
| tooling that is objectively worse than what Maven offered
| close to two decades ago
| klhanb wrote:
| That's their calling card. Long discussion threads, mails
| spanning whole pages, silencing opposition.
|
| But deliver anything more streamlined and secure? Hell, no!
| ianai wrote:
| Even animals in the wild agree to peace around the watering hole.
| MadSudaca wrote:
| The problem is assuming we're better than animals.
| eternalban wrote:
| You forgot some of those animals have fangs.
|
| It's like NYC's side walks. Compare pedestrian behavior at say
| SoHo (daylight) and say LES (nighttime). Amazingly enough, the
| partying and inebrieted pedestrians at night all file politely
| in the correct bimodal L|R formation. During the day, it's a
| rather wild and somewhat uncivilized dynamic slalom formation.
| My theory: Fangs. The night creatures know someone potentially
| dangerous maybe in the midst.
| readthenotes1 wrote:
| Someone forgot to tell the crocodiles...
|
| https://journals.plos.org/plosone/article?id=10.1371/journal...
| steponlego wrote:
| Yet another attack that requires the biggest malware vector, MS
| Windows. LOL
| throwaway81523 wrote:
| Wonder if that is related to the malware spamming of NPM that I
| saw something about last night.
|
| Python used to have a "batteries included" philosophy which tried
| to put most important stuff into the distro, reducing the number
| of external dependencies any given app needed. They seem to have
| abandoned that now, leaving us to fend for ourselves against the
| malware.
|
| NPM spam: https://www.scmagazine.com/analysis/devops/npm-
| repository-15...
| wheelerof4te wrote:
| "They seem to have abandoned that now, leaving us to fend for
| ourselves against the malware."
|
| Yes, along with reducing the stdlib and directing us to PyPI
| for "alternatives".
| almet wrote:
| It's still the same story : PyPI still doesn't have a way to
| automatically detect interactions with the network and the
| filesystems for the submitted packages. It's a complex thing to
| do for sure, but that would be a welcome addition, I guess.
| woodruffw wrote:
| PyPI still doesn't have this because no packaging ecosystem
| does. It's impossible to do in the general case if your
| packaging schema allows arbitrary code execution, which Python
| (and Ruby, and NPM, and Cargo, etc.) allow.
|
| The closest thing is pattern/AST matching on the package's
| source, but trivial obfuscation defeats that. There's also no
| requirement that a package on PyPI is even uploaded with source
| (binary wheel-only packages are perfectly acceptable).
| eigenvalue wrote:
| This seems eminently solvable though. Why can't every package
| submission cause some minimal sandboxed docker image to
| install the package and call the various functions and
| methods and log all network and disk activity? If anything
| looks suspicious it would be denied and the submitter would
| have to appeal it, explaining why the submission is valid.
| The same applies for NPM and Cargo. I know there is a
| researcher out there who has retrieved and installed every
| single pip package to do an analysis, which is a good start.
| This seems like the kind of thing that wouldn't even cost all
| that much, and big corporate users of python would stand to
| benefit.
| woodruffw wrote:
| For one, because Docker is not a sandbox, and containers
| are not a strong security boundary[1]. What you really need
| here is a strongly isolated VM, at which point you're
| playing cat-and-mouse games with your target: their new
| incentive is to detect your (extremely detectable) VM, and
| your job is to make the VM look as "normal" as possible
| without _actually_ making it behave normally (because this
| would mean getting exploited). That kind of work has a long
| and frustrating tail, and it 's not particularly fruitful
| (relative to the other things packaging ecosystems can do
| to improve package security).
|
| > I know there is a researcher out there who has retrieved
| and installed every single pip package to do an analysis,
| which is a good start.
|
| You're probably talking about Moyix, who did indeed
| downloaded every package on PyPI[2], and unintentionally
| executed a bunch of arbitrary code on his local machine in
| the process.
|
| [1]: https://cloud.google.com/blog/products/gcp/exploring-
| contain...
|
| [2]: https://moyix.blogspot.com/2022/09/someones-been-
| messing-wit...
| nodogoto wrote:
| [dead]
| spenczar5 wrote:
| "no packaging ecosystem does."
|
| This is a little bit too strong, since packaging doesn't
| require arbitrary code execution. For example, Go doesn't
| permit arbitrary code execution during `go get`. Now - there
| have been bugs which permit code execution (like
| https://github.com/golang/go/issues/22125) but they are
| treated as security vulnerabilities and bugs.
|
| Of course, you're right about Python.
| woodruffw wrote:
| What I meant by that is that no packaging ecosystem (to my
| knowledge) runs arbitrary uploaded code to find network
| activity. Some may do simpler, static analyses, but
| outright execution for dynamic analysis purposes isn't
| something I'm aware of any ecosystem doing.
|
| Python, Ruby, et al. are in an even worse position than
| that baseline, since they have both arbitrary code in the
| package itself _and_ arbitrary code in the package 's
| definition. But the problem is a universal one!
| spenczar5 wrote:
| Ah, yep, you're right about that as far as I know too.
| photon12 wrote:
| Smart attackers are already/will add
| `sleep(SOME_NUMBER_LONGER_THAN_SCAN_SANDBOX_LIFETIME)` before
| anything that does FS or network access. Not to say that this
| wouldn't be a welcome addition, but the scanning needs to be
| understood in the context of the inherent limitations of large
| scale runtime behavior detection of packages when you have a
| fixed amount of hardware and time for running those scans.
___________________________________________________________________
(page generated 2023-02-24 23:00 UTC)