[HN Gopher] Aggressive Attack on PyPI Attempting to Deliver Rust...
       ___________________________________________________________________
        
       Aggressive Attack on PyPI Attempting to Deliver Rust Executable
        
       Author : iamspoilt
       Score  : 131 points
       Date   : 2023-02-24 20:35 UTC (2 hours ago)
        
 (HTM) web link (blog.phylum.io)
 (TXT) w3m dump (blog.phylum.io)
        
       | ashishbijlani wrote:
       | We've built Packj [1] to detect packages with install hooks,
       | embedded binary blobs, and other such malicious/risky packages.
       | It performs static/dynamic/metadata analysis to look for
       | "suspicious" attributes.
       | 
       | 1. https://github.com/ossillate-inc/packj
        
         | lrem wrote:
         | Why are these things riskier than the plain Python code you
         | likely don't read, but go ahead and execute?
        
           | [deleted]
        
           | ashishbijlani wrote:
           | A number of academic researchers (including us) have studied
           | malware samples from past open-source supply chain attacks
           | and identified code/metadata attributes that make packages
           | vulnerable to such attacks. Packj scans for several such
           | attributes to identify insecure or "weak links" in your
           | software supply chain (e.g., missing or incorrect GitHub
           | repo, very high version number, use of decode+exec, etc.).
           | Full list here: https://github.com/ossillate-
           | inc/packj/blob/main/packj/audit...
        
           | twodave wrote:
           | It's relative, but I assume it's flagging for certain class
           | of known malicious patterns. There's nothing stopping you
           | from writing malicious python code, but essentially that
           | script will only run while you expect it to in most cases
           | unless it interacts with the OS in some way.
           | 
           | It doesn't make plain Python code you blindly execute any
           | safer, but at least you've explicitly given those packages
           | your trust. I believe this is more geared toward detecting
           | compromises of those packages you have given that trust.
        
             | colatkinson wrote:
             | Packages can do weird things like auto-loading into the
             | interpreter (example: [0]). So in a scenario where a
             | malicious package has ended up on your machine, you're a
             | bit screwed whether it's a .so or a .py. I believe that was
             | the point OP was making -- a pure-Python wheel is not
             | really any safer than a wheel with embedded binaries.
             | 
             | [0]: https://github.com/pyston/pyston/blob/1d65d4831912179c
             | 26bb27...
        
               | Godel_unicode wrote:
               | It's like tor though; everyone that's malicious doesn't
               | do this but the ratio is much higher so be much more
               | suspicious. Security isn't about silver bullets, it's
               | about a compilation of tricks that make malicious things
               | more obvious.
        
       | belinder wrote:
       | How is the rust part relevant?
        
         | hoppla wrote:
         | Because everywhere it's used, it has to be explicitly
         | mentioned, like clothing that use GoreTex(tm).
        
         | throwthere wrote:
         | Chatgpt recommended it for the upvotes.
        
           | wheelerof4te wrote:
           | "The most beloved programming language used to build and ship
           | malware (PHOTO/VIDEO/NSFW)"
        
         | HL33tibCe7 wrote:
         | It's unusual for malware.
        
           | Godel_unicode wrote:
           | I mean, not really? There's a lot of legacy stuff written in
           | other languages, but malware authors have realized that
           | people are less skeptical of rust and are actively taking
           | advantage of that fact.
        
           | j-krieger wrote:
           | Yes, but only for the time being. I've recently published a
           | paper on the topic. Rust and Golang are getting immensely
           | popular with malware authors.
        
             | hoppla wrote:
             | I would love a link so I could read your paper.
        
               | arthurcolle wrote:
               | Same! Please share GP
        
         | sidlls wrote:
         | We should only refer to Rust when it's included in positive
         | events? How is it _not_ relevant here? It was used to build
         | executables to inject, likely for malicious purposes. Given its
         | newness and all the other hype around it, I 'd say it's very
         | relevant.
        
         | yabones wrote:
         | We like when malware is written in a memory-safe language.
        
         | baguettefurnace wrote:
         | just like if a tesla is involved in a car crash, headline must
         | mention Tesla
        
           | butterNaN wrote:
           | Isn't that because sometimes the tesla software might be at
           | fault?
        
         | mftb wrote:
         | If a payload is native it's potentially more of a problem than
         | a script. If the payload had been c or c++, I wouldn't have
         | been surprised if they noted that either.
        
       | i_love_cookies wrote:
       | [dead]
        
       | lelandbatey wrote:
       | Interestingly, all the packages, even the ones from today, have
       | all been taken down. So too have all the files that were being
       | hosted on Dropbox.
        
       | photochemsyn wrote:
       | Wow this site runs a lot of JavaScript, speaking of aggressive
       | data collection.
       | 
       | https://blog.hubspot.com/website/data-mining
        
       | fortran77 wrote:
       | But I thought Rust was supposed to be safe?!
        
       | woodruffw wrote:
       | I understand that this is meant to be an eye-popping press
       | release (and implicitly a product spotlight), but some of these
       | claims make me gag.
       | 
       | It's not an attack "on" PyPI, or even an attack at all: someone
       | is just spamming the index with packages. There's no evidence
       | that these packages are being downloaded _by anyone at all,_ or
       | that the person in question has made any serious effort to
       | conceal their attentions (it 's all stuffed in the setup script
       | without any obfuscation, as the post says). The executable in
       | question isn't even served through PyPI (for reasons that are
       | unclear to me): it's downloaded by the dropper script.
       | Ironically, serving the binary directly would probably raise
       | fewer red flags.
       | 
       | Supply chain security is important; we should reserve phrases
       | like "aggressive attack" for things that aren't script kiddie
       | spam.
        
         | asperous wrote:
         | I think it's a serious threat, especially with LLMs now because
         | people can make believable packages at scale. Not everyone vets
         | their packages thoroughly
        
           | [deleted]
        
           | woodruffw wrote:
           | You've always been able to make "believable" packages at
           | scale. PyPI doesn't enforce uniqueness: you can crank out
           | malicious near-duplicates of any package you please.
        
             | zeven7 wrote:
             | And, to parent's point, now LLMs will tell people to use
             | them and they will[1].
             | 
             | [1] https://news.ycombinator.com/item?id=34916682
        
               | Groxx wrote:
               | Stack Overflow and Google search results were already
               | doing that though, at massive scale. I agree it changes
               | things _somehow_ , but people not thinking before acting
               | is not a new problem.
        
           | freeqaz wrote:
           | I agree that it is a threat. I don't think this instance is
           | (it's too noisy).
           | 
           | I wrote a comment on the NPM thread earlier
           | (https://news.ycombinator.com/threads?id=freeqaz) that I'll
           | quote here:
           | 
           | > "While being flooded with spam is never good, it gets
           | immediately noticed and mitigated. It's harder for open
           | source projects to spot and stop rare one-offs"
           | 
           | This is the real problem that NPM and other ecosystems face.
           | A determined attacker that is trying to "poison" a popular
           | Open Source package just has to feign as a maintainer long
           | enough to succeed[0]. Defeating these types of attacks will
           | require rethinking how we think about trust of packages.
           | 
           | Projects like Deno are one approach (fork the ecosystem)
           | while projects like Packj (mentioned elsewhere here),
           | Socket.dev, and LunaTrace[1] are taking the other angle (make
           | it harder to install malware).
           | 
           | It's hard to say which approach is better right away.
           | (Probably a hybrid of both, realistically) It's just non-
           | trivial to fix this in one clean swoop. It's messy.
           | 
           | 0: https://www.trendmicro.com/vinfo/us/security/news/cybercri
           | me...
           | 
           | 1: https://github.com/lunasec-io/lunasec
        
           | wheelerof4te wrote:
           | Me, I just use the stdlib and my local packages.
           | 
           | There's something beautiful in knowing you're using pure,
           | clean Python. Much easier to install, also.
        
           | codetrotter wrote:
           | Speaking of LLMs. Since LLMs like to hallucinate every now
           | and then, an LLM could also hallucinate names of packages
           | that it tells people to install. And those packages could in
           | turn have been squatted by malware authors.
           | 
           | And in this way, malicious packages may be unintentionally
           | downloaded by users even when those malicious packages did
           | not yet exist when the LLM was trained. Just because the
           | hallucinated package name was randomly later taken by someone
           | malicious.
        
             | freeqaz wrote:
             | I've seen this effect get amplified also when somebody puts
             | a "bad" answer in a public place like StackOverflow. It is
             | possible to have quite a large blast radius from something
             | like this!
        
         | [deleted]
        
         | agolio wrote:
         | The most "aggressive" part is that those sweet package names
         | like "colorslib" are being stolen.
        
           | komali2 wrote:
           | My biggest curiosity here is how they generated over a
           | thousand package names ranging from feasible to interesting.
           | I expected gibberish.
           | 
           | Lol, maybe, "chatgpt, give me a thousand feasible pypi
           | package names"?
        
             | praash wrote:
             | The names seem to be simple concatenations of random parts
             | like "game", "lib", "vm", "cv", "http".
             | 
             | They do look surprisingly convincing.
        
           | lelandbatey wrote:
           | Thankfully, they're not actually being stolen because all the
           | packages were already taken down; they're available for
           | legitimate use again: https://pypi.org/project/colorslib/
        
         | worik wrote:
         | No. This is very concerning.
         | 
         | Attacking a popular repository like this does not have to have
         | a high hit rate.
         | 
         | "Script kiddie spam" is now computers get compromised.
         | Unsophisticated mass attack.
         | 
         | This sport of thing, combined with woeful security and fragile
         | systems are causing havoc the world over.
        
       | blibble wrote:
       | why does pypi/pip still not have namespacing?
       | 
       | Maven sorted this out 20 years ago
       | 
       | what's a bit sad is the python packaging's authority survey from
       | a few months ago seemed to be mostly interested in vision and
       | mission statements
       | 
       | rather that building a functional set of tools
        
         | djbusby wrote:
         | Every lang-ecosystem needs to re-implement CPAN the hard way.
        
         | woodruffw wrote:
         | Namespacing is not a security boundary: it's a usability
         | feature that helps users visually distinguish between packages
         | that share the same name but different owners. I don't think it
         | would meaningfully affect things like package index spam, which
         | this is.
         | 
         | (This is not a reason _not_ to add namespacing; just an
         | observation that it 's mostly irrelevant to contexts like
         | this.)
        
           | pphysch wrote:
           | Namespacing is a _lot_ more than just a theoretical name
           | collision avoider.
           | 
           | Good namespacing (e.g. in Go), in practice, provides critical
           | _context_ about the development /publication of a software
           | package.
        
           | blibble wrote:
           | obviously, but it allows delegation of trust onto other
           | systems (like the DNS)
           | 
           | example: the package named "aws" on pypi was created by some
           | random guy and has been abandoned for years
           | 
           | if pypi/pip supported namespacing that would be
           | info.randomdude.aws instead
           | 
           | and amazon's packages would be under com.amazon
           | 
           | not being able to namespace internal packages is another
           | security issue that is substantially improved with proper
           | namespacing
           | 
           | to be blunt: not supporting it at this point is reckless and
           | irresponsible
           | 
           | (I note you're part of pypa!)
        
             | georgyo wrote:
             | I like the way golang handled this. Imports are the URL to
             | the resource. No central distribution mechanism at all. In
             | the past few years they implemented a optional catching
             | layer so you a dependencies going offline doesn't
             | necessarily mean that it unavailable anymore.
        
             | dpedu wrote:
             | Who's to say mr randomdude won't claim com.amazon first?
        
               | dragonwriter wrote:
               | You could in principle do proof-of-ownership checks like
               | Google does for things like Webmaster Tools, so you'd
               | need to control a domain to have thr corresponding
               | namespace.
        
               | sophacles wrote:
               | Let's encrypt solved this by doing a proof of control
               | over the domain name, and in an automated way.
               | 
               | Pypi could do this. Or, they could require that someone
               | demonstrate proof of ownership for a namespace by signing
               | it with a certificate tied to the domain name (so you
               | couldn't claim the com.bigco namespace without having the
               | certs, which you can't get without owning that domain).
               | There could even be signature requirements/proof for each
               | package and/or version uploaded.
        
               | pphysch wrote:
               | It's much easier to correct the ownership of a single
               | namespace than N packages in the global namespace
        
               | natpalmer1776 wrote:
               | Well, in theory you could have a namespace schema that
               | differentiates between user-submitted and organization-
               | submitted packages such that randomdude's would appear as
               | 'public.randomdude.aws' and organization-owned namespaces
               | verified by a DNS record would appear as 'com.amazon.aws'
        
             | woodruffw wrote:
             | DNS isn't a particularly secure root of trust; Java is
             | somewhat unique among package ecosystems for picking it as
             | their trust anchor.
             | 
             | It also just kicks the can down the road: Amazon is the the
             | easy case with `com.amazon`, but it isn't clear a priori
             | whether you should trust `net.coolguy.importantpackage` or
             | `net.cooldude.importantpackage`. These kinds of trust
             | relationships require external communication of a kind that
             | package indices are not equipped to supply, and should not
             | attempt to solve haphazardly.
             | 
             | > (I note you're part of pypa!)
             | 
             | I am a member of PyPA, but I don't represent anyone's
             | opinions but my own. It's a very loose collection of
             | projects, and it would be incorrect to read a general
             | opinion from mine.
        
               | blibble wrote:
               | > Amazon is the the easy case with `com.amazon`, but it
               | isn't clear a priori whether you should trust
               | `net.coolguy.importantpackage` or
               | `net.cooldude.importantpackage`
               | 
               | this is a classic example of not letting perfect be the
               | enemy of good
               | 
               | there is no perfect solution, there never will be
               | 
               | piggybacking off of DNS works extremely well for Java and
               | Go (and the tooling is a pleasure to work with)
               | 
               | meanwhile Python continues to be a complete disaster
        
               | woodruffw wrote:
               | I agree there is no perfect solution. But I want a _good_
               | solution, and I disagree that DNS is a _good_ one.
        
               | blibble wrote:
               | I look forward to another 20 years of no progress!
        
               | woodruffw wrote:
               | Your cynicism isn't warranted: we've made significant
               | improvements to PyPI over the last 4 years[1][2], and I'm
               | currently working on additional features that will make
               | secure publishing to PyPI easier[3]. We're also working
               | on a codesigning implementation for PyPI, based on
               | Sigstore[4].
               | 
               | Security needs to be evidence and outcome-driven, first
               | and foremost. That takes a while, but improved outcomes
               | make it worth it.
               | 
               | [1]: https://pyfound.blogspot.com/2019/06/pypi-now-
               | supports-two-f...
               | 
               | [2]: https://pythoninsider.blogspot.com/2019/07/pypi-now-
               | supports...
               | 
               | [3]: https://github.com/pypi/warehouse/issues/12465
               | 
               | [4]: https://www.sigstore.dev/
        
               | blibble wrote:
               | > That takes a while, but improved outcomes make it worth
               | it.
               | 
               | meanwhile the integrity of the supply chain continues to
               | be compromised
               | 
               | > Your cynicism isn't warranted
               | 
               | it is: the python packaging situation is _worse_ today
               | than it was when I started writing Python in 2005
               | 
               | the legions of meetings, grandiose titles, conferences
               | and mountains of unreadable proposals have produced
               | tooling that is objectively worse than what Maven offered
               | close to two decades ago
        
         | klhanb wrote:
         | That's their calling card. Long discussion threads, mails
         | spanning whole pages, silencing opposition.
         | 
         | But deliver anything more streamlined and secure? Hell, no!
        
       | ianai wrote:
       | Even animals in the wild agree to peace around the watering hole.
        
         | MadSudaca wrote:
         | The problem is assuming we're better than animals.
        
         | eternalban wrote:
         | You forgot some of those animals have fangs.
         | 
         | It's like NYC's side walks. Compare pedestrian behavior at say
         | SoHo (daylight) and say LES (nighttime). Amazingly enough, the
         | partying and inebrieted pedestrians at night all file politely
         | in the correct bimodal L|R formation. During the day, it's a
         | rather wild and somewhat uncivilized dynamic slalom formation.
         | My theory: Fangs. The night creatures know someone potentially
         | dangerous maybe in the midst.
        
         | readthenotes1 wrote:
         | Someone forgot to tell the crocodiles...
         | 
         | https://journals.plos.org/plosone/article?id=10.1371/journal...
        
       | steponlego wrote:
       | Yet another attack that requires the biggest malware vector, MS
       | Windows. LOL
        
       | throwaway81523 wrote:
       | Wonder if that is related to the malware spamming of NPM that I
       | saw something about last night.
       | 
       | Python used to have a "batteries included" philosophy which tried
       | to put most important stuff into the distro, reducing the number
       | of external dependencies any given app needed. They seem to have
       | abandoned that now, leaving us to fend for ourselves against the
       | malware.
       | 
       | NPM spam: https://www.scmagazine.com/analysis/devops/npm-
       | repository-15...
        
         | wheelerof4te wrote:
         | "They seem to have abandoned that now, leaving us to fend for
         | ourselves against the malware."
         | 
         | Yes, along with reducing the stdlib and directing us to PyPI
         | for "alternatives".
        
       | almet wrote:
       | It's still the same story : PyPI still doesn't have a way to
       | automatically detect interactions with the network and the
       | filesystems for the submitted packages. It's a complex thing to
       | do for sure, but that would be a welcome addition, I guess.
        
         | woodruffw wrote:
         | PyPI still doesn't have this because no packaging ecosystem
         | does. It's impossible to do in the general case if your
         | packaging schema allows arbitrary code execution, which Python
         | (and Ruby, and NPM, and Cargo, etc.) allow.
         | 
         | The closest thing is pattern/AST matching on the package's
         | source, but trivial obfuscation defeats that. There's also no
         | requirement that a package on PyPI is even uploaded with source
         | (binary wheel-only packages are perfectly acceptable).
        
           | eigenvalue wrote:
           | This seems eminently solvable though. Why can't every package
           | submission cause some minimal sandboxed docker image to
           | install the package and call the various functions and
           | methods and log all network and disk activity? If anything
           | looks suspicious it would be denied and the submitter would
           | have to appeal it, explaining why the submission is valid.
           | The same applies for NPM and Cargo. I know there is a
           | researcher out there who has retrieved and installed every
           | single pip package to do an analysis, which is a good start.
           | This seems like the kind of thing that wouldn't even cost all
           | that much, and big corporate users of python would stand to
           | benefit.
        
             | woodruffw wrote:
             | For one, because Docker is not a sandbox, and containers
             | are not a strong security boundary[1]. What you really need
             | here is a strongly isolated VM, at which point you're
             | playing cat-and-mouse games with your target: their new
             | incentive is to detect your (extremely detectable) VM, and
             | your job is to make the VM look as "normal" as possible
             | without _actually_ making it behave normally (because this
             | would mean getting exploited). That kind of work has a long
             | and frustrating tail, and it 's not particularly fruitful
             | (relative to the other things packaging ecosystems can do
             | to improve package security).
             | 
             | > I know there is a researcher out there who has retrieved
             | and installed every single pip package to do an analysis,
             | which is a good start.
             | 
             | You're probably talking about Moyix, who did indeed
             | downloaded every package on PyPI[2], and unintentionally
             | executed a bunch of arbitrary code on his local machine in
             | the process.
             | 
             | [1]: https://cloud.google.com/blog/products/gcp/exploring-
             | contain...
             | 
             | [2]: https://moyix.blogspot.com/2022/09/someones-been-
             | messing-wit...
        
             | nodogoto wrote:
             | [dead]
        
           | spenczar5 wrote:
           | "no packaging ecosystem does."
           | 
           | This is a little bit too strong, since packaging doesn't
           | require arbitrary code execution. For example, Go doesn't
           | permit arbitrary code execution during `go get`. Now - there
           | have been bugs which permit code execution (like
           | https://github.com/golang/go/issues/22125) but they are
           | treated as security vulnerabilities and bugs.
           | 
           | Of course, you're right about Python.
        
             | woodruffw wrote:
             | What I meant by that is that no packaging ecosystem (to my
             | knowledge) runs arbitrary uploaded code to find network
             | activity. Some may do simpler, static analyses, but
             | outright execution for dynamic analysis purposes isn't
             | something I'm aware of any ecosystem doing.
             | 
             | Python, Ruby, et al. are in an even worse position than
             | that baseline, since they have both arbitrary code in the
             | package itself _and_ arbitrary code in the package 's
             | definition. But the problem is a universal one!
        
               | spenczar5 wrote:
               | Ah, yep, you're right about that as far as I know too.
        
         | photon12 wrote:
         | Smart attackers are already/will add
         | `sleep(SOME_NUMBER_LONGER_THAN_SCAN_SANDBOX_LIFETIME)` before
         | anything that does FS or network access. Not to say that this
         | wouldn't be a welcome addition, but the scanning needs to be
         | understood in the context of the inherent limitations of large
         | scale runtime behavior detection of packages when you have a
         | fixed amount of hardware and time for running those scans.
        
       ___________________________________________________________________
       (page generated 2023-02-24 23:00 UTC)