[HN Gopher] Backdooring Rust crates for fun and profit
       ___________________________________________________________________
        
       Backdooring Rust crates for fun and profit
        
       Author : cjg
       Score  : 284 points
       Date   : 2021-11-18 14:29 UTC (8 hours ago)
        
 (HTM) web link (kerkour.com)
 (TXT) w3m dump (kerkour.com)
        
       | dane-pgp wrote:
       | > Actually, my num_cpu crate has been downloaded 24 times in less
       | than 24 hours, but I'm not sure if it's by bots or real persons.
       | 
       | Presumably the author could include some payload which phones
       | home with a randomly generated ID, to detect how many machines
       | the package could take control over. That's probably more
       | meaningful than trying to decide whether the package was
       | downloaded by a "bot", and wouldn't involve any GDPR-breaking
       | information.
        
         | wizzwizz4 wrote:
         | > _and wouldn 't involve any GDPR-breaking information._
         | 
         | Actually, this would (in my non-legal opinion) be a GDPR
         | violation. See https://gdpr-info.eu for details.
        
           | dane-pgp wrote:
           | You want me to read the entire GDPR? It's apparently 261
           | pages long.
           | 
           | https://www.enterpriseready.io/gdpr/how-to-read-gdpr/
        
             | wizzwizz4 wrote:
             | It's not _that_ long, and you don 't need to read the
             | entire thing. The first few articles make it clear that you
             | can't do this.
             | 
             | Article 4
             | 
             | > 'personal data' means any information relating to an
             | identified or identifiable natural person ('data subject');
             | an identifiable natural person is one who can be
             | identified, directly or indirectly, in particular by
             | reference to an identifier such as a name, an
             | identification number, location data, an online identifier
             | or to one or more factors specific to the physical,
             | physiological, genetic, mental, economic, cultural or
             | social identity of that natural person;
             | 
             | Creating a random identifier to identify individuals means
             | you're working with personal data. (And it's coupled with
             | the IP address, and hence crude location data - and OS
             | information, and metadata about their behaviour...)
             | 
             | Article 6
             | 
             | > Processing shall be lawful only if and to the extent that
             | at least one of the following applies:
             | 
             | None of them do. If you had _consent_ , then this would be
             | fine, but the whole idea is going behind people's backs to
             | secretly reprogram their computers to phone home, for your
             | own curiosity. That's not allowed.
        
               | dane-pgp wrote:
               | There is, I think, some legal uncertainty about what it
               | means for someone to be "identified indirectly", so a
               | court might agree with you, but I think there is also a
               | widely accepted opinion that if data can't be connected
               | back to a specific natural person by the data
               | controller/processor, then it is no longer "personal
               | data".
               | 
               | Here's what the University College London's guidance[0]
               | says:
               | 
               | "Once data is truly anonymised and individuals are no
               | longer identifiable, the data will not fall within the
               | scope of the GDPR and it becomes easier to use."
               | 
               | Obviously in the system I am proposing, the data
               | controller/processor would have to deliberately not log
               | the IP address or a user agent string. It's true that
               | timestamps could give an idea of which timezone the user
               | is in, but that shouldn't necessarily reveal even the
               | country the user is in.
               | 
               | [0] https://www.ucl.ac.uk/data-protection/guidance-staff-
               | student...
        
         | CloselyChunky wrote:
         | The easier, less invasive but also less accurate option would
         | be to publish an empty crate with a random name that does not
         | exploit typos (just some random junk) and check how often that
         | crate is downloaded. You can assume that almost all downloads
         | for this crate are bot downloads and just subtract that amount
         | from the downloads of the typo-squatted crate
        
         | rectang wrote:
         | Do we really need hard proof of real downloads for an example
         | like this? Typosquatting is obviously a scary problem
         | regardless, with a history of successful exploitation in web
         | domains, other package managers, etc.
        
       | uncomputation wrote:
       | IMO some simple (not easy, but simple) solutions would go a long
       | way here.
       | 
       | - Support name spacing
       | 
       | - Support specifying non-crates.io server (docket does this)
       | 
       | - Throw warnings when the Git tag (if applicable) contents do not
       | match the Cargo upload. Rate limit package owners so they are
       | encouraged to set their tags right the first time and not move
       | around
       | 
       | RE: compile time execution. This is a harder problem, common to
       | any binary file distribution.
        
         | steveklabnik wrote:
         | The second is already something that exists.
        
       | thorum wrote:
       | I'm increasingly fatalistic about computer security. It seems
       | like your options are carefully auditing all dependencies
       | (difficult and maybe impossible if the dependencies are highly
       | technical or the malicious code is sufficiently subtle or
       | obfuscated) every time you update, not updating at all (which
       | leaves you vulnerable to all the bugs and other security issues
       | in the version you choose to pin), or not using dependencies at
       | all (by spending months or years totally rewriting the libraries
       | and tools you need, and of course your own code will have bugs
       | too).
       | 
       | Fixing the points addressed in this article helps by making it
       | harder to slip these backdoors in, but will never be foolproof
       | unless every single library has a maintainer with the skills to
       | detect subtle bugs and security issues, who audits every line of
       | code.
       | 
       | Even then the marketplace for unreported zero day vulnerabilities
       | means that there are probably undiscovered vulnerabilities
       | somewhere in your dependencies (or in the code for your IDE or OS
       | or Spotify app or mouse driver...) that can be exploited by
       | someone.
       | 
       | I'm reminded of the Commonwealth series by Peter Hamilton, in
       | which the invading aliens have no machines, and quickly discover
       | that ours are full of bugs that can be exploited to turn against
       | us. I don't know what the solution is. Sandboxing your
       | development in a codespace like Gitpod is a big improvement for
       | sure, but even in Gitpod a lot of people import credentials and
       | environment variables that can be stolen. (And what dependencies
       | is Gitpod itself running?)
        
         | remus wrote:
         | > I'm increasingly fatalistic about computer security. It seems
         | like your options are carefully auditing all dependencies
         | (difficult and maybe impossible if the dependencies are highly
         | technical or the malicious code is sufficiently subtle or
         | obfuscated) every time you update, not updating at all (which
         | leaves you vulnerable to all the bugs and other security issues
         | in the version you choose to pin), or not using dependencies at
         | all (by spending months or years totally rewriting the
         | libraries and tools you need, and of course your own code will
         | have bugs too).
         | 
         | There is also the option of having trusted third parties review
         | code. This is by no means an easy option but it does seem more
         | feasible than everyone auditing every line of code they ever
         | depend on. You do end up with spicy questions like who do we
         | trust to audit code? Why do we trust them? How are they
         | actually auditing this code?
        
           | rectang wrote:
           | The big problem is not bugs, not vulnerabilities, but
           | malicious code inserted deliberately into packages published
           | by attackers.
           | 
           | One way to detect malicious code is line-by-line code review
           | of published packages, but that's extremely laborious, even
           | when done by third parties.
           | 
           | What we really want to do is confirm that the package was the
           | end product of an open source commit history, where commits
           | were reviewed by a set of trusted authors (hey look third
           | parties!) over time. That involves strong validation of
           | publisher identities and cryptographic validation of the
           | package contents to connect it to a commit history in a
           | trusted public repository.
        
           | worik wrote:
           | But that costs money. Money that if not spent, goes straight
           | to the profit line
        
           | _tom_ wrote:
           | Another option is reducing number of dependencies. Doesn't
           | cure the problem but can cut the vulnerability surface by
           | orders of magnitude.
           | 
           | Transitive dependencies can, and frequently do result in The
           | majority of the code in an application being unneeded.
           | 
           | Using one function from a dependency can result in whole sets
           | of libraries or even whole languages being included in the
           | project.
        
         | worik wrote:
         | But there it can be so much better, or so much worse.
         | 
         | Worse is Node.js (not bothering with the unanswerable question:
         | Why Node.js?) thousands and thousands of package downloads.
         | Long chains of transitive dependencies. A long storied history
         | of security/reliability catastrophes.
         | 
         | I love Rust. But I have always thought having the compiler
         | download dependencies is a very bad idea. It would be much
         | better if the programmer had to deliberately install the
         | dependencies. Then there would be an incentive to have less
         | dependencies.
         | 
         | This is currently a shit show, because it is easier to write
         | than read, to talk than listen. New generations of programmers
         | refuse to learn the lessons of their forebears and repeat all
         | their mistakes, harder, bigger, faster
        
         | _tom_ wrote:
         | I think that the rise of automated dependency resolution tools,
         | like maven, has made this exponentially worse. It's routine for
         | tools to to have hundreds or even thousands of dependencies,
         | something that would never happen if you manually had to manage
         | them.
         | 
         | My .m2 directory has 4000+ jar files in it, for example.
         | 
         | They make you more productive, but much more vulnerable.
        
         | froh wrote:
         | I think we have, as an industry, for a long time not seen the
         | true value proposition of "Linux distributions". They do quite
         | some boring and tedious security auditing, for example review
         | setuid binaries to the point they drop from root into user
         | privileges; and they backport security patches, so security
         | updates are binary compatible drop in replacements.
         | 
         | When a binary distribution is widely used, the beneift is
         | shared bug fixing and hardening, the disadvantage is somewhat
         | dated libraries.
         | 
         | It's a model I understand.
         | 
         | What I don't understand is this idea of bootstrapping
         | infrastructure via curl https://..../setup.sh && ./setup.sh,
         | and the equivalent import of "modules", whatever you call them
         | in your language of choice, straight from the web.
        
         | throwawaygh wrote:
         | _> carefully auditing all dependencies (difficult and maybe
         | impossible if the dependencies are highly technical or the
         | malicious code is sufficiently subtle or obfuscated)_
         | 
         | ...yeah, a business is responsible for the integrity of its
         | supply chain. There 's nothing fatalistic about this. Running a
         | business with potential liabilities is different from having a
         | high school programming hobby.
         | 
         | If you're using community distributions of open source software
         | in a security-critical context (e.g., any machine that touches
         | PII) then you should absolutely white-list dependencies and
         | either (1) have internal auditing mechanisms in place for those
         | dependencies or else (2) have good reason to trust the QA
         | procedures of the underlying community (and still do some basic
         | auditing on every update anyways).
         | 
         | Everything else should be carefully sand-boxed and basically
         | assumed to be pwned/pwnable.
         | 
         | If some rando came up to your contractor and offered them free
         | concrete for use in your foundation, and the contractor said
         | yes without any due dilligence, you would have every right to
         | sue that contractor out of existence.
         | 
         | The www isn't a wild west anymore. The era where any middle
         | schooler can build a six figure business by serving as the
         | middle man between open source packages and end-users should
         | probably come to a close. And I say that as someone whose
         | middle school software freelancing business cleared lots of
         | revenue by the end of college.
         | 
         | I wonder if this could be a revenue model of OSS. Cyber
         | insurance providers should probably stat weighing in on these
         | supply chain issues soon.
        
         | Groxx wrote:
         | I'm gonna pimp my own complaint here:
         | https://news.ycombinator.com/item?id=29125409
         | 
         | I think library permissions systems would mitigate or
         | effectively eliminate a _huge_ amount of these, and
         | significantly raise the cost or reduce the targets of nearly
         | all attacks.
         | 
         | Libraries are, in practice, treated as black boxes. I think
         | that's largely reasonable - that's _almost the whole point_ of
         | leveraging someone else 's work. But our languages/etc do not
         | allow doing that in any sane way. I think that's completely
         | ridiculous.
        
       | rvz wrote:
       | > As Rust is designed for sensitive applications where
       | reliability is important such as embedded or blockchain-like
       | projects, it can raise concerns.
       | 
       | This is why I get very concerned with Rust projects using tons
       | and tons of external crates. Especially cryptocurrency projects
       | using Rust.
       | 
       | These sort of techniques can be used to compromise lots of them
       | at once which in the very worst case can lead to loss of funds
       | and is irreversible.
       | 
       | Unfortunately we will see the same issues found in NPM be found
       | in crates.io with cargo.
       | 
       | Oh dear.
        
         | natded wrote:
         | This is an issue with any ecosystem. The alternative would be
         | to have them in standard library which is silly.
         | 
         | The actual solution I guess are domain specific languages.
        
           | oytis wrote:
           | With any ecosystem that has a packet manager. Try inserting a
           | dependency backdoor into a C++ project where dependencies
           | have to be managed manually (and consequently are few)
        
             | ziml77 wrote:
             | Of course if they need to be manually updated, there's a
             | strong likelihood that they are using vastly outdated
             | versions of their dependencies. Users could be wide open to
             | unpatched exploits.
        
               | oytis wrote:
               | Not necessarily. C/C++ relies vastly on dynamic linking
               | which means keeping your dependencies up to date will be
               | outsourced to distro maintainers. You'll have to make
               | sure that your package builds for new versions of the
               | distro.
        
               | notriddle wrote:
               | C relies vastly on dynamic linking. C++ cannot
               | dynamically link templates, so it's pretty common for a
               | C++ library to be made entirely of header files.
               | 
               | Also, that's really only true of a small number of Linux
               | and BSD flavors. Applications shipped on Windows, macOS,
               | Android, iOS, or any of the Linux "application bundle"
               | systems like the Steam runtime, Docker, FlatPak, will
               | deliberately avoid using globally-specified dependencies.
               | 
               | It's also commonplace to avoid declaring dependencies in
               | C by vendoring them, like how VLC basically includes its
               | own implementation of a bunch of data structures [1], and
               | the entire universe of single-file libraries [2].
               | 
               | [1]:
               | https://wiki.alopex.li/LetsBeRealAboutDependencies#gotta-
               | go-...
               | 
               | [2]: https://github.com/nothings/stb/blob/master/docs/stb
               | _howto.t...
        
               | oytis wrote:
               | C++ ABI is PITA indeed, but exposing a C API for external
               | linking is totally possible and is being actively used.
               | Various cases of static linking (or header-only files) do
               | indeed exist, but no sane C++ project (I'm not sure one
               | can call ROS that without reservations) uses nearly as
               | many static dependencies as typical Rust one does.
               | 
               | > Also, that's really only true of a small number of
               | Linux and BSD flavors.
               | 
               | This is true on all major Linux distributions.
               | 
               | > Applications shipped on Windows, macOS, Android, iOS
               | 
               | If we are talking about security closed-source walled
               | gardens are out of consideration right?
               | 
               | > or any of the Linux "application bundle" systems like
               | the Steam runtime, Docker, FlatPak
               | 
               | This is sad indeed. Docker used right is just a distro
               | inside a distro though, so anything that applies to
               | distros applies here too.
        
               | ziml77 wrote:
               | That first link is a good analysis of the situation when
               | it comes to dependencies. It's really not straightforward
               | unless you are targeting a specific OS distribution.
        
               | [deleted]
        
               | cozzyd wrote:
               | While vendoring is somewhat common (especially for C++
               | header-only libraries), often dependencies are provided
               | by the OS package manager and dynamically linked (you
               | have to recompile if the ABI changes, and hope that the
               | API remains more or less stable, but that's usually part
               | of the contract of a "major version").
        
           | rvz wrote:
           | > This is an issue with any ecosystem.
           | 
           | So repeating the same mistakes that NPM has into other
           | package managers? We have therefore learned nothing at
           | mitigating these supply chain issues then.
           | 
           | > The actual solution I guess are domain specific languages.
           | 
           | Any production real world examples of this?
        
       | dathinab wrote:
       | > crates.io matches the code on GitHub,
       | 
       | There is no tight coupling between GitHub and cargo/crates.io
       | (sure it uses GitHub internally but that is an implementation
       | detail).
       | 
       | But not only has it no tight coupling with GitHub it also doesn't
       | require you to use git, you can use whatever version control you
       | want and at worse you don't get support for "detect dirty
       | repository".
       | 
       | Similar git tags are fundamentally unreliable as you can always
       | "move" some to any arbitrary commit.
       | 
       | So IMHO the problem here is relying on code you didn't got from
       | github which might not even use git to match a arbitrary tag on
       | something on github which might not even be from the same author
       | (but e.g. a mirror on GitHub from whatever VC the author uses).
       | 
       | But uploads to crates.io are immutable and are source code
       | uploads, so you can just review them.
       | 
       | In general (independent of cargo) _do review the code you use_
       | not some code you got form somewhere else which you hope /believe
       | is the same.
        
         | rectang wrote:
         | Treating PGP signed commits as privileged and only pointing at
         | them as opposed to mutable tags seems like it would help.
        
           | dathinab wrote:
           | I'm not sure it's actually possible, for varying reasons.
           | 
           | Even if we ignore that cargo can be used with other version
           | control systems I see some problems:
           | 
           | - Validation must be done server side as everything the
           | client does can be manipulated.
           | 
           | - So the server would need to be able to make sure that some
           | signature is valid for some commit.
           | 
           | - But you don't upload commits, you uploads a sub-set of the
           | checkout produced by the combination of the current commit
           | and previous commits.
           | 
           | - Uploading all of the diff of the current (or previous)
           | commit is a no-go for various reasons (size, leaking internal
           | code etc.).
           | 
           | - Similar even if it would work, the repository on GitHub
           | could still contains different code, the signature might not
           | match, but who is checking the signature? The reviewer
           | manually?
           | 
           | What you maybe could do is e.g. signing the uploaded archive
           | which in combination with e.g. a Yubikey would make it harder
           | for attackers to upload malicious packages (if the authors
           | aren't the attackers).
           | 
           | You also could include some version id (e.g. git hash) in the
           | signature, then review tools could check that the uploaded
           | code matches the GitHub repository.
           | 
           | But really the most important thing is to review the code you
           | actually use (e.g. what is uploaded to crates.io) and not
           | what you belive you probably should be using.
        
             | rectang wrote:
             | In the abstract, what I'm wishing for is definitely
             | possible at least for some subset of packages.
             | 
             | * Source code packages are cryptographically signed by at
             | least one and ideally several identities which are
             | verifiable via a web of trust.
             | 
             | * Packages are associated with a public repository base URL
             | known to the package manager and that base URL doesn't
             | change without setting off alarm bells.
             | 
             | * The source code of a package can be traced back and
             | cryptographically connected to a public commit history at
             | the public repository URL.
        
         | PragmaticPulp wrote:
         | > There is no tight coupling between GitHub and cargo/crates.io
         | (sure it uses GitHub internally but that is an implementation
         | detail).
         | 
         | Cargo had a "locked" option that uses the URL and commit hash
         | from the Cargo.lock file. If the crate, repository, or commit
         | has changed then it won't build.
         | 
         | This is what everyone uses for reproducible and secure builds,
         | but it's not as commonly used for casual use.
        
           | jeltz wrote:
           | Why isn't that the default?
        
             | Diggsey wrote:
             | To be clear, even without `--locked`, cargo will use the
             | dependencies from the lockfile, as long as the lockfile is
             | not out of date compared to Cargo.toml.
             | 
             | However, for development, you normally want cargo to update
             | the lockfile after you change something in `Cargo.toml`
             | (like adding a new dependency).
             | 
             | The `--locked` option is particularly useful in CI though,
             | where you want it to fail if the lockfile is out of date,
             | rather than update the lockfile and continue.
        
               | steveklabnik wrote:
               | To spell it out in even more detail:
               | 
               | I add foo = "1.0.0" to my Cargo.toml. The latest release
               | of foo is 1.0.1. I invoke "cargo build." Cargo builds my
               | project, using 1.0.1 (since "1.0.0" is short for
               | "^1.0.0"), and records that in the lockfile.
               | 
               | Now foo releases 1.1.0. I invoke "cargo build". Cargo
               | uses the lockfile, and _nothing changes_. I still build
               | with 1.0.1, because that 's what's in the lockfile.
               | 
               | Now, let's say I go in and change my Cargo.toml to use
               | 1.1.0. This is where the behavior differs:
               | 
               | 'cargo build' will say "oh, you've changed your
               | Cargo.toml, let's perform resolution again" and will
               | update your Cargo.lock to have 1.1.0.
               | 
               | 'cargo build --locked' will say "error, you have changed
               | your Cargo.toml but the lockfile was not updated."
               | 
               | That's why --locked is useful in CI; it will make sure
               | that the lock you've committed is up to date. It doesn't
               | imply that Cargo ignores the lockfile by default, only
               | that it will update your Cargo.lock when you change your
               | Cargo.toml, because that implies you're asking for a
               | change in your dependencies.
        
       | [deleted]
        
       | dathinab wrote:
       | While the thinks mentioned are a thing, his recommendations and
       | interpretation of them seems often strange for me.
       | 
       | > Typosquatting
       | 
       | Well known problem of more or less any open repo, namespaces do
       | not solve it, they just shift the problem, maybe even make it
       | more complex.
       | 
       | > Misleading name
       | 
       | Again not cargo specific and not specific to flat namespaces
       | either.
       | 
       | Namespaces help with first party packages from the same authors
       | (but then it's not too hard to check the authors, but annoying;
       | Annoying is insecure). But namespaces do not help with 3rd party
       | libraries. I.e. many (most?) `tokio-` libraries are 3rd party
       | libraries made to work with tokio, not libraries from the `tokio`
       | authors.
       | 
       | > Transitive dependencies
       | 
       | Again something inherent to all package managers.
       | 
       | Through some ecosystems like npm are worse due to using more
       | smaller packages.
       | 
       | In my experience while most rust crates have many dependencies
       | most of them are to the same set of packages (like thiserror,
       | anyhow, serde, etc.).
       | 
       | Anyway trying to avoid unnecessary dependencies is generally not
       | a bad idea.
       | 
       | > "x.x.1" Update
       | 
       | Basically: If you update a dependency you update a dependency,
       | surprise.
       | 
       | Anyway using a `=` dependency is a anti-pattern which produces an
       | endless amount of headaches and incompatibilities especially with
       | libraries using `=` imports. There is a reason lock-fiels exists
       | and `cargo update` doesn't run by default.
       | 
       | > Malicious update
       | 
       | This point make no sense as:
       | 
       | - cargo is not specific to the version control system you use it
       | has some helpers for some vc systems, but that's it.
       | 
       | - git tags are settable arbitrarily (and can be "moved").
       | 
       | - You get the source from crates.io, not GitHub.
       | 
       | So the only way this can be a security vulnerability if you
       | believe that reviewing not the source you use but source code
       | from a different source is a good idea, while also trusting git
       | tags while not fully trusting the author who sets the tags????
       | 
       | Just review the source code which you use, uploads to crates.io
       | are immutable and you can download and read them.
       | 
       | > Run code before main
       | 
       | Neet to point it out, but not specific to either rust and even
       | less to cargo/crates.io. It's specific to the systems binary file
       | format and how binaries are executed.
       | 
       | > principles of Rust is no life before main
       | 
       | A design principal for the rust language, not a security
       | statements.
       | 
       | Anyway review the source code you use, not source code form a
       | different source you believe should be the same (unspecific of
       | rust).
       | 
       | Rust could also improve on this by warning if (transitive)
       | dependencies links to .ctor or similar.
       | 
       | > Malicious macros
       | 
       | Code generation is an interesting aspect to analyze across
       | languages. Most have it, some like rust at compiler time and
       | first calls, other at compiler time but second class and others
       | at runtime (e.g. Java reflections).
       | 
       | It's definitely an area where rust has a lot of potential to
       | improve (e.g. default-wasm sandbox most proc-macros or similar).
       | 
       | It's also unavoidable in some cases (e.g. building external
       | dependencies).
       | 
       | Additionally I would argue you probably could find a way to have
       | code which somehow allows you to start running code when it's
       | compiled due to you crafting the code in a way which triggers
       | buffer overflows and similar in the underlying compiler (e.g.
       | LLVM).
       | 
       | You also tend to run tests...
       | 
       | So theoretically you always should sandbox your IDE/project, even
       | if you develop for a language which doesn't has compiler time
       | code generation.
       | 
       | As a side-note vscode ask you if you trust a project you open, if
       | not it will change some settings for the project and rust-
       | analyzer can be configured to no run build-scripts and proc-
       | macros. (I haven't tested if not trusting automatically also
       | makes rust-analyzer not run things.).
       | 
       | > a bigger standard library would reduce the need for external
       | [..]
       | 
       | no, it doesn't reduce the need. See python. Or see how even some
       | features in rusts standard library are frequently not used but
       | instead externally libraries are used, e.g. parkinlot or
       | crossbeam.
       | 
       | Similar having 10 instead of 1 dependency might not decrease
       | security if that 10 are a bundle from the same author in the same
       | CI and have not (relevant) more code then if they where just 1
       | dependency.
       | 
       | What you want is reducing the number of trusted entities (!=not
       | packaged) and trusted lines of code.
       | 
       | So having a group which manages a set of widely used packages
       | with tight security would roughly be as helpful for the problem
       | as having a bigger standard library, but much more practical. To
       | some degree the rust nursery was going in that direction (but
       | didn't reach that goal).
       | 
       | > Rust supports git dependencies.
       | 
       | It also allows pinning versions in Cargo.toml, pinning versions
       | in lock-files and is immutable.
       | 
       | Git dependencies are prone in making you miss security updates,
       | similar 3rd party audits are likely done on the code uploaded to
       | crates.io, not the one on GitHub.
       | 
       | If you want to review all dependencies, and make sure to use them
       | etc. then it's probably best to vendor all of them, and set
       | thinks up so that your dev system can only see/reach vendored
       | packages, while from time to time updating them based on
       | crates.io and on diffs of the crates.io uploads (not GitHub).
        
       | cntlzw wrote:
       | How do rust crates compare with something like maven or npm? It
       | looks like some issues for example Typosquatting can be done in
       | all of these dependency managers.
        
         | junon wrote:
         | npm has some guards for typosquatting. They're annoying when
         | you run into them but I appreciate that they're there. I have
         | no idea how effective or extensive they are, though.
        
         | Ajedi32 wrote:
         | Yep, supply chain attacks are a near-universal problem with
         | programing language package managers.
         | 
         | I think there's a lot of room for improvement here. Some good
         | low-hanging fruit IMO would be to:
         | 
         | 1. Take steps to make package source code easier to review.
         | 
         | 1.1. When applicable, encourage verified builds to ensure
         | package source matches the uploaded package.
         | 
         | 1.2. Display the source code on the package manager website,
         | and display a warning next to any links to external source
         | repositories when it can't be verified that the package's
         | source matches what's in that repo.
         | 
         | 1.3. Build systems for crowdsourcing review of package source
         | code. Even if I don't trust the package author, if someone I
         | _do_ trust has already reviewed the code then it's probably
         | okay to install.
         | 
         | 2. Make package managers expose more information about who
         | exactly you're trusting when you choose to install a particular
         | package.
         | 
         | 2.1. List any new authors you're adding to your dependency
         | chain when you install a package.
         | 
         | 2.2. Warn when package ownership changes (e.g. new version is
         | signed by a different author than the old one).
         | 
         | Long-term, maybe some kind of sandbox for dependencies could
         | make sense. Lots of dependencies don't need disk or network
         | access. Denying them that would certainly limit the amount of
         | damage they can do if they are compromised, provided the host
         | language makes that level of isolation feasible.
        
           | ansible wrote:
           | I like all these ideas.
           | 
           | > _Long-term, maybe some kind of sandbox for dependencies
           | could make sense. Lots of dependencies don 't need disk or
           | network access._
           | 
           | Just like with Android permissions, we could audit the crate
           | sources to list out what functions it uses (out of the
           | standard library or where ever) and provide an indication of
           | that this particular crate is capable of.
        
             | UncleMeat wrote:
             | This is a strategy, but it typically falls apart against
             | clever attackers who are targeting you specifically.
             | Hackers have been performing return-to-libc attacks forever
             | where they don't actually get to write any code at all,
             | just sequence code that already exists in your binary.
             | 
             | Java also tried this in a slightly more rigorous manner
             | with the SecurityManager and that just ended up being a
             | botch.
        
               | Ajedi32 wrote:
               | Yeah that's why I said it really depends on the host
               | language to make such sandboxing feasible. If you're
               | using a language that lets code write arbitrary data to
               | arbitrary memory locations, implementing a secure sandbox
               | is going to be pretty tricky.
        
             | dane-pgp wrote:
             | For what it's worth, this Principle Of Least Authority /
             | object-capability model is being attempted in the
             | JavaScript ecosystem with SES (Secure ECMAScript).
             | 
             | https://agoric.com/blog/technology/ses-securing-javascript/
             | 
             | https://medium.com/agoric/pola-would-have-prevented-the-
             | even...
        
           | _tom_ wrote:
           | Analysis tools that show where large transitive dependencies
           | could be avoided would help.
           | 
           | Right now there is no feedback to encourage people to not
           | have HUGE lists of dependencies. And for trivial reasons.
           | This compounds the problem hugely.
           | 
           | If you have three dependencies, verifying is feasible. If you
           | have 3,000, it is not.
        
         | tetha wrote:
         | Maven Central is somewhat resilient against this. In the java
         | world, an artifact is identified by a group-id, an artifact-id
         | and a version, and some technical stuff. The group id is a
         | reversed domain, like org.springframework.
         | 
         | If you want to upload artifacts with the group id
         | "org.springframework", you first have to demonstrate that you
         | own springframework.org via a challenge, usually a TXT record
         | or some other possibilities for github group-ids and such.
         | 
         | It's not entirely bulletproof, because you could squat group-
         | ids "org.spring" or "org.spring.framework" (if you can get that
         | domain). However, once a developer knows the correct group id
         | is "org.springframework", you need additional compromises to
         | upload an artifact "backdoor" there.
         | 
         | Edit - and as I'm currently seeing, PGP signatures are also
         | required by now.
        
           | [deleted]
        
           | brabel wrote:
           | It's a hell of a lot harder to squat namespaces as you need
           | to either spoof or steal or buy one domain per namespace,
           | which is not trivial.
           | 
           | Maven Central has require PGP signatures since the beginning
           | as far as I know! In the olden days, it didn't use HTTPS
           | though (which has been fixed for several years now), so
           | unless you validated the signatures and kept track of the PGP
           | keys, you could still run into trouble.
        
             | kibwen wrote:
             | _> It 's a hell of a lot harder to squat namespaces as you
             | need to either spoof or steal or buy one domain per
             | namespace, which is not trivial._
             | 
             | This introduces a different security wrinkle, as domain
             | names need to be continually renewed. What does Maven do to
             | prevent unauthorized transfer of namespace ownership when a
             | domain lapses?
        
         | ChrisSD wrote:
         | These do all seem to be things that apply to most package
         | managers of this kind. So it would be good if Rust could find
         | solutions that can be applied more broadly.
        
         | typicalbender wrote:
         | I haven't thought this through at all but are you aware of any
         | package repositories that do something like levenshtein
         | distance between package names maybe combined with a heuristic
         | on common mistyped characters to not allow typosquatting?
        
           | Buttons840 wrote:
           | Are there any tools that can scan my dependencies and point
           | out names that are typos of older or more popular packages?
           | 
           | Something like: you said "times", did you mean the older and
           | more popular package "time"?
        
           | brabel wrote:
           | Yes, they do that in Dart's pub [1].
           | 
           | They also have the concept of verified publishers[2], which
           | is pretty neat (similar to Maven Central), and keep track of
           | a score for each package (e.g.
           | https://pub.dev/packages/darq/score) including up-to-date
           | dependencies and result of static analysis.
           | 
           | Dart is doing a lot of things right.
           | 
           | [1] https://pub.dev/
           | 
           | [2] https://dart.dev/tools/pub/publishing#verified-publisher
        
       | pornel wrote:
       | https://lib.rs/cargo-crev tries to address this.
       | 
       | It allows you to review the actual published source of your
       | dependencies. It then can check whether your project only uses
       | reviewed dependencies.
       | 
       | Reviewing everything is of course a lot of work, so there's an
       | option to mark crate owners as trusted, and also reuse code
       | reviews made by people you trust.
        
       | hn8788 wrote:
       | Stuff like this is why the place I work for decided not to use
       | Rust as a replacement for future C projects. The devs evaluating
       | it loved the language, and said they were more productive with it
       | than they were with C, but being forced to either use no external
       | dependencies, or audit tons of crates published by random people,
       | made it a non-starter.
        
         | smabie wrote:
         | Oh right because C has a very rich standard library
        
         | smoldesu wrote:
         | So they'd prefer to manually grab header files and audit those
         | instead?
        
         | lytedev wrote:
         | Don't you have the same problem with C?
        
       | Diggsey wrote:
       | There's going to be a risk to running someone else's code. There
       | are two factors here:
       | 
       | 1) Do I trust the code I think I'm running. 2) Am I actually
       | running the code I think I'm running.
       | 
       | With (1) there's not really any way around it: someone or
       | something has to review the code in some way.
       | 
       | Even the suggestion to have a larger standard library doesn't
       | _really_ address it: with a larger standard library the rust
       | project needs more maintainers, and it might just get easier to
       | get vulnerabilities into the standard library.
       | 
       | Someone could build a tool that automatically scans crates
       | uploaded to crates.io. It could look for suspicious code
       | patterns, or could simply figure out what side-effects a crate
       | might have, based on what standard library functions it calls,
       | and then provide that information to you. For example, if I'm
       | looking for a SHA256 crate, and I notice that the crate uses the
       | filesystem, then I might be suspicious.
       | 
       | With (2) there are some easier options, such as making it easier
       | to download or browse the contents of a crate directly from
       | crates.io, or have a tool to show the full dependency source diff
       | after a `cargo update`. For initially installing the crate, the
       | number of downloads is a pretty good indicator of "is this really
       | the crate I meant?".
        
         | curun1r wrote:
         | I'm not sure I agree with your fatalistic take on (1). As an
         | example, the proof-of-concept proc macro attack from the
         | article could be addressed by running proc macros in a wasm
         | sandbox. I know there's been some exploratory work done towards
         | that.
         | 
         | Similarly, all of the attacks that execute code at compile time
         | are mostly addressed currently by building code in a Docker
         | container. It's not a perfect security measure, but it greatly
         | increases the probability that a build.rs or proc macro attack
         | will fail.
         | 
         | Your (1) strikes me as something that can never be solved to
         | provide 100% safety, but that there are partial solutions
         | which, in practice, can significantly reduce the attack
         | surface. And hiding behind "fundamentally unsolvable problem"
         | will get in the way pushing for the meaningful half-measures
         | that offer some degree of real-world protection.
        
           | Diggsey wrote:
           | > As an example, the proof-of-concept proc macro attack from
           | the article could be addressed by running proc macros in a
           | wasm sandbox. I know there's been some exploratory work done
           | towards that.
           | 
           | Sure, you can (and we absolutely should) sandbox the build
           | process itself. I meant "running other people's code outside
           | of a sandbox", and should have specified that.
           | 
           | The problem is that at some point you actually want to run
           | the code you compiled, and then proc-macro exploits can still
           | do whatever.
           | 
           | > hiding behind "fundamentally unsolvable problem" will get
           | in the way pushing for the meaningful half-measures that
           | offer some degree of real-world protection.
           | 
           | I didn't say they were unsolvable, I said that solving them
           | requires someone or something to review the code. That's the
           | only way you can gain trust that code does what it says it
           | does. I even suggested some possible "meaningul half-
           | measures" that could be implemented.
        
           | kevincox wrote:
           | > could be addressed by running proc macros in a wasm sandbox
           | 
           | Kind of. That will stop the compromise on build, but odds are
           | then you run the code anyways so injecting the code into the
           | executable is _almost_ as good. I guess if you are building
           | an application that is always run in a sandbox anyways (like
           | a wasm application that never sees sensitive data) then
           | sandboxing proc macros could be good enough but I suspect
           | that is a very rare case.
        
             | curun1r wrote:
             | My comment specifically called it out as a half measure.
             | But that doesn't mean it's not worth doing.
             | 
             | One of the ways that I'd see a proc macro attack playing
             | out in real life would be to attempt to obtain the API
             | token from cargo login to allow the attacker to publish a
             | malicious version of a popular crate. But if cargo login is
             | only run on the CI instance responsible for publishing and
             | developers test their code on their individual dev
             | machines, the attack fails.
             | 
             | We shouldn't worry about making security 100% effective.
             | Rather, we should be concerned with stopping as many
             | different attacks as we can and providing users with as
             | many building blocks as possible to let them protect
             | themselves. There will always be vulnerabilities. But using
             | that as an excuse for inaction isn't helpful.
        
           | pjmlp wrote:
           | The sandbox doesn't protect against injection of malicious
           | code.
        
       | PragmaticPulp wrote:
       | I use "cargo ... ---locked" to install things using the
       | dependencies from the Cargo.lock file, which includes specific
       | commit hashes for dependencies. Avoids things like the 0.0.1
       | problem or even replaced crates. Need to be careful to watch for
       | actual security updates, though.
       | 
       | I really wish crates.io would have at least launched with a name
       | spacing feature. This wouldn't solve every spoofing or
       | typosquatting issue, but it would go a long way toward improving
       | the situation.
       | 
       | There's a separate issue of crates.io squatting. One person
       | famously registered hundreds (or thousands? Tens of thousands) of
       | common words as crate names on crates.io and has been squatting
       | them ever since. Those names are effectively unavailable for use
       | but also completely useless because they don't contain anything.
       | 
       | It's also becoming a huge problem for abandoned crates. New forks
       | have to choose a completely different, less intuitive name
       | because they can't just namespace their alternative. As old
       | crates get abandoned, this leads to weird situations where the
       | newest, best maintained crate has the least obvious crate name.
       | It takes work to find the good crate some times because the best
       | named crates might just be the oldest, most abandoned ones
        
         | mikepurvis wrote:
         | I feel like "single shared namespace" works best when the
         | ecosystem is managed collaboratively, with a clear leadership
         | to make calls on who gets what name-- for example, something
         | like a Linux distro, where there are Replaces/Provides metadata
         | specifically to facilitate these kinds of transitions and avoid
         | being stuck forever with crappy legacy nonsense.
         | 
         | But this doesn't work at all in a free-for-all environment like
         | PyPI, NPM, or Crates, where anyone can just grab a name and
         | then have it in perpetuity.
         | 
         | IMO the Docker ecosystem got this right, with baking in a
         | domain name as part of the container, and insisting that
         | everyone on docker.io use a vendor/product convention. This
         | meant that the toplevel namespace was reserved for them to
         | offer (or delegate the offering of) specific blessed container
         | images, much more in line with how distro packaging might work.
         | # Get the "official" image, whatever that means (but you trust
         | Docker Inc, so yay).         docker pull nginx              #
         | Get an image supplied by a specific vendor.         docker pull
         | bitnami/nginx              # Get an image from a different
         | server altogether; maybe it's your company, or you don't trust
         | Docker Inc after all?         docker pull
         | quay.io/jitesoft/nginx
         | 
         | Maybe the big-flat-namespace thing is still a years-later
         | reaction against huge and unnecessary hierarchies in Java land?
         | I think the ideal is not to permit infinite depth, but perhaps
         | to insist on 2-3 levels.
        
           | roywashere wrote:
           | Similarly, Perl has CPAN and system where if a module is
           | unmaintained and the author unresponsive, new maintainers can
           | be granted permission by admins to take over. This ensures
           | the namespace is preserved, peoples dependencies keep getting
           | updated.
        
           | kbenson wrote:
           | It's worth noting that this behavior is also a sticking point
           | with some users and distros, and I believe was at least part
           | of the reasoning behind podman being developed by Red Hat.
           | 
           | Sometimes you don't want some third party controlling and
           | blessing what's okay, you want to explicitly control that by
           | only allowing sources you control so you can't even
           | accidentally use that third party.
           | 
           | Different groups have different use cases and it's hard to
           | support them all well.
        
             | mikepurvis wrote:
             | I definitely see both of sides of it-- on the one hand,
             | it's nice that "docker pull nginx" is _always_ shorthand
             | for  "docker pull docker.io/_/nginx" and whether or not it
             | is isn't dependent on some config file somewhere telling
             | you what your primary registry is. We solved this for
             | ourselves internally by just always using fully qualified
             | names (containers.corp.com/foo) and not worrying too much
             | about it.
             | 
             | But I can also see how this is absolutely a power grab by
             | Docker Inc. For a company which has given away almost
             | everything and has basically no moat in terms of
             | technology, ecosystem, or thought-leadership, I can
             | understand them wanting to at least retain the final say on
             | which containers are which.
             | 
             | And of course, given that perspective, it's clear why a
             | company like Red Hat would chafe against this, particularly
             | when it's clear from the architectural differences in
             | Podman (daemonless, rootless) that they were looking for an
             | excuse to do it over anyway.
        
             | marcosdumay wrote:
             | You mean you want to override or replace the global name
             | registry?
             | 
             | I can't imagine that getting any harder by the global vs.
             | scoped name distinction. It's just an extra feature.
        
               | kbenson wrote:
               | It's a feature docker refused to implement, and is one of
               | the reasons Red Hat created podman, which is a drop in
               | docker replacement (you can alias it to docker for the
               | most part), but it allows full control over what repos
               | are checked.
               | 
               | For docker, as I understand it to get something
               | functionally similar you have to define a mirror for
               | docker.io that's not really a mirror and then prefer that
               | mirror and/or disallow traffic to docker.io in some
               | fashion.
        
               | infogulch wrote:
               | I've heard this argument a million times, but I really
               | don't get it. If you want to control which images are
               | deployed, you want to be running your own image registry
               | anyway. So you already have full control: just limit
               | images to ones in your private registry with its fully
               | qualified name. Why do you also insist on being able to
               | deploy with 'short' names? Arguably it's better because
               | users can't confuse "enterprise customized ubuntu:20.04"
               | with the public one and bother open source maintainers
               | with internal issues.
        
               | kbenson wrote:
               | Nobody is insisting on being able to deploy with short
               | names (or at least not the people that actually care
               | about this). They are insisting that someone screwing up
               | and using a short name when they shouldn't doesn't go out
               | to a third party that's not explicitly trusted. Making
               | this about short names is completely missing the point.
               | 
               | Requiring docker to be run with a "mirror" which isn't
               | actually a mirror of docker.io and making it use your
               | non-mirror mirror so docker.io can't be reached and
               | everything still works is just a hacky way to trick
               | docker into working in what many people think is a very
               | valid way - where no external resources _can_ be used as
               | a container not just as a matter of policy, but as a
               | matter of capability.
               | 
               | Put simply, it's not about replacing the default repo,
               | it's about _excluding all external repos_. Requiring they
               | all use long names would be a perfectly acceptable trade
               | off for most orgs I think, if that was an option offered.
        
           | krinchan wrote:
           | I feel like namespaces came much later than Docker Hub, but I
           | can't really find a good historical record of docker registry
           | features (or at least easily searchable ones). The global
           | namespace was either forcibly cleaned or delegated to who was
           | currently publishing them and projects raised issues if that
           | wasn't them. I vaguely remember annoyance that I had to
           | publish my toy containers to the global namespace. I could be
           | wrong though.
        
             | mikepurvis wrote:
             | So per [1] it looks like Docker Hub launched in mid-2014.
             | Admittedly my earliest pushes to there are more like the
             | late-2015 timeframe, but definitely by that point I was
             | pushing to namespace under my username. My impression had
             | always been that the structure of Docker Hub was meant to
             | mirror Github in terms of an org/repo scheme.
             | 
             | [1]: https://www.docker.com/blog/announcing-docker-hub-and-
             | offici...
        
           | dmurray wrote:
           | Github generally has two levels (organization and repository)
           | which works pretty well.
        
             | hollerith wrote:
             | It works pretty well, but might not if making org2/repo55 a
             | copy of org1/repo55 (i.e., forking) weren't free for the
             | user (namely, org2) and maybe a venture-backed startup can
             | afford to keep it free for the user, but a non-profit
             | cannot.
        
         | Lifelarper wrote:
         | https://users.rust-lang.org/t/name-squatting-on-the-crates-i...
         | 
         | This a very old discussion if you keep digging through the
         | links.
        
           | ansible wrote:
           | I hope that the Rust Foundation (which finally exists) can
           | put in some time / resources to help fix this.
           | 
           | I totally get that maintaining crates.io is a time suck for
           | all involved, and mediating disputes would increase that even
           | more.
           | 
           | But at some point, this is going to need to be addressed.
        
         | rectang wrote:
         | My understanding is that validating identity for package
         | authors is a hard problem thus expensive to solve robustly, and
         | the crates.io folks have hitherto deferred tackling it in
         | earnest. That is arguably a responsible approach up to a point,
         | in that they haven't committed prematurely to something half-
         | baked.
         | 
         | As described elsethread, there is prior art -- Maven's identity
         | verification is substantially better:
         | https://news.ycombinator.com/item?id=29266591
         | 
         | Validating ownership of a namespace reliably enough that it is
         | difficult to spoof is tough. It's possible for PGP creds to be
         | stolen. But then at least the keys can be revoked, and old
         | packages signed with a new key.
        
           | miohtama wrote:
           | I believe Debian Maintainer process and its keyring is one,
           | if not only, project that gets this as good as it can get.
           | 
           | https://wiki.debian.org/DebianMaintainer
        
           | not2b wrote:
           | Robustly verifying identity isn't enough, because a developer
           | with an outstanding reputation could lose their credentials
           | to a spearphishing attack, and the attacker could then modify
           | crates using stolen credentials.
        
             | rectang wrote:
             | I agree that authentication is not sufficient on its own,
             | but I argue that it's _necessary_ as part of the solution.
             | 
             | A source package should meet the following criteria:
             | 
             | * Package contents match exactly the source tree at a PGP
             | signed commit at a public repository URL. This must be
             | verified before the package is made available through the
             | package manager.
             | 
             | Now, if I am whitelisting PGP keys, an attacker _needs_ to
             | steal creds to get something by me.
             | 
             | Ideally you want multiple signatures by trusted keys prior
             | to publication. Each additional signature makes it
             | significantly less likely that a package is provided by a
             | malicious attacker.
             | 
             | EDIT: Hmm, how about package signing parties held over
             | videochat? If we already know each other and you tell me
             | that a particular package was created by you, I sign it.
        
           | PragmaticPulp wrote:
           | They don't need to validate anything, though. Just treat the
           | namespace as something that can be claimed exactly the way
           | that a crate name can be claimed.
           | 
           | Only owners of the namespace can add crates to that
           | namespace, in the same way that only people who own a crate
           | name can publish to that crate.
        
             | kibwen wrote:
             | If you're not validating anything, then that violates the
             | assumptions of users who see an amazon/ namespace that
             | isn't officially associated with Amazon. We don't even need
             | to presume malicious or squatting intent; the entire left-
             | pad fiasco was precipitated by a person who innocently
             | registered "kik", and then later a company named Kik
             | demanded that NPM transfer ownership to them. This happened
             | with a package, but it would just as easily happen with a
             | namespace. Once you introduce an identity layer, people
             | start to expect identity validation. It's a problem worth
             | solving, but crates.io would need to move beyond being
             | merely staffed by part-time volunteers to solve it.
        
               | anderskaseorg wrote:
               | > _This happened with a package, but it would just as
               | easily happen with a namespace._
               | 
               | But that's exactly the point. This is equally a problem
               | for packages and namespaces, so it should not be
               | considered a reason to avoid adding namespaces to a
               | system that already supports packages.
               | 
               | And namespaces do help. Yes, maybe the user has to
               | externally validate that the namespace is registered to
               | the owner they expect, just like they would for an
               | individual package. But the difference is that, with a
               | namespace, they've now validated the ownership of every
               | package inside it and don't need to repeat this process
               | for each package.
        
               | kibwen wrote:
               | _> But that's exactly the point. This is equally a
               | problem for packages and namespaces, so it should not be
               | considered a reason to avoid adding namespaces to a
               | system that already supports packages._
               | 
               | I think this is missing that crates.io _doesn 't_
               | transfer packages between owners. That NPM decided that
               | it was acceptable to unilaterally transfer ownership of
               | the kik package from the original maintainer was an lapse
               | of judgment on their part. It would be unprecedented for
               | crates.io to begin doing so.
               | 
               |  _> And namespaces do help._
               | 
               | I agree with this entire paragraph and have made these
               | same arguments in favor of namespaces before. As I
               | mentioned, I am mildly a proponent of namespaces. But we
               | can't delude ourselves into thinking that an identity
               | layer is simple to maintain, regardless of how much we
               | want namespaces. It's a messy social problem.
        
         | ziml77 wrote:
         | The lack of namespacing in crates.io has always been a strange
         | decision to me. I hope at some point they decide that it's
         | worth it to introduce namespaces. The could use some an
         | aliasing mechanism to avoid breaking any existing package
         | references.
        
           | jamincan wrote:
           | From previous times this has come up, I was led to believe
           | that it's not so much that the maintainers don't want
           | namespaces as it's a thankless and overwhelming task just
           | maintaining the status quo, and so they don't have the time
           | and resources to implement something.
           | 
           | Hopefully this is one of those issues that the Rust
           | Foundation can direct some funding toward as the entire Rust
           | community would be all the better for it.
        
         | belter wrote:
         | I think if somebody decides to act as a vandal I can too?
         | 
         | https://crates.io/users/swmon
         | 
         | Edit: Lots of love for this user in the thread....
         | 
         | https://github.com/swmon/Charles-Crack/pull/1
        
           | brabel wrote:
           | While I can agree it's a dickhead thing to do, it seems to be
           | entirely within the rules of crates.io, so I think the hate
           | should go towards them, not someone who's just playing by the
           | rules.
           | 
           | Domain squatting has been a thing for decades, it's not like
           | no one expected this kind of thing would happen.
           | 
           | Just use org/author namespaces for crying out loud.
        
             | kibwen wrote:
             | I am a mild proponent of namespaces. But adding them means
             | you are now managing an identity layer, which is not
             | something that a volunteer organization wants to deal with.
             | In a flat namespace, users don't expect that a google-foo
             | library is actually from Google. But with namespaces,
             | people expect that everything under the google/ namespace
             | actually is officially supported by Google. So when the
             | initial rush to register names happens, who verifies that
             | Google owns the google/ namespace? And not just for Google,
             | but for every litigous trademark-defending organization
             | under the sun? And since the package repository is supposed
             | to be immutable, you don't want to wantonly delete crates
             | from namespaces that they originally registered. Likewise,
             | you can't go transferring ownership of a namespace from one
             | entity to another, because that violates the security
             | assumptions of your users who originally trusted the
             | original owner of the namespace and not the new owner.
             | 
             | Identity layers are hard. It's a problem worth solving, but
             | it's not going to happen on a volunteer basis. It needs
             | full-time paid employees to deal with this. Perhaps the
             | Rust Foundation will make such a thing a priority.
        
               | brabel wrote:
               | These problems you raise have been solved problems for a
               | long time.
               | 
               | Maven Central uses verified reversed domain names, so to
               | own `com.google` you need to prove you own the domain
               | `google.com`. It's not hard to do it.
               | 
               | Dart's pub does the same thing with verified publishers.
               | 
               | crates.io already requires user accounts from publishers,
               | so it already manages identity.
               | 
               | The only reason they don't require namespaces is to make
               | crate names look cooler and the barrier to entry non-
               | existen, so they can amass large numbers of packages in a
               | short time (then deal with the fallout only when it
               | becomes an emergency), I can't see any other reason
               | anywhere, including from the ones you've raised.
        
               | kibwen wrote:
               | _> crates.io already requires user accounts from
               | publishers, so it already manages identity._
               | 
               | It currently requires a Github account, but the registry
               | is not tightly coupled to Github. But once you introduce
               | Github identities as something that code is explicitly
               | depending on, now you're at the mercy of Github for the
               | rest of time. Furthermore, immutability is a desirable
               | property of package registries, and Github identities are
               | not immutable; Github allows people to change their
               | usernames and then allows any random person to snap up
               | the old username.
               | 
               |  _> The only reason they don 't require namespaces is to
               | make crate names look cooler and the barrier to entry
               | non-existen, so they can amass large numbers of packages
               | in a short time_
               | 
               | No, this is needlessly conspiratorial. Please exercise
               | some perspective. crates.io is run by volunteers, and
               | managing an immutable identity layer is not something
               | that the current crop of volunteers wants to commit to.
        
               | teddyh wrote:
               | When (not if) a domain name expires or changes ownership,
               | how is this detected and the connection re-verified?
        
             | y7 wrote:
             | What a weird line of reasoning. That something is allowed
             | by "the rules" doesn't make it morally justified. Assholes
             | should be called out on _their behavior_ , rather than
             | hating on the creators of a system that believe people to
             | act in good faith. Sure, those creators can take steps to
             | mitigate malicious behavior, but the onus is not on them.
        
         | moojd wrote:
         | This one is frustrating because this is an issue that has been
         | solved many times before and I hate seeing it repeated in every
         | new package manager. A vendor name should always be required
         | and the top level should be reserved for official/standard
         | packages.
         | 
         | I want all of the following from a package manager:
         | 
         | 1. Required vendor/namespace for third party packages
         | 
         | 2. No multiple package versions. If there is a version conflict
         | between transitive dependencies of a package because of semver,
         | you should not be able to install that package.
         | 
         | 3. Lock file and a separate 'install' command for installing
         | the locked versions and an 'upgrade' command for updating
         | versions via semver
         | 
         | 4. Upgrade command should support a --dry-run option that lists
         | the packages and versions that are to be updated and a --diff
         | that lets you preview the code changes.
        
           | clon wrote:
           | For all the animosity that PHP gets these days, every single
           | item on your list (granted, of very basic demands) aligns
           | with PHP's composer. I am surprised that Rust is that much
           | worse off than PHP in this regard.
        
             | moojd wrote:
             | I don't think composer has a diff option to dump the actual
             | code differences before you update yet but yes most of this
             | list comes from my past experience with composer. My
             | current company doesn't use PHP but I look back fondly at
             | how easy it was to audit my dependencies manually and be
             | explicit about upgrades and transitive dependencies.
        
               | clon wrote:
               | It does offer a diff option when you have local edits in
               | the /vendor (for whatever insane reason). Always assumed
               | it could be triggered manually as well. TIL.
               | 
               | I also love how easy it is to declare conflicts [1]. Some
               | sub sub sub dependency down the tree had a bad 0.0.1
               | release? Just declare a conflict and have the tool do the
               | work.
               | 
               | [1] https://getcomposer.org/doc/04-schema.md#conflict
        
           | epage wrote:
           | > 2. No multiple package versions. If there is a version
           | conflict between transitive dependencies of a package because
           | of semver, you should not be able to install that package.
           | 
           | I am grateful Rust allows this unlike C++ or Python. While I
           | ideally minimize repeat dependencies, it is a big help to not
           | be constrained to only on version. We've already had cases in
           | Rust where some people were overly restrictive on dependency
           | declarations (since Rust does block some versions as too
           | similar) and it has caused a bit of pain.
        
           | dahfizz wrote:
           | > 2. No multiple package versions. If there is a version
           | conflict between transitive dependencies of a package because
           | of semver, you should not be able to install that package.
           | 
           | I think this depends highly on your environment. In npm land,
           | where a typical project has hundreds or thousands of
           | dependencies of dubious quality, this would be a nightmare.
           | It guarantees that every single deployment will be different
           | and risky.
           | 
           | This model works much better in linux, where packages are
           | maintained by maintainers and there is not an explosion in
           | the dependency network. Especially on a Debian or Centos box,
           | you can be confident that upgrading packages won't break
           | stuff.
        
             | moojd wrote:
             | If you are in node land I highly recommend using 'yarn
             | install --flat' and I desperately wish this was the default
             | in npm from the beginning. It would have radically altered
             | the package development culture in a good way.
             | 
             | The way npm currently handles version conflicts is one of
             | the primary reasons why using npm is currently a nightmare.
             | The average node project will have dozens of abandoned or
             | ancient package versions precisely because allowing
             | multiple versions to exist means that these packages never
             | get forked or updated. Each one of those packages is a
             | ticking time bomb waiting to be taken over by a malicious
             | actor. Forking is a better solution than pulling in
             | unmaintained packages with out of date dependencies.
        
       | infogulch wrote:
       | I'd like to see a notification when the repo tag contents doesn't
       | match the cargo tag version.
        
       | pdimitar wrote:
       | There's zero backdooring involved anywhere in this article. His
       | most convincing argument seems to be "if your account as a
       | package maintainer is hijacked then bad things could happen" --
       | well yeah, thanks for the insight Sherlock.
       | 
       | I'd be genuinely excited to read objective and deeper analyses of
       | the Rust ecosystem in which I am looking to invest myself
       | further. I want to know what exactly I am getting involved with
       | so I'd welcome any good criticisms of it.
       | 
       | But not click-baity articles with almost zero substance inside.
       | He's basically repeating old lists of risks of human error.
        
       | ferdowsi wrote:
       | The anemic standard library in Rust always seemed like a disaster
       | waiting to happen. Javascript gets heaps of deserved criticism
       | for its standard library, but at I can generate a SHA-256 without
       | needing to pull in a third party library.
        
       | mullr wrote:
       | What are people doing about this on the client side? The solution
       | that comes to mind is to do all my Rust builds in a sandbox of
       | some kind, but with rust-analyzer involved, I'd likely have to
       | put my editor in there as well.
        
         | gpm wrote:
         | There's some work towards moving the scarier parts of rust
         | builds (e.g. procedural macros, that run arbitrary code) into a
         | wasm-based sandbox. E.g. [1]. Obviously doesn't make the final
         | artifacts safe to run though, and I also wouldn't trust LLVM to
         | have no bugs exploitable by feeding it bad code, but at least
         | it would raise the bar.
         | 
         | [1] https://github.com/dtolnay/watt
         | 
         | Edit: And someone on reddit brought up vscode's dev containers
         | [2], to move everything into docker. Obviously docker isn't
         | really a security sandbox, but again it raises the bar.
         | 
         | [2] https://code.visualstudio.com/docs/remote/containers
        
           | rectang wrote:
           | At first glance, watt looks like a substantial improvement
           | that would close the door on arbitrary code execution by proc
           | macro crates. Yes, please! While this may not solve the
           | general problem of package identity validation, it closes a
           | Rust-specific hole that hopefully doesn't need to exist.
           | 
           | Now if only `build.rs` could be nerfed...
        
             | duped wrote:
             | build.rs is particularly useful for Rust because it is
             | routinely used to compile C/C++ object files as a previous
             | step, which is crucial to having solid Rust to C/C++ FFI.
             | 
             | It is no different from a ./configure script, or other
             | prebuilt script. Lots of builds require these, and
             | "nerfing" it just makes building Rust harder. Cargo is
             | already a crippled build system that requires extensions
             | like cargo-make to be useful. Getting rid of something so
             | fundamentally required by modern software with no standard
             | fallback would be a massive blow to the ecosystem.
             | 
             | I really am not convinced that there is anything "scary"
             | about a build.rs file - other than that standard tools like
             | rust-analyzer find it sane to run external code during
             | initialization. Your language server shouldn't be coupled
             | to the build system and require it to run!
             | 
             | (And yes, Cargo is a build system - it's just a bad one)
        
               | rectang wrote:
               | _sigh_ , probably "nerfed" wasn't the greatest choice of
               | words... I'm writing such an FFI crate right now, and I
               | use a `build.rs`. I can still wish that the package
               | management system didn't have to fall back to running
               | arbitrary code, or that there was some way to sandbox
               | that code. That would make it easier for people to trust
               | my crate!
        
       | zelos wrote:
       | > How to protect?
       | 
       | > By pinning an exact version of a dependency, tokio = "=1.0.0"
       | for example, but then you lose the bug fixes.
       | 
       | Surely no one uses version ranges in production? Is the default
       | really not to use an exact version for crates?
        
         | Macha wrote:
         | The default is to use ^x.y.z so it'll pull in patch versions.
        
         | steveklabnik wrote:
         | The default is to declare ranges, but then you get a lockfile
         | after an initial build, and Cargo will use those exact versions
         | until you ask for changes.
        
       | 3r8Oltr0ziouVDM wrote:
       | We should switch to using pure functional languages by default.
       | Most of the packages don't need to do any side effects and only
       | perform pure calculations. In a pure functional language it is
       | obvious from function signatures if these functions are able to
       | perform side effects, so it's not possible to hide a backdoor
       | inside a pure function. An average project would depend only on a
       | few impure packages, such as a HTTP client or a framework,
       | therefore it would be much easier to verify the security (for
       | small impure packages you could just inspect their code yourself,
       | and bigger packages like frameworks would have many contributors
       | that check the code and strict policies about their security).
       | Languages like Rust and C++ for which the pure functional model
       | doesn't work should then only be used for performance critical
       | code, and projects written in impure languages should avoid
       | third-party dependencies as much as they can.
        
         | Findecanor wrote:
         | Another approach would be to harden the software supply chain
         | by requiring that dependencies and side-effects are
         | _entitlements_ in metadata that are visible and would need to
         | be approved by the programmer that imports the module.
         | 
         | There are already some frameworks out there who use signed
         | metadata and databases to track code and where code comes from.
         | But on the source code level, I think the metadata could just
         | be extracted from the existing Crate metadata and source code.
        
         | peterth3 wrote:
         | So, you're claiming that pure FP languages need less
         | dependencies than FP-adjacent languages like rust?
         | 
         | This is really interesting. Do you have a source to cite
         | proving this claim?
        
           | bertylicious wrote:
           | Parent only claimed that most Haskell packages are pure and
           | thus cannot execute impure side-effects. They didn't say
           | anything about the overall number of dependencies.
        
           | 3r8Oltr0ziouVDM wrote:
           | No. What I'm saying is that many of the dependencies in any
           | language don't need to perform side effects, they only do
           | pure calculations. For example a JSON parser takes a JSON
           | string and returns some data structures. It's a pure
           | function. However, in a language like Rust you can easily
           | hide malicious code that has access to network inside such a
           | function. In a pure functional language you can tell from the
           | signature of a function you're calling that it is indeed a
           | pure function and is guaranteed to not perform any side
           | effects. So it is safe to call any function from a third-
           | party dependency that doesn't do side effects (which you can
           | immediately see from the type signature) without even
           | inspecting the code.
        
         | frenchyatwork wrote:
         | I don't get how that would solve you problem at all. You can
         | implement a bitcoin miner using functional code then just add
         | an http client as a dependency for getting data to/from the
         | blockchain.
        
           | 3r8Oltr0ziouVDM wrote:
           | You can't perform HTTP requests from a pure function without
           | making it obvious in its signature that it does side effects.
           | For example in a language like Haskell:                 add :
           | Int -> Int -> Int       add x y = x + y
           | 
           | There is no way a function like this can run a Bitcoin miner,
           | all it can do is to return an `Int`. In order to do side
           | effects, a function must return a special `IO` type that
           | should then be returned from `main` (and only then these side
           | effects would be performed).
        
         | moonchrome wrote:
         | > An average project would depend only on a few impure
         | packages, such as a HTTP client or a framework, therefore it
         | would be much easier to verify the security (for small impure
         | packages you could just inspect their code yourself, and bigger
         | packages like frameworks would have many contributors that
         | check the code and strict policies about their security).
         | 
         | OK so just a random list of common packages a web app could use
         | that come to mind :
         | 
         | - HTTP server
         | 
         | - HTTP client
         | 
         | - Logging
         | 
         | - Database
         | 
         | - Distributed cache
         | 
         | - File storage/blob storage
         | 
         | - Email
         | 
         | - Push notifications/SMS if dealing with mobile
         | 
         | - Auth (eg. OAuth/OpenID Connect middleware)
         | 
         | - Background task management/queue
         | 
         | And then there's libraries that wrap access to external
         | services, specific protocol libraries like gRPC or GraphQL.
         | 
         | I would say the number of pure libraries that you reference
         | directly in a modern webapp is probably very low, that's all a
         | layer below.
        
           | 3r8Oltr0ziouVDM wrote:
           | Ok, but in Rust or NodeJS an HTTP server may depend on a
           | package A that depends on a package B that depends on a
           | package C that then introduces a backdoor in its 1.0.1
           | release. In a pure functional language you can quickly look
           | through dependencies of an HTTP server, and if it has zero
           | impure dependencies then you just need to trust the
           | developers of this one HTTP server package.
        
             | platinumrad wrote:
             | You seem to be suggesting that impure actions never depend
             | on the results of pure calculations.
             | 
             | Also System.IO.Unsafe exists.
        
       | verdverm wrote:
       | I wrote https://verdverm.com/go-mods/ to talk about ways Go
       | avoids some of these pitfalls. The forethought that went into `go
       | mod` is one of the reasons I like and trust Go
        
         | steveklabnik wrote:
         | A _tremendous_ amount of forethought was put into Cargo and
         | Crates.io. The difference is that many folks look at the same
         | problems and come to different conclusions about what to do to,
         | not negligence.
        
         | Groxx wrote:
         | I only see one that it avoids: domain names / URLs as import
         | paths makes ownership much more clear, and _slightly_ harder to
         | achieve typo-squatting... sometimes. And I do very much like
         | this part of go modules, it also helps decentralize the whole
         | system a fair bit. I sincerely hope it becomes the dominant
         | package-name strategy in time.
         | 
         | But lets pick another that seems _on the surface_ pretty likely
         | to be mitigated: source for downloaded-version X not matching
         | version X repo 's source, under "Malicious update" with cargo's
         | `--allow-dirty`. After all, goproxy pulls from git repos
         | directly, right? There's no --dirty flag or anything to push
         | random garbage.
         | 
         | That's still a problem! Git tags are mutable, as are git
         | repositories as a whole. You can _absolutely_ tag a malicious
         | version, get it into goproxy, and then change or remove the tag
         | and any associated commits. The goproxy doesn 't even store the
         | SHA for correctly-tagged versions, only the code and a checksum
         | of the code it saved, so finding the commit that it originally
         | pointed to can be difficult or impossible. You can download the
         | module and read the code from that, but that's true of any non-
         | binary dependency system. You can't _publish_ a change to an
         | already-published version, but that 's true of cargo too
         | (afaik) as well as most package hosts (afaik), though goproxy
         | takes a minor technical step further to make that accident-
         | resistant (or at least easily detectable. which is great,
         | everyone should do that).
        
           | verdverm wrote:
           | A module in Goproxy should be in the global SumDB, so if you
           | are consulting that (the default), even if someone managed to
           | get a retag in, it would fail the sumdb check. I suspect that
           | Goproxy, by virtue of running Go under the hood, consults
           | sumdb prior to adding to the proxy. As long as a tag was
           | fetched once, I would expect that any changes would be
           | caught. There are of course edge cases, such as custom domain
           | go get hosts are not all kept in the GoProxy, but their
           | content hash should be.
        
             | Groxx wrote:
             | yeah - in many other dependency systems, you only get
             | protection for versions _you use_ , as those are in your
             | lockfile. the public gosumdb helps prevent a few more cases
             | of _re-releasing_ something + you upgrading + pulling it
             | from a different provider (... which is mostly relevant due
             | to its more-distributed setup, if it treated goproxy as
             | canonical it 'd be unnecessary because it wouldn't contain
             | that re-release), but not "downloaded module does not match
             | repository".
             | 
             | I do think the sumdb setup is worth others copying, it's
             | relatively cheap to maintain and it does clamp down on some
             | issues. It also makes it much harder to revoke things
             | though, as you can't remove anything from it ever - after a
             | couple versions, Go finally added "redacted" versions, but
             | the need for that is partly a consequence of having the
             | permanently-immutable sumdb + not having a canonical
             | source. Unique self-inflicted pain -> unique workaround,
             | though it's all relatively reasonable and I think a net
             | benefit.
        
               | verdverm wrote:
               | Sigstore & Cosign are worth looking into as well.
               | GoReleaser supports those for compiled binaries.
               | 
               | SLSA is another
               | 
               | https://sigstore.dev/
               | 
               | https://slsa.dev/
        
               | Groxx wrote:
               | That seems like just code-signing? If so: yeah,
               | definitely, that should be supported by every packaging
               | system. And it's largely ridiculous that it isn't. It
               | removes the need to trust the packaging-host, so it's no
               | longer a giant target for exploits that can modify
               | _every_ package at once. Go, using domain names, is
               | probably in the best position to take advantage of this,
               | as it allows you to lean on domain ownership (and maybe
               | even use the same ssl certificate) rather than having to
               | trust-on-first-use or something.
        
       | caffeine wrote:
       | Seems like you could address with a super-crate that includes
       | "trusted" crate releases as "features"
       | 
       | That crate could involve some automation like:
       | 
       | * Checking that the code in the crate matches the code in Github
       | 
       | * Checking whether the latest commit is from a new committer, or
       | whether there is any code comitted by a user not in a whitelist,
       | 
       | * Checking whether the package has any known security advisories
       | 
       | * Checking that crate signatures match some whitelist
       | 
       | * Running a project that includes the crate in a sandbox and
       | seeing whether there are any files accessed, network accesses,
       | etc. that were not pre-whitelisted
       | 
       | New versions of included crates would have to go through this
       | battery of checks before they get bumped in the super-crate.
       | 
       | Crates that want to be included as features of super-crate or
       | that need to change/add significant functionality, or add
       | dependencies, would need to make a PR to update the relevant
       | whitelists, which could then be reviewed by the super-crate team
        
         | epage wrote:
         | This has come up several times in the past. One name for it was
         | stdx.
         | 
         | Some in the ecosystem are very cautious of picking winners and
         | losers, limiting the exposure to new break-out crates. Rarely
         | recommending crates for different problems. This comes at the
         | cost of making it a harder barrier to get involved because you
         | need to be "in the know" for what crates to use or avoid.
         | 
         | Another problem with stdx is if anyone uses types from this in
         | their public API, they are decoupled from the individual crates
         | semver constraints which makes it hard to know which breaking
         | changes from your dependency are a breaking change in your API.
        
       | loeg wrote:
       | You don't really need the '--allow-dirty' flag to do as the
       | author claims. There's no enforcement that the local git commit
       | is ever published to a public repo.
        
       | peterth3 wrote:
       | Discussion on /r/rust about this article:
       | 
       | https://www.reddit.com/r/rust/comments/qw3w01/backdooring_ru...
        
       | jynelson wrote:
       | > While it's possible to audit the code of a crate on
       | https://docs.rs on clicking on a [src] button, it turns that I
       | couldn't find a way to inspect build.rs files. Thus, combined
       | with a malicious update, it's the almost perfect backdoor.
       | 
       | Docs.rs has its own source view on /crate that's separate from
       | rustdoc's. For example, you can see the build.rs for boring-sys
       | on `https://docs.rs/crate/boring-sys/1.1.1/source/build.rs`.
        
         | richardwhiuk wrote:
         | You can also download the crate directly from crates.io
        
       | jrochkind1 wrote:
       | Most of these are common to other platform packaging systems, and
       | I'm not sure I've seen any especially interesting solutions to
       | them.
       | 
       | The macro-based ones are rust-specific and seem especially
       | devious and challenging to me.
        
         | ReactiveJelly wrote:
         | I think I/O isolation will be part of a solution. I'm
         | interested to see how Deno handles that.
        
           | devmunchies wrote:
           | Does deno allow you to scope the IO permission at a
           | dependency level?
           | 
           | This comment from 9 months ago indicate its only at the app
           | level. has it changed?
           | https://news.ycombinator.com/item?id=26090873
        
       | Lifelarper wrote:
       | > I'm not sure if it's by bots or real persons
       | 
       | The bot usage is a significant amount of the low level noise,
       | I've published things of no use to anyone and they always rack up
       | a lot of dl's despite no one practically using them for a long
       | time.
       | 
       | > Firstly, a bigger standard library would reduce the need for
       | external dependencies
       | 
       | There's years worth of the same arguments tiringly made over and
       | over again (same with namespacing) on the rust forum, everyone
       | has played their hand on this issue a dozen times now, the
       | community clearly has a majority stance on such things.
       | 
       | > A variant of the previous technique is to use the --allow-dirty
       | flag of the cargo publish command.
       | 
       | Please correct me if I'm wrong but thought that flag simply
       | allows uncommitted changes to be published, the source is still
       | availabile for anyone to view on crates.io
       | 
       | > We're sorry but this website doesn't work properly without
       | JavaScript enabled. Please enable it to continue.
       | 
       | Works perfectly fine for me. Maybe you couldn't serve me a gdpr
       | or something. Thankfully I can keep it turned off for now :)
        
       | rectang wrote:
       | Is there a way to buy into PGP identity-based controls for
       | crates.io packages? To say, "I trust the keys in this whitelist,
       | so trust packages signed by those keys."
       | 
       | > _Thirdly, using cloud developer environments such as GitHub
       | Codespaces or Gitpod. By working in sandboxed environments for
       | each project, one can significantly reduce the impact of a
       | compromise._
       | 
       | That's appealing but expensive. I wish I could effectively
       | sandbox a local developer machine. External boot drives, maybe?
        
         | gpm wrote:
         | Cargo-crev is that sort of web of trust, but it's really in
         | it's infancy.
        
           | rectang wrote:
           | Cargo-crev's writ seems to be much more expansive and
           | nebulous than security:
           | 
           | https://github.com/crev-dev/cargo-crev/wiki/Howto:-Create-
           | Re...
           | 
           | > _While it 's still open for debate, the current opinion is
           | that additional fields are not useful for the downstream
           | users. Instead they just complicate their life, putting the
           | burden of the decision from the reviewer onto them. At the
           | end, a downstream user of a review just wants to know: "is it
           | OK to use this package or not?". Your role as a reviewer is
           | to provide that judgment._
           | 
           | That irks me. I don't care about popularity contests, I only
           | care whether a crate is malicious or not. If it has security
           | _vulnerabilities_ , I can deal. But if downloading and
           | running a proc macro crate build runs gives an attacker
           | remote code execution and they install a keylogger on my dev
           | box, that's altogether different.
        
             | ChrisSD wrote:
             | If the crate is actively malicious then crates.io should be
             | informed immediately and the crate removed. Probably the
             | author too. If need be RustSec can issue a security
             | advisory.
        
       | carlhjerpe wrote:
       | Nobody is mentioning C#, but my experience there is that I rely
       | on a lot less dependencies and a rather big standard library from
       | Microsoft.
       | 
       | Microsoft has been splitting the standard library into separate
       | dependencies now, but they're still maintained by them and I feel
       | safe depending on them.
        
       ___________________________________________________________________
       (page generated 2021-11-18 23:00 UTC)