[HN Gopher] Backdooring Rust crates for fun and profit
___________________________________________________________________
Backdooring Rust crates for fun and profit
Author : cjg
Score : 284 points
Date : 2021-11-18 14:29 UTC (8 hours ago)
(HTM) web link (kerkour.com)
(TXT) w3m dump (kerkour.com)
| dane-pgp wrote:
| > Actually, my num_cpu crate has been downloaded 24 times in less
| than 24 hours, but I'm not sure if it's by bots or real persons.
|
| Presumably the author could include some payload which phones
| home with a randomly generated ID, to detect how many machines
| the package could take control over. That's probably more
| meaningful than trying to decide whether the package was
| downloaded by a "bot", and wouldn't involve any GDPR-breaking
| information.
| wizzwizz4 wrote:
| > _and wouldn 't involve any GDPR-breaking information._
|
| Actually, this would (in my non-legal opinion) be a GDPR
| violation. See https://gdpr-info.eu for details.
| dane-pgp wrote:
| You want me to read the entire GDPR? It's apparently 261
| pages long.
|
| https://www.enterpriseready.io/gdpr/how-to-read-gdpr/
| wizzwizz4 wrote:
| It's not _that_ long, and you don 't need to read the
| entire thing. The first few articles make it clear that you
| can't do this.
|
| Article 4
|
| > 'personal data' means any information relating to an
| identified or identifiable natural person ('data subject');
| an identifiable natural person is one who can be
| identified, directly or indirectly, in particular by
| reference to an identifier such as a name, an
| identification number, location data, an online identifier
| or to one or more factors specific to the physical,
| physiological, genetic, mental, economic, cultural or
| social identity of that natural person;
|
| Creating a random identifier to identify individuals means
| you're working with personal data. (And it's coupled with
| the IP address, and hence crude location data - and OS
| information, and metadata about their behaviour...)
|
| Article 6
|
| > Processing shall be lawful only if and to the extent that
| at least one of the following applies:
|
| None of them do. If you had _consent_ , then this would be
| fine, but the whole idea is going behind people's backs to
| secretly reprogram their computers to phone home, for your
| own curiosity. That's not allowed.
| dane-pgp wrote:
| There is, I think, some legal uncertainty about what it
| means for someone to be "identified indirectly", so a
| court might agree with you, but I think there is also a
| widely accepted opinion that if data can't be connected
| back to a specific natural person by the data
| controller/processor, then it is no longer "personal
| data".
|
| Here's what the University College London's guidance[0]
| says:
|
| "Once data is truly anonymised and individuals are no
| longer identifiable, the data will not fall within the
| scope of the GDPR and it becomes easier to use."
|
| Obviously in the system I am proposing, the data
| controller/processor would have to deliberately not log
| the IP address or a user agent string. It's true that
| timestamps could give an idea of which timezone the user
| is in, but that shouldn't necessarily reveal even the
| country the user is in.
|
| [0] https://www.ucl.ac.uk/data-protection/guidance-staff-
| student...
| CloselyChunky wrote:
| The easier, less invasive but also less accurate option would
| be to publish an empty crate with a random name that does not
| exploit typos (just some random junk) and check how often that
| crate is downloaded. You can assume that almost all downloads
| for this crate are bot downloads and just subtract that amount
| from the downloads of the typo-squatted crate
| rectang wrote:
| Do we really need hard proof of real downloads for an example
| like this? Typosquatting is obviously a scary problem
| regardless, with a history of successful exploitation in web
| domains, other package managers, etc.
| uncomputation wrote:
| IMO some simple (not easy, but simple) solutions would go a long
| way here.
|
| - Support name spacing
|
| - Support specifying non-crates.io server (docket does this)
|
| - Throw warnings when the Git tag (if applicable) contents do not
| match the Cargo upload. Rate limit package owners so they are
| encouraged to set their tags right the first time and not move
| around
|
| RE: compile time execution. This is a harder problem, common to
| any binary file distribution.
| steveklabnik wrote:
| The second is already something that exists.
| thorum wrote:
| I'm increasingly fatalistic about computer security. It seems
| like your options are carefully auditing all dependencies
| (difficult and maybe impossible if the dependencies are highly
| technical or the malicious code is sufficiently subtle or
| obfuscated) every time you update, not updating at all (which
| leaves you vulnerable to all the bugs and other security issues
| in the version you choose to pin), or not using dependencies at
| all (by spending months or years totally rewriting the libraries
| and tools you need, and of course your own code will have bugs
| too).
|
| Fixing the points addressed in this article helps by making it
| harder to slip these backdoors in, but will never be foolproof
| unless every single library has a maintainer with the skills to
| detect subtle bugs and security issues, who audits every line of
| code.
|
| Even then the marketplace for unreported zero day vulnerabilities
| means that there are probably undiscovered vulnerabilities
| somewhere in your dependencies (or in the code for your IDE or OS
| or Spotify app or mouse driver...) that can be exploited by
| someone.
|
| I'm reminded of the Commonwealth series by Peter Hamilton, in
| which the invading aliens have no machines, and quickly discover
| that ours are full of bugs that can be exploited to turn against
| us. I don't know what the solution is. Sandboxing your
| development in a codespace like Gitpod is a big improvement for
| sure, but even in Gitpod a lot of people import credentials and
| environment variables that can be stolen. (And what dependencies
| is Gitpod itself running?)
| remus wrote:
| > I'm increasingly fatalistic about computer security. It seems
| like your options are carefully auditing all dependencies
| (difficult and maybe impossible if the dependencies are highly
| technical or the malicious code is sufficiently subtle or
| obfuscated) every time you update, not updating at all (which
| leaves you vulnerable to all the bugs and other security issues
| in the version you choose to pin), or not using dependencies at
| all (by spending months or years totally rewriting the
| libraries and tools you need, and of course your own code will
| have bugs too).
|
| There is also the option of having trusted third parties review
| code. This is by no means an easy option but it does seem more
| feasible than everyone auditing every line of code they ever
| depend on. You do end up with spicy questions like who do we
| trust to audit code? Why do we trust them? How are they
| actually auditing this code?
| rectang wrote:
| The big problem is not bugs, not vulnerabilities, but
| malicious code inserted deliberately into packages published
| by attackers.
|
| One way to detect malicious code is line-by-line code review
| of published packages, but that's extremely laborious, even
| when done by third parties.
|
| What we really want to do is confirm that the package was the
| end product of an open source commit history, where commits
| were reviewed by a set of trusted authors (hey look third
| parties!) over time. That involves strong validation of
| publisher identities and cryptographic validation of the
| package contents to connect it to a commit history in a
| trusted public repository.
| worik wrote:
| But that costs money. Money that if not spent, goes straight
| to the profit line
| _tom_ wrote:
| Another option is reducing number of dependencies. Doesn't
| cure the problem but can cut the vulnerability surface by
| orders of magnitude.
|
| Transitive dependencies can, and frequently do result in The
| majority of the code in an application being unneeded.
|
| Using one function from a dependency can result in whole sets
| of libraries or even whole languages being included in the
| project.
| worik wrote:
| But there it can be so much better, or so much worse.
|
| Worse is Node.js (not bothering with the unanswerable question:
| Why Node.js?) thousands and thousands of package downloads.
| Long chains of transitive dependencies. A long storied history
| of security/reliability catastrophes.
|
| I love Rust. But I have always thought having the compiler
| download dependencies is a very bad idea. It would be much
| better if the programmer had to deliberately install the
| dependencies. Then there would be an incentive to have less
| dependencies.
|
| This is currently a shit show, because it is easier to write
| than read, to talk than listen. New generations of programmers
| refuse to learn the lessons of their forebears and repeat all
| their mistakes, harder, bigger, faster
| _tom_ wrote:
| I think that the rise of automated dependency resolution tools,
| like maven, has made this exponentially worse. It's routine for
| tools to to have hundreds or even thousands of dependencies,
| something that would never happen if you manually had to manage
| them.
|
| My .m2 directory has 4000+ jar files in it, for example.
|
| They make you more productive, but much more vulnerable.
| froh wrote:
| I think we have, as an industry, for a long time not seen the
| true value proposition of "Linux distributions". They do quite
| some boring and tedious security auditing, for example review
| setuid binaries to the point they drop from root into user
| privileges; and they backport security patches, so security
| updates are binary compatible drop in replacements.
|
| When a binary distribution is widely used, the beneift is
| shared bug fixing and hardening, the disadvantage is somewhat
| dated libraries.
|
| It's a model I understand.
|
| What I don't understand is this idea of bootstrapping
| infrastructure via curl https://..../setup.sh && ./setup.sh,
| and the equivalent import of "modules", whatever you call them
| in your language of choice, straight from the web.
| throwawaygh wrote:
| _> carefully auditing all dependencies (difficult and maybe
| impossible if the dependencies are highly technical or the
| malicious code is sufficiently subtle or obfuscated)_
|
| ...yeah, a business is responsible for the integrity of its
| supply chain. There 's nothing fatalistic about this. Running a
| business with potential liabilities is different from having a
| high school programming hobby.
|
| If you're using community distributions of open source software
| in a security-critical context (e.g., any machine that touches
| PII) then you should absolutely white-list dependencies and
| either (1) have internal auditing mechanisms in place for those
| dependencies or else (2) have good reason to trust the QA
| procedures of the underlying community (and still do some basic
| auditing on every update anyways).
|
| Everything else should be carefully sand-boxed and basically
| assumed to be pwned/pwnable.
|
| If some rando came up to your contractor and offered them free
| concrete for use in your foundation, and the contractor said
| yes without any due dilligence, you would have every right to
| sue that contractor out of existence.
|
| The www isn't a wild west anymore. The era where any middle
| schooler can build a six figure business by serving as the
| middle man between open source packages and end-users should
| probably come to a close. And I say that as someone whose
| middle school software freelancing business cleared lots of
| revenue by the end of college.
|
| I wonder if this could be a revenue model of OSS. Cyber
| insurance providers should probably stat weighing in on these
| supply chain issues soon.
| Groxx wrote:
| I'm gonna pimp my own complaint here:
| https://news.ycombinator.com/item?id=29125409
|
| I think library permissions systems would mitigate or
| effectively eliminate a _huge_ amount of these, and
| significantly raise the cost or reduce the targets of nearly
| all attacks.
|
| Libraries are, in practice, treated as black boxes. I think
| that's largely reasonable - that's _almost the whole point_ of
| leveraging someone else 's work. But our languages/etc do not
| allow doing that in any sane way. I think that's completely
| ridiculous.
| rvz wrote:
| > As Rust is designed for sensitive applications where
| reliability is important such as embedded or blockchain-like
| projects, it can raise concerns.
|
| This is why I get very concerned with Rust projects using tons
| and tons of external crates. Especially cryptocurrency projects
| using Rust.
|
| These sort of techniques can be used to compromise lots of them
| at once which in the very worst case can lead to loss of funds
| and is irreversible.
|
| Unfortunately we will see the same issues found in NPM be found
| in crates.io with cargo.
|
| Oh dear.
| natded wrote:
| This is an issue with any ecosystem. The alternative would be
| to have them in standard library which is silly.
|
| The actual solution I guess are domain specific languages.
| oytis wrote:
| With any ecosystem that has a packet manager. Try inserting a
| dependency backdoor into a C++ project where dependencies
| have to be managed manually (and consequently are few)
| ziml77 wrote:
| Of course if they need to be manually updated, there's a
| strong likelihood that they are using vastly outdated
| versions of their dependencies. Users could be wide open to
| unpatched exploits.
| oytis wrote:
| Not necessarily. C/C++ relies vastly on dynamic linking
| which means keeping your dependencies up to date will be
| outsourced to distro maintainers. You'll have to make
| sure that your package builds for new versions of the
| distro.
| notriddle wrote:
| C relies vastly on dynamic linking. C++ cannot
| dynamically link templates, so it's pretty common for a
| C++ library to be made entirely of header files.
|
| Also, that's really only true of a small number of Linux
| and BSD flavors. Applications shipped on Windows, macOS,
| Android, iOS, or any of the Linux "application bundle"
| systems like the Steam runtime, Docker, FlatPak, will
| deliberately avoid using globally-specified dependencies.
|
| It's also commonplace to avoid declaring dependencies in
| C by vendoring them, like how VLC basically includes its
| own implementation of a bunch of data structures [1], and
| the entire universe of single-file libraries [2].
|
| [1]:
| https://wiki.alopex.li/LetsBeRealAboutDependencies#gotta-
| go-...
|
| [2]: https://github.com/nothings/stb/blob/master/docs/stb
| _howto.t...
| oytis wrote:
| C++ ABI is PITA indeed, but exposing a C API for external
| linking is totally possible and is being actively used.
| Various cases of static linking (or header-only files) do
| indeed exist, but no sane C++ project (I'm not sure one
| can call ROS that without reservations) uses nearly as
| many static dependencies as typical Rust one does.
|
| > Also, that's really only true of a small number of
| Linux and BSD flavors.
|
| This is true on all major Linux distributions.
|
| > Applications shipped on Windows, macOS, Android, iOS
|
| If we are talking about security closed-source walled
| gardens are out of consideration right?
|
| > or any of the Linux "application bundle" systems like
| the Steam runtime, Docker, FlatPak
|
| This is sad indeed. Docker used right is just a distro
| inside a distro though, so anything that applies to
| distros applies here too.
| ziml77 wrote:
| That first link is a good analysis of the situation when
| it comes to dependencies. It's really not straightforward
| unless you are targeting a specific OS distribution.
| [deleted]
| cozzyd wrote:
| While vendoring is somewhat common (especially for C++
| header-only libraries), often dependencies are provided
| by the OS package manager and dynamically linked (you
| have to recompile if the ABI changes, and hope that the
| API remains more or less stable, but that's usually part
| of the contract of a "major version").
| rvz wrote:
| > This is an issue with any ecosystem.
|
| So repeating the same mistakes that NPM has into other
| package managers? We have therefore learned nothing at
| mitigating these supply chain issues then.
|
| > The actual solution I guess are domain specific languages.
|
| Any production real world examples of this?
| dathinab wrote:
| > crates.io matches the code on GitHub,
|
| There is no tight coupling between GitHub and cargo/crates.io
| (sure it uses GitHub internally but that is an implementation
| detail).
|
| But not only has it no tight coupling with GitHub it also doesn't
| require you to use git, you can use whatever version control you
| want and at worse you don't get support for "detect dirty
| repository".
|
| Similar git tags are fundamentally unreliable as you can always
| "move" some to any arbitrary commit.
|
| So IMHO the problem here is relying on code you didn't got from
| github which might not even use git to match a arbitrary tag on
| something on github which might not even be from the same author
| (but e.g. a mirror on GitHub from whatever VC the author uses).
|
| But uploads to crates.io are immutable and are source code
| uploads, so you can just review them.
|
| In general (independent of cargo) _do review the code you use_
| not some code you got form somewhere else which you hope /believe
| is the same.
| rectang wrote:
| Treating PGP signed commits as privileged and only pointing at
| them as opposed to mutable tags seems like it would help.
| dathinab wrote:
| I'm not sure it's actually possible, for varying reasons.
|
| Even if we ignore that cargo can be used with other version
| control systems I see some problems:
|
| - Validation must be done server side as everything the
| client does can be manipulated.
|
| - So the server would need to be able to make sure that some
| signature is valid for some commit.
|
| - But you don't upload commits, you uploads a sub-set of the
| checkout produced by the combination of the current commit
| and previous commits.
|
| - Uploading all of the diff of the current (or previous)
| commit is a no-go for various reasons (size, leaking internal
| code etc.).
|
| - Similar even if it would work, the repository on GitHub
| could still contains different code, the signature might not
| match, but who is checking the signature? The reviewer
| manually?
|
| What you maybe could do is e.g. signing the uploaded archive
| which in combination with e.g. a Yubikey would make it harder
| for attackers to upload malicious packages (if the authors
| aren't the attackers).
|
| You also could include some version id (e.g. git hash) in the
| signature, then review tools could check that the uploaded
| code matches the GitHub repository.
|
| But really the most important thing is to review the code you
| actually use (e.g. what is uploaded to crates.io) and not
| what you belive you probably should be using.
| rectang wrote:
| In the abstract, what I'm wishing for is definitely
| possible at least for some subset of packages.
|
| * Source code packages are cryptographically signed by at
| least one and ideally several identities which are
| verifiable via a web of trust.
|
| * Packages are associated with a public repository base URL
| known to the package manager and that base URL doesn't
| change without setting off alarm bells.
|
| * The source code of a package can be traced back and
| cryptographically connected to a public commit history at
| the public repository URL.
| PragmaticPulp wrote:
| > There is no tight coupling between GitHub and cargo/crates.io
| (sure it uses GitHub internally but that is an implementation
| detail).
|
| Cargo had a "locked" option that uses the URL and commit hash
| from the Cargo.lock file. If the crate, repository, or commit
| has changed then it won't build.
|
| This is what everyone uses for reproducible and secure builds,
| but it's not as commonly used for casual use.
| jeltz wrote:
| Why isn't that the default?
| Diggsey wrote:
| To be clear, even without `--locked`, cargo will use the
| dependencies from the lockfile, as long as the lockfile is
| not out of date compared to Cargo.toml.
|
| However, for development, you normally want cargo to update
| the lockfile after you change something in `Cargo.toml`
| (like adding a new dependency).
|
| The `--locked` option is particularly useful in CI though,
| where you want it to fail if the lockfile is out of date,
| rather than update the lockfile and continue.
| steveklabnik wrote:
| To spell it out in even more detail:
|
| I add foo = "1.0.0" to my Cargo.toml. The latest release
| of foo is 1.0.1. I invoke "cargo build." Cargo builds my
| project, using 1.0.1 (since "1.0.0" is short for
| "^1.0.0"), and records that in the lockfile.
|
| Now foo releases 1.1.0. I invoke "cargo build". Cargo
| uses the lockfile, and _nothing changes_. I still build
| with 1.0.1, because that 's what's in the lockfile.
|
| Now, let's say I go in and change my Cargo.toml to use
| 1.1.0. This is where the behavior differs:
|
| 'cargo build' will say "oh, you've changed your
| Cargo.toml, let's perform resolution again" and will
| update your Cargo.lock to have 1.1.0.
|
| 'cargo build --locked' will say "error, you have changed
| your Cargo.toml but the lockfile was not updated."
|
| That's why --locked is useful in CI; it will make sure
| that the lock you've committed is up to date. It doesn't
| imply that Cargo ignores the lockfile by default, only
| that it will update your Cargo.lock when you change your
| Cargo.toml, because that implies you're asking for a
| change in your dependencies.
| [deleted]
| dathinab wrote:
| While the thinks mentioned are a thing, his recommendations and
| interpretation of them seems often strange for me.
|
| > Typosquatting
|
| Well known problem of more or less any open repo, namespaces do
| not solve it, they just shift the problem, maybe even make it
| more complex.
|
| > Misleading name
|
| Again not cargo specific and not specific to flat namespaces
| either.
|
| Namespaces help with first party packages from the same authors
| (but then it's not too hard to check the authors, but annoying;
| Annoying is insecure). But namespaces do not help with 3rd party
| libraries. I.e. many (most?) `tokio-` libraries are 3rd party
| libraries made to work with tokio, not libraries from the `tokio`
| authors.
|
| > Transitive dependencies
|
| Again something inherent to all package managers.
|
| Through some ecosystems like npm are worse due to using more
| smaller packages.
|
| In my experience while most rust crates have many dependencies
| most of them are to the same set of packages (like thiserror,
| anyhow, serde, etc.).
|
| Anyway trying to avoid unnecessary dependencies is generally not
| a bad idea.
|
| > "x.x.1" Update
|
| Basically: If you update a dependency you update a dependency,
| surprise.
|
| Anyway using a `=` dependency is a anti-pattern which produces an
| endless amount of headaches and incompatibilities especially with
| libraries using `=` imports. There is a reason lock-fiels exists
| and `cargo update` doesn't run by default.
|
| > Malicious update
|
| This point make no sense as:
|
| - cargo is not specific to the version control system you use it
| has some helpers for some vc systems, but that's it.
|
| - git tags are settable arbitrarily (and can be "moved").
|
| - You get the source from crates.io, not GitHub.
|
| So the only way this can be a security vulnerability if you
| believe that reviewing not the source you use but source code
| from a different source is a good idea, while also trusting git
| tags while not fully trusting the author who sets the tags????
|
| Just review the source code which you use, uploads to crates.io
| are immutable and you can download and read them.
|
| > Run code before main
|
| Neet to point it out, but not specific to either rust and even
| less to cargo/crates.io. It's specific to the systems binary file
| format and how binaries are executed.
|
| > principles of Rust is no life before main
|
| A design principal for the rust language, not a security
| statements.
|
| Anyway review the source code you use, not source code form a
| different source you believe should be the same (unspecific of
| rust).
|
| Rust could also improve on this by warning if (transitive)
| dependencies links to .ctor or similar.
|
| > Malicious macros
|
| Code generation is an interesting aspect to analyze across
| languages. Most have it, some like rust at compiler time and
| first calls, other at compiler time but second class and others
| at runtime (e.g. Java reflections).
|
| It's definitely an area where rust has a lot of potential to
| improve (e.g. default-wasm sandbox most proc-macros or similar).
|
| It's also unavoidable in some cases (e.g. building external
| dependencies).
|
| Additionally I would argue you probably could find a way to have
| code which somehow allows you to start running code when it's
| compiled due to you crafting the code in a way which triggers
| buffer overflows and similar in the underlying compiler (e.g.
| LLVM).
|
| You also tend to run tests...
|
| So theoretically you always should sandbox your IDE/project, even
| if you develop for a language which doesn't has compiler time
| code generation.
|
| As a side-note vscode ask you if you trust a project you open, if
| not it will change some settings for the project and rust-
| analyzer can be configured to no run build-scripts and proc-
| macros. (I haven't tested if not trusting automatically also
| makes rust-analyzer not run things.).
|
| > a bigger standard library would reduce the need for external
| [..]
|
| no, it doesn't reduce the need. See python. Or see how even some
| features in rusts standard library are frequently not used but
| instead externally libraries are used, e.g. parkinlot or
| crossbeam.
|
| Similar having 10 instead of 1 dependency might not decrease
| security if that 10 are a bundle from the same author in the same
| CI and have not (relevant) more code then if they where just 1
| dependency.
|
| What you want is reducing the number of trusted entities (!=not
| packaged) and trusted lines of code.
|
| So having a group which manages a set of widely used packages
| with tight security would roughly be as helpful for the problem
| as having a bigger standard library, but much more practical. To
| some degree the rust nursery was going in that direction (but
| didn't reach that goal).
|
| > Rust supports git dependencies.
|
| It also allows pinning versions in Cargo.toml, pinning versions
| in lock-files and is immutable.
|
| Git dependencies are prone in making you miss security updates,
| similar 3rd party audits are likely done on the code uploaded to
| crates.io, not the one on GitHub.
|
| If you want to review all dependencies, and make sure to use them
| etc. then it's probably best to vendor all of them, and set
| thinks up so that your dev system can only see/reach vendored
| packages, while from time to time updating them based on
| crates.io and on diffs of the crates.io uploads (not GitHub).
| cntlzw wrote:
| How do rust crates compare with something like maven or npm? It
| looks like some issues for example Typosquatting can be done in
| all of these dependency managers.
| junon wrote:
| npm has some guards for typosquatting. They're annoying when
| you run into them but I appreciate that they're there. I have
| no idea how effective or extensive they are, though.
| Ajedi32 wrote:
| Yep, supply chain attacks are a near-universal problem with
| programing language package managers.
|
| I think there's a lot of room for improvement here. Some good
| low-hanging fruit IMO would be to:
|
| 1. Take steps to make package source code easier to review.
|
| 1.1. When applicable, encourage verified builds to ensure
| package source matches the uploaded package.
|
| 1.2. Display the source code on the package manager website,
| and display a warning next to any links to external source
| repositories when it can't be verified that the package's
| source matches what's in that repo.
|
| 1.3. Build systems for crowdsourcing review of package source
| code. Even if I don't trust the package author, if someone I
| _do_ trust has already reviewed the code then it's probably
| okay to install.
|
| 2. Make package managers expose more information about who
| exactly you're trusting when you choose to install a particular
| package.
|
| 2.1. List any new authors you're adding to your dependency
| chain when you install a package.
|
| 2.2. Warn when package ownership changes (e.g. new version is
| signed by a different author than the old one).
|
| Long-term, maybe some kind of sandbox for dependencies could
| make sense. Lots of dependencies don't need disk or network
| access. Denying them that would certainly limit the amount of
| damage they can do if they are compromised, provided the host
| language makes that level of isolation feasible.
| ansible wrote:
| I like all these ideas.
|
| > _Long-term, maybe some kind of sandbox for dependencies
| could make sense. Lots of dependencies don 't need disk or
| network access._
|
| Just like with Android permissions, we could audit the crate
| sources to list out what functions it uses (out of the
| standard library or where ever) and provide an indication of
| that this particular crate is capable of.
| UncleMeat wrote:
| This is a strategy, but it typically falls apart against
| clever attackers who are targeting you specifically.
| Hackers have been performing return-to-libc attacks forever
| where they don't actually get to write any code at all,
| just sequence code that already exists in your binary.
|
| Java also tried this in a slightly more rigorous manner
| with the SecurityManager and that just ended up being a
| botch.
| Ajedi32 wrote:
| Yeah that's why I said it really depends on the host
| language to make such sandboxing feasible. If you're
| using a language that lets code write arbitrary data to
| arbitrary memory locations, implementing a secure sandbox
| is going to be pretty tricky.
| dane-pgp wrote:
| For what it's worth, this Principle Of Least Authority /
| object-capability model is being attempted in the
| JavaScript ecosystem with SES (Secure ECMAScript).
|
| https://agoric.com/blog/technology/ses-securing-javascript/
|
| https://medium.com/agoric/pola-would-have-prevented-the-
| even...
| _tom_ wrote:
| Analysis tools that show where large transitive dependencies
| could be avoided would help.
|
| Right now there is no feedback to encourage people to not
| have HUGE lists of dependencies. And for trivial reasons.
| This compounds the problem hugely.
|
| If you have three dependencies, verifying is feasible. If you
| have 3,000, it is not.
| tetha wrote:
| Maven Central is somewhat resilient against this. In the java
| world, an artifact is identified by a group-id, an artifact-id
| and a version, and some technical stuff. The group id is a
| reversed domain, like org.springframework.
|
| If you want to upload artifacts with the group id
| "org.springframework", you first have to demonstrate that you
| own springframework.org via a challenge, usually a TXT record
| or some other possibilities for github group-ids and such.
|
| It's not entirely bulletproof, because you could squat group-
| ids "org.spring" or "org.spring.framework" (if you can get that
| domain). However, once a developer knows the correct group id
| is "org.springframework", you need additional compromises to
| upload an artifact "backdoor" there.
|
| Edit - and as I'm currently seeing, PGP signatures are also
| required by now.
| [deleted]
| brabel wrote:
| It's a hell of a lot harder to squat namespaces as you need
| to either spoof or steal or buy one domain per namespace,
| which is not trivial.
|
| Maven Central has require PGP signatures since the beginning
| as far as I know! In the olden days, it didn't use HTTPS
| though (which has been fixed for several years now), so
| unless you validated the signatures and kept track of the PGP
| keys, you could still run into trouble.
| kibwen wrote:
| _> It 's a hell of a lot harder to squat namespaces as you
| need to either spoof or steal or buy one domain per
| namespace, which is not trivial._
|
| This introduces a different security wrinkle, as domain
| names need to be continually renewed. What does Maven do to
| prevent unauthorized transfer of namespace ownership when a
| domain lapses?
| ChrisSD wrote:
| These do all seem to be things that apply to most package
| managers of this kind. So it would be good if Rust could find
| solutions that can be applied more broadly.
| typicalbender wrote:
| I haven't thought this through at all but are you aware of any
| package repositories that do something like levenshtein
| distance between package names maybe combined with a heuristic
| on common mistyped characters to not allow typosquatting?
| Buttons840 wrote:
| Are there any tools that can scan my dependencies and point
| out names that are typos of older or more popular packages?
|
| Something like: you said "times", did you mean the older and
| more popular package "time"?
| brabel wrote:
| Yes, they do that in Dart's pub [1].
|
| They also have the concept of verified publishers[2], which
| is pretty neat (similar to Maven Central), and keep track of
| a score for each package (e.g.
| https://pub.dev/packages/darq/score) including up-to-date
| dependencies and result of static analysis.
|
| Dart is doing a lot of things right.
|
| [1] https://pub.dev/
|
| [2] https://dart.dev/tools/pub/publishing#verified-publisher
| pornel wrote:
| https://lib.rs/cargo-crev tries to address this.
|
| It allows you to review the actual published source of your
| dependencies. It then can check whether your project only uses
| reviewed dependencies.
|
| Reviewing everything is of course a lot of work, so there's an
| option to mark crate owners as trusted, and also reuse code
| reviews made by people you trust.
| hn8788 wrote:
| Stuff like this is why the place I work for decided not to use
| Rust as a replacement for future C projects. The devs evaluating
| it loved the language, and said they were more productive with it
| than they were with C, but being forced to either use no external
| dependencies, or audit tons of crates published by random people,
| made it a non-starter.
| smabie wrote:
| Oh right because C has a very rich standard library
| smoldesu wrote:
| So they'd prefer to manually grab header files and audit those
| instead?
| lytedev wrote:
| Don't you have the same problem with C?
| Diggsey wrote:
| There's going to be a risk to running someone else's code. There
| are two factors here:
|
| 1) Do I trust the code I think I'm running. 2) Am I actually
| running the code I think I'm running.
|
| With (1) there's not really any way around it: someone or
| something has to review the code in some way.
|
| Even the suggestion to have a larger standard library doesn't
| _really_ address it: with a larger standard library the rust
| project needs more maintainers, and it might just get easier to
| get vulnerabilities into the standard library.
|
| Someone could build a tool that automatically scans crates
| uploaded to crates.io. It could look for suspicious code
| patterns, or could simply figure out what side-effects a crate
| might have, based on what standard library functions it calls,
| and then provide that information to you. For example, if I'm
| looking for a SHA256 crate, and I notice that the crate uses the
| filesystem, then I might be suspicious.
|
| With (2) there are some easier options, such as making it easier
| to download or browse the contents of a crate directly from
| crates.io, or have a tool to show the full dependency source diff
| after a `cargo update`. For initially installing the crate, the
| number of downloads is a pretty good indicator of "is this really
| the crate I meant?".
| curun1r wrote:
| I'm not sure I agree with your fatalistic take on (1). As an
| example, the proof-of-concept proc macro attack from the
| article could be addressed by running proc macros in a wasm
| sandbox. I know there's been some exploratory work done towards
| that.
|
| Similarly, all of the attacks that execute code at compile time
| are mostly addressed currently by building code in a Docker
| container. It's not a perfect security measure, but it greatly
| increases the probability that a build.rs or proc macro attack
| will fail.
|
| Your (1) strikes me as something that can never be solved to
| provide 100% safety, but that there are partial solutions
| which, in practice, can significantly reduce the attack
| surface. And hiding behind "fundamentally unsolvable problem"
| will get in the way pushing for the meaningful half-measures
| that offer some degree of real-world protection.
| Diggsey wrote:
| > As an example, the proof-of-concept proc macro attack from
| the article could be addressed by running proc macros in a
| wasm sandbox. I know there's been some exploratory work done
| towards that.
|
| Sure, you can (and we absolutely should) sandbox the build
| process itself. I meant "running other people's code outside
| of a sandbox", and should have specified that.
|
| The problem is that at some point you actually want to run
| the code you compiled, and then proc-macro exploits can still
| do whatever.
|
| > hiding behind "fundamentally unsolvable problem" will get
| in the way pushing for the meaningful half-measures that
| offer some degree of real-world protection.
|
| I didn't say they were unsolvable, I said that solving them
| requires someone or something to review the code. That's the
| only way you can gain trust that code does what it says it
| does. I even suggested some possible "meaningul half-
| measures" that could be implemented.
| kevincox wrote:
| > could be addressed by running proc macros in a wasm sandbox
|
| Kind of. That will stop the compromise on build, but odds are
| then you run the code anyways so injecting the code into the
| executable is _almost_ as good. I guess if you are building
| an application that is always run in a sandbox anyways (like
| a wasm application that never sees sensitive data) then
| sandboxing proc macros could be good enough but I suspect
| that is a very rare case.
| curun1r wrote:
| My comment specifically called it out as a half measure.
| But that doesn't mean it's not worth doing.
|
| One of the ways that I'd see a proc macro attack playing
| out in real life would be to attempt to obtain the API
| token from cargo login to allow the attacker to publish a
| malicious version of a popular crate. But if cargo login is
| only run on the CI instance responsible for publishing and
| developers test their code on their individual dev
| machines, the attack fails.
|
| We shouldn't worry about making security 100% effective.
| Rather, we should be concerned with stopping as many
| different attacks as we can and providing users with as
| many building blocks as possible to let them protect
| themselves. There will always be vulnerabilities. But using
| that as an excuse for inaction isn't helpful.
| pjmlp wrote:
| The sandbox doesn't protect against injection of malicious
| code.
| PragmaticPulp wrote:
| I use "cargo ... ---locked" to install things using the
| dependencies from the Cargo.lock file, which includes specific
| commit hashes for dependencies. Avoids things like the 0.0.1
| problem or even replaced crates. Need to be careful to watch for
| actual security updates, though.
|
| I really wish crates.io would have at least launched with a name
| spacing feature. This wouldn't solve every spoofing or
| typosquatting issue, but it would go a long way toward improving
| the situation.
|
| There's a separate issue of crates.io squatting. One person
| famously registered hundreds (or thousands? Tens of thousands) of
| common words as crate names on crates.io and has been squatting
| them ever since. Those names are effectively unavailable for use
| but also completely useless because they don't contain anything.
|
| It's also becoming a huge problem for abandoned crates. New forks
| have to choose a completely different, less intuitive name
| because they can't just namespace their alternative. As old
| crates get abandoned, this leads to weird situations where the
| newest, best maintained crate has the least obvious crate name.
| It takes work to find the good crate some times because the best
| named crates might just be the oldest, most abandoned ones
| mikepurvis wrote:
| I feel like "single shared namespace" works best when the
| ecosystem is managed collaboratively, with a clear leadership
| to make calls on who gets what name-- for example, something
| like a Linux distro, where there are Replaces/Provides metadata
| specifically to facilitate these kinds of transitions and avoid
| being stuck forever with crappy legacy nonsense.
|
| But this doesn't work at all in a free-for-all environment like
| PyPI, NPM, or Crates, where anyone can just grab a name and
| then have it in perpetuity.
|
| IMO the Docker ecosystem got this right, with baking in a
| domain name as part of the container, and insisting that
| everyone on docker.io use a vendor/product convention. This
| meant that the toplevel namespace was reserved for them to
| offer (or delegate the offering of) specific blessed container
| images, much more in line with how distro packaging might work.
| # Get the "official" image, whatever that means (but you trust
| Docker Inc, so yay). docker pull nginx #
| Get an image supplied by a specific vendor. docker pull
| bitnami/nginx # Get an image from a different
| server altogether; maybe it's your company, or you don't trust
| Docker Inc after all? docker pull
| quay.io/jitesoft/nginx
|
| Maybe the big-flat-namespace thing is still a years-later
| reaction against huge and unnecessary hierarchies in Java land?
| I think the ideal is not to permit infinite depth, but perhaps
| to insist on 2-3 levels.
| roywashere wrote:
| Similarly, Perl has CPAN and system where if a module is
| unmaintained and the author unresponsive, new maintainers can
| be granted permission by admins to take over. This ensures
| the namespace is preserved, peoples dependencies keep getting
| updated.
| kbenson wrote:
| It's worth noting that this behavior is also a sticking point
| with some users and distros, and I believe was at least part
| of the reasoning behind podman being developed by Red Hat.
|
| Sometimes you don't want some third party controlling and
| blessing what's okay, you want to explicitly control that by
| only allowing sources you control so you can't even
| accidentally use that third party.
|
| Different groups have different use cases and it's hard to
| support them all well.
| mikepurvis wrote:
| I definitely see both of sides of it-- on the one hand,
| it's nice that "docker pull nginx" is _always_ shorthand
| for "docker pull docker.io/_/nginx" and whether or not it
| is isn't dependent on some config file somewhere telling
| you what your primary registry is. We solved this for
| ourselves internally by just always using fully qualified
| names (containers.corp.com/foo) and not worrying too much
| about it.
|
| But I can also see how this is absolutely a power grab by
| Docker Inc. For a company which has given away almost
| everything and has basically no moat in terms of
| technology, ecosystem, or thought-leadership, I can
| understand them wanting to at least retain the final say on
| which containers are which.
|
| And of course, given that perspective, it's clear why a
| company like Red Hat would chafe against this, particularly
| when it's clear from the architectural differences in
| Podman (daemonless, rootless) that they were looking for an
| excuse to do it over anyway.
| marcosdumay wrote:
| You mean you want to override or replace the global name
| registry?
|
| I can't imagine that getting any harder by the global vs.
| scoped name distinction. It's just an extra feature.
| kbenson wrote:
| It's a feature docker refused to implement, and is one of
| the reasons Red Hat created podman, which is a drop in
| docker replacement (you can alias it to docker for the
| most part), but it allows full control over what repos
| are checked.
|
| For docker, as I understand it to get something
| functionally similar you have to define a mirror for
| docker.io that's not really a mirror and then prefer that
| mirror and/or disallow traffic to docker.io in some
| fashion.
| infogulch wrote:
| I've heard this argument a million times, but I really
| don't get it. If you want to control which images are
| deployed, you want to be running your own image registry
| anyway. So you already have full control: just limit
| images to ones in your private registry with its fully
| qualified name. Why do you also insist on being able to
| deploy with 'short' names? Arguably it's better because
| users can't confuse "enterprise customized ubuntu:20.04"
| with the public one and bother open source maintainers
| with internal issues.
| kbenson wrote:
| Nobody is insisting on being able to deploy with short
| names (or at least not the people that actually care
| about this). They are insisting that someone screwing up
| and using a short name when they shouldn't doesn't go out
| to a third party that's not explicitly trusted. Making
| this about short names is completely missing the point.
|
| Requiring docker to be run with a "mirror" which isn't
| actually a mirror of docker.io and making it use your
| non-mirror mirror so docker.io can't be reached and
| everything still works is just a hacky way to trick
| docker into working in what many people think is a very
| valid way - where no external resources _can_ be used as
| a container not just as a matter of policy, but as a
| matter of capability.
|
| Put simply, it's not about replacing the default repo,
| it's about _excluding all external repos_. Requiring they
| all use long names would be a perfectly acceptable trade
| off for most orgs I think, if that was an option offered.
| krinchan wrote:
| I feel like namespaces came much later than Docker Hub, but I
| can't really find a good historical record of docker registry
| features (or at least easily searchable ones). The global
| namespace was either forcibly cleaned or delegated to who was
| currently publishing them and projects raised issues if that
| wasn't them. I vaguely remember annoyance that I had to
| publish my toy containers to the global namespace. I could be
| wrong though.
| mikepurvis wrote:
| So per [1] it looks like Docker Hub launched in mid-2014.
| Admittedly my earliest pushes to there are more like the
| late-2015 timeframe, but definitely by that point I was
| pushing to namespace under my username. My impression had
| always been that the structure of Docker Hub was meant to
| mirror Github in terms of an org/repo scheme.
|
| [1]: https://www.docker.com/blog/announcing-docker-hub-and-
| offici...
| dmurray wrote:
| Github generally has two levels (organization and repository)
| which works pretty well.
| hollerith wrote:
| It works pretty well, but might not if making org2/repo55 a
| copy of org1/repo55 (i.e., forking) weren't free for the
| user (namely, org2) and maybe a venture-backed startup can
| afford to keep it free for the user, but a non-profit
| cannot.
| Lifelarper wrote:
| https://users.rust-lang.org/t/name-squatting-on-the-crates-i...
|
| This a very old discussion if you keep digging through the
| links.
| ansible wrote:
| I hope that the Rust Foundation (which finally exists) can
| put in some time / resources to help fix this.
|
| I totally get that maintaining crates.io is a time suck for
| all involved, and mediating disputes would increase that even
| more.
|
| But at some point, this is going to need to be addressed.
| rectang wrote:
| My understanding is that validating identity for package
| authors is a hard problem thus expensive to solve robustly, and
| the crates.io folks have hitherto deferred tackling it in
| earnest. That is arguably a responsible approach up to a point,
| in that they haven't committed prematurely to something half-
| baked.
|
| As described elsethread, there is prior art -- Maven's identity
| verification is substantially better:
| https://news.ycombinator.com/item?id=29266591
|
| Validating ownership of a namespace reliably enough that it is
| difficult to spoof is tough. It's possible for PGP creds to be
| stolen. But then at least the keys can be revoked, and old
| packages signed with a new key.
| miohtama wrote:
| I believe Debian Maintainer process and its keyring is one,
| if not only, project that gets this as good as it can get.
|
| https://wiki.debian.org/DebianMaintainer
| not2b wrote:
| Robustly verifying identity isn't enough, because a developer
| with an outstanding reputation could lose their credentials
| to a spearphishing attack, and the attacker could then modify
| crates using stolen credentials.
| rectang wrote:
| I agree that authentication is not sufficient on its own,
| but I argue that it's _necessary_ as part of the solution.
|
| A source package should meet the following criteria:
|
| * Package contents match exactly the source tree at a PGP
| signed commit at a public repository URL. This must be
| verified before the package is made available through the
| package manager.
|
| Now, if I am whitelisting PGP keys, an attacker _needs_ to
| steal creds to get something by me.
|
| Ideally you want multiple signatures by trusted keys prior
| to publication. Each additional signature makes it
| significantly less likely that a package is provided by a
| malicious attacker.
|
| EDIT: Hmm, how about package signing parties held over
| videochat? If we already know each other and you tell me
| that a particular package was created by you, I sign it.
| PragmaticPulp wrote:
| They don't need to validate anything, though. Just treat the
| namespace as something that can be claimed exactly the way
| that a crate name can be claimed.
|
| Only owners of the namespace can add crates to that
| namespace, in the same way that only people who own a crate
| name can publish to that crate.
| kibwen wrote:
| If you're not validating anything, then that violates the
| assumptions of users who see an amazon/ namespace that
| isn't officially associated with Amazon. We don't even need
| to presume malicious or squatting intent; the entire left-
| pad fiasco was precipitated by a person who innocently
| registered "kik", and then later a company named Kik
| demanded that NPM transfer ownership to them. This happened
| with a package, but it would just as easily happen with a
| namespace. Once you introduce an identity layer, people
| start to expect identity validation. It's a problem worth
| solving, but crates.io would need to move beyond being
| merely staffed by part-time volunteers to solve it.
| anderskaseorg wrote:
| > _This happened with a package, but it would just as
| easily happen with a namespace._
|
| But that's exactly the point. This is equally a problem
| for packages and namespaces, so it should not be
| considered a reason to avoid adding namespaces to a
| system that already supports packages.
|
| And namespaces do help. Yes, maybe the user has to
| externally validate that the namespace is registered to
| the owner they expect, just like they would for an
| individual package. But the difference is that, with a
| namespace, they've now validated the ownership of every
| package inside it and don't need to repeat this process
| for each package.
| kibwen wrote:
| _> But that's exactly the point. This is equally a
| problem for packages and namespaces, so it should not be
| considered a reason to avoid adding namespaces to a
| system that already supports packages._
|
| I think this is missing that crates.io _doesn 't_
| transfer packages between owners. That NPM decided that
| it was acceptable to unilaterally transfer ownership of
| the kik package from the original maintainer was an lapse
| of judgment on their part. It would be unprecedented for
| crates.io to begin doing so.
|
| _> And namespaces do help._
|
| I agree with this entire paragraph and have made these
| same arguments in favor of namespaces before. As I
| mentioned, I am mildly a proponent of namespaces. But we
| can't delude ourselves into thinking that an identity
| layer is simple to maintain, regardless of how much we
| want namespaces. It's a messy social problem.
| ziml77 wrote:
| The lack of namespacing in crates.io has always been a strange
| decision to me. I hope at some point they decide that it's
| worth it to introduce namespaces. The could use some an
| aliasing mechanism to avoid breaking any existing package
| references.
| jamincan wrote:
| From previous times this has come up, I was led to believe
| that it's not so much that the maintainers don't want
| namespaces as it's a thankless and overwhelming task just
| maintaining the status quo, and so they don't have the time
| and resources to implement something.
|
| Hopefully this is one of those issues that the Rust
| Foundation can direct some funding toward as the entire Rust
| community would be all the better for it.
| belter wrote:
| I think if somebody decides to act as a vandal I can too?
|
| https://crates.io/users/swmon
|
| Edit: Lots of love for this user in the thread....
|
| https://github.com/swmon/Charles-Crack/pull/1
| brabel wrote:
| While I can agree it's a dickhead thing to do, it seems to be
| entirely within the rules of crates.io, so I think the hate
| should go towards them, not someone who's just playing by the
| rules.
|
| Domain squatting has been a thing for decades, it's not like
| no one expected this kind of thing would happen.
|
| Just use org/author namespaces for crying out loud.
| kibwen wrote:
| I am a mild proponent of namespaces. But adding them means
| you are now managing an identity layer, which is not
| something that a volunteer organization wants to deal with.
| In a flat namespace, users don't expect that a google-foo
| library is actually from Google. But with namespaces,
| people expect that everything under the google/ namespace
| actually is officially supported by Google. So when the
| initial rush to register names happens, who verifies that
| Google owns the google/ namespace? And not just for Google,
| but for every litigous trademark-defending organization
| under the sun? And since the package repository is supposed
| to be immutable, you don't want to wantonly delete crates
| from namespaces that they originally registered. Likewise,
| you can't go transferring ownership of a namespace from one
| entity to another, because that violates the security
| assumptions of your users who originally trusted the
| original owner of the namespace and not the new owner.
|
| Identity layers are hard. It's a problem worth solving, but
| it's not going to happen on a volunteer basis. It needs
| full-time paid employees to deal with this. Perhaps the
| Rust Foundation will make such a thing a priority.
| brabel wrote:
| These problems you raise have been solved problems for a
| long time.
|
| Maven Central uses verified reversed domain names, so to
| own `com.google` you need to prove you own the domain
| `google.com`. It's not hard to do it.
|
| Dart's pub does the same thing with verified publishers.
|
| crates.io already requires user accounts from publishers,
| so it already manages identity.
|
| The only reason they don't require namespaces is to make
| crate names look cooler and the barrier to entry non-
| existen, so they can amass large numbers of packages in a
| short time (then deal with the fallout only when it
| becomes an emergency), I can't see any other reason
| anywhere, including from the ones you've raised.
| kibwen wrote:
| _> crates.io already requires user accounts from
| publishers, so it already manages identity._
|
| It currently requires a Github account, but the registry
| is not tightly coupled to Github. But once you introduce
| Github identities as something that code is explicitly
| depending on, now you're at the mercy of Github for the
| rest of time. Furthermore, immutability is a desirable
| property of package registries, and Github identities are
| not immutable; Github allows people to change their
| usernames and then allows any random person to snap up
| the old username.
|
| _> The only reason they don 't require namespaces is to
| make crate names look cooler and the barrier to entry
| non-existen, so they can amass large numbers of packages
| in a short time_
|
| No, this is needlessly conspiratorial. Please exercise
| some perspective. crates.io is run by volunteers, and
| managing an immutable identity layer is not something
| that the current crop of volunteers wants to commit to.
| teddyh wrote:
| When (not if) a domain name expires or changes ownership,
| how is this detected and the connection re-verified?
| y7 wrote:
| What a weird line of reasoning. That something is allowed
| by "the rules" doesn't make it morally justified. Assholes
| should be called out on _their behavior_ , rather than
| hating on the creators of a system that believe people to
| act in good faith. Sure, those creators can take steps to
| mitigate malicious behavior, but the onus is not on them.
| moojd wrote:
| This one is frustrating because this is an issue that has been
| solved many times before and I hate seeing it repeated in every
| new package manager. A vendor name should always be required
| and the top level should be reserved for official/standard
| packages.
|
| I want all of the following from a package manager:
|
| 1. Required vendor/namespace for third party packages
|
| 2. No multiple package versions. If there is a version conflict
| between transitive dependencies of a package because of semver,
| you should not be able to install that package.
|
| 3. Lock file and a separate 'install' command for installing
| the locked versions and an 'upgrade' command for updating
| versions via semver
|
| 4. Upgrade command should support a --dry-run option that lists
| the packages and versions that are to be updated and a --diff
| that lets you preview the code changes.
| clon wrote:
| For all the animosity that PHP gets these days, every single
| item on your list (granted, of very basic demands) aligns
| with PHP's composer. I am surprised that Rust is that much
| worse off than PHP in this regard.
| moojd wrote:
| I don't think composer has a diff option to dump the actual
| code differences before you update yet but yes most of this
| list comes from my past experience with composer. My
| current company doesn't use PHP but I look back fondly at
| how easy it was to audit my dependencies manually and be
| explicit about upgrades and transitive dependencies.
| clon wrote:
| It does offer a diff option when you have local edits in
| the /vendor (for whatever insane reason). Always assumed
| it could be triggered manually as well. TIL.
|
| I also love how easy it is to declare conflicts [1]. Some
| sub sub sub dependency down the tree had a bad 0.0.1
| release? Just declare a conflict and have the tool do the
| work.
|
| [1] https://getcomposer.org/doc/04-schema.md#conflict
| epage wrote:
| > 2. No multiple package versions. If there is a version
| conflict between transitive dependencies of a package because
| of semver, you should not be able to install that package.
|
| I am grateful Rust allows this unlike C++ or Python. While I
| ideally minimize repeat dependencies, it is a big help to not
| be constrained to only on version. We've already had cases in
| Rust where some people were overly restrictive on dependency
| declarations (since Rust does block some versions as too
| similar) and it has caused a bit of pain.
| dahfizz wrote:
| > 2. No multiple package versions. If there is a version
| conflict between transitive dependencies of a package because
| of semver, you should not be able to install that package.
|
| I think this depends highly on your environment. In npm land,
| where a typical project has hundreds or thousands of
| dependencies of dubious quality, this would be a nightmare.
| It guarantees that every single deployment will be different
| and risky.
|
| This model works much better in linux, where packages are
| maintained by maintainers and there is not an explosion in
| the dependency network. Especially on a Debian or Centos box,
| you can be confident that upgrading packages won't break
| stuff.
| moojd wrote:
| If you are in node land I highly recommend using 'yarn
| install --flat' and I desperately wish this was the default
| in npm from the beginning. It would have radically altered
| the package development culture in a good way.
|
| The way npm currently handles version conflicts is one of
| the primary reasons why using npm is currently a nightmare.
| The average node project will have dozens of abandoned or
| ancient package versions precisely because allowing
| multiple versions to exist means that these packages never
| get forked or updated. Each one of those packages is a
| ticking time bomb waiting to be taken over by a malicious
| actor. Forking is a better solution than pulling in
| unmaintained packages with out of date dependencies.
| infogulch wrote:
| I'd like to see a notification when the repo tag contents doesn't
| match the cargo tag version.
| pdimitar wrote:
| There's zero backdooring involved anywhere in this article. His
| most convincing argument seems to be "if your account as a
| package maintainer is hijacked then bad things could happen" --
| well yeah, thanks for the insight Sherlock.
|
| I'd be genuinely excited to read objective and deeper analyses of
| the Rust ecosystem in which I am looking to invest myself
| further. I want to know what exactly I am getting involved with
| so I'd welcome any good criticisms of it.
|
| But not click-baity articles with almost zero substance inside.
| He's basically repeating old lists of risks of human error.
| ferdowsi wrote:
| The anemic standard library in Rust always seemed like a disaster
| waiting to happen. Javascript gets heaps of deserved criticism
| for its standard library, but at I can generate a SHA-256 without
| needing to pull in a third party library.
| mullr wrote:
| What are people doing about this on the client side? The solution
| that comes to mind is to do all my Rust builds in a sandbox of
| some kind, but with rust-analyzer involved, I'd likely have to
| put my editor in there as well.
| gpm wrote:
| There's some work towards moving the scarier parts of rust
| builds (e.g. procedural macros, that run arbitrary code) into a
| wasm-based sandbox. E.g. [1]. Obviously doesn't make the final
| artifacts safe to run though, and I also wouldn't trust LLVM to
| have no bugs exploitable by feeding it bad code, but at least
| it would raise the bar.
|
| [1] https://github.com/dtolnay/watt
|
| Edit: And someone on reddit brought up vscode's dev containers
| [2], to move everything into docker. Obviously docker isn't
| really a security sandbox, but again it raises the bar.
|
| [2] https://code.visualstudio.com/docs/remote/containers
| rectang wrote:
| At first glance, watt looks like a substantial improvement
| that would close the door on arbitrary code execution by proc
| macro crates. Yes, please! While this may not solve the
| general problem of package identity validation, it closes a
| Rust-specific hole that hopefully doesn't need to exist.
|
| Now if only `build.rs` could be nerfed...
| duped wrote:
| build.rs is particularly useful for Rust because it is
| routinely used to compile C/C++ object files as a previous
| step, which is crucial to having solid Rust to C/C++ FFI.
|
| It is no different from a ./configure script, or other
| prebuilt script. Lots of builds require these, and
| "nerfing" it just makes building Rust harder. Cargo is
| already a crippled build system that requires extensions
| like cargo-make to be useful. Getting rid of something so
| fundamentally required by modern software with no standard
| fallback would be a massive blow to the ecosystem.
|
| I really am not convinced that there is anything "scary"
| about a build.rs file - other than that standard tools like
| rust-analyzer find it sane to run external code during
| initialization. Your language server shouldn't be coupled
| to the build system and require it to run!
|
| (And yes, Cargo is a build system - it's just a bad one)
| rectang wrote:
| _sigh_ , probably "nerfed" wasn't the greatest choice of
| words... I'm writing such an FFI crate right now, and I
| use a `build.rs`. I can still wish that the package
| management system didn't have to fall back to running
| arbitrary code, or that there was some way to sandbox
| that code. That would make it easier for people to trust
| my crate!
| zelos wrote:
| > How to protect?
|
| > By pinning an exact version of a dependency, tokio = "=1.0.0"
| for example, but then you lose the bug fixes.
|
| Surely no one uses version ranges in production? Is the default
| really not to use an exact version for crates?
| Macha wrote:
| The default is to use ^x.y.z so it'll pull in patch versions.
| steveklabnik wrote:
| The default is to declare ranges, but then you get a lockfile
| after an initial build, and Cargo will use those exact versions
| until you ask for changes.
| 3r8Oltr0ziouVDM wrote:
| We should switch to using pure functional languages by default.
| Most of the packages don't need to do any side effects and only
| perform pure calculations. In a pure functional language it is
| obvious from function signatures if these functions are able to
| perform side effects, so it's not possible to hide a backdoor
| inside a pure function. An average project would depend only on a
| few impure packages, such as a HTTP client or a framework,
| therefore it would be much easier to verify the security (for
| small impure packages you could just inspect their code yourself,
| and bigger packages like frameworks would have many contributors
| that check the code and strict policies about their security).
| Languages like Rust and C++ for which the pure functional model
| doesn't work should then only be used for performance critical
| code, and projects written in impure languages should avoid
| third-party dependencies as much as they can.
| Findecanor wrote:
| Another approach would be to harden the software supply chain
| by requiring that dependencies and side-effects are
| _entitlements_ in metadata that are visible and would need to
| be approved by the programmer that imports the module.
|
| There are already some frameworks out there who use signed
| metadata and databases to track code and where code comes from.
| But on the source code level, I think the metadata could just
| be extracted from the existing Crate metadata and source code.
| peterth3 wrote:
| So, you're claiming that pure FP languages need less
| dependencies than FP-adjacent languages like rust?
|
| This is really interesting. Do you have a source to cite
| proving this claim?
| bertylicious wrote:
| Parent only claimed that most Haskell packages are pure and
| thus cannot execute impure side-effects. They didn't say
| anything about the overall number of dependencies.
| 3r8Oltr0ziouVDM wrote:
| No. What I'm saying is that many of the dependencies in any
| language don't need to perform side effects, they only do
| pure calculations. For example a JSON parser takes a JSON
| string and returns some data structures. It's a pure
| function. However, in a language like Rust you can easily
| hide malicious code that has access to network inside such a
| function. In a pure functional language you can tell from the
| signature of a function you're calling that it is indeed a
| pure function and is guaranteed to not perform any side
| effects. So it is safe to call any function from a third-
| party dependency that doesn't do side effects (which you can
| immediately see from the type signature) without even
| inspecting the code.
| frenchyatwork wrote:
| I don't get how that would solve you problem at all. You can
| implement a bitcoin miner using functional code then just add
| an http client as a dependency for getting data to/from the
| blockchain.
| 3r8Oltr0ziouVDM wrote:
| You can't perform HTTP requests from a pure function without
| making it obvious in its signature that it does side effects.
| For example in a language like Haskell: add :
| Int -> Int -> Int add x y = x + y
|
| There is no way a function like this can run a Bitcoin miner,
| all it can do is to return an `Int`. In order to do side
| effects, a function must return a special `IO` type that
| should then be returned from `main` (and only then these side
| effects would be performed).
| moonchrome wrote:
| > An average project would depend only on a few impure
| packages, such as a HTTP client or a framework, therefore it
| would be much easier to verify the security (for small impure
| packages you could just inspect their code yourself, and bigger
| packages like frameworks would have many contributors that
| check the code and strict policies about their security).
|
| OK so just a random list of common packages a web app could use
| that come to mind :
|
| - HTTP server
|
| - HTTP client
|
| - Logging
|
| - Database
|
| - Distributed cache
|
| - File storage/blob storage
|
| - Email
|
| - Push notifications/SMS if dealing with mobile
|
| - Auth (eg. OAuth/OpenID Connect middleware)
|
| - Background task management/queue
|
| And then there's libraries that wrap access to external
| services, specific protocol libraries like gRPC or GraphQL.
|
| I would say the number of pure libraries that you reference
| directly in a modern webapp is probably very low, that's all a
| layer below.
| 3r8Oltr0ziouVDM wrote:
| Ok, but in Rust or NodeJS an HTTP server may depend on a
| package A that depends on a package B that depends on a
| package C that then introduces a backdoor in its 1.0.1
| release. In a pure functional language you can quickly look
| through dependencies of an HTTP server, and if it has zero
| impure dependencies then you just need to trust the
| developers of this one HTTP server package.
| platinumrad wrote:
| You seem to be suggesting that impure actions never depend
| on the results of pure calculations.
|
| Also System.IO.Unsafe exists.
| verdverm wrote:
| I wrote https://verdverm.com/go-mods/ to talk about ways Go
| avoids some of these pitfalls. The forethought that went into `go
| mod` is one of the reasons I like and trust Go
| steveklabnik wrote:
| A _tremendous_ amount of forethought was put into Cargo and
| Crates.io. The difference is that many folks look at the same
| problems and come to different conclusions about what to do to,
| not negligence.
| Groxx wrote:
| I only see one that it avoids: domain names / URLs as import
| paths makes ownership much more clear, and _slightly_ harder to
| achieve typo-squatting... sometimes. And I do very much like
| this part of go modules, it also helps decentralize the whole
| system a fair bit. I sincerely hope it becomes the dominant
| package-name strategy in time.
|
| But lets pick another that seems _on the surface_ pretty likely
| to be mitigated: source for downloaded-version X not matching
| version X repo 's source, under "Malicious update" with cargo's
| `--allow-dirty`. After all, goproxy pulls from git repos
| directly, right? There's no --dirty flag or anything to push
| random garbage.
|
| That's still a problem! Git tags are mutable, as are git
| repositories as a whole. You can _absolutely_ tag a malicious
| version, get it into goproxy, and then change or remove the tag
| and any associated commits. The goproxy doesn 't even store the
| SHA for correctly-tagged versions, only the code and a checksum
| of the code it saved, so finding the commit that it originally
| pointed to can be difficult or impossible. You can download the
| module and read the code from that, but that's true of any non-
| binary dependency system. You can't _publish_ a change to an
| already-published version, but that 's true of cargo too
| (afaik) as well as most package hosts (afaik), though goproxy
| takes a minor technical step further to make that accident-
| resistant (or at least easily detectable. which is great,
| everyone should do that).
| verdverm wrote:
| A module in Goproxy should be in the global SumDB, so if you
| are consulting that (the default), even if someone managed to
| get a retag in, it would fail the sumdb check. I suspect that
| Goproxy, by virtue of running Go under the hood, consults
| sumdb prior to adding to the proxy. As long as a tag was
| fetched once, I would expect that any changes would be
| caught. There are of course edge cases, such as custom domain
| go get hosts are not all kept in the GoProxy, but their
| content hash should be.
| Groxx wrote:
| yeah - in many other dependency systems, you only get
| protection for versions _you use_ , as those are in your
| lockfile. the public gosumdb helps prevent a few more cases
| of _re-releasing_ something + you upgrading + pulling it
| from a different provider (... which is mostly relevant due
| to its more-distributed setup, if it treated goproxy as
| canonical it 'd be unnecessary because it wouldn't contain
| that re-release), but not "downloaded module does not match
| repository".
|
| I do think the sumdb setup is worth others copying, it's
| relatively cheap to maintain and it does clamp down on some
| issues. It also makes it much harder to revoke things
| though, as you can't remove anything from it ever - after a
| couple versions, Go finally added "redacted" versions, but
| the need for that is partly a consequence of having the
| permanently-immutable sumdb + not having a canonical
| source. Unique self-inflicted pain -> unique workaround,
| though it's all relatively reasonable and I think a net
| benefit.
| verdverm wrote:
| Sigstore & Cosign are worth looking into as well.
| GoReleaser supports those for compiled binaries.
|
| SLSA is another
|
| https://sigstore.dev/
|
| https://slsa.dev/
| Groxx wrote:
| That seems like just code-signing? If so: yeah,
| definitely, that should be supported by every packaging
| system. And it's largely ridiculous that it isn't. It
| removes the need to trust the packaging-host, so it's no
| longer a giant target for exploits that can modify
| _every_ package at once. Go, using domain names, is
| probably in the best position to take advantage of this,
| as it allows you to lean on domain ownership (and maybe
| even use the same ssl certificate) rather than having to
| trust-on-first-use or something.
| caffeine wrote:
| Seems like you could address with a super-crate that includes
| "trusted" crate releases as "features"
|
| That crate could involve some automation like:
|
| * Checking that the code in the crate matches the code in Github
|
| * Checking whether the latest commit is from a new committer, or
| whether there is any code comitted by a user not in a whitelist,
|
| * Checking whether the package has any known security advisories
|
| * Checking that crate signatures match some whitelist
|
| * Running a project that includes the crate in a sandbox and
| seeing whether there are any files accessed, network accesses,
| etc. that were not pre-whitelisted
|
| New versions of included crates would have to go through this
| battery of checks before they get bumped in the super-crate.
|
| Crates that want to be included as features of super-crate or
| that need to change/add significant functionality, or add
| dependencies, would need to make a PR to update the relevant
| whitelists, which could then be reviewed by the super-crate team
| epage wrote:
| This has come up several times in the past. One name for it was
| stdx.
|
| Some in the ecosystem are very cautious of picking winners and
| losers, limiting the exposure to new break-out crates. Rarely
| recommending crates for different problems. This comes at the
| cost of making it a harder barrier to get involved because you
| need to be "in the know" for what crates to use or avoid.
|
| Another problem with stdx is if anyone uses types from this in
| their public API, they are decoupled from the individual crates
| semver constraints which makes it hard to know which breaking
| changes from your dependency are a breaking change in your API.
| loeg wrote:
| You don't really need the '--allow-dirty' flag to do as the
| author claims. There's no enforcement that the local git commit
| is ever published to a public repo.
| peterth3 wrote:
| Discussion on /r/rust about this article:
|
| https://www.reddit.com/r/rust/comments/qw3w01/backdooring_ru...
| jynelson wrote:
| > While it's possible to audit the code of a crate on
| https://docs.rs on clicking on a [src] button, it turns that I
| couldn't find a way to inspect build.rs files. Thus, combined
| with a malicious update, it's the almost perfect backdoor.
|
| Docs.rs has its own source view on /crate that's separate from
| rustdoc's. For example, you can see the build.rs for boring-sys
| on `https://docs.rs/crate/boring-sys/1.1.1/source/build.rs`.
| richardwhiuk wrote:
| You can also download the crate directly from crates.io
| jrochkind1 wrote:
| Most of these are common to other platform packaging systems, and
| I'm not sure I've seen any especially interesting solutions to
| them.
|
| The macro-based ones are rust-specific and seem especially
| devious and challenging to me.
| ReactiveJelly wrote:
| I think I/O isolation will be part of a solution. I'm
| interested to see how Deno handles that.
| devmunchies wrote:
| Does deno allow you to scope the IO permission at a
| dependency level?
|
| This comment from 9 months ago indicate its only at the app
| level. has it changed?
| https://news.ycombinator.com/item?id=26090873
| Lifelarper wrote:
| > I'm not sure if it's by bots or real persons
|
| The bot usage is a significant amount of the low level noise,
| I've published things of no use to anyone and they always rack up
| a lot of dl's despite no one practically using them for a long
| time.
|
| > Firstly, a bigger standard library would reduce the need for
| external dependencies
|
| There's years worth of the same arguments tiringly made over and
| over again (same with namespacing) on the rust forum, everyone
| has played their hand on this issue a dozen times now, the
| community clearly has a majority stance on such things.
|
| > A variant of the previous technique is to use the --allow-dirty
| flag of the cargo publish command.
|
| Please correct me if I'm wrong but thought that flag simply
| allows uncommitted changes to be published, the source is still
| availabile for anyone to view on crates.io
|
| > We're sorry but this website doesn't work properly without
| JavaScript enabled. Please enable it to continue.
|
| Works perfectly fine for me. Maybe you couldn't serve me a gdpr
| or something. Thankfully I can keep it turned off for now :)
| rectang wrote:
| Is there a way to buy into PGP identity-based controls for
| crates.io packages? To say, "I trust the keys in this whitelist,
| so trust packages signed by those keys."
|
| > _Thirdly, using cloud developer environments such as GitHub
| Codespaces or Gitpod. By working in sandboxed environments for
| each project, one can significantly reduce the impact of a
| compromise._
|
| That's appealing but expensive. I wish I could effectively
| sandbox a local developer machine. External boot drives, maybe?
| gpm wrote:
| Cargo-crev is that sort of web of trust, but it's really in
| it's infancy.
| rectang wrote:
| Cargo-crev's writ seems to be much more expansive and
| nebulous than security:
|
| https://github.com/crev-dev/cargo-crev/wiki/Howto:-Create-
| Re...
|
| > _While it 's still open for debate, the current opinion is
| that additional fields are not useful for the downstream
| users. Instead they just complicate their life, putting the
| burden of the decision from the reviewer onto them. At the
| end, a downstream user of a review just wants to know: "is it
| OK to use this package or not?". Your role as a reviewer is
| to provide that judgment._
|
| That irks me. I don't care about popularity contests, I only
| care whether a crate is malicious or not. If it has security
| _vulnerabilities_ , I can deal. But if downloading and
| running a proc macro crate build runs gives an attacker
| remote code execution and they install a keylogger on my dev
| box, that's altogether different.
| ChrisSD wrote:
| If the crate is actively malicious then crates.io should be
| informed immediately and the crate removed. Probably the
| author too. If need be RustSec can issue a security
| advisory.
| carlhjerpe wrote:
| Nobody is mentioning C#, but my experience there is that I rely
| on a lot less dependencies and a rather big standard library from
| Microsoft.
|
| Microsoft has been splitting the standard library into separate
| dependencies now, but they're still maintained by them and I feel
| safe depending on them.
___________________________________________________________________
(page generated 2021-11-18 23:00 UTC)