[HN Gopher] Abusing Go's Infrastructure
       ___________________________________________________________________
        
       Abusing Go's Infrastructure
        
       Author : efge
       Score  : 297 points
       Date   : 2024-05-25 12:50 UTC (10 hours ago)
        
 (HTM) web link (reverse.put.as)
 (TXT) w3m dump (reverse.put.as)
        
       | arccy wrote:
       | it's a known issue https://github.com/golang/go/issues/31866
        
         | yjftsjthsd-h wrote:
         | That fix would help with accidents, but wouldn't someone
         | _intentionally_ hoping doing it just add a .mod and .go file to
         | the root?
        
           | jerf wrote:
           | How do you "fix" that at all?
           | 
           | In the end, there is no definition of "a source control
           | repository that is a Go module" that is robust to this sort
           | of "attack"... although calling it an "attack" is kind of
           | dubious, the reasons why this is a bad thing strike me as
           | very strained and relatively weak. Mostly it hurts Google by
           | hosting too much stuff, but, good luck bringing them down
           | that way.
        
         | oooyay wrote:
         | Color me unsurprised Marwan is on this issue. He and Aaron
         | wrote Athens, Marwan wrote (to my knowledge) the first Go
         | download protocol implementation that Athens is based on.
         | 
         | This issue is kind of curious because Athens already uses the
         | go mod download -json command mentioned as a preflight check
         | for module verification. More or less, if the repo passes the
         | go module commands understanding of a module then Athens will
         | serve it. In more verboten terms:
         | 
         | - a module version, pseudo version, or +incompatible must be
         | able to be formulated
         | 
         | - that module (and it's dependencies) must produce a valid
         | checksum
         | 
         | The checksum of modules just has to do with the current .mod
         | and all files + recursively for each dependency. So, as the
         | author pointed out you can have lots of space for arbitrary
         | files by design so long as you have a basic go program.
        
       | kyrra wrote:
       | Googler, opinions are my own. I know nothing about this space.
       | 
       | I would hope the Go team collaborated with GCP and Drive, as
       | hosting malicious files is something Google has to deal with all
       | the time. This isn't much different from other endpoints Google
       | already allows people to put random data on.
        
       | 8organicbits wrote:
       | I know pypi has some non-python projects as well. Python needs
       | the ability to distribute wheels, which are compiled binaries, as
       | the user may not be able to compile library code. Lots of that
       | code is written in C, but Golang[1] is also possible. I can't
       | find an example, but I believe I've seen this used for
       | distributing applications (not libraries) as well. It's kinda
       | cool to write some app in C, upload to pypi, and then ask users
       | to install with `pip install`.
       | 
       | [1] https://github.com/popatam/gopy_build_wheel_example
        
         | bee_rider wrote:
         | Hypothetically if they did try to add some requirement to use
         | Python, people could just comply maliciously by providing the
         | most minimal stub of Python code, right? Linux, but ls is
         | written in Python. So it is probably better just to not play
         | games.
        
           | Maxatar wrote:
           | You could embed the binary data in a Python string and then
           | have the installer dump that string to a file.
        
         | rfoo wrote:
         | pip install cmake
         | 
         | or even proprietary binaries, pip install nvidia-cudnn-cu12
        
           | IshKebab wrote:
           | Yeah I copied CMake's idea of using PyPI and I also use it to
           | distribute some pure Rust CLI tools using Maturin. It works
           | really well. Pip is... well it's about on par with most other
           | package managers, i.e. not great, not terrible, but it has
           | some pretty huge advantages over any other software
           | distribution method on Linux:
           | 
           | * Very likely to be installed already on Linux and probably
           | Mac too.
           | 
           | * Doesn't require root to install. You can even have isolated
           | installs via pyenv.
           | 
           | * I don't have to ask anyone's permission to publish a
           | package.
           | 
           | * I only have to make one package.
           | 
           | If any can think of a better option I'm all ears but until
           | then I'm fairly happy with this hack.
        
             | Too wrote:
             | Some of those arguments are becoming more and more
             | difficult as pip and distros are pushing for use of venvs
             | and now requires a scary --break-system-packages argument
             | if you were to use the pre installed launcher.
        
               | IshKebab wrote:
               | That is a good point. Distro package managers have
               | somehow screwed this up too.
        
         | mort96 wrote:
         | I guess that was much more useful as a use case before pip
         | started requiring you to be in a
         | venv/virtualenv/pipenv/pyenv/whatever to download packages
        
           | 12_throw_away wrote:
           | I've never encountered this requirement in many years of
           | daily use - pip for me has always happily installed anything
           | if it can.
           | 
           | Now I've definitely seen customized distributions of python
           | from package managers that have taken steps to prevent you
           | from using pip. IIRC, the python you get from `apt-get
           | install python` in Debian does this? I.e., it's designed to
           | support system utilities, not as a user's general purpose
           | python environment, and they want `apt-get` to control this
           | environment, not pip. So they've removed pip and ensure_pip
           | and easy_install from your core system python environment.
           | 
           | TLDR: In my experience, that requirement doesn't come from
           | pip, it's your distro taking steps to prevent
           | https://xkcd.com/1987/
        
             | thayne wrote:
             | I'm not sure if that is upstream or an ununtu or debian
             | patch, but that is the case on Ubuntu 24.04, at least
             | unless you pass the --break-system-packages option.
        
             | garblegarble wrote:
             | Sorta, although it is a Python feature for distros to use:
             | installations can be marked as externally managed and pip
             | will refuse (without being forced) to make changes unless
             | in a venv[1][2]
             | 
             | 1: PEP 668 https://peps.python.org/pep-0668/
             | 
             | 2: https://packaging.python.org/en/latest/specifications/ex
             | tern...
        
       | verdverm wrote:
       | CUE's module system is finally rolling out, MVS likes Go's, but
       | built on OCI infra. If you are interested in dependency
       | management systems, here are some links
       | 
       | - proposal: https://github.com/cue-
       | lang/proposal/tree/main/designs/modul...
       | 
       | - custom registry: https://cuelang.org/docs/tutorial/working-
       | with-a-custom-modu...
       | 
       | - road map: https://github.com/orgs/cue-lang/projects/10/views/8
       | 
       | - in 0.9.0-alpha-5, modules become enabled by default:
       | https://github.com/cue-lang/cue/releases/tag/v0.9.0-alpha.5
       | 
       | For Go Sum, the Trillian project backs the transparency log:
       | https://github.com/google/trillian
       | 
       | CUE plans to piggyback on the OCI options with attestations and
       | such
        
         | dgellow wrote:
         | What does that have to do with the linked article?
        
           | verdverm wrote:
           | The CUE team worked with the Go team on the module system.
           | From these interactions, and community input, they decided
           | against using a proxy like Go has. The "exploit" in the
           | article was one of the reasons they made this decision, and
           | chose to use OCI registries instead. The V1 proposal actually
           | proposed using the same Go proxy servers as a stopgap, which
           | received significant pushback from the community (I was
           | probably the loudest voice against the idea). The Go team was
           | supportive at the time, but this would have been exactly what
           | OP talks about, having non-Go projects in the proxy/sumdb.
           | 
           | So CUE's module design can be seen as an evolution on Go's,
           | building on the good parts while addressing some of the
           | shortcomings.
           | 
           | Fun fact, CUE started as a fork of Go, mainly for the
           | internal compiler tooling and packages
        
       | ithkuil wrote:
       | I toyed with the idea of piggybacking on (i.e. abusing) the
       | golang proxy and sumdb to have a free transparent log of
       | checksums of arbitrary URLs
       | 
       | https://getsum.pub/
        
         | arccy wrote:
         | sounds convoluted. If you just want a public transparency log,
         | the public rekor instance under the sigstore project is much
         | more appropriate for that.
         | 
         | https://www.sigstore.dev/
         | 
         | https://docs.sigstore.dev/logging/overview/
        
           | lpapez wrote:
           | Sure, but the gosum database is a critical piece of worldwide
           | software infrastructure, so you can count on it being
           | accesible behind many firewalls and always up. And it's
           | completely free and anonymus.
           | 
           | Perfect for the purpose.
        
           | ithkuil wrote:
           | Yeah when I did that there was no public rekor instance ran
           | by the sigstore project so I choose the only available public
           | transparency log I could bend to my needs (x509 transparency
           | logs were an alternative but it'd quickly hit rate limits by
           | acme providers)
        
           | skybrian wrote:
           | Interesting! Looks like it's being used by some npm packages
           | [1] and soon homebrew will be using it [2]. Any other
           | interesting usage?
           | 
           | As a user, the npm usage doesn't seem very prominent. On an
           | npm's web page, there's a checkmark next to the version
           | number on the right side that I hadn't paid any attention to
           | before, with more information at the very bottom of the page.
           | Here's an example. [3]
           | 
           | [1] https://blog.sigstore.dev/npm-provenance-ga/ [2]
           | https://blog.sigstore.dev/homebrew-build-provenance/ [3]
           | https://www.npmjs.com/package/fast-check
        
             | mynameisvlad wrote:
             | It's at the bottom of the page _on mobile_. On desktop,
             | that's the first thing on the right hand side of the screen
             | IIRC.
        
               | skybrian wrote:
               | For me on desktop, the version seems to be the fourth
               | thing down in the right column, under weekly downloads,
               | and there's a checkmark. (Or maybe I'm missing
               | something.)
        
       | miki123211 wrote:
       | Any online service that lets users upload material that is then
       | publicly visible will eventually be used for command-and-control,
       | copyright infringement and hosting CSAM. This is especially true
       | for services that have other important uses besides file hosting
       | and hence are hard to block.
       | 
       | This already happened to Twitter[1], Telegram[2], and even the
       | PGP key infrastructure[3], not to mention obvious suspects like
       | GitHub.
       | 
       | [1] https://pentestlab.blog/2017/09/26/command-and-control-
       | twitt... [2] https://www.blazeinfosec.com/post/leveraging-
       | telegram-as-a-c... [3] https://torrentfreak.com/openpgp-
       | keyservers-now-store-irremo...
        
         | liquidgecka wrote:
         | And Gmail and Google groups, and Google drive, and Gchat, on
         | and on. The data you store doesn't even have to be public. With
         | Gmail they would distribute credentials to log in and read
         | attachments that they uploaded via imap.
         | 
         | (I am a former Google SAD-SRE [Spam, Abuse, Delivery])
        
           | howenterprisey wrote:
           | Just curious, "Delivery" doesn't seem to be the same sort of
           | thing as "Spam" and "Abuse": why are the three grouped?
        
             | fred256 wrote:
             | No inside information, but presumably this means Delivery
             | to other organizations, which, among other things, includes
             | maintaining outbound IP reputation, which is closely
             | related to Spam and Abuse.
        
             | romafirst3 wrote:
             | Delivery is what happens if it's not spam or abuse.
        
           | zepolen wrote:
           | Question, how would you know without invading the user's
           | privacy?
        
             | kccqzy wrote:
             | An algorithm that processes private user data is by itself
             | not invading anyone's privacy. It's clear to me that
             | invasion of privacy only happens when humans look at
             | private user data directly, or look at user data that's not
             | sufficiently processed by an algorithm.
             | 
             | Otherwise, something as simple as a spell checker would be
             | an invasion of privacy because it literally looks at every
             | word in an email you write. That's absurd.
        
               | _heimdall wrote:
               | At least in my opinion, there's a big difference with
               | where the data lives and where the checking algorithm is
               | run. I don't think a spell checker would fall into what
               | I'd consider a privacy concern as long as the spell
               | checker is running locally on my device.
        
               | imachine1980_ wrote:
               | I don't work in the area of email nor Google but I see
               | two problems.
               | 
               | 1) you need to constantly update the spell checker so
               | each time you say this is word or something like that
               | most likely the data is send the problem is part of the
               | data, I assume Google do something similar whit data send
               | to span and mark as not spam. This is full email redirect
               | and analysis not partial like old word processing.
               | 
               | 2)I feel ai make this even more harder so now you can't
               | simply check patterns as simply as before, and you need
               | to check the whole content constantly
        
             | carom wrote:
             | Companies are legally obligated to scan for CSAM in the US.
        
               | toast0 wrote:
               | I don't think that's accurate... Do you have a link?
               | 
               | I do think there is an obligation to report if any is
               | found, but I don't think they need to look.
        
           | gloryjulio wrote:
           | Just a side note, I found the name sad sre funny and blursed
           | at the same time
        
             | cdelsolar wrote:
             | Whats' blursed mean?
        
               | jprete wrote:
               | Simultaneously blessed and cursed.
        
           | follower wrote:
           | > I am a former Google SAD-SRE
           | 
           | From long enough ago that I should apologize to you for
           | libgmail: https://libgmail.sourceforge.net ? :D
        
         | nerdponx wrote:
         | It seems like it would be pretty easy to use PyPI for this,
         | because packages can contain arbitrary non-Python files. And
         | you can also do things like base 64 encoding your files in
         | strings in Python code.
        
         | weinzierl wrote:
         | Not sure if it has already happened, but the not so obvious one
         | is HuggingFace.
        
           | miki123211 wrote:
           | No idea if it's used for CSAM or malware, but copyright
           | infringement on a massive scale?
           | https://huggingface.co/datasets/EleutherAI/pile
        
       | 9dev wrote:
       | Off topic: took a look at the domain, had a foreboding on the
       | innuendo, found mostly what I expected on put.as ...
        
         | moonlion_eth wrote:
         | I made the mistake of doing that at work
        
         | yazzku wrote:
         | Not off-topic at all; came here just for this.
        
         | KolmogorovComp wrote:
         | https://put.as/ Mildly NSFW.
        
         | mdtrooper wrote:
         | Yes, it is a spanish plural word. But....
        
       | IshKebab wrote:
       | Maybe I'm being stupid but what exactly is the issue here? It's
       | probably a bit wasteful of the proxy to cache non-Go repos, but
       | even if it didn't you could make it store arbitrary data just by
       | having it cache a Go repo surely? Sounds like a complete non-
       | issue unless I've missed something.
        
         | arandomusername wrote:
         | you're right.
        
         | gnfargbl wrote:
         | I don't think you've missed anything. The news here appears to
         | be that a unsecured public proxy is willing to proxy things and
         | make them available to the public in an unsecured fashion.
         | 
         | The article does make the point that some monitored networks
         | might trust golang proxy URLs more than arbitrary web URLs and
         | that this could be used for bypassing reputation filters etc --
         | but there are already several ways to do that, and this one
         | doesn't seem _particularly_ special.
        
       | palata wrote:
       | That's maybe naive, but... how is that different than just
       | pushing files to e.g. a GitHub repository? Is it just the fact
       | that you need to create an account for GitHub? Because I can
       | store arbitrary data there, too. Without the 500M limit...
        
         | withinboredom wrote:
         | GitHub has some pretty stringent rate limits for anon requests.
        
       | erik_seaberg wrote:
       | W3C laid the groundwork for _everything_ on the Web to be heavily
       | cacheable, so it 's weird that there are so few general-purpose
       | proxy caches. Are publishers sending short "Cache-Control: max-
       | age" or "Vary: Cookie" responses when they didn't need to? Are
       | too many ISPs paying for transit rather than peering?
        
         | lmz wrote:
         | In general there's no way to ensure the cache hasn't tampered
         | with the contents (e.g. ISP proxy ad injection on non HTTPS
         | sites). For software downloads usually there are signatures and
         | checksums. Arbitrary content, not so much.
        
           | arccy wrote:
           | There was HTTP SXG (signed exchanges) but it never seemed to
           | get any traction https://web.dev/articles/signed-exchanges
        
             | iainmerrick wrote:
             | Even if the content is signed, there's still the issue that
             | the proxy gets to see everything you read, right?
        
           | yencabulator wrote:
           | Maybe use cache only when Subresource Integrity is present.
           | 
           | https://developer.mozilla.org/en-
           | US/docs/Web/Security/Subres...
        
       ___________________________________________________________________
       (page generated 2024-05-25 23:00 UTC)