[HN Gopher] S3 as a Git remote and LFS server
       ___________________________________________________________________
        
       S3 as a Git remote and LFS server
        
       Author : kbumsik
       Score  : 186 points
       Date   : 2024-10-19 10:37 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | mdaniel wrote:
       | All this mocking when moto exists is just :-(
       | https://github.com/awslabs/git-remote-s3/blob/v0.1.19/test/r...
       | 
       | Actually, moto is just one bandaid for that problem - there are
       | SO MANY s3 storage implementations, including the pre-license-
       | switch Apache 2 version of minio (one need not use a bleeding
       | edge for something as relatively stable as the S3 Api)
        
         | SahAssar wrote:
         | Do you mean boto (the python SDK for AWS)?
         | 
         | EDIT: They probably do not, I'm guessing they mean
         | https://docs.getmoto.org/en/latest/index.html ?
        
           | flakes wrote:
           | moto server for testing S3 is pretty great. It's about the
           | same experience as using a minio container to run integration
           | tests against.
           | 
           | I use this, and testing.postgresql for unit testing my api
           | servers with barely any mocks used at all.
        
             | neeleshs wrote:
             | There is also testcontainers. Supports multiple languages.
             | Uses containers though.
             | 
             | https://testcontainers-python.readthedocs.io/en/latest/
        
           | mdaniel wrote:
           | Happy 10,000th Day to you :-D Yes, moto and its friend
           | localstack are just fantastic for being able to play with AWS
           | without spending money, or to reproduce kabooms that only
           | happen once a month with the real API
           | 
           | I believe moto has an "embedded" version such that one need
           | not even have in listen on a network port, but I find it
           | much, much less mental gymnastics to just supersede the
           | "endpoint" address in the actual AWS SDKs to point to
           | 127.0.0.1:4566 and off to the races. The AWS SDKs are even so
           | friendly as to not mandate TLS or have allowlists of endpoint
           | addresses, unlike their misguided Azure colleagues
        
             | SahAssar wrote:
             | > Happy 10,000th Day to you :-D
             | 
             | Sorry, not sure what you mean?
        
               | mdaniel wrote:
               | https://xkcd.com/1053/
        
               | misnome wrote:
               | How do you know they are in the US?
        
         | notpushkin wrote:
         | > there are SO MANY s3 storage implementations
         | 
         | I suppose given this is under the AWS Labs org, they don't
         | really care about non-AWS S3 implementations.
        
           | mdaniel wrote:
           | Well, I look forward to their `docker run awslabs/the-
           | real-s3:latest` implementation then. Until such time,
           | monkeypatching api calls to always give the exact answer the
           | consumer is looking for is damn cheating
        
             | chrsig wrote:
             | it wouldn't be unprecedented. dynamodb-local exists.
        
             | notpushkin wrote:
             | Agreed, haha. Well, I think it _should_ work with Minio  &
             | co. just as well, but be prepared to have your issues
             | closed as unsupported. (Pesonally, I might give it a go
             | with Backblaze B2 just to play around, yeah)
        
         | remram wrote:
         | Unfortunately there's been a few vulnerability since that old
         | Minio release. For something you expose to users, it's a
         | problem.
        
           | mdaniel wrote:
           | I would hope my mentioning moto made it clear my comment was
           | about having an S3 implementation _for testing_. Presumably
           | one should not expose moto to users, either
        
       | philsnow wrote:
       | I'm surprised they just punt on concurrent updates [0] instead of
       | locking with something like dynamodb, like terraform does.
       | 
       | [0] https://github.com/awslabs/git-remote-s3?tab=readme-ov-
       | file#...
        
         | mdaniel wrote:
         | I thank goodness I have access to a non-stupid Terraform state
         | provider[1] so I've never tried that S3+dynamodb setup but, if
         | I understand the situation correctly, introducing Yet Another
         | AWS Service ™ into this mix would mandate that _callers_
         | also be given a `dynamo:WriteSomething` IAM perm, which is
         | actually different from S3 in that in S3 one can -- at their
         | discretion -- set the policies on the _bucket_ such that it
         | would work without any explicit caller IAM
         | 
         | 1:
         | https://docs.gitlab.com/ee/user/infrastructure/iac/terraform...
        
         | ncruces wrote:
         | Google Cloud Storage is good enough to implement locks all by
         | itself:
         | https://reddit.com/r/golang/comments/t52d4f/gmutex_a_global_...
         | 
         | Doesn't S3 provide primitives to do the same? At least since
         | moving to strong read-after-write consistency?
         | 
         | PS: I wrote the above package. Happy to answer questions about
         | it.
        
           | kbumsik wrote:
           | Conditional write is just added to S3 2 month ago:
           | https://aws.amazon.com/about-aws/whats-
           | new/2024/08/amazon-s3...
        
             | laurencerowe wrote:
             | Unfortunately this functionality is much more limited in S3
             | as you can only use `If-None-Match: *` to prevent
             | overwrites. https://docs.aws.amazon.com/AmazonS3/latest/use
             | rguide/condit...
             | 
             | GCS also allows for conditional overwrites using `If-Match:
             | <etag>` which means you can do optimistic concurrency
             | control. https://cloud.google.com/storage/docs/request-
             | preconditions
        
         | noctune wrote:
         | S3 recently got conditional writes and you can use do locking
         | entirely in S3 - I don't think they are using this though. Must
         | be too recent an addition.
        
       | fortran77 wrote:
       | Amazon has deprecated Amazon Code Commit, so this may be an
       | interesting alternative.
        
         | adobrawy wrote:
         | In what use case it can be interesting alternativd?
         | 
         | Limited access control (e.g. CI pass required), so not very
         | useful for end users. For machine-to-machine it's an additional
         | layer of abstraction when a regular tarball is fine.
        
       | Scribbd wrote:
       | This is something I was trying to implement myself. I am
       | surprised it can be done with just an s3 bucket. I was messing
       | with API Gateways, Lambda functions and DynamoDB tables to
       | support the s3 bucket. It didn't occur to me to implement it
       | client side. I might have stuck a bit too much to the lfs test
       | server implementation. https://github.com/git-lfs/lfs-test-server
        
         | chx wrote:
         | Client side is, while interesting, of limited use as every CI
         | and similar tool won't work this. This seems like a sort of
         | automation of wormhole which I guess is neat
         | https://github.com/cxw42/git-tools/blob/master/wormhole
        
       | tonymet wrote:
       | how does it handle incremental changes? If it's writing your
       | entire repo on a loop, I could see why AWS would promote it.
        
         | afro88 wrote:
         | Looks like it uses bundles, which handle incremental changes:
         | https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
        
       | Evidlo wrote:
       | For the LFS part there is also dvc which works better than git-
       | lfs and natively supports S3.
        
         | bagavi wrote:
         | Dvc is great tool!
        
           | lenova wrote:
           | I haven't heard of dvc, so I had to google it, which took me
           | to: https://dvc.org/
           | 
           | But I'm still confused as to what is dvc is after a cursory
           | glance at their homepage.
        
             | chatmasta wrote:
             | It was on the front page contemporaneously with this
             | comment that recommended it, so you know it was an unbiased
             | recommendation. :)
        
         | matrss wrote:
         | There is also git-annex, which supports S3 as well as a bunch
         | of other storage backends (and it is very easy to implement
         | your own, it just has to loosely resemble a key-value store).
         | Git-annex can use any of its special remotes as git remotes,
         | like what the presented tool does for just S3.
        
         | kernelsanderz wrote:
         | Also worth checking out https://github.com/jasonwhite/rudolfs
         | 
         | Been using it to store datasets via lfs. Written in rust and
         | has been very reliable.
        
       | x3n0ph3n3 wrote:
       | Wow, AWS _really_ wants to get rid of CodeCommit.
        
       | zmmmmm wrote:
       | Just remember, the mininum billing increment for file size is
       | 128KB in real AWS S3. So your Git repo may be a lot more
       | expensive than you would think if you have a giant source tree
       | full of small files.
        
         | afro88 wrote:
         | Looks like it uses bundles rather than raw files:
         | https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
        
         | justin_oaks wrote:
         | That 128KB only applies to non-standard S3 storage tiers
         | (glacier, infrequent access, one zone, etc)
         | 
         | S3 standard, which is likely what people would use for git
         | storage, doesn't have that minimum file size charge.
         | 
         | See the asterisk sections in https://aws.amazon.com/s3/pricing/
        
           | zmmmmm wrote:
           | Thank you for highlighting that, I had remembered it wrongly.
        
         | chrsig wrote:
         | also the puts are 5x as expensive as the get operations
        
       | milkey_mouse wrote:
       | You can also do this with Cloudflare Workers for fewer setup
       | steps/moving parts:
       | 
       | https://github.com/milkey-mouse/git-lfs-s3-proxy
        
       | xena wrote:
       | How do you install this? Homebrew broke global pip install. Is
       | there a homebrew package or something?
        
         | mdaniel wrote:
         | FWIW, their helpers make things pretty cheap to create new
         | Formula by yourself                   $ brew create --python
         | --set-license Apache-2 https://github.com/awslabs/git-
         | remote-s3/archive/refs/tags/v0.1.19.tar.gz         Formula name
         | [git-remote-s3]:         ==> Downloading
         | https://github.com/awslabs/git-
         | remote-s3/archive/refs/tags/v0.1.19.tar.gz         ==>
         | Downloading from https://codeload.github.com/awslabs/git-
         | remote-s3/tar.gz/refs/tags/v0.1.19         ##O=-#   #
         | Warning: Cannot verify integrity of '84b0a9a6936ebc07a39f123a3e
         | 85cd23d7458c876ac5f42e9f3ffb027dcb3a0f--git-
         | remote-s3-0.1.19.tar.gz'.         No checksum was provided.
         | For your reference, the checksum is:           sha256 "3faa1f95
         | 34c4ef2ec130fac2df61428d4f0a525efb88ebe074db712b8fd2063b"
         | ==> Retrieving PyPI dependencies for
         | "https://github.com/awslabs/git-
         | remote-s3/archive/refs/tags/v0.1.19.tar.gz"...         ==>
         | Retrieving PyPI dependencies for excluded ""...         ==>
         | Getting PyPI info for "boto3==1.35.44"         ==> Getting PyPI
         | info for "botocore==1.35.44"         ==> Excluding "git-
         | remote-s3==0.1.19"         ==> Getting PyPI info for
         | "jmespath==1.0.1"         ==> Getting PyPI info for "python-
         | dateutil==2.9.0.post0"         ==> Getting PyPI info for
         | "s3transfer==0.10.3"         ==> Getting PyPI info for
         | "six==1.16.0"         ==> Getting PyPI info for
         | "urllib3==2.2.3"         ==> Updating resource blocks
         | Please run the following command before submitting:
         | HOMEBREW_NO_INSTALL_FROM_API=1 brew audit --new git-remote-s3
         | Editing /usr/local/Homebrew/Library/Taps/homebrew/homebrew-
         | core/Formula/g/git-remote-s3.rb
         | 
         | They also support building from git directly, if you want to
         | track non-tagged releases (see the "--head" option to create)
        
       | CGamesPlay wrote:
       | If you are interested in using S3 as a git remote but are
       | concerned with privacy, I built a tool a while ago to use S3 as
       | an untrusted git remote using Restic.
       | https://github.com/CGamesPlay/git-remote-restic
        
       | mattxxx wrote:
       | This seems wrong, since you can't push transactionally +
       | consistently in S3.
       | 
       | They address this directly in their section on concurrent writes:
       | https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
       | 
       | And in their design: https://github.com/awslabs/git-
       | remote-s3?tab=readme-ov-file#...
       | 
       | But it seems like this is just the wrong tool for the job
       | (hosting git repos).
        
       | WhyNotHugo wrote:
       | git-annex also has native support for s3.
        
         | matrss wrote:
         | I think this is more about storing the entire repository on s3,
         | not just large files as git-lfs and git-annex are usually
         | concerned with. But coincidentally, git-annex somewhat recently
         | got the feature to use any of its special remotes as normal git
         | remotes (https://git-annex.branchable.com/git-remote-annex/),
         | including s3, webdav, anything that rclone supports, and a few
         | more.
        
       | doctorpangloss wrote:
       | https://alanedwardes.com/blog/posts/serverless-git-lfs-for-g...
       | 
       | I've used this guy's CloudFormation template since forever for
       | LFS on S3.
       | 
       | GitHub has to lower its egregious LFS pricing.
        
       | kernelsanderz wrote:
       | I've been using https://github.com/jasonwhite/rudolfs - which is
       | written in rust. It's high performance but doesn't have all the
       | features (auth) that you might need.
        
       ___________________________________________________________________
       (page generated 2024-10-20 23:02 UTC)