[HN Gopher] S3 as a Git remote and LFS server
___________________________________________________________________
S3 as a Git remote and LFS server
Author : kbumsik
Score : 186 points
Date : 2024-10-19 10:37 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mdaniel wrote:
| All this mocking when moto exists is just :-(
| https://github.com/awslabs/git-remote-s3/blob/v0.1.19/test/r...
|
| Actually, moto is just one bandaid for that problem - there are
| SO MANY s3 storage implementations, including the pre-license-
| switch Apache 2 version of minio (one need not use a bleeding
| edge for something as relatively stable as the S3 Api)
| SahAssar wrote:
| Do you mean boto (the python SDK for AWS)?
|
| EDIT: They probably do not, I'm guessing they mean
| https://docs.getmoto.org/en/latest/index.html ?
| flakes wrote:
| moto server for testing S3 is pretty great. It's about the
| same experience as using a minio container to run integration
| tests against.
|
| I use this, and testing.postgresql for unit testing my api
| servers with barely any mocks used at all.
| neeleshs wrote:
| There is also testcontainers. Supports multiple languages.
| Uses containers though.
|
| https://testcontainers-python.readthedocs.io/en/latest/
| mdaniel wrote:
| Happy 10,000th Day to you :-D Yes, moto and its friend
| localstack are just fantastic for being able to play with AWS
| without spending money, or to reproduce kabooms that only
| happen once a month with the real API
|
| I believe moto has an "embedded" version such that one need
| not even have in listen on a network port, but I find it
| much, much less mental gymnastics to just supersede the
| "endpoint" address in the actual AWS SDKs to point to
| 127.0.0.1:4566 and off to the races. The AWS SDKs are even so
| friendly as to not mandate TLS or have allowlists of endpoint
| addresses, unlike their misguided Azure colleagues
| SahAssar wrote:
| > Happy 10,000th Day to you :-D
|
| Sorry, not sure what you mean?
| mdaniel wrote:
| https://xkcd.com/1053/
| misnome wrote:
| How do you know they are in the US?
| notpushkin wrote:
| > there are SO MANY s3 storage implementations
|
| I suppose given this is under the AWS Labs org, they don't
| really care about non-AWS S3 implementations.
| mdaniel wrote:
| Well, I look forward to their `docker run awslabs/the-
| real-s3:latest` implementation then. Until such time,
| monkeypatching api calls to always give the exact answer the
| consumer is looking for is damn cheating
| chrsig wrote:
| it wouldn't be unprecedented. dynamodb-local exists.
| notpushkin wrote:
| Agreed, haha. Well, I think it _should_ work with Minio &
| co. just as well, but be prepared to have your issues
| closed as unsupported. (Pesonally, I might give it a go
| with Backblaze B2 just to play around, yeah)
| remram wrote:
| Unfortunately there's been a few vulnerability since that old
| Minio release. For something you expose to users, it's a
| problem.
| mdaniel wrote:
| I would hope my mentioning moto made it clear my comment was
| about having an S3 implementation _for testing_. Presumably
| one should not expose moto to users, either
| philsnow wrote:
| I'm surprised they just punt on concurrent updates [0] instead of
| locking with something like dynamodb, like terraform does.
|
| [0] https://github.com/awslabs/git-remote-s3?tab=readme-ov-
| file#...
| mdaniel wrote:
| I thank goodness I have access to a non-stupid Terraform state
| provider[1] so I've never tried that S3+dynamodb setup but, if
| I understand the situation correctly, introducing Yet Another
| AWS Service ™ into this mix would mandate that _callers_
| also be given a `dynamo:WriteSomething` IAM perm, which is
| actually different from S3 in that in S3 one can -- at their
| discretion -- set the policies on the _bucket_ such that it
| would work without any explicit caller IAM
|
| 1:
| https://docs.gitlab.com/ee/user/infrastructure/iac/terraform...
| ncruces wrote:
| Google Cloud Storage is good enough to implement locks all by
| itself:
| https://reddit.com/r/golang/comments/t52d4f/gmutex_a_global_...
|
| Doesn't S3 provide primitives to do the same? At least since
| moving to strong read-after-write consistency?
|
| PS: I wrote the above package. Happy to answer questions about
| it.
| kbumsik wrote:
| Conditional write is just added to S3 2 month ago:
| https://aws.amazon.com/about-aws/whats-
| new/2024/08/amazon-s3...
| laurencerowe wrote:
| Unfortunately this functionality is much more limited in S3
| as you can only use `If-None-Match: *` to prevent
| overwrites. https://docs.aws.amazon.com/AmazonS3/latest/use
| rguide/condit...
|
| GCS also allows for conditional overwrites using `If-Match:
| <etag>` which means you can do optimistic concurrency
| control. https://cloud.google.com/storage/docs/request-
| preconditions
| noctune wrote:
| S3 recently got conditional writes and you can use do locking
| entirely in S3 - I don't think they are using this though. Must
| be too recent an addition.
| fortran77 wrote:
| Amazon has deprecated Amazon Code Commit, so this may be an
| interesting alternative.
| adobrawy wrote:
| In what use case it can be interesting alternativd?
|
| Limited access control (e.g. CI pass required), so not very
| useful for end users. For machine-to-machine it's an additional
| layer of abstraction when a regular tarball is fine.
| Scribbd wrote:
| This is something I was trying to implement myself. I am
| surprised it can be done with just an s3 bucket. I was messing
| with API Gateways, Lambda functions and DynamoDB tables to
| support the s3 bucket. It didn't occur to me to implement it
| client side. I might have stuck a bit too much to the lfs test
| server implementation. https://github.com/git-lfs/lfs-test-server
| chx wrote:
| Client side is, while interesting, of limited use as every CI
| and similar tool won't work this. This seems like a sort of
| automation of wormhole which I guess is neat
| https://github.com/cxw42/git-tools/blob/master/wormhole
| tonymet wrote:
| how does it handle incremental changes? If it's writing your
| entire repo on a loop, I could see why AWS would promote it.
| afro88 wrote:
| Looks like it uses bundles, which handle incremental changes:
| https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
| Evidlo wrote:
| For the LFS part there is also dvc which works better than git-
| lfs and natively supports S3.
| bagavi wrote:
| Dvc is great tool!
| lenova wrote:
| I haven't heard of dvc, so I had to google it, which took me
| to: https://dvc.org/
|
| But I'm still confused as to what is dvc is after a cursory
| glance at their homepage.
| chatmasta wrote:
| It was on the front page contemporaneously with this
| comment that recommended it, so you know it was an unbiased
| recommendation. :)
| matrss wrote:
| There is also git-annex, which supports S3 as well as a bunch
| of other storage backends (and it is very easy to implement
| your own, it just has to loosely resemble a key-value store).
| Git-annex can use any of its special remotes as git remotes,
| like what the presented tool does for just S3.
| kernelsanderz wrote:
| Also worth checking out https://github.com/jasonwhite/rudolfs
|
| Been using it to store datasets via lfs. Written in rust and
| has been very reliable.
| x3n0ph3n3 wrote:
| Wow, AWS _really_ wants to get rid of CodeCommit.
| zmmmmm wrote:
| Just remember, the mininum billing increment for file size is
| 128KB in real AWS S3. So your Git repo may be a lot more
| expensive than you would think if you have a giant source tree
| full of small files.
| afro88 wrote:
| Looks like it uses bundles rather than raw files:
| https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
| justin_oaks wrote:
| That 128KB only applies to non-standard S3 storage tiers
| (glacier, infrequent access, one zone, etc)
|
| S3 standard, which is likely what people would use for git
| storage, doesn't have that minimum file size charge.
|
| See the asterisk sections in https://aws.amazon.com/s3/pricing/
| zmmmmm wrote:
| Thank you for highlighting that, I had remembered it wrongly.
| chrsig wrote:
| also the puts are 5x as expensive as the get operations
| milkey_mouse wrote:
| You can also do this with Cloudflare Workers for fewer setup
| steps/moving parts:
|
| https://github.com/milkey-mouse/git-lfs-s3-proxy
| xena wrote:
| How do you install this? Homebrew broke global pip install. Is
| there a homebrew package or something?
| mdaniel wrote:
| FWIW, their helpers make things pretty cheap to create new
| Formula by yourself $ brew create --python
| --set-license Apache-2 https://github.com/awslabs/git-
| remote-s3/archive/refs/tags/v0.1.19.tar.gz Formula name
| [git-remote-s3]: ==> Downloading
| https://github.com/awslabs/git-
| remote-s3/archive/refs/tags/v0.1.19.tar.gz ==>
| Downloading from https://codeload.github.com/awslabs/git-
| remote-s3/tar.gz/refs/tags/v0.1.19 ##O=-# #
| Warning: Cannot verify integrity of '84b0a9a6936ebc07a39f123a3e
| 85cd23d7458c876ac5f42e9f3ffb027dcb3a0f--git-
| remote-s3-0.1.19.tar.gz'. No checksum was provided.
| For your reference, the checksum is: sha256 "3faa1f95
| 34c4ef2ec130fac2df61428d4f0a525efb88ebe074db712b8fd2063b"
| ==> Retrieving PyPI dependencies for
| "https://github.com/awslabs/git-
| remote-s3/archive/refs/tags/v0.1.19.tar.gz"... ==>
| Retrieving PyPI dependencies for excluded ""... ==>
| Getting PyPI info for "boto3==1.35.44" ==> Getting PyPI
| info for "botocore==1.35.44" ==> Excluding "git-
| remote-s3==0.1.19" ==> Getting PyPI info for
| "jmespath==1.0.1" ==> Getting PyPI info for "python-
| dateutil==2.9.0.post0" ==> Getting PyPI info for
| "s3transfer==0.10.3" ==> Getting PyPI info for
| "six==1.16.0" ==> Getting PyPI info for
| "urllib3==2.2.3" ==> Updating resource blocks
| Please run the following command before submitting:
| HOMEBREW_NO_INSTALL_FROM_API=1 brew audit --new git-remote-s3
| Editing /usr/local/Homebrew/Library/Taps/homebrew/homebrew-
| core/Formula/g/git-remote-s3.rb
|
| They also support building from git directly, if you want to
| track non-tagged releases (see the "--head" option to create)
| CGamesPlay wrote:
| If you are interested in using S3 as a git remote but are
| concerned with privacy, I built a tool a while ago to use S3 as
| an untrusted git remote using Restic.
| https://github.com/CGamesPlay/git-remote-restic
| mattxxx wrote:
| This seems wrong, since you can't push transactionally +
| consistently in S3.
|
| They address this directly in their section on concurrent writes:
| https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...
|
| And in their design: https://github.com/awslabs/git-
| remote-s3?tab=readme-ov-file#...
|
| But it seems like this is just the wrong tool for the job
| (hosting git repos).
| WhyNotHugo wrote:
| git-annex also has native support for s3.
| matrss wrote:
| I think this is more about storing the entire repository on s3,
| not just large files as git-lfs and git-annex are usually
| concerned with. But coincidentally, git-annex somewhat recently
| got the feature to use any of its special remotes as normal git
| remotes (https://git-annex.branchable.com/git-remote-annex/),
| including s3, webdav, anything that rclone supports, and a few
| more.
| doctorpangloss wrote:
| https://alanedwardes.com/blog/posts/serverless-git-lfs-for-g...
|
| I've used this guy's CloudFormation template since forever for
| LFS on S3.
|
| GitHub has to lower its egregious LFS pricing.
| kernelsanderz wrote:
| I've been using https://github.com/jasonwhite/rudolfs - which is
| written in rust. It's high performance but doesn't have all the
| features (auth) that you might need.
___________________________________________________________________
(page generated 2024-10-20 23:02 UTC)