[HN Gopher] serverless-registry: A Docker registry backed by Wor...
___________________________________________________________________
serverless-registry: A Docker registry backed by Workers and R2
Author : tosh
Score : 97 points
Date : 2024-09-05 16:34 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Fire-Dragon-DoL wrote:
| How's the pricing with low usage? I suspect this is great. I
| wanted an image registry so that I can use it to deploy with
| Kamal, but the $5 plan is overpriced, given I push an image maybe
| once every 3 months. This could solve that
| bayesianbot wrote:
| I don't use much of CloudFlare services but it seems kinda
| cheap, $0.015/GBmonth for storage (+10GB free), Workers are
| charged per request and CPU time, both of which would probably
| be quite low for a registry so free plan would go quite far?
|
| I just set up the official registry on a VPS (for similar usage
| pattern) and it was a bit of work and probably much more
| expensive, this seems quite attractive unless I've
| misunderstood something.
| Fire-Dragon-DoL wrote:
| Yeah it does sound great. The alternative for me is to host
| my own docker registry on my home server. That would cost me
| 0 essentially (I have good internet at home)
| Alifatisk wrote:
| > regitry
| thangngoc89 wrote:
| I think this is wonderful. I'm running a Gitea instance in one of
| our dev machine just for private registry. Keeping the instance
| only had been extra workflow for us.
|
| But 500MB limit of layer size is a dealbreaker for AI related
| workflow.
| geek_at wrote:
| gitea does also bring their own registry though. If you self-
| host you can also use LFS for unlimited filesizes
| thangngoc89 wrote:
| I'm self-hosting gitea just for their private docker
| registry. LFS is actually slow for heavy deep learning
| workflow with millions of small files. I'm using DVC [1]
| instead.
|
| [1]: https://dvc.org
| arjvik wrote:
| Absolutely love DVC for data version control! What storage
| backend are you using with DVC?
| thangngoc89 wrote:
| Local (mounted NFS) from our internal NAS
| victorbjorklund wrote:
| Nice. I been seriously thinking about building exactly this (but
| Im glad someone smarter made it already)
| jacobwg wrote:
| The annoying thing about trying to implement a Docker registry on
| Workers and R2 is that it's _so close_ to having everything you
| need, but the 500MB request body limit means Workers is unable to
| accept pushes of layers larger than 500MB. The limit is even
| lower at 100MB on the Pro plan[0].
|
| We are running a registry that does store content on R2[1], and
| today this is implemented as the unholy chimera of Cloudflare
| Workers, AWS CloudFront, Lambda@edge, regular Lambda, S3, and R2.
|
| Pushes go first to CloudFront+Lambda@edge, content is saved in S3
| first, then moved to R2 in background jobs. Once it's transited
| to R2, then pulls are served from R2.
|
| I would so love for Workers + R2 to actually be able to accept
| pushes of large layers, unfortunately I have yet to talk to
| anyone at Cloudflare who believes it's possible. Especially in
| this era of AI/ML models, some container images can have single
| layers in the 10-100GB range!
|
| [0]
| https://developers.cloudflare.com/workers/platform/limits/#r...
|
| [1] https://depot.dev/docs/guides/ephemeral-registry
| amenghra wrote:
| Can you generate a signed url to upload directly to R2? Or
| perform the upload in chunks?
| jacobwg wrote:
| Uploading in chunks could definitely solve the issue, and the
| OCI Distribution Specification does actually have some
| language about an optional chunked push API[0].
|
| Unfortunately very few of the registry clients actually
| support this, critically containerd does not[1], so this
| means your regular `docker push` and a whole lot of ecosystem
| tooling does not work.
|
| This also means that the single PUT must be able to support
| very large pushes as a single request, possibly even larger
| than what R2 or S3 would allow without using multipart
| upload. This means you actually need a server to accept the
| PUT, then do its own chunked upload to object storage or
| otherwise stage the content before it's finally saved in
| object storage.
|
| This rules out presigned URLs for push too, since the PUT
| request made to the presigned URL can be too large for the
| backing object storage to accept.
|
| There's also other processing that ideally happens on push
| (like hash digest verification of the pushed layer) that mean
| a server somewhere needs to be involved.
|
| [0] https://github.com/opencontainers/distribution-
| spec/blob/mai...
|
| [1] https://github.com/containerd/containerd/blob/192679b0591
| 7b5...
| telgareith wrote:
| What am I missing such that presigned Urls aren't the solution
| to this issue?
| andreasmetsala wrote:
| R2 is ridiculously cheap compared to S3. The price difference
| was more than 40x when I last looked at it.
| compootr wrote:
| Rediculously cheap until their sales team shows up and says
| otherwise!
| kobalsky wrote:
| Mind explaining?
| fragmede wrote:
| https://robindev.substack.com/p/cloudflare-took-down-our-
| web...
| zimbatm wrote:
| Does the worker need to process the body?
|
| Sometimes, the worker can return a signed URL and have the
| client directly upload to R2.
| koolba wrote:
| > We are running a registry that does store content on R2[1],
| and today this is implemented as the unholy chimera of
| Cloudflare Workers, AWS CloudFront, Lambda@edge, regular
| Lambda, S3, and R2.
|
| What's the advantage over just using ECR? Cost of storage? Cost
| of bandwidth to read? Hosting provider genetic diversity?
| gyre007 wrote:
| Cost of storage and egress cost are wildly more expensive on
| ECR compared to CF+R2
| Shakahs wrote:
| ECR is _slow_. Despite being a static datastore presumably
| backed by S3 it will only serve container image layers at
| around 150mbps, when dealing with large (10GB) container
| images this is a problem. R2 will happily serve the same data
| at multi-gigabit speed.
| mikeocool wrote:
| Have any container running tools just implemented basic S3
| compatibility for pushing/pulling images? If your registry doesnt
| accept pushes from untrusted sources, it doesn't seem like there
| is a ton of value in having "smarts" in the registry server
| itself.
|
| When you push, the client could just PUT a metadata file and an
| object for each layer in the object store, and pulling would just
| read the metadata file, which would tell it where to get each
| layer. And could use etags to skip downloaded layers that have
| already been downloaded.
|
| For auth just use the standard S3 auth.
|
| Would be compatible with S3/r2/any other S3-compatible storage.
| champtar wrote:
| I would love if the container pull protocol stopped using custom
| headers or content-type, so we could use any dumb http server.
| mayli wrote:
| this? https://github.com/NicolasT/static-container-registry and
| this? https://github.com/jpetazzo/registrish
| RyeCombinator wrote:
| Great feat.
|
| However I am ever more confused now on what Cloudflare does and
| builds. They have everything from CDN, DNS to Orange Meets and
| this now?
| rozenmd wrote:
| There's a developer platform: https://workers.cloudflare.com/
| yazaddaruvala wrote:
| My understanding is Cloudflare is a competitor of AWS, Azure,
| and GCP.
| yecuken wrote:
| I'm using this registry with regctl[0] to chunk uploads (to
| circumvent 100MB limit), works just fine for huge layers with
| models. With regctl you will also get 'mount' query parameter for
| upload initialization with the proper blob name so you can skip
| additional R2 copy when multi-part upload finalisation which
| speeds up the upload (and avoids crashes on larger blobs). This
| is not part of docker registry API, so I never got to PR that.
|
| [0] https://github.com/regclient/regclient
| miohtama wrote:
| When you switch to private Docker or Github registry to
| Cloudflare, are you effectively just trading one vendor lock in
| to another, or is there more into this?
| vineyardmike wrote:
| None of these are really vendor lock in. The registry protocol
| is an open standard, this is just one more source that
| implements it. So you're only locked in as far as your data is
| stored _somewhere_ , but that data is behind an open API, so
| minimal risk there.
| spikey_sanju wrote:
| Interesting. Just wish it handled larger image layers a bit
| better!
| jzelinskie wrote:
| If you are a CloudFlare employee reading this, you should get
| involved with the OCI Distribution group that develops the
| standards for the registry:
| https://github.com/opencontainers/distribution-spec
| gyre007 wrote:
| OCI is demonstrably broken as a specification body as
| demonstrated by the referrers API. Distirbution spec as it is
| at the moment is just a very poorly written technical doc.
| beeboobaa3 wrote:
| Can you explain a bit more what you mean?
| fswd wrote:
| using this same architecture, it would be cool to build a
| serverless-git
| mdaniel wrote:
| I would have thought for sure someone would have already tried
| that, but regrettably trying to search for "serverless git"
| coughs up innumerable references to the _framework_ that is
| hosted on _GIThub_
|
| Anyway, I was curious how much work such a stunt would be and
| based on the git-http-backend docs <https://git.github.io/git-
| scm.com/docs/git-http-backend#Docu...> it seems like there are
| actually a manageable number of endpoints
|
| I actually prefer their second scenario of splitting out access
| to the objects and pack files but since doing that would still
| require a function (or, ahem, a web server running) I suspect
| that optimization is not within the scope of what you had in
| mind
| airocker wrote:
| Is there a registry that would work on extremely cheap disk
| storage if the use case if only push and very infrequent pulls?
| qudat wrote:
| This is pretty nice. Does it support an API for deleting images
| (and having it properly garbage-collected)? It looks like maybe
| this does it? https://github.com/cloudflare/serverless-
| registry/blob/13c4e...
|
| We have a managed docker registry and could have definitely used
| this project!
|
| Slightly unrelated, but we've been experimenting with using SSH
| for authenticating with a docker registry if anyone is
| interested: https://github.com/picosh/tunkit?tab=readme-ov-
| file#why
| bravetraveler wrote:
| Interesting approach when running one of these on your 'LAN' is
| relatively easy
|
| Though, to be fair, the pull-through mechanism has been kind of
| goofy for years. Ask me how I know /s
___________________________________________________________________
(page generated 2024-09-05 23:00 UTC)