[HN Gopher] serverless-registry: A Docker registry backed by Wor...
       ___________________________________________________________________
        
       serverless-registry: A Docker registry backed by Workers and R2
        
       Author : tosh
       Score  : 97 points
       Date   : 2024-09-05 16:34 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Fire-Dragon-DoL wrote:
       | How's the pricing with low usage? I suspect this is great. I
       | wanted an image registry so that I can use it to deploy with
       | Kamal, but the $5 plan is overpriced, given I push an image maybe
       | once every 3 months. This could solve that
        
         | bayesianbot wrote:
         | I don't use much of CloudFlare services but it seems kinda
         | cheap, $0.015/GBmonth for storage (+10GB free), Workers are
         | charged per request and CPU time, both of which would probably
         | be quite low for a registry so free plan would go quite far?
         | 
         | I just set up the official registry on a VPS (for similar usage
         | pattern) and it was a bit of work and probably much more
         | expensive, this seems quite attractive unless I've
         | misunderstood something.
        
           | Fire-Dragon-DoL wrote:
           | Yeah it does sound great. The alternative for me is to host
           | my own docker registry on my home server. That would cost me
           | 0 essentially (I have good internet at home)
        
       | Alifatisk wrote:
       | > regitry
        
       | thangngoc89 wrote:
       | I think this is wonderful. I'm running a Gitea instance in one of
       | our dev machine just for private registry. Keeping the instance
       | only had been extra workflow for us.
       | 
       | But 500MB limit of layer size is a dealbreaker for AI related
       | workflow.
        
         | geek_at wrote:
         | gitea does also bring their own registry though. If you self-
         | host you can also use LFS for unlimited filesizes
        
           | thangngoc89 wrote:
           | I'm self-hosting gitea just for their private docker
           | registry. LFS is actually slow for heavy deep learning
           | workflow with millions of small files. I'm using DVC [1]
           | instead.
           | 
           | [1]: https://dvc.org
        
             | arjvik wrote:
             | Absolutely love DVC for data version control! What storage
             | backend are you using with DVC?
        
               | thangngoc89 wrote:
               | Local (mounted NFS) from our internal NAS
        
       | victorbjorklund wrote:
       | Nice. I been seriously thinking about building exactly this (but
       | Im glad someone smarter made it already)
        
       | jacobwg wrote:
       | The annoying thing about trying to implement a Docker registry on
       | Workers and R2 is that it's _so close_ to having everything you
       | need, but the 500MB request body limit means Workers is unable to
       | accept pushes of layers larger than 500MB. The limit is even
       | lower at 100MB on the Pro plan[0].
       | 
       | We are running a registry that does store content on R2[1], and
       | today this is implemented as the unholy chimera of Cloudflare
       | Workers, AWS CloudFront, Lambda@edge, regular Lambda, S3, and R2.
       | 
       | Pushes go first to CloudFront+Lambda@edge, content is saved in S3
       | first, then moved to R2 in background jobs. Once it's transited
       | to R2, then pulls are served from R2.
       | 
       | I would so love for Workers + R2 to actually be able to accept
       | pushes of large layers, unfortunately I have yet to talk to
       | anyone at Cloudflare who believes it's possible. Especially in
       | this era of AI/ML models, some container images can have single
       | layers in the 10-100GB range!
       | 
       | [0]
       | https://developers.cloudflare.com/workers/platform/limits/#r...
       | 
       | [1] https://depot.dev/docs/guides/ephemeral-registry
        
         | amenghra wrote:
         | Can you generate a signed url to upload directly to R2? Or
         | perform the upload in chunks?
        
           | jacobwg wrote:
           | Uploading in chunks could definitely solve the issue, and the
           | OCI Distribution Specification does actually have some
           | language about an optional chunked push API[0].
           | 
           | Unfortunately very few of the registry clients actually
           | support this, critically containerd does not[1], so this
           | means your regular `docker push` and a whole lot of ecosystem
           | tooling does not work.
           | 
           | This also means that the single PUT must be able to support
           | very large pushes as a single request, possibly even larger
           | than what R2 or S3 would allow without using multipart
           | upload. This means you actually need a server to accept the
           | PUT, then do its own chunked upload to object storage or
           | otherwise stage the content before it's finally saved in
           | object storage.
           | 
           | This rules out presigned URLs for push too, since the PUT
           | request made to the presigned URL can be too large for the
           | backing object storage to accept.
           | 
           | There's also other processing that ideally happens on push
           | (like hash digest verification of the pushed layer) that mean
           | a server somewhere needs to be involved.
           | 
           | [0] https://github.com/opencontainers/distribution-
           | spec/blob/mai...
           | 
           | [1] https://github.com/containerd/containerd/blob/192679b0591
           | 7b5...
        
         | telgareith wrote:
         | What am I missing such that presigned Urls aren't the solution
         | to this issue?
        
           | andreasmetsala wrote:
           | R2 is ridiculously cheap compared to S3. The price difference
           | was more than 40x when I last looked at it.
        
             | compootr wrote:
             | Rediculously cheap until their sales team shows up and says
             | otherwise!
        
               | kobalsky wrote:
               | Mind explaining?
        
               | fragmede wrote:
               | https://robindev.substack.com/p/cloudflare-took-down-our-
               | web...
        
         | zimbatm wrote:
         | Does the worker need to process the body?
         | 
         | Sometimes, the worker can return a signed URL and have the
         | client directly upload to R2.
        
         | koolba wrote:
         | > We are running a registry that does store content on R2[1],
         | and today this is implemented as the unholy chimera of
         | Cloudflare Workers, AWS CloudFront, Lambda@edge, regular
         | Lambda, S3, and R2.
         | 
         | What's the advantage over just using ECR? Cost of storage? Cost
         | of bandwidth to read? Hosting provider genetic diversity?
        
           | gyre007 wrote:
           | Cost of storage and egress cost are wildly more expensive on
           | ECR compared to CF+R2
        
           | Shakahs wrote:
           | ECR is _slow_. Despite being a static datastore presumably
           | backed by S3 it will only serve container image layers at
           | around 150mbps, when dealing with large (10GB) container
           | images this is a problem. R2 will happily serve the same data
           | at multi-gigabit speed.
        
       | mikeocool wrote:
       | Have any container running tools just implemented basic S3
       | compatibility for pushing/pulling images? If your registry doesnt
       | accept pushes from untrusted sources, it doesn't seem like there
       | is a ton of value in having "smarts" in the registry server
       | itself.
       | 
       | When you push, the client could just PUT a metadata file and an
       | object for each layer in the object store, and pulling would just
       | read the metadata file, which would tell it where to get each
       | layer. And could use etags to skip downloaded layers that have
       | already been downloaded.
       | 
       | For auth just use the standard S3 auth.
       | 
       | Would be compatible with S3/r2/any other S3-compatible storage.
        
       | champtar wrote:
       | I would love if the container pull protocol stopped using custom
       | headers or content-type, so we could use any dumb http server.
        
         | mayli wrote:
         | this? https://github.com/NicolasT/static-container-registry and
         | this? https://github.com/jpetazzo/registrish
        
       | RyeCombinator wrote:
       | Great feat.
       | 
       | However I am ever more confused now on what Cloudflare does and
       | builds. They have everything from CDN, DNS to Orange Meets and
       | this now?
        
         | rozenmd wrote:
         | There's a developer platform: https://workers.cloudflare.com/
        
         | yazaddaruvala wrote:
         | My understanding is Cloudflare is a competitor of AWS, Azure,
         | and GCP.
        
       | yecuken wrote:
       | I'm using this registry with regctl[0] to chunk uploads (to
       | circumvent 100MB limit), works just fine for huge layers with
       | models. With regctl you will also get 'mount' query parameter for
       | upload initialization with the proper blob name so you can skip
       | additional R2 copy when multi-part upload finalisation which
       | speeds up the upload (and avoids crashes on larger blobs). This
       | is not part of docker registry API, so I never got to PR that.
       | 
       | [0] https://github.com/regclient/regclient
        
       | miohtama wrote:
       | When you switch to private Docker or Github registry to
       | Cloudflare, are you effectively just trading one vendor lock in
       | to another, or is there more into this?
        
         | vineyardmike wrote:
         | None of these are really vendor lock in. The registry protocol
         | is an open standard, this is just one more source that
         | implements it. So you're only locked in as far as your data is
         | stored _somewhere_ , but that data is behind an open API, so
         | minimal risk there.
        
       | spikey_sanju wrote:
       | Interesting. Just wish it handled larger image layers a bit
       | better!
        
       | jzelinskie wrote:
       | If you are a CloudFlare employee reading this, you should get
       | involved with the OCI Distribution group that develops the
       | standards for the registry:
       | https://github.com/opencontainers/distribution-spec
        
         | gyre007 wrote:
         | OCI is demonstrably broken as a specification body as
         | demonstrated by the referrers API. Distirbution spec as it is
         | at the moment is just a very poorly written technical doc.
        
           | beeboobaa3 wrote:
           | Can you explain a bit more what you mean?
        
       | fswd wrote:
       | using this same architecture, it would be cool to build a
       | serverless-git
        
         | mdaniel wrote:
         | I would have thought for sure someone would have already tried
         | that, but regrettably trying to search for "serverless git"
         | coughs up innumerable references to the _framework_ that is
         | hosted on _GIThub_
         | 
         | Anyway, I was curious how much work such a stunt would be and
         | based on the git-http-backend docs <https://git.github.io/git-
         | scm.com/docs/git-http-backend#Docu...> it seems like there are
         | actually a manageable number of endpoints
         | 
         | I actually prefer their second scenario of splitting out access
         | to the objects and pack files but since doing that would still
         | require a function (or, ahem, a web server running) I suspect
         | that optimization is not within the scope of what you had in
         | mind
        
       | airocker wrote:
       | Is there a registry that would work on extremely cheap disk
       | storage if the use case if only push and very infrequent pulls?
        
       | qudat wrote:
       | This is pretty nice. Does it support an API for deleting images
       | (and having it properly garbage-collected)? It looks like maybe
       | this does it? https://github.com/cloudflare/serverless-
       | registry/blob/13c4e...
       | 
       | We have a managed docker registry and could have definitely used
       | this project!
       | 
       | Slightly unrelated, but we've been experimenting with using SSH
       | for authenticating with a docker registry if anyone is
       | interested: https://github.com/picosh/tunkit?tab=readme-ov-
       | file#why
        
       | bravetraveler wrote:
       | Interesting approach when running one of these on your 'LAN' is
       | relatively easy
       | 
       | Though, to be fair, the pull-through mechanism has been kind of
       | goofy for years. Ask me how I know /s
        
       ___________________________________________________________________
       (page generated 2024-09-05 23:00 UTC)