[HN Gopher] Video streaming at scale with Kubernetes and RabbitMQ
___________________________________________________________________
Video streaming at scale with Kubernetes and RabbitMQ
Author : thunderbong
Score : 129 points
Date : 2023-10-09 17:51 UTC (5 hours ago)
(HTM) web link (alexandreolive.medium.com)
(TXT) w3m dump (alexandreolive.medium.com)
| com2kid wrote:
| This is nice if you only have to deliver in one format, but as
| soon as you want to show up on TVs you are stuck delivering in a
| _lot_ of formats, and life gets complicated quickly.
|
| Throw subtitles in multiple languages, and different audio
| tracks, into the mix, and all of a sudden streaming video becomes
| a nightmare.
|
| Finally, if you are dealing with copyrighted materials, you have
| to be aware as to what country your user is physically residing
| in while accessing the videos, as you likely don't have a license
| to stream all your videos in every country all at once.
|
| Throw this all into a blender and what is needed is a very fancy
| asset catalog management system, and that part right there ends
| up being annoyingly complicated.
| grzes wrote:
| "fancy asset catalog management system" - was thinking about
| building such solution lately - do you know any open-source
| solutions of this kind?
| dbrueck wrote:
| Oh, this is just the tip of the iceberg. Many parts of on-
| demand video streaming are largely commoditized at this point.
| Add in support for linear (live) streaming and ad insertion and
| things start to get really interesting. :)
| mannyv wrote:
| Fuck K8. You literally don't need it. Maybe he needs it because
| he's building on google cloud.
|
| AWS is easier, but you can do it with anything. The basic steps
| are:
|
| 1. Upload the file somewhere 2. Transcode it 3. Put the parts
| somewhere 4. Serve the parts
|
| You should really transcode everything into HLS. It's 2023, and
| everything that matters supports it. If you want 4k you can use
| HLS or the other thing (which I keep forgetting the acronym for).
|
| If you want to get fancy you can do rendition audio, which not
| everything supports. Rendition audio means sharing one audio
| stream amongst N number of video streams.
|
| You can use FFMPEG to transcode, but I'd suggest using AWS
| MediaConvert. It's cheap, fast, and probably does everything you
| want. Using FFmpeg directly works, but why bother. You will get
| an option wrong and screw everything up. You don't want your
| video to not work on some random device that 50k people are using
| in some country you didn't think about.
|
| He's using RabbitMQ but you should use SQS, because SQS can
| trigger lambdas...which means no polling required. But use
| whatever queue you want.
|
| You can kick the process off by attaching a Lambda to S3, which
| will start the process when the file is uploaded.
|
| You can kick your "availability activation" off by attaching a
| Lambda to the S3 output bucket.
|
| Background: I help run a streaming service and built the backend
| pipeline.
|
| This omits the entire "metadata management and analytics" side as
| well. That's left as an exercise for the user.
| [deleted]
| jonnycoder wrote:
| What would you recommend using as an alternative to being
| locked into AWS?
| jiggawatts wrote:
| This post is somewhat unfairly voted down.
|
| Cloud services like S3 and Azure Storage were invented
| specifically for hosting images and video. That's their origin
| story, their foundation, their very reason for being.
|
| Similarly, cloud functions / lambda were invented for
| background processing of blobs. The first demos were always of
| resizing images!
|
| Building out this infrastructure yourself is a little insane.
| Unless you're Netflix, don't bother. Just dump your videos into
| blobs.
|
| It's like driving to your cousin's place, but step one is
| building your own highway because you couldn't be bothered to
| check the map to see if one already existed.
|
| PS: Netflix serves video from BSD running directly on bare
| metal because at that scale efficiency matters! If efficiency
| doesn't matter _that much_ , use blobs. Kubernetes is going to
| be even worse.
| totallyunknown wrote:
| While the article provides guidance on utilizing standard
| software and services to construct a basic video upload platform,
| it lacks deeper insights into advanced scaling techniques.
| andrewstuart wrote:
| I have to ask, why bother with Kubernetes and all the associated
| config and pain? Why not just start a new spot instance? I can't
| see any reason for Kubernetes in this architecture even though
| it's the title of the post.
|
| Also personally I wouldn't use rabbitmq ... it's pretty
| heavyweight... there's lots of lightweight queues out there.
| Overall this architecture looks like it could be simplified.
|
| Also, the post doesn't mention if the video encoding uses GPU
| hardware acceleration. Makes a big difference especially if using
| spot instances .... ffmpeg in CPU is extremely computationally
| expensive.
|
| Presumably all input videos need reencoding to convert them to
| HLS.
| malux85 wrote:
| This is what I was wondering, in the article it looks like
| kubernetes is just used to launch the node containers - why is
| the database and rabbitmq outside of kubernetes? This
| architecture looks like it's been cobbled together by a junior
| baq wrote:
| There some of us who still perform four extra steps before
| putting any DB in k8s and we have good reasons.
| robertlagrant wrote:
| Kubernetes loves stateless services. Zero wrong with moving
| RabbitMQ or a database outside of it.
| malux85 wrote:
| Except kubernetes has a whole storage provisioning system
| that gives you redundancy and automatic failover, if you're
| going to the trouble of running kubernetes why not just run
| your whole infra on it?
|
| I run https://atomictessellator.com solo, using kubernetes,
| and my database, Minio object store, application servers,
| quantum workers, everything is all on kubernetes, it's self
| healing and much simpler to run all the infrastructure the
| same.
|
| Recently I had a node failure while I was sleeping and the
| whole system healed itself while I slept, the monitoring
| system didn't even alarm me because the small blip of
| increased latency while the pods rebalanced wasn't above
| the alert threshold so it didn't even wake me up.
|
| What happens in the article infra when the rabbitmq or
| database nodes fail? The whole system goes offline, which
| seems very silly setup when you have kubernetes sitting
| right there, who's primary function is to handle all of
| this.
| robertlagrant wrote:
| What happens when your storage detaches from your k8s
| cluster? Your services start 503ing, hopefully, because
| you didn't design your system thinking that k8s == 100%
| uptime.
| malux85 wrote:
| Anybody can invent random problems ad nauseam - that
| doesn't prove anything.
|
| I'm not claiming that it's totally bullet proof, I never
| said that - I'm saying that if you had a kubernetes
| cluster anyway why not benefit from its abilities?
| Especially when the alternative is single node, single
| points of failure, which is clearly inferior.
|
| The "what if the storage detaches" argument could easily
| apply to the single node VMs too, in which case the
| outcome would be a total system failure.
|
| We are discussing the contrast between the articles
| architecture and running everything on K8s ... and I'm
| saying that running everything on K8s is clearly better
| pyrophane wrote:
| Why do you say RabbitMQ is heavyweight? What queues do you
| consider more lightweight and what would be your go-to in a
| situation like this?
| alexandreolive wrote:
| Hello, I'm the writer of the article. We are using Kubernetes
| for our whole architecture, consisting of around 40
| microservices and cron jobs. I just wanted in this article to
| give an example of asynchronous architecture using Kubernetes
| and RabbitMQ.
|
| We are using RabbitMQ because it's my company target solution.
| There might better so lighter solution that would fit us but
| having just one for every solution is easier to maintain.
|
| Great comment about GPU hardware acceleration for encoding, I'm
| going to look this up.
| andrewstuart wrote:
| So Kubernetes is only in this architecture because other
| systems use it and its required by the parent company but not
| needed.
|
| That's pretty important context.
| alexandreolive wrote:
| That's not what I said; sorry if that was not clear. The
| parent company requires RabbitMQ, we are using Kubernetes
| because managing 40 microservices without it would be hell.
| In the article, I only showed 1 user-facing API, but it's
| actually multiple services, I just did not want to
| complicate it too much.
| [deleted]
| mihaitodor wrote:
| I believe loads of auxiliary microservices have been omitted
| for brevity. Of course, those also don't require Kubernetes,
| but maybe they have some standardised deployment system which
| keeps things manageable. Don't forget about Observability and
| whatnot.
| [deleted]
| schott12521 wrote:
| I thoroughly appreciated this article as I've been building a
| short-form video content streaming service and the performance
| hasn't been what I expected.
|
| Granted, I knew that my service needs to be able to scale at
| different bottlenecks, but a lot of "build your own video
| service!" tutorials start with:
|
| - Build a backend, return a video file
|
| - Build a frontend, embed the video
|
| And that leaves a lot to be desired in terms of performance. I
| think the actual steps should be:
|
| - Build a backend that consists of: - Video
| Ingestion service - Video Upload / Processing Service
| that saves the video into chunks - Build a streaming
| service that returns video chunks
|
| - Build a frontend that consists of: - Build or
| use a video streaming library that can play video chunks as a
| stream
|
| Edit: From the author's links, I found this website which is very
| informative: https://howvideo.works/
| John23832 wrote:
| I built a similar project, and had great results with
| cloudflare stream.
| mmcclure wrote:
| I helped work on howvideo.works, fun to see it helping people!
| The world of video is, I'd argue, one of those technical spaces
| that is extremely iceberg-y. You can get decently far enough
| using S3 + the HTML5 video tag, which I think creates a
| perception among some that video is just images but a little
| bigger, but that couldn't be further from the truth. You can
| really pick just about any step along the video pipeline from
| production to playback and go as deep for as many years as
| you'd like.
|
| This is both a semi-shameless plug _and_ probably a few levels
| deeper than what you 're looking for, but I organize a
| conference for video developers called Demuxed. The YouTube
| channel[1] has 8 years worth of conference videos about
| streaming video (and the 9th year is happening in a couple of
| weeks). The bullet points you mentioned are definitely covered
| across a few talks, but it's certainly not in any kind of "how
| to" format.
|
| [1]: https://youtube.com/demuxed
| alexandreolive wrote:
| I'm the writer of the article; I LOVE howvideo.works. It
| helped me quite a lot when I started working on video
| processing. I'm still a beginner and always fall back to it
| when I'm unsure about something fundamental. Thanks for your
| work. I'll take a look at your YouTube channel.
| Uehreka wrote:
| Something like this?
| https://github.com/streamlinevideo/streamline
| andrewstuart wrote:
| >> I've been building a short-form video content streaming
| service
|
| What does it do?
| schott12521 wrote:
| Right now I'm basically trying to just re-create the TikTok /
| Youtube Shorts / Instagram Reels experience of infinitely
| scrolling videos.
|
| Mostly just building for fun though.
| alexandreolive wrote:
| I'm the writer of the article; thanks for your lovely comment.
| I skipped many essential parts of the architecture in the
| article to keep it concise. The following articles will be
| about the technical implementation of what I discussed in this
| one.
| FaisalMahmoud wrote:
| For general video streaming, Mux.com has greatly decreased my
| development time. Getting playback working is straightforward.
| And for advanced use cases, like real time editing and preview in
| a web browser, it works as expected and doesn't get in the way.
| dvliman wrote:
| I built a similar video pipeline, not on Kubernetes but using EC2
| instances for those hungry FFMPEG encoder.
|
| The system differ in that it was not user generated video
| content. It was coming from the cameras in our fitness studio.
|
| Here is the article if anyone intereste to read about:
| https://dev.to/dvliman/building-a-live-streaming-app-in-cloj...
| devgoth wrote:
| awesome article! curious -- why clojure?
| robinduckett wrote:
| Because CS Degree
| dvliman wrote:
| No specific reason. It could have been built in any language.
| It was just the language we were using and enjoyed at that
| time.
| thomasjudge wrote:
| OP mentions that "I would love to be a little mouse and peek at
| YouTube's complete architecture to see how far we are from them."
| You can occasionally find posts -often linked here- from another
| player in streaming video which you might have heard of,
| discussing technical architecture. For example, this might be a
| little lower level that you may be interested in as it relates to
| kernel optimizations to jack bit throughput rates, but I dig this
| sort of thing -
|
| https://www.youtube.com/watch?v=36qZYL5RlgY
| andrewstuart wrote:
| Must be expensive to run on Google Cloud.
|
| Also looks pretty complex.
|
| The stabilization step presumably does a video encode .... that's
| extremely expensive in terms of time, compute and money I wonder
| why it's necessary.
| tehlike wrote:
| I was thinking the same. CF on the front would improve on it
| but still.
|
| Hetzner or other bare metal providers would probably be a
| better idea.
| hotnfresh wrote:
| CF meaning Cloudflare? If you're serving video through them,
| then you're in "enterprise plan" territory. You can't do that
| on the free or "self-serve" paid plans. $5k+/m depending on
| bandwidth needs (and if you just need a cdn to push bits, CF
| won't be competitive on price--their enterprise prices are
| tailored for companies that want all sorts of managed
| services and private networking stuff)
| alexandreolive wrote:
| Hello, I'm the writer of the article. Our solution gets videos
| from random people who present products we sent them. We get
| dodgy videos filmed on bad devices, and the process of
| contacting the user and getting him to re-upload another video
| in better quality is time-consuming for our team. We'd rather
| spend a little bit more in computing to try and save time
| overall. I hope this answers your question.
| latchkey wrote:
| Not necessarily. GCP, when used correctly, can be super cheap.
| You also don't know the contractual deals they have with GCP.
| klaussilveira wrote:
| I wonder if it wouldn't be cheaper to run an on-prem farm of
| BestBuy-grade "gamer PC" for smaller scale networks like that.
| andrewstuart wrote:
| Slap one of these puppies in....
|
| AMD Alveo MA35D Media Accelerator
|
| https://www.xilinx.com/applications/data-center/video-
| imagin...
| goeiedaggoeie wrote:
| Ive used xilinx a fair bit for encoding. once you get past
| the pain of compiling your tooling for it it does speed up
| VOD encode significantly.
| [deleted]
___________________________________________________________________
(page generated 2023-10-09 23:00 UTC)