[HN Gopher] YouTube is now building its own video-transcoding chips
___________________________________________________________________
YouTube is now building its own video-transcoding chips
Author : beambot
Score : 232 points
Date : 2021-05-04 06:31 UTC (16 hours ago)
(HTM) web link (arstechnica.com)
(TXT) w3m dump (arstechnica.com)
| gigatexal wrote:
| Honestly I'm surprised they didn't do this earlier
| colonwqbang wrote:
| If anything it's just strange that such a part didn't exist
| before (if it really didn't). Accelerated encoding is hugely more
| power efficient than software encoding.
| wishysgb wrote:
| well I do believe video transcoding chips have been there
| forever. But I think those chips should be tailored to their
| exact application making them more efficient
| fireattack wrote:
| The complement of "building its own video-transcoding chips "
| isn't just software encoding though. Google/Youtube could have
| already been using hardware encodings, just with generic GPUs
| or whatever existing hardware.
| izacus wrote:
| I've worked on a IPTV broadcasting system and this isn't as
| obvious as you'd think.
|
| The big issue here is quality - most hardware encoders hugely
| lag behind software encoders. By quality I mean visual quality
| per bytes/s. Which means that using a quality software encoder
| like x264 will save you massive amount of money in bandwidth
| costs because you can simply go significantly lower in bitrate
| than you can with a hardware encoding block.
|
| At the time, our comparisons showed that you could get away
| with as low as 1.2MBps for 720p stream where with an enterprise
| HW encoder you'd have to do about 2-3MBps to have the same
| picture quality.
|
| That's one consideration. The other consideration is density -
| at the time most hardware encoders could do up to about 4
| streams per 1U rack unit. Those rack units cost about half the
| price of a fully loaded 24+ core server. Even GPUs like nVidia
| at the time could do at most 2 encoding sessions with any kind
| of performance. On CPU, we could encode 720p on about 2 Xeon
| cores which means that a fully loaded server box with 36+ cores
| could easily do 15-20 sessions of SD and HD and we could scale
| the load as necessary.
|
| And the last consideration was price - all HW encoders were
| significantly more expensive than buying large core count rack-
| mount servers. Funny enough, many of those "HW specialised
| encoding" boxes were running x86 cores internally too so they
| weren't even more power efficient.
|
| So in the end the calculation was simple - software encodes
| saved a ton of money on bandwidth, it allowed better quality
| product because we could deliver high quality video to people
| with poor internet connectivity, it made procuring hardware
| simple, it made the solution more scalable and all that at the
| cost of some power consumption. Easy trade. Of course the
| computation is a bit different with modern formats like VP9 and
| H.265/HEVC - the software encoders are still very CPU intensive
| so it might make sense to buy cards these days.
|
| Of course, we weren't Google and couldn't design and
| manufacture our own hardware. But seeing the list of codecs
| YouTube uses, there's also one more consideration: flexibility.
| HW encoding blocks are usually very limited at what they can do
| - most of them will do H.264, some of them will stretch to
| H.265 and maaaaaybe VP9. CPUs will encode into everything. Even
| when a new format is needed, you just deploy new software, not
| a whole chip.
| jng wrote:
| Very interesting description. Are you familiar at all with
| the details of FPGAs for these very same tasks, especially
| the EV family of Xilinx Zynq Ultrascale+ MPSoC? They include
| hardened video codec units, but I don't know how they compare
| quality/performance-wise. Thanks!
| izacus wrote:
| I'm afraid I don't have any experience with those devices.
| Most HW encoders however struggle with one thing - the fact
| that encoding is very costly when it comes to memory
| bandwidth.
|
| The most important performance/quality related process in
| encoding is having the encoder take each block (piece) of
| previous frame and scan the current frame to see whether it
| still exists and where it moved. The larger area the codec
| scans, the more likely it'll find the area where the piece
| of image moved to. This allows it to write just a motion
| vector instead of actually encoding image data.
|
| This process is hugely memory bandwidth intensive and most
| HW encoders severely limit the area each thread can access
| to keep memory bandwidth costs down and performance up.
| This is also a fundamental limitation for CUDA/gpGPU
| encoders, where you're also facing a huge performance loss
| if there's too much memory accessed by each thread.
|
| Most "realtime" encoders severely limit the macroblock scan
| area because of how expensive it is - which also makes them
| significantly less efficient. I don't see FPGAs really
| solving this issue - I'd bet more on Intel/nVidia encoding
| blocks paired with copious amount of onboard memory. I
| heard Ampere nVidia encoding blocks are good (although they
| can only handle a few streams).
| spuz wrote:
| That is interesting context for this quote from the
| article:
|
| > "each encoder core can encode 2160p in realtime, up to
| 60 FPS (frames per second) using three reference frames."
|
| Apparently reference frames are the frames that a codec
| scans for similarity in the next frame to be encoded. If
| it really is that expensive to reference a single frame
| then it puts into perspective how effective this VPU
| hardware must be to be able to do 3 reference frames of
| 4K at 60 fps.
| daniellarusso wrote:
| I always thought of reference frames as like the sampling
| rate, so in that sense, is it how few reference frames
| can it get away with, without being noticeable?
|
| Would that also depend on the content?
|
| Aren't panning shots more difficult to encode?
| izacus wrote:
| > I always thought of reference frames as like the
| sampling rate, so in that sense, is it how few reference
| frames can it get away with, without being noticeable?
|
| Actually not quite - "reference frames" means how far
| back (or forward!) the encoded frame can reference other
| frames. In plain words, "max reference frames 3" means
| that frame 5 in a stream can say "here goes block 3 of
| frame 2" but isn't allowed to say "here goes block 3 of
| frame 1" because that's out of range.
|
| This has obvious consequences for decoders: they need to
| have enough memory to keep "reference frames" decoded
| uncompressed frames around in a chance that a future
| frame will reference them. It also has consequences for
| encoders: while they don't have to reference frames far
| back, it'll increase efficienty if they can reuse the
| same stored block of image data across as much frames as
| possible. This of course means that they need to scan
| more frames for each processed input frames to try to
| find as much reusable data as possible.
|
| You can easily get away with "1" reference frame (MPEG-2
| has this limit for example), but it'll encode same data
| multiple times, lowering overall efficiency and leaving
| less space to store detail.
|
| > Would that also depend on the content?
|
| It does depend on the content - in my testing it works
| best for animated content because the visuals are static
| for a long time so referencing data from half a second
| ago makes a lot of sense. It doesn't add a lot for
| content where there's a lot of scenecuts and actions like
| a Michael Bay movie combat scene.
| colonwqbang wrote:
| My work is in IP cameras so I'm aware of these tradeoffs.
|
| I guess what I didn't expect is that Google could design
| their own encoder IP to beat the current offerings with a big
| factor at the task of general video coding. I guessed that
| Google actually built an ASIC with customised IPs from some
| other vendor.
|
| But maybe Google did do just that?
| kevincox wrote:
| At least for Google's case YouTube videos are usually
| transcoded in idle datacenters (for example locations where the
| locals are sleeping). This means that the cost of CPU is much
| lower than a naive estimate. These new accelerators can only be
| used for transcoding video, the rest of the time they will sit
| idle (or you will keep them loaded but the regular servers will
| be idle). This means that the economics are necessarily an
| obvious win.
|
| Of course if you do enough transcoding that you are buying
| servers for the job then these start to save money. So I guess
| someone finally decided that the R&D would likely pay off due
| to the current combination of cyclical traffic, adjustable load
| and the cost savings of the accelerator.
| p0nce wrote:
| There is one such chip in your phone and in your GPU.
| brigade wrote:
| Intel has had PCIe cards targeted at this market, reusing their
| own HW encoder, e.g. the VCA2 could do up to 14 real-time 4K
| transcodes at under 240W, and the upcoming Xe cards would
| support VP9 encode. (XG310 is similar albeit more targeted at
| cloud gaming servers)
| dogma1138 wrote:
| These PCIe cards just run a low power Xeon CPU with the iGPU
| doing the majority of the heavy lifting.
|
| It was always an interesting and weird product it even runs
| it's own OS.
| sidcool wrote:
| For YouTube's scale it makes sense, since a small saving or
| efficiency boost would accumulate at their scale.
| kevingadd wrote:
| Not just cost reduction or efficiency, the faster encodes you
| can get through dedicated hardware mean they can potentially
| reduce the delay between a video being uploaded and a video
| being available to the public (right now even if you don't
| spend time waiting in the processing queue, it takes a bit for
| your videos to get encoded)
|
| You can handle larger volumes of incoming video by spinning up
| more encoder machines, but the only solution for lowering
| latency is _faster encodes_ , and with the way the CPU and GPU
| markets are these days a dedicated encoder chip is probably
| your best bet.
| Dylan16807 wrote:
| You can split a video up across cores or even across servers.
| Encoding speed does not need to have a significant impact on
| publishing latency.
| mauricio wrote:
| Impressive. I wonder if Google will sell servers with these cards
| via Google Cloud. Seems like it could be pretty competitive in
| the transcoding space and also help them push AV1 adoption.
| jankeymeulen wrote:
| You can transcode as a service on Google Cloud:
| https://cloud.google.com/transcoder/docs
| vhiremath4 wrote:
| I'm just always blown away that Google transcodes into as many
| formats as they do upfront. I wonder if they do a mix of just in
| time transcoding on top of queue-based.
| sgarland wrote:
| For VP9/x264, almost certainly not. If you jump on a newly-
| uploaded video, you'll see that higher resolution comes later.
| It's common to see 720p nearly immediately, then 1080p, then
| 4K.
|
| For pre-x264, they probably could, but between the relatively
| small sizes required for the low resolution those codecs would
| be supporting, and the cost difference between compute and
| storage, I'd bet everything is encoded beforehand.
| kevincox wrote:
| > Google's warehouse-scale computing system.
|
| That is quite the understatement. Google's computing system is
| dozens of connected "warehouses" around the world.
| spuz wrote:
| > Google probably only provides stats about growth (like "500
| hours of video are uploaded to YouTube every minute") because the
| total number of videos is so large, it's an unknowable amount.
|
| I suppose you could sample random YouTube urls to find out how
| many of them link to public videos. Given the total number of
| possible URLs, it would give you an idea of what percentage of
| them have been used and therefore how many videos YouTube has in
| total. It would not tell you how many private videos or Google
| Drive / Photos videos exist of course.
| warent wrote:
| It doesn't seem like this would work. I think you could sample
| trillions of YouTube IDs with a high likelihood of all of them
| being unused. They're supposed to be unique after all
| espadrine wrote:
| Let's do the math.
|
| IDs are 64-bit integers. The number of tries before an event
| with probability P occurs is a geometric distribution. If V
| is the number of valid IDs (that have a video), the number of
| tries is 2^64/V. Assuming 1 megatries per second, since we
| can parallelize it, we would find the first video in 20
| seconds on average, with a conservative estimate of V = 10^12
| (a hundred billion videos).
|
| To have a sample of ~100 videos, it'd take about half an
| hour.
| chrisseaton wrote:
| What do you mean 'supposed to be unique'? How can an ID not
| be unique?
| zakki wrote:
| Maybe when they reach 62^11 + 1?
| rococode wrote:
| Clicked into a couple random videos, looks like all of their
| video IDs are 11 characters, alphanumeric with cases. So
| 26+26+10 = 62 choices for each char, 62^11 = 5.2e+19 = 52
| quintillion unique IDs (52 million trillions).
|
| So, yeah, sampling would be a mostly futile effort since
| you're looking to estimate about 8 to 10 decimal digits of
| precision. Though it's technically still possible since you'd
| expect about 1 in every 50 million - 5 billion IDs to work
| (assuming somewhere between a trillion and 10 billion
| videos).
|
| My statistics knowledge is rusty, but I guess if you could
| sample, say, 50 billion urls you could actually make a very
| coarse estimate with a reasonable confidence level. That's a
| lot but, ignoring rate limits, well within the range of usual
| web-scale stuff.
| toxik wrote:
| If there are N IDs to draw from and M videos on YouTube,
| then P(ID used) = M/N if the ID is drawn from a uniform
| distribution, and P(At least one of K IDs used) = 1 - (1 -
| M/N)^K (not accounting for replacement).
|
| If M [?] 1e9 and N [?] 1e18, and you sample K = 1000 URLs,
| then it's about one in 1e-09 that you hit a used ID.
| spuz wrote:
| Thanks for doing the maths - it does seem the sampling
| method would not be feasible. Taking the statistic of "500
| hours uploaded per minute" and assuming the average video
| length is 10 minutes, we can say about 1.5bn videos are
| uploaded to YouTube every year or 15bn every 10 years. So
| it seems likely that YouTube has less than 1tn videos in
| total.
| dpatterbee wrote:
| They also use "_" and "-" according to Tom Scott.
|
| https://www.youtube.com/watch?v=gocwRvLhDf8
| _0ffh wrote:
| Which would bring it up to a nice 64 choices, making it
| exactly 6 bits per character.
| slver wrote:
| It's a URL-friendly form of base64.
|
| 11 chars encode 66 bits, but actually 2 bits are likely
| not used and it's simply an int64 encoded to base64.
|
| Given everyone and their grandma is pushing 128-bit UUID
| for distributed entity PK, it's interesting to see
| YouTube keep it short and sweet.
|
| Int64 is my go to PK as well, if I have to, I make it
| hierarchical to distribute it, but I don't do UUID.
| littlestymaar wrote:
| > Given everyone and their grandma is pushing 128-bit
| UUID for distributed entity PK, it's interesting to see
| YouTube keep it short and sweet.
|
| The trade-off you make when using short IDs is that you
| can't generate them at random. With 128-bit Id, you can't
| realistically have collisions, but with 64-bit ones,
| because of the birthday paradox, as soon as you have more
| than 2^32 elements, you're really likely to have
| collisions.
| quantumofalpha wrote:
| Youtube video ids used to be just base64 of a 3DES-
| encrypted mysql's primary key, a sequential 64-bit int -
| collisions are of zero concern there. By birthday paradox
| it's about as good as 128-bit UUID generated without
| using a centralized component like database's row
| counter, when you have to care about collisions.
|
| However theft of the encryption key is a concern, since
| you can't rotate it and it just sat there in the code.
| Nowadays they do something a bit smarter to ensure ex-
| employees can't enumerate all unlisted videos.
| slver wrote:
| You seem to know about their architecture. What do they
| do now?
| quantumofalpha wrote:
| > You seem to know about their architecture. What do they
| do now?
|
| Random 64-bit primary keys in mysql for newer videos.
| These may sometimes collide but then I suppose you could
| have the database reject insert and retry with a
| different id.
| slver wrote:
| So a single cluster produces those keys? I thought it's
| more decentralized.
| quantumofalpha wrote:
| With random database keys I would think they can just be
| generated at random by any frontend server running
| anywhere. Ultimately, a request to insert that key would
| come to the database - which is the centralized
| gatekeeper in this design and can accept or reject it.
| But with replication, sharding, caching even SQL
| databases scale extremely well. Just avoid expensive
| operations like joins.
| slver wrote:
| The reason why we want ids to be purely random is so we
| don't have to do the work of coordinating distributed id
| generation. But if you don't mind coordinating, then none
| of this matters.
|
| Surely if it was a great chore for YouTube to have
| random-looking int64 ids, they would switch to int128.
| But they haven't.
|
| I'm a big fan of the "works 99.99999999% of the time"
| mentality, but if anything happens to your PRNGs, you
| risk countless collisions to slip up by you in production
| before you realize what happened. It's good to design
| your identity system in a way that'd catch that,
| regardless of how unlikely it seems in the abstract.
|
| The concept of hierarchical ids is undervalued. You can
| have a machine give "namespaces" to others, and they can
| generate locally and check for collisions locally in a
| very basic way.
| tatersolid wrote:
| > but if anything happens to your PRNGs, you risk
| countless collisions to slip up by you in production
| before you realize what happened.
|
| UUID generation basically has to use a CSPRNG to avoid
| collisions (or at least a very large-state insecure
| PRNG).
|
| Because of the low volume simply using /dev/urandom on
| each node makes the most sense. If /dev/urandom is broken
| so is your TLS stack and a host of other security-
| critical things; at that point worrying about video ID
| collisions seems silly.
| slver wrote:
| I worry about state corrupting problems, because they
| tend to linger long after you have a fix.
| trinovantes wrote:
| Is the extra 64 bits simply used to lower the risk of
| collision?
| nannal wrote:
| I tried this for some time, I was looking for unlisted
| videos.
|
| Just generate a random valid link and then check if it
| gives a video or not.
|
| I found exactly 0 videos.
| dr-detroit wrote:
| Of course. They are using some modulo arithmetic: 1.
| Start from the rightmost digit (i.e. check digit) 2.
| Multiply every second digit by 2 (i.e. digit at even
| positions) 3. If the result in step 2 is more than one
| digit, add them up (E.g. 12: 1+2 = 3) 4. Add the
| resulting digits to digits at the odd positions
| slver wrote:
| > all of their video IDs are 11 characters, alphanumeric
| with cases
|
| It's an int64, encoded as URL-friendly base64 (i.e.
| alphanumeric with _ and -).
| jasoncartwright wrote:
| Surprised they don't also mention Nest, which I assume also has
| an interesting & significant video encoding operation.
| slver wrote:
| > Google probably only provides stats about growth (like "500
| hours of video are uploaded to YouTube every minute") because
| the total number of videos is so large, it's an unknowable
| amount.
|
| I can totally see it being non-trivial to count your videos,
| which is a funny problem to have. But I doubt it's unknowable.
| More like they don't care/want us to know that.
| nine_k wrote:
| Quite likely they have a good approximate number.
|
| But knowing the _exact_ number can indeed be hard. It would
| take stopping the entire uploading and deletion activity. Of
| course they may have counters of uploads and deletions on
| every node which handles them, but the notion of 'the same
| instant' is tricky in distributed systems, so the exact
| number still remains elusive.
| iainmerrick wrote:
| It's not just tricky and elusive, I think it's literally
| unknowable -- not a well-defined question. Like asking
| about the simultaneity of disconnected events in special
| relativity.
| zodiac wrote:
| You can modify the question to be well-defined and not
| suffer those measurement problems, eg "the total number
| of videos uploaded before midnight UTC on 2021-04-29"
| suprfsat wrote:
| Interesting, I wonder if a distributed database could be
| developed to consistently answer queries phrased in this
| way.
| hderms wrote:
| Seems like it would either need to be append only or have
| some kind of snapshot isolation
| mmcconnell1618 wrote:
| Google Developed Spanner which is a globally distributed
| database that uses atomic clocks to keep things
| synchronized: https://static.googleusercontent.com/media/
| research.google.c...
| zodiac wrote:
| I think the Chandy-Lamport snapshot algorithm tries to do
| something like this for all distributed systems (in their
| model, and it tries to get any consistent snapshot, not
| allowing you to specify the "time"); not sure if it's
| actually useful IRL though
| nine_k wrote:
| All these nodes are in the same light cone, so we
| theoretically can stop all mutation and wait for the
| exact final state to converge to a precise number of
| uploaded videos.
|
| But the question of the precise number of videos before
| that moment is indeed ill-defined.
| iainmerrick wrote:
| You can theoretically stop all mutation, but the users
| might start to complain!
| saalweachter wrote:
| I read "an unknowable amount" as "a meaninglessly large
| number to our monkey brains".
|
| It's like knowing the distance to the Sun is 93 million
| miles. The difficulty there isn't that measuring the distance
| from the Sun to the Earth exactly is hard, although it is, or
| that the distance is constantly changing, although it is, or
| that the question is ill-defined, because the Earth is an
| object 8000 miles across and the Sun is 100 times bigger, and
| what points are you measuring between?
|
| The distance is "unknowable" because while we know what "93
| million miles" means, it's much harder to say we know what it
| "means". Even when we try to rephrase it to smaller numbers
| like "it's the distance you could walk in 90 human lifetimes"
| is still hard to really _feel_ beyond "it's really really
| far."
|
| Likewise, does it matter if YouTube has 100, 1000, or 10,000
| millennia of video content? Does that number have any real
| meaning beyond back-of-the-envelope calculations of how much
| storage they're using? Or is "500 years per minute" the most
| comprehensible number they can give?
| kyrra wrote:
| Googler, opinions are my own.
|
| Youtube isn't the only platform where Google does video
| transcoding. I don't know them all, but here are a few other
| places where video plays a part:
|
| Meet - I'm guessing for participates that on are different
| devices (desktop, android, ios) and depending on their
| bandwidth will get different video feed quality. Though, the
| real-time nature of this may not work as well? Though, Meet has
| a live stream[0] feature for when your meeting is over 250
| people, which gives you a youtube-like player, so this likely
| is transcoded.
|
| Duo - more video chat.
|
| Photos - when you watch a photos video stored at google (or
| share it with someone), it will likely be transcoded.
|
| Video Ads. I'd guess these are all pre-processed for every
| platform type for effecient delivery. While these are mainly on
| youtube, they show up on other platforms as well.
|
| Play Movies.
|
| Nest cameras. This is a 24-stream of data to the cloud that
| some people pay to have X days of video saved.
|
| [0]
| https://support.google.com/meet/answer/9308630?co=GENIE.Plat...
| jankeymeulen wrote:
| Also Googler, you can add Stadia here as well. Needs fast and
| low-latency transcoding.
| kyrra wrote:
| You're right. I thought stadia did something special, but I
| guess not. So yes, Stadia.
|
| Another: Youtube TV
|
| EDIT: Stadia: I did some searching around the interwebs and
| found a year-old interview[0] that hints at something
| different.
|
| > It's basically you have the GPU. We've worked with AMD to
| build custom GPUs for Stadia. Our hardware--our specialized
| hardware--goes after the GPU. Think of it as it's a
| specialized ASIC.
|
| [0] https://www.techrepublic.com/article/how-google-stadia-
| encod...
| NavinF wrote:
| Yeah that should work. If I do 1 request per second with 64 IP
| addresses, I'd expect to find ~110 videos after 1 year of
| random sampling if there are 1T videos on YouTube.
|
| 110=(64*3.154e+7)*(1,000,000,000,000/2^64)
|
| (The other thread that assumes a 62-character set is wrong
| because they forgot about '-' and '_'. I'm fairly certain a
| video ID is a urlsafe base64 encoded 64-bit int. 64^11==2^66)
| samwestdev wrote:
| Why don't they encode on the uploader machine?
| absove wrote:
| Sounds like something for the next version of recaptcha
| devit wrote:
| Because society results in companies being incentivized to
| babysit users rather than cutting off those who are unable to
| learn simple technical skills like optimally encoding a video
| respecting a maximum bitrate requirement.
| giantrobot wrote:
| > simple technical skills like optimally encoding a video
| respecting a maximum bitrate requirement.
|
| This is in no way a "simple skill" as maximum video bitrate
| is only one of a number of factors for encoding video. For
| streaming to end users there's questions of codecs, codec
| profiles, entropy coding options, GOP sizes, frame rates, and
| frame sizes. This also applies for your audio but replacing
| frame rates and sizes with sample rate and number of
| channels.
|
| Streaming to ten random devices will require different
| combinations of any or all of those settings. There's no one
| single optimum setting. YouTube encodes dozens of
| combinations of audio and video streams from a single source
| file.
|
| Video it turns out is pretty complicated.
| acdha wrote:
| What about cutting off those who condescend others without
| recognizing the limits of their own understanding?
|
| I'm not an expert in this but I know that "optimally encoding
| a video" is an actual job. That's because there's no global
| definition of optimal (it varies depending on the source
| material and target devices, not to mention the costs of your
| compute, bandwidth, and time); you're doing it multiple times
| using different codecs, resolutions, bandwidth targets, etc.;
| and those change regularly so you need to periodically
| reprocess without asking people to come back years later to
| upload the iPhone 13 optimized version.
|
| This brings us to a second important concept: YouTube is a
| business which pays for bandwidth. Their definition of
| optimal is not the same as yours (every pixel of my
| masterpiece must be shown exactly as I see it!) and they have
| a keen interest in managing that over time even if you don't
| care very much because an old video isn't bringing you much
| (or any) revenue. They have the resources to heavily optimize
| that process but very few of their content creators do.
| pta2002 wrote:
| Can't trust user input, you'd have to spend quite a bit of
| energy just checking to see if it's good. You also want to
| transcode multiple resolutions, it'd end up being quite slow if
| it's done using JS.
| londons_explore wrote:
| Checking the result is good shouldn't be too hard - a simple
| spot check of a few frames should be sufficient, and it isn't
| like the uploader gets a massive advantage for uploading
| corrupt files.
|
| The CPU and bandwidth costs of transcoding to 40+ different
| audio and video formats would be massive though. I could
| imagine a 5 minute video taking more than 24 hours to
| transcode on a phone.
| simcop2387 wrote:
| > Checking the result is good shouldn't be too hard - a
| simple spot check of a few frames should be sufficient, and
| it isn't like the uploader gets a massive advantage for
| uploading corrupt files.
|
| Uploading corrupt files could allow the uploader to execute
| code on future client machines. You _must_ check every
| frame and the full encoding of the video.
| kevincox wrote:
| Must is a strong word. In theory browsers and other
| clients treat all video stream as untrusted and it is
| safe to watch an arbitrary video. However complex formats
| like videos are a huge attack surface.
|
| So yes, for the bigger names like Google this is an
| unacceptable risk. They will generally avoid serving any
| user-generated complex format like video, images or audio
| to users directly. Everything is transcoded to reduce the
| likelihood that an exploit was included.
| amelius wrote:
| Verification is simpler than encoding, I suppose.
| mschuster91 wrote:
| Because of the massive bandwidth and data requirements.
| Assuming I as the source have a 20 MBit/s content that is 30
| min long - that's about 3.6 GB of data.
|
| Given your average DSL uplink of 5 MBit/s, that's 2 hours
| uploading for the master version... and if I had to upload a
| dozen smaller versions myself, that could easily add five times
| the data and upload time.
| greenknight wrote:
| Imagine someone using a 10 year old computer to upload a 1 hour
| video. not only do they need to transcode to multiple different
| resolutions, but also codecs. This would not practical from a
| business / client relationship. They want their client (the
| uploader) to spend as little time as possible and get their
| videos as quickly as possible.
| ThatPlayer wrote:
| I wouldn't even say 10 year old computer. Think phones or
| tablets. As well as the battery drain. Or imagine trying to
| upload something over 4G.
| bufferoverflow wrote:
| > _Or imagine trying to upload something over 4G._
|
| 4G is perfectly fine for uploading videos. It can hit up to
| 50 Mbps. LTE-Advanced can do 150 Mbps.
| greenknight wrote:
| Though that being said, it would be great to be like hey
| google, ill do the conversions for you! but then they would
| have to trust that the bitrate isnt too high / not going to
| crash their servers etc.etc.etc.
| lbotos wrote:
| Because I make one output file and they optimize for like 7
| different resolutions. If they make it longer for me to upload
| I'd wager that would lower the video upload rate.
| arghwhat wrote:
| YouTube needs to re-encode occasionally (new
| codecs/settings/platforms), it would be easy to abuse and send
| too high bitrate or otherwise wrong content, and a lot of end-
| user devices simply isn't powerful enough to complete the task
| in a reasonable amount of time.
| chrisseaton wrote:
| > Why don't they encode on the uploader machine?
|
| Are you basically asking why they don't take a performance-
| sensitive, specialised, and parallel task and run it on a low-
| performance, unspecialised, and sequential system?
|
| Would take hours and be super inefficient.
| 8K832d7tNmiQ wrote:
| The real news here is that They still use GPU to transcode their
| videos whilst other service such as search engine already use TPU
| for almost a decade now.
|
| I thought they've already use custom chip for transcoding for
| decades.
| kungito wrote:
| i think they do more general purpose things like downsampling,
| copyright detection etc which doesn't have globally available
| custom asics. i think gpus don't do encoding/decoding
| themselves, they have separate asics built in which do the
| standardised encodings
| boomlinde wrote:
| Are TPUs particularly useful for this kind of workload,
| compared to specialized encoders/decoders available on GPUs?
| rland wrote:
| GPUs have specialized hardware for video transcoding, no? So
| this actually makes sense. The product was already made
| (although, perhaps not up to Youtube's standard) by GPU
| manufacturers.
| numlock86 wrote:
| The specialized hardware in GPUs is targeted at encoding
| content on the fly. While you could use this to encode a
| video for later playback it has a couple of drawbacks when it
| comes to size and quality, namely h264, keyframes, static
| frame allocations, no multipass encoding, etc. ... This is
| why video production software that supports GPU encoding
| usually marks this option as "create a preview, fast!". It's
| fast but that's it. If you want a good quality/size ratio you
| would use something like VP9 for example. Because of missing
| specialized hardware and internals of the codec itself
| currently this is very slow. Add multipass encoding,
| something like 4k at 60 frames, adaptive codec bitrates and
| suddenly encoding a second takes a over two minutes ... the
| result is the need for specialized hardware.
| baybal2 wrote:
| They were actually transcoding on CPUs before, not GPUs
| madeofpalk wrote:
| Yeah I was surprised that it's taken them this long to build
| custom hardware for encoding videos.
| jng wrote:
| Is there any solid information about Google using TPU for the
| search engine, or is this an assumption you're making?
| alarmingfox wrote:
| This[0] Google blog from 2017 states they were using TPU for
| RankBrain which is what powers the search engine
|
| [0] - https://cloud.google.com/blog/products/ai-machine-
| learning/g...
| JohnJamesRambo wrote:
| All that power to show me recipes that hemhaw around and
| spend ten paragraphs to use all the right SEO words.
|
| I feel like the Google results were better 20 years ago,
| what did they use back then before TPUs?
| mda wrote:
| I think search results 20 years ago were laughably worse
| than today.
| KeplerBoy wrote:
| The web just got worse in a lot of ways, because
| everything needs to generate money.
| londons_explore wrote:
| They had to get special permission from the US government
| to export TPU's abroad to use in their datacenters. The
| TPU's fell under ITAR regulations (like many machine
| learning chips). The US government granted permission, but
| put some restriction like 'they must always be supervised
| by an american citizen', which I imagine leads to some very
| well paid foreign security guard positions for someone with
| the correct passport...
|
| Read all that on some random government document portal,
| but can't seem to find it now...
| anonymoushn wrote:
| Right, at a competing video site we had vendors trying to sell
| us specialized encoding hardware most of a decade ago.
| pengaru wrote:
| How long before the ads are realtime encoded into the video
| streams such that even youtube-dl can't bypass them without a
| premium login?
|
| I've been surprised this wasn't already the case, but assumed it
| was just an encoding overhead issue vs. just serving pre-encoded
| videos for both the content and ads with necessarily well-defined
| stream boundaries separating them.
| IshKebab wrote:
| That sounds like an enormous pain in the arse just to piss off
| a vocal minority of users.
| whywhywhywhy wrote:
| A vocal minority who are not bringing in any revenue for the
| site.
|
| Saying that though the day they finally succeed in making ads
| unskippable will be the time for a competitor to move in.
| JohnWhigham wrote:
| Yeah, if YT wanted to insert unskippable ads on the
| backend, they would have years ago. The tech is not the
| hard part. They know it'd be a PR disaster for them.
| corobo wrote:
| When it's possible to skip baked in ads (SponsorBlock[1])
| -- the whack-a-mole will continue no matter what. Even if
| it means you can't watch videos in realtime but have to
| wait for them to fully download to rip the ad out, someone
| will figure it out.
|
| At that time everyone starts talking about it and I gotta
| imagine a bunch of new people become adblocking users.
|
| [1] https://news.ycombinator.com/item?id=26886275
| KingMachiavelli wrote:
| SponsorBlock only works because the sponsored segments
| are at the same location for every viewer. If Youtube
| spliced in their own ads they could easily do it at
| variable intervals preventing any crowd sourced database
| of ad segment timestamps. To be honest, nothing really
| stops Youtube from just turning on Widevine encryption
| for all videos (not just purchased/rented TV & movies)
| besides breaking compatibility with old devices. Sure
| widevine can be circumvented but most of the best/working
| cracks are not public.
| martin-adams wrote:
| I suspect doing personalised ads obliterates any caching method
| on cheaper hardware than transcoding servers. Interesting
| problem to solve though.
| jwandborg wrote:
| The ads are not part of the encoded video AFAICT, they are
| probably served as a separate stream which the client
| requests alongside the regular video stream, this means that
| videos and ads can be cached using traditional techniques.
| amelius wrote:
| > Interesting problem to solve though.
|
| Ah, smart people and ads ...
| garblegarble wrote:
| You wouldn't even need to do real-time encoding for that, you
| can simply mux them in at any GOP boundary (other services
| already do real-time ad insertion in MPEG-DASH manifests)
|
| Example: https://www.youtube.com/watch?v=LFHEko3vC98
| elithrar wrote:
| Right, using DAI means you don't have to actually touch the
| original video (good!) but doesn't stop a smart enough client
| (youtube-dl) from pattern matching and ignoring those
| segments when stitching the final video together.
|
| I am not, however, suggesting that encoding ads into the
| final stream is appropriate or scalable, though!
| kevincox wrote:
| The client doesn't even have to know that there is an ad
| playing if they really want to thwart ad blockers. If you
| are talking about pattern-matching the actual video stream
| ad-blockers could do that today and just seek forwards but
| none do yet.
| callamdelaney wrote:
| Then you could just skip the ad in the video, unless the player
| has some meta-data around when the ad is; in which case
| youtube-dl can chop it out.
| pengaru wrote:
| Not if you tightly control the streaming rate to not get far
| ahead of a realtime playback, just mete out the video stream
| at a rate appropriate for watching, not as fast as the pipe
| can suck it down.
| londons_explore wrote:
| I'm kinda surprised Google doesn't do this... They would
| need to keep track of user seeks and stuff, but it still
| seems do-able. One simple model is for the server to know
| when ad-breaks should happen, and prevent any more
| downloading for the duration of the ad.
|
| Sure, it would break people who want to watch at 2x
| realtime, but they seem small-fry compared to those with
| adblockers.
| giantrobot wrote:
| The issue there is scale, MPEG-DASH/HLS let the edge
| servers for video to be simple. The servers don't need to
| do much more than serve up bytes via HTTP. This ends up
| being better for clients, especially mobile clients,
| since they can choose streams based on their local
| conditions the server couldn't know about like
| downgrading from LTE to UMTS.
|
| Google would end up having to maintain a lot of extra
| client state on their edge servers if they wanted to do
| that all in-band. Right now it's done out of band with
| their JavaScript player. Chasing down youtube-dl users
| isn't likely worth that extra cost.
| londons_explore wrote:
| The edge server could implement this without much extra
| complexity.
|
| For example each chunk URL could be signed with a
| "donotdeliverbefore" timestamp.
|
| Now the edge server has zero state.
|
| Similar things are done to prevent signed in URL's being
| shared with other users.
| giantrobot wrote:
| There's no shared wall clock between the server and
| client with HTTP-based streaming. There's also no
| guarantee the client's stream will play continuously or
| even hit the same edge server for two individual
| segments. That's state an edge server needs to maintain
| and even share between nodes. It would be different for
| every client and every stream served from that node.
|
| For streaming you actually _want_ the client to have a
| buffer past the play head. If the client can buffer the
| whole stream it makes sense to let them in many cases.
| The client buffers the whole stream and then leaves your
| infrastructure alone even if they skip around or pause
| the content for a long time. The only limits that really
| make sense are individual connection bandwidth limits and
| overall connection limits.
|
| The whole point of HTTP-based streaming is to minimize
| the amount of work required on the server and push more
| capability to the client. It's meant to allow servers to
| be dumb and stateless. The more state you add, even if
| it's negligible _per client_ , ends up being a lot of
| state in aggregate. If a system meant edge servers could
| handle 1% less traffic that means server costs increase
| by 1%. Unless those ones of ad impressions skipped by
| youtube-dl users come anywhere close to 1% of ad revenue
| it's pointless for Google to bother.
| londons_explore wrote:
| > skipped by youtube-dl users
|
| It's also ublock and adblock plus users. Estimated at
| about 25% of youtube viewership.
|
| Also, the shared clock only needs to be between edge
| servers and application servers. And only to an accuracy
| of a couple of seconds. I bet they have that in place
| already.
| Traubenfuchs wrote:
| The dangerous case of custom hardware making a software business
| significantly more efficient: This makes disruption and
| competition even harder.
| CyberRabbi wrote:
| Usually competition for a general platform like YouTube comes
| in the form of unbundling and in that case these last mile
| optimizations will matter little.
| narrator wrote:
| The main competitors to YouTube are the sites that have non-
| illegal content that YouTube won't host. e.g: Porn and
| controversial political stuff.
| CyberRabbi wrote:
| That might be true but I think sites like odyssey are more
| popular than controversial political video sites.
| wishysgb wrote:
| as long as you can buy the parts or have the HDL to deploy it
| on an FPGA you should be fine
| blihp wrote:
| It's the seemingly infinite bandwidth that Google throws at
| YouTube that make competition hard. Then there's the inability
| to monetize. Transcoding is probably about 20th on the list of
| issues.
| azurezyq wrote:
| It's inevitable, and this applies to other kinds of
| optimizations as well. This place is too mature, disruption
| might be easier elsewhere.
| c7DJTLrn wrote:
| What is there to compete for? Video hosting is a money-losing
| business unless you have exclusives, like Floatplane.
| endless1234 wrote:
| What is floatplane, never heard of it? Seemingly an yt
| competitor by a somewhat popular youtuber. App on Android has
| "10k+" installs. Isn't it _way_ too early to say it wouldn't
| be a money losing business?
| bobsmooth wrote:
| Linus' goal for Floatplane is "If it doesn't fly, it'll at
| least float." There's only 20 creators on it and it's
| intended to compliment YouTube, not replace it.
| oblio wrote:
| My guess is that the commenter is either a Floatplane
| insider or possibly just optimistic :-)
| throwaway3699 wrote:
| Think of Floatplane as more of a Patreon competitor with a
| video component, than a YouTube competitor.
| human_error wrote:
| What's floatplane? Hadn't heard of it. The website doesn't
| say much.
| pcmill wrote:
| Floatplane is a video service built by the people behind
| the popular Youtube channel LinusTechTips. It is not a
| direct competitor to Youtube though. The platform makes it
| easier to let paying fans get videos earlier but it is not
| meant to build an audience.
| fancyfredbot wrote:
| This will keep costs down but I am not sure cost of transcoding
| is the major barrier to entry? I think the network effect
| (everyone is on YouTube) had already made disruption pretty
| difficult!
| NicoJuicy wrote:
| Depends how you look at it. There could be someone making
| these chips and then a competitor with lower startup costs
| than before.
| londons_explore wrote:
| Things like youtube run on super-thin margins. Bandwidth and
| storage costs are massive, compute costs quite big, and ad
| revenue really quite low.
|
| A competitor would need either a different model to keep
| costs low (limit video length/quality, the vimeo model of
| forcing creators to pay, or go for the netflix-like model of
| having a very limited library), or very deep pockets to run
| at a loss until they reach youtube-scale.
|
| I'm still mystified how tiktok apparently manage to turn a
| profit. I have a feeling they are using the 'deep pockets'
| approach, although the short video format might also bring in
| more ad revenue per hour of video stored/transcoded/served.
| Traster wrote:
| To be honest I suspect it isn't actually a differentiator. It's
| good for Google that they can produce this chip and trim their
| hardware costs by some percentage, but it's not going to give
| them a competitive advantage in the market of video sharing.
| Especially in a business like youtube with network effects,
| getting the audience is the difficult bit, the technical
| solutions are interesting but you're not going to beat google
| by having 5% cheaper encoding costs.
| ant6n wrote:
| Perhaps. But the big issues for YouTube right now isnt
| efficiency per se, but copyright, monetization, ai-tagging,
| social clout. If a YouTube competitor can get the content
| creators and offer them viewers, competition could perhaps
| work. This fight is probably not fought at the margins of
| hardware optimization.
| mrtksn wrote:
| It's like crystallisation of the software. When you decide that
| this is the best version of an algorithm, you make a hardware
| that is extremely efficient in running that algorithm.
|
| It probably means that, unless you have a groundbreaking
| algorithm on something that is available as hardware, you
| simply do software on something that is not "perfected".
|
| It trims marginal improvements.
| imwillofficial wrote:
| This is intense. ASICS making a comeback again. It's weird how
| the computer market is so cyclical with regard to trends.
| justinzollars wrote:
| Youtube blog post on this topic: https://blog.youtube/inside-
| youtube/new-era-video-infrastruc...
| bradfa wrote:
| The paper linked in the ARS article
| (https://dl.acm.org/doi/abs/10.1145/3445814.3446723) seems to be
| how they developed it. I find it interesting that they went from
| C++ to hardware in order to optimize the development and
| verification time.
|
| In my past experience working with FPGA designers, I was always
| told that any C-to-H(ardware) tooling was always quicker to
| develop but often had significant performance implications for
| the resulting design in that it would consume many more gates and
| run significantly slower. But, if you have a huge project to
| undertake and your video codec is only likely to be useful for a
| few years, you need to get an improvement (any improvement!) as
| quick as possible and so the tradeoff was likely worth it for
| Google.
|
| Or possibly the C-to-H tooling has gotten significantly better
| recently? Anyone aware of what the state of the art is now with
| this to shed some light on it?
| pclmulqdq wrote:
| It has not, and the type of design they show in the paper has a
| lot of room to improve (FIFOs everywhere, inefficient blocks,
| etc). However, video transcoding is suited to that approach
| since the operations you do are so wide that you can't avoid a
| speedup compared to software.
| rurban wrote:
| "now" being 2015. They are talking about the new 2nd generation
| chip here, which is a bit faster.
| pizza234 wrote:
| A couple of interesting bits:
|
| - without dedicated processors, VP9's encoding is roughly 4.5x as
| slow as H.264, while with the VPUs, the two formats perform the
| same; this is a big win for the open format(s)
|
| - sad extract: "Google has aggressively fought to keep the site's
| cost down, often reinventing Internet infrastructure and
| _copyright_ in order to make it happen " (emphasis mine)
| sillysaurusx wrote:
| Why is that sad? No company in Google's position could've done
| better, probably. Youtube was about to be sued into oblivion
| till Google purchased it.
| serf wrote:
| >Why is that sad?
|
| because the wholesale destruction and minimization of
| knowledge, education, and information to appease (often
| arbitrary) intellectual protectionism laws is sad, regardless
| of who perpetrates it.
|
| non-Google example : What.cd was a site centered around music
| piracy, but that potentially illegal market created a huge
| amount of original labels and music that still exists now in
| the legal sphere.
|
| No one would defend the legal right for what.cd to continue
| operating, it was obviously illegal; but the unique, novel,
| and creative works that came from the existence of this
| illegal enterprise would be sad to destroy.
|
| Swinging back to the Google example : YouTube systematically
| destroys creations that they feel (often wrongly) infringe
| upon IP. This is often not even the case, Google routinely
| makes wrong decisions erring on the side of the legal team.
|
| This destruction of creative work is _sad_ , in my opinion
| it's more sad than the un-permitted use of work.
|
| Of course, Google as a corporation _should_ act that way, but
| it 's _sad_ in certain human aspects.
| nolok wrote:
| It's not just google as a corporation, it's google as a
| legal entity.
|
| Have your own site in your own individual name with no
| corporate entity nor search for profit offering to host
| people's videos for free, and I guarantee you that within
| 24h you are dealing with things ranging from pedophilia to
| copyright violations and the like. And if you don't clear
| them out, you're the one responsible.
|
| Google is acting the way society has decided they should
| act through the laws it voted. Could they act another, more
| expensive, way in order to save a bit more of content that
| get caught by mistake ? Definitely, but why would they as a
| company when the laws says any mistake or delay is their
| fault.
|
| Source: like many people, I once made a free image hosting
| thingy. It was overrun by pedos within a week to my
| absolute horror and shock. Copyright infringement is
| obviously not the same at all, BUT the way the law act
| toward the host is not that different "ensure there is none
| and be proactive in cleaning, or else ...".
| sfgweilr4f wrote:
| Your free image hosting thingy is an example of low
| barrier to entry both in cost and anonymity. If you had
| made the cost trivial but traceable I wonder what the
| outcome would have been. I wonder if a site like
| lobste.rs but for video would work better. A graph of who
| is posting what and a graph of how they got onto the site
| in the first place.
|
| If you vouch for someone who is dodgy now you are also
| seen as a little dodgier than you were before. This
| doesn't necessarily mean you lose your account because
| you happened to vouch for someone, but it might mean that
| your vouching means less in future.
| maxerickson wrote:
| It's not destroyed, it just isn't published. Or is the idea
| that they should be the canonical archive of all uploads?
| dev_tty01 wrote:
| They aren't destroying anything. They are just not allowing
| the material on their site. Are you saying that anyone who
| creates a video hosting site must allow ANY content on
| their site? I don't see any practical basis for that
| contention.
| [deleted]
| asdfasgasdgasdg wrote:
| I don't see any justification in the linked article for the
| claim that YouTube has in any way reinvented copyright. It
| seems like a throw-away line that is unsupported by any facts.
| ximeng wrote:
| https://www.reddit.com/r/videos/comments/n29fxn/piano_teache.
| ..
|
| https://www.reddit.com/r/videos/comments/n4a4l0/huge_history.
| ..
|
| Even if not supported in the article here are two examples in
| the last couple of days of how YouTube is de facto defining
| copyright regulation.
| asdfasgasdgasdg wrote:
| These are examples of YouTube following copyright laws
| imperfectly, which is basically guaranteed to happen on a
| regular basis at their scale. Definitely not what I would
| consider YouTube redefining copyright.
| grecy wrote:
| > _These are examples of YouTube following copyright laws
| imperfectly, which is basically guaranteed to happen on a
| regular basis at their scale_
|
| Given their entire copyright takedown system is
| (in)famously entirely automated, I would have thought it
| would be trivial for it to _always_ follow copyright laws
| to the letter.. if they wanted it to.
| asdfasgasdgasdg wrote:
| If channel A uploads a video copied from channel B, then
| makes a copyright claim against channel B, how does an
| automated system determine which owns the rights?
| Certainly it would seem in most cases that we should
| presume channel B has the copyright, since they uploaded
| first. But there is a very economically important class
| of videos where infringers will tend to be the first to
| upload (movies, TV shows, etc.). I don't really see how
| an automated system solves this problem without making
| any mistakes. Especially because the law (DMCA) puts the
| onus on the service provider to take down or face
| liability.
| toast0 wrote:
| It would be trivial to follow copyright laws to the
| letter if authorship and user identity were trivial and
| fair use exceptions were trivial to determine.
|
| None of those things are trivial, and that's before
| rights assignment.
|
| YouTube's system is built primarily to placate
| rightsholders and avoid human labor paid for by Google.
| jsight wrote:
| How would that work? Infringement and even ownership are
| sometimes subjective or disputed. Automating it doesn't
| make those issues any easier.
| ksec wrote:
| >- without dedicated processors, VP9's encoding is roughly 4.5x
| as slow as H.264, while with the VPUs, the two formats perform
| the same; this is a big win for the open format(s)
|
| H.264 is an open format. Just not Royalty Free. The baseline of
| H264 will soon be "free" once those patents expires in 2023. (
| Or basically MPEG-5 EVC )
|
| The hardware encoding for VP9 being the same as H.264 is mostly
| due to hardware specifically optimise for VP9 and not H.264.
| The complexity difference is still there.
| threeseed wrote:
| And VP9 is patent encumbered but they have a license from
| MPEG-LA.
|
| So it's definitely not any more open than H.264.
| gruez wrote:
| Source for this? https://en.wikipedia.org/wiki/VP9 says
| some companies claimed patents on it, but google basically
| ignored them.
| vbezhenar wrote:
| While Google might ignore them, can small company ignore
| them? I don't think that Google will fight for some guy
| using VP9 and getting sued.
| dmitriid wrote:
| > without dedicated processors, VP9's encoding is roughly 4.5x
| as slow as H.264
|
| > this is a big win for the open format(s)
|
| How is this a big win if you need dedicated processors for it
| to be as fast?
| selfhoster11 wrote:
| It increases adoption of the open standard on the supply
| side.
| dmitriid wrote:
| wat?
|
| I honestly can't parse this sentence.
|
| "Google creates dedicated custom proprietary processors
| which can process VP9 at roughly the same speed as a
| 20-year-old codec". How is this a win for opensource
| codecs?
| selfhoster11 wrote:
| You won't get adoption until the word gets around that
| Big Company X is using Format Y, and they supply content
| prominently in Format Y. That's when Chinese SoC
| manufacturers start taking things seriously, add hardware
| decode blocks to their designs, and adoption just spirals
| out from there.
| IshKebab wrote:
| Because VP9 achieves better compression than H.264.
| virtue3 wrote:
| because the largest video site in the world will be
| encoding as VP9.
| dmitriid wrote:
| It's a codec developed by Google, for Google, and Google
| will happily abandon it. From the article:
|
| > After pushing out and upgrading to VP8 and VP9, Google
| is moving on to its next codec, called "AV1," which it
| hopes will someday see a wide rollout.
|
| I still can't see how this is a win.
| saynay wrote:
| VP9 is meant to be a parallel to h264, and AV1 to h265?
|
| VP9 running on custom circuits being equivalent speed to
| h264 running on custom circuits seems like a win for VP9?
| Since VP9 isn't royalty encumbered the way h264 is, that
| could well be a win for the rest of us too.
| dmitriid wrote:
| > Since VP9 isn't royalty encumbered the way h264 is,
| that could well be a win for the rest of us too.
|
| I can only repeat myself: "Google creates dedicated
| custom proprietary processors which can process VP9 at
| roughly the same speed as a 20-year-old codec". How is
| this a win for anyone but Google (who is already eyeing
| to replace VP9 with AV1)?
|
| "The rest of us" are very unlikely to run Google's custom
| chips. "The rest of us" are much more likely to run this
| in software, for which, to quote the comment I was
| originally replying to, "without dedicated processors,
| VP9's encoding is roughly 4.5x as slow as H.264".
|
| Note: I'm not questioning the codec itself. I'm
| questioning the reasoning declaring this "a big win for
| the open format(s)".
| virtue3 wrote:
| Aren't Vp8/9/av1 all open codecs tho? I don't really see
| what the issue is.
|
| Vp8 seems to be the most supported on all platforms
| without melting your intel CPU. At least from when I was
| deploying a webRTC solution to a larger customer base
| last year.
|
| > In May 2010, after the purchase of On2 Technologies,
| Google provided an irrevocable patent promise on its
| patents for implementing the VP8 format
|
| This is significantly better than even h264 in terms of
| patents/royalties.
|
| Would you mind elaborating on your hate? There's nothing
| for google to abandon here? It's already out in the wild.
| dmitriid wrote:
| > Would you mind elaborating on your hate?
|
| _How_ did you even come up with this question?
|
| Please go an re-read my original question:
| https://news.ycombinator.com/item?id=27035059 and the
| follow-up from me:
| https://news.ycombinator.com/item?id=27035112 and from
| another person:
| https://news.ycombinator.com/item?id=27036150
|
| But yeah, sure, I _hate hate hate_ VP9 smh
| threeseed wrote:
| They have always been encoding in VP9 though. But it
| doesn't mean they will be serving it.
|
| For example OSX doesn't support it at all and iOS only
| supports VP9 in 4K/HDR mode and only for certain
| versions.
| sdenton4 wrote:
| There are a pile of transcodes of every video, served
| appropriately for the device. Serving VP9 to non-OSX
| devices is still a big win, at scale.
| acdha wrote:
| It's a relatively modest win versus H.265 unless you're
| willing to substantially sacrifice quality and that has
| to be balanced against the extra CPU time and storage.
|
| YouTube uses so much bandwidth that this is still
| measured in millions of dollars but it's really worth
| remembering that "at scale" to them is beyond what almost
| anyone else hits.
| xodice wrote:
| I'm watching a YouTube video in Safari right now being
| served via VP9 (Verified under "Stats for nerds").
|
| Heck, even the M1 supports VP9 in hardware up to 8K60.
|
| I'm not sure where you got that macOS has no VP9 support,
| it works quite well.
| acdha wrote:
| What impact does this really have, though? Are they making
| better VP9 tools available to other people? Browsers
| already have highly-tuned playback engines and YouTube
| actively combats efforts to make downloaders or other
| things which use their videos, is there a path I'm missing
| where this has much of an impact on the rest of the
| Internet?
| CyberRabbi wrote:
| Wow 33x throughput improvement for vp9 for the same hardware
| cost. That seems excessive but their benchmark is using ffmpeg.
| Is ffmpeg known to have the theoretically highest throughput
| possible state of the art vp9 encoder algorithms? Or is there any
| way of knowing if their hardware IP block is structured
| equivalently to the ffmpeg software algorithm? I know that custom
| hardware will always beat general hardware but 33x is a very
| large improvement. Contemporary core counts coupled with very
| wide simd makes CPUs functionally similar to ASIC/fpga in many
| cases.
| brigade wrote:
| The only OSS VP9 encoder is Google's own libvpx, which is what
| ffmpeg uses.
| rndgermandude wrote:
| By now Intel has released an open source encoder (BSD-2 +
| patent grants), tuned for their Xeons:
|
| https://github.com/OpenVisualCloud/SVT-VP9
| bumbada wrote:
| That doesn't look so excessive to me. We get hundred or
| thousand times more efficiency and performance regularly using
| custom electronics for things like 3d or audio recognition.
|
| But programming fixed electronics in parallel is also way
| harder than flexible CPUs.
|
| "Contemporary core counts coupled with very wide simd makes
| CPUs functionally similar to ASIC/fpga in many cases."
|
| I don't think so. For things that have a way to be solved in
| parallel, you can get at least a 100x advantage easily.
|
| There are lots of problems that you could solve in the
| CPU(serially) that you just can't solve in parallel(because
| they have inter dependencies).
|
| Today CPUs delegate the video load to video coprocessors of one
| type or another.
| bumbada wrote:
| BTW: Multiple CPUs cores are not parallel programming in the
| sense fpgas or ASICS (or even GPUs) are.
|
| Multiple cores work like multiple machines, but parallel
| units work choreographically in sync at lower speeds(with
| quadratic energy consumption). They could share everything
| and have only the needed electronics that do the job.
| CyberRabbi wrote:
| Well transistors are cheap and synchronization is not a
| bottleneck for embarrassingly parallel video encoding jobs
| like these. Contemporary CPUs already downclock when they
| can to save power and conserve heat.
| CyberRabbi wrote:
| >> Contemporary core counts coupled with very wide simd makes
| CPUs functionally similar to ASIC/fpga in many cases.
|
| > I don't think so. For things that have a way to be solved
| in parallel, you can get at least a 100x advantage easily.
|
| That's kind of my point. CPUs are incredibly parallel now in
| their interface. Let's say you have 32 cores and use 256 bit
| simd for 4 64-bit ops. That would give you ~128x improvement
| compared to doing all those ops serially. It's just a matter
| of writing your program to exploit the available parallelism.
|
| There's also implicit ILP going on as well but I think
| explicitly using simd usually keeps execution ports filled.
| WJW wrote:
| TBH 32 or even 64 cores does not sound all that impressive
| compared to the thousands of cores available on modern GPUs
| and presumably even more that could be squeezed into a
| dedicated ASIC.
|
| In any case, wouldn't you run out of memory bandwidth long
| before you can fill all those cores? It doesn't really
| matter how many cores you have in that case.
| CyberRabbi wrote:
| Those thousands of cores are all much more simple and do
| not have simd and have a huge penalty for branching.
| gigel82 wrote:
| I'm wondering if this is related to the recent Roku argument;
| perhaps YouTube is trying to force Roku to incorporate a hardware
| decoding chip (maybe with an increased cost) in future products
| as a condition to stay on the platform.
| conk wrote:
| I don't think YouTube cares if you use hardware of software
| decoding. I also don't think they care if you use their
| hardware decoder or someone else's. The issue with roku is they
| don't want to include any extra hardware to support vp9, and
| they use such cheap/low spec hardware they can't reliably
| decode in software.
| qwertox wrote:
| I wonder how YouTube's power consumption in transcoding the most
| useless / harmful videos relates to Bitcoin's power consumption.
| Maybe even every video should be included in the calculation,
| since Bitcoin also has its positive aspects.
|
| I've never heard about how much power YouTube's transcoding is
| consuming, but transcoding has always been one of those very CPU-
| intensive tasks (hence it was one of the first tasks to be moved
| over to the GPU).
___________________________________________________________________
(page generated 2021-05-04 23:03 UTC)