hngopher.com

       [HN Gopher] YouTube is now building its own video-transcoding chips
       ___________________________________________________________________
        
       YouTube is now building its own video-transcoding chips
        
       Author : beambot
       Score  : 232 points
       Date   : 2021-05-04 06:31 UTC (16 hours ago)
        
 (HTM) web link (arstechnica.com)
 (TXT) w3m dump (arstechnica.com)
        
       | gigatexal wrote:
       | Honestly I'm surprised they didn't do this earlier
        
       | colonwqbang wrote:
       | If anything it's just strange that such a part didn't exist
       | before (if it really didn't). Accelerated encoding is hugely more
       | power efficient than software encoding.
        
         | wishysgb wrote:
         | well I do believe video transcoding chips have been there
         | forever. But I think those chips should be tailored to their
         | exact application making them more efficient
        
         | fireattack wrote:
         | The complement of "building its own video-transcoding chips "
         | isn't just software encoding though. Google/Youtube could have
         | already been using hardware encodings, just with generic GPUs
         | or whatever existing hardware.
        
         | izacus wrote:
         | I've worked on a IPTV broadcasting system and this isn't as
         | obvious as you'd think.
         | 
         | The big issue here is quality - most hardware encoders hugely
         | lag behind software encoders. By quality I mean visual quality
         | per bytes/s. Which means that using a quality software encoder
         | like x264 will save you massive amount of money in bandwidth
         | costs because you can simply go significantly lower in bitrate
         | than you can with a hardware encoding block.
         | 
         | At the time, our comparisons showed that you could get away
         | with as low as 1.2MBps for 720p stream where with an enterprise
         | HW encoder you'd have to do about 2-3MBps to have the same
         | picture quality.
         | 
         | That's one consideration. The other consideration is density -
         | at the time most hardware encoders could do up to about 4
         | streams per 1U rack unit. Those rack units cost about half the
         | price of a fully loaded 24+ core server. Even GPUs like nVidia
         | at the time could do at most 2 encoding sessions with any kind
         | of performance. On CPU, we could encode 720p on about 2 Xeon
         | cores which means that a fully loaded server box with 36+ cores
         | could easily do 15-20 sessions of SD and HD and we could scale
         | the load as necessary.
         | 
         | And the last consideration was price - all HW encoders were
         | significantly more expensive than buying large core count rack-
         | mount servers. Funny enough, many of those "HW specialised
         | encoding" boxes were running x86 cores internally too so they
         | weren't even more power efficient.
         | 
         | So in the end the calculation was simple - software encodes
         | saved a ton of money on bandwidth, it allowed better quality
         | product because we could deliver high quality video to people
         | with poor internet connectivity, it made procuring hardware
         | simple, it made the solution more scalable and all that at the
         | cost of some power consumption. Easy trade. Of course the
         | computation is a bit different with modern formats like VP9 and
         | H.265/HEVC - the software encoders are still very CPU intensive
         | so it might make sense to buy cards these days.
         | 
         | Of course, we weren't Google and couldn't design and
         | manufacture our own hardware. But seeing the list of codecs
         | YouTube uses, there's also one more consideration: flexibility.
         | HW encoding blocks are usually very limited at what they can do
         | - most of them will do H.264, some of them will stretch to
         | H.265 and maaaaaybe VP9. CPUs will encode into everything. Even
         | when a new format is needed, you just deploy new software, not
         | a whole chip.
        
           | jng wrote:
           | Very interesting description. Are you familiar at all with
           | the details of FPGAs for these very same tasks, especially
           | the EV family of Xilinx Zynq Ultrascale+ MPSoC? They include
           | hardened video codec units, but I don't know how they compare
           | quality/performance-wise. Thanks!
        
             | izacus wrote:
             | I'm afraid I don't have any experience with those devices.
             | Most HW encoders however struggle with one thing - the fact
             | that encoding is very costly when it comes to memory
             | bandwidth.
             | 
             | The most important performance/quality related process in
             | encoding is having the encoder take each block (piece) of
             | previous frame and scan the current frame to see whether it
             | still exists and where it moved. The larger area the codec
             | scans, the more likely it'll find the area where the piece
             | of image moved to. This allows it to write just a motion
             | vector instead of actually encoding image data.
             | 
             | This process is hugely memory bandwidth intensive and most
             | HW encoders severely limit the area each thread can access
             | to keep memory bandwidth costs down and performance up.
             | This is also a fundamental limitation for CUDA/gpGPU
             | encoders, where you're also facing a huge performance loss
             | if there's too much memory accessed by each thread.
             | 
             | Most "realtime" encoders severely limit the macroblock scan
             | area because of how expensive it is - which also makes them
             | significantly less efficient. I don't see FPGAs really
             | solving this issue - I'd bet more on Intel/nVidia encoding
             | blocks paired with copious amount of onboard memory. I
             | heard Ampere nVidia encoding blocks are good (although they
             | can only handle a few streams).
        
               | spuz wrote:
               | That is interesting context for this quote from the
               | article:
               | 
               | > "each encoder core can encode 2160p in realtime, up to
               | 60 FPS (frames per second) using three reference frames."
               | 
               | Apparently reference frames are the frames that a codec
               | scans for similarity in the next frame to be encoded. If
               | it really is that expensive to reference a single frame
               | then it puts into perspective how effective this VPU
               | hardware must be to be able to do 3 reference frames of
               | 4K at 60 fps.
        
               | daniellarusso wrote:
               | I always thought of reference frames as like the sampling
               | rate, so in that sense, is it how few reference frames
               | can it get away with, without being noticeable?
               | 
               | Would that also depend on the content?
               | 
               | Aren't panning shots more difficult to encode?
        
               | izacus wrote:
               | > I always thought of reference frames as like the
               | sampling rate, so in that sense, is it how few reference
               | frames can it get away with, without being noticeable?
               | 
               | Actually not quite - "reference frames" means how far
               | back (or forward!) the encoded frame can reference other
               | frames. In plain words, "max reference frames 3" means
               | that frame 5 in a stream can say "here goes block 3 of
               | frame 2" but isn't allowed to say "here goes block 3 of
               | frame 1" because that's out of range.
               | 
               | This has obvious consequences for decoders: they need to
               | have enough memory to keep "reference frames" decoded
               | uncompressed frames around in a chance that a future
               | frame will reference them. It also has consequences for
               | encoders: while they don't have to reference frames far
               | back, it'll increase efficienty if they can reuse the
               | same stored block of image data across as much frames as
               | possible. This of course means that they need to scan
               | more frames for each processed input frames to try to
               | find as much reusable data as possible.
               | 
               | You can easily get away with "1" reference frame (MPEG-2
               | has this limit for example), but it'll encode same data
               | multiple times, lowering overall efficiency and leaving
               | less space to store detail.
               | 
               | > Would that also depend on the content?
               | 
               | It does depend on the content - in my testing it works
               | best for animated content because the visuals are static
               | for a long time so referencing data from half a second
               | ago makes a lot of sense. It doesn't add a lot for
               | content where there's a lot of scenecuts and actions like
               | a Michael Bay movie combat scene.
        
           | colonwqbang wrote:
           | My work is in IP cameras so I'm aware of these tradeoffs.
           | 
           | I guess what I didn't expect is that Google could design
           | their own encoder IP to beat the current offerings with a big
           | factor at the task of general video coding. I guessed that
           | Google actually built an ASIC with customised IPs from some
           | other vendor.
           | 
           | But maybe Google did do just that?
        
         | kevincox wrote:
         | At least for Google's case YouTube videos are usually
         | transcoded in idle datacenters (for example locations where the
         | locals are sleeping). This means that the cost of CPU is much
         | lower than a naive estimate. These new accelerators can only be
         | used for transcoding video, the rest of the time they will sit
         | idle (or you will keep them loaded but the regular servers will
         | be idle). This means that the economics are necessarily an
         | obvious win.
         | 
         | Of course if you do enough transcoding that you are buying
         | servers for the job then these start to save money. So I guess
         | someone finally decided that the R&D would likely pay off due
         | to the current combination of cyclical traffic, adjustable load
         | and the cost savings of the accelerator.
        
         | p0nce wrote:
         | There is one such chip in your phone and in your GPU.
        
         | brigade wrote:
         | Intel has had PCIe cards targeted at this market, reusing their
         | own HW encoder, e.g. the VCA2 could do up to 14 real-time 4K
         | transcodes at under 240W, and the upcoming Xe cards would
         | support VP9 encode. (XG310 is similar albeit more targeted at
         | cloud gaming servers)
        
           | dogma1138 wrote:
           | These PCIe cards just run a low power Xeon CPU with the iGPU
           | doing the majority of the heavy lifting.
           | 
           | It was always an interesting and weird product it even runs
           | it's own OS.
        
       | sidcool wrote:
       | For YouTube's scale it makes sense, since a small saving or
       | efficiency boost would accumulate at their scale.
        
         | kevingadd wrote:
         | Not just cost reduction or efficiency, the faster encodes you
         | can get through dedicated hardware mean they can potentially
         | reduce the delay between a video being uploaded and a video
         | being available to the public (right now even if you don't
         | spend time waiting in the processing queue, it takes a bit for
         | your videos to get encoded)
         | 
         | You can handle larger volumes of incoming video by spinning up
         | more encoder machines, but the only solution for lowering
         | latency is _faster encodes_ , and with the way the CPU and GPU
         | markets are these days a dedicated encoder chip is probably
         | your best bet.
        
           | Dylan16807 wrote:
           | You can split a video up across cores or even across servers.
           | Encoding speed does not need to have a significant impact on
           | publishing latency.
        
       | mauricio wrote:
       | Impressive. I wonder if Google will sell servers with these cards
       | via Google Cloud. Seems like it could be pretty competitive in
       | the transcoding space and also help them push AV1 adoption.
        
         | jankeymeulen wrote:
         | You can transcode as a service on Google Cloud:
         | https://cloud.google.com/transcoder/docs
        
       | vhiremath4 wrote:
       | I'm just always blown away that Google transcodes into as many
       | formats as they do upfront. I wonder if they do a mix of just in
       | time transcoding on top of queue-based.
        
         | sgarland wrote:
         | For VP9/x264, almost certainly not. If you jump on a newly-
         | uploaded video, you'll see that higher resolution comes later.
         | It's common to see 720p nearly immediately, then 1080p, then
         | 4K.
         | 
         | For pre-x264, they probably could, but between the relatively
         | small sizes required for the low resolution those codecs would
         | be supporting, and the cost difference between compute and
         | storage, I'd bet everything is encoded beforehand.
        
       | kevincox wrote:
       | > Google's warehouse-scale computing system.
       | 
       | That is quite the understatement. Google's computing system is
       | dozens of connected "warehouses" around the world.
        
       | spuz wrote:
       | > Google probably only provides stats about growth (like "500
       | hours of video are uploaded to YouTube every minute") because the
       | total number of videos is so large, it's an unknowable amount.
       | 
       | I suppose you could sample random YouTube urls to find out how
       | many of them link to public videos. Given the total number of
       | possible URLs, it would give you an idea of what percentage of
       | them have been used and therefore how many videos YouTube has in
       | total. It would not tell you how many private videos or Google
       | Drive / Photos videos exist of course.
        
         | warent wrote:
         | It doesn't seem like this would work. I think you could sample
         | trillions of YouTube IDs with a high likelihood of all of them
         | being unused. They're supposed to be unique after all
        
           | espadrine wrote:
           | Let's do the math.
           | 
           | IDs are 64-bit integers. The number of tries before an event
           | with probability P occurs is a geometric distribution. If V
           | is the number of valid IDs (that have a video), the number of
           | tries is 2^64/V. Assuming 1 megatries per second, since we
           | can parallelize it, we would find the first video in 20
           | seconds on average, with a conservative estimate of V = 10^12
           | (a hundred billion videos).
           | 
           | To have a sample of ~100 videos, it'd take about half an
           | hour.
        
           | chrisseaton wrote:
           | What do you mean 'supposed to be unique'? How can an ID not
           | be unique?
        
             | zakki wrote:
             | Maybe when they reach 62^11 + 1?
        
           | rococode wrote:
           | Clicked into a couple random videos, looks like all of their
           | video IDs are 11 characters, alphanumeric with cases. So
           | 26+26+10 = 62 choices for each char, 62^11 = 5.2e+19 = 52
           | quintillion unique IDs (52 million trillions).
           | 
           | So, yeah, sampling would be a mostly futile effort since
           | you're looking to estimate about 8 to 10 decimal digits of
           | precision. Though it's technically still possible since you'd
           | expect about 1 in every 50 million - 5 billion IDs to work
           | (assuming somewhere between a trillion and 10 billion
           | videos).
           | 
           | My statistics knowledge is rusty, but I guess if you could
           | sample, say, 50 billion urls you could actually make a very
           | coarse estimate with a reasonable confidence level. That's a
           | lot but, ignoring rate limits, well within the range of usual
           | web-scale stuff.
        
             | toxik wrote:
             | If there are N IDs to draw from and M videos on YouTube,
             | then P(ID used) = M/N if the ID is drawn from a uniform
             | distribution, and P(At least one of K IDs used) = 1 - (1 -
             | M/N)^K (not accounting for replacement).
             | 
             | If M [?] 1e9 and N [?] 1e18, and you sample K = 1000 URLs,
             | then it's about one in 1e-09 that you hit a used ID.
        
             | spuz wrote:
             | Thanks for doing the maths - it does seem the sampling
             | method would not be feasible. Taking the statistic of "500
             | hours uploaded per minute" and assuming the average video
             | length is 10 minutes, we can say about 1.5bn videos are
             | uploaded to YouTube every year or 15bn every 10 years. So
             | it seems likely that YouTube has less than 1tn videos in
             | total.
        
             | dpatterbee wrote:
             | They also use "_" and "-" according to Tom Scott.
             | 
             | https://www.youtube.com/watch?v=gocwRvLhDf8
        
               | _0ffh wrote:
               | Which would bring it up to a nice 64 choices, making it
               | exactly 6 bits per character.
        
               | slver wrote:
               | It's a URL-friendly form of base64.
               | 
               | 11 chars encode 66 bits, but actually 2 bits are likely
               | not used and it's simply an int64 encoded to base64.
               | 
               | Given everyone and their grandma is pushing 128-bit UUID
               | for distributed entity PK, it's interesting to see
               | YouTube keep it short and sweet.
               | 
               | Int64 is my go to PK as well, if I have to, I make it
               | hierarchical to distribute it, but I don't do UUID.
        
               | littlestymaar wrote:
               | > Given everyone and their grandma is pushing 128-bit
               | UUID for distributed entity PK, it's interesting to see
               | YouTube keep it short and sweet.
               | 
               | The trade-off you make when using short IDs is that you
               | can't generate them at random. With 128-bit Id, you can't
               | realistically have collisions, but with 64-bit ones,
               | because of the birthday paradox, as soon as you have more
               | than 2^32 elements, you're really likely to have
               | collisions.
        
               | quantumofalpha wrote:
               | Youtube video ids used to be just base64 of a 3DES-
               | encrypted mysql's primary key, a sequential 64-bit int -
               | collisions are of zero concern there. By birthday paradox
               | it's about as good as 128-bit UUID generated without
               | using a centralized component like database's row
               | counter, when you have to care about collisions.
               | 
               | However theft of the encryption key is a concern, since
               | you can't rotate it and it just sat there in the code.
               | Nowadays they do something a bit smarter to ensure ex-
               | employees can't enumerate all unlisted videos.
        
               | slver wrote:
               | You seem to know about their architecture. What do they
               | do now?
        
               | quantumofalpha wrote:
               | > You seem to know about their architecture. What do they
               | do now?
               | 
               | Random 64-bit primary keys in mysql for newer videos.
               | These may sometimes collide but then I suppose you could
               | have the database reject insert and retry with a
               | different id.
        
               | slver wrote:
               | So a single cluster produces those keys? I thought it's
               | more decentralized.
        
               | quantumofalpha wrote:
               | With random database keys I would think they can just be
               | generated at random by any frontend server running
               | anywhere. Ultimately, a request to insert that key would
               | come to the database - which is the centralized
               | gatekeeper in this design and can accept or reject it.
               | But with replication, sharding, caching even SQL
               | databases scale extremely well. Just avoid expensive
               | operations like joins.
        
               | slver wrote:
               | The reason why we want ids to be purely random is so we
               | don't have to do the work of coordinating distributed id
               | generation. But if you don't mind coordinating, then none
               | of this matters.
               | 
               | Surely if it was a great chore for YouTube to have
               | random-looking int64 ids, they would switch to int128.
               | But they haven't.
               | 
               | I'm a big fan of the "works 99.99999999% of the time"
               | mentality, but if anything happens to your PRNGs, you
               | risk countless collisions to slip up by you in production
               | before you realize what happened. It's good to design
               | your identity system in a way that'd catch that,
               | regardless of how unlikely it seems in the abstract.
               | 
               | The concept of hierarchical ids is undervalued. You can
               | have a machine give "namespaces" to others, and they can
               | generate locally and check for collisions locally in a
               | very basic way.
        
               | tatersolid wrote:
               | > but if anything happens to your PRNGs, you risk
               | countless collisions to slip up by you in production
               | before you realize what happened.
               | 
               | UUID generation basically has to use a CSPRNG to avoid
               | collisions (or at least a very large-state insecure
               | PRNG).
               | 
               | Because of the low volume simply using /dev/urandom on
               | each node makes the most sense. If /dev/urandom is broken
               | so is your TLS stack and a host of other security-
               | critical things; at that point worrying about video ID
               | collisions seems silly.
        
               | slver wrote:
               | I worry about state corrupting problems, because they
               | tend to linger long after you have a fix.
        
               | trinovantes wrote:
               | Is the extra 64 bits simply used to lower the risk of
               | collision?
        
             | nannal wrote:
             | I tried this for some time, I was looking for unlisted
             | videos.
             | 
             | Just generate a random valid link and then check if it
             | gives a video or not.
             | 
             | I found exactly 0 videos.
        
               | dr-detroit wrote:
               | Of course. They are using some modulo arithmetic: 1.
               | Start from the rightmost digit (i.e. check digit) 2.
               | Multiply every second digit by 2 (i.e. digit at even
               | positions) 3. If the result in step 2 is more than one
               | digit, add them up (E.g. 12: 1+2 = 3) 4. Add the
               | resulting digits to digits at the odd positions
        
             | slver wrote:
             | > all of their video IDs are 11 characters, alphanumeric
             | with cases
             | 
             | It's an int64, encoded as URL-friendly base64 (i.e.
             | alphanumeric with _ and -).
        
         | jasoncartwright wrote:
         | Surprised they don't also mention Nest, which I assume also has
         | an interesting & significant video encoding operation.
        
         | slver wrote:
         | > Google probably only provides stats about growth (like "500
         | hours of video are uploaded to YouTube every minute") because
         | the total number of videos is so large, it's an unknowable
         | amount.
         | 
         | I can totally see it being non-trivial to count your videos,
         | which is a funny problem to have. But I doubt it's unknowable.
         | More like they don't care/want us to know that.
        
           | nine_k wrote:
           | Quite likely they have a good approximate number.
           | 
           | But knowing the _exact_ number can indeed be hard. It would
           | take stopping the entire uploading and deletion activity. Of
           | course they may have counters of uploads and deletions on
           | every node which handles them, but the notion of  'the same
           | instant' is tricky in distributed systems, so the exact
           | number still remains elusive.
        
             | iainmerrick wrote:
             | It's not just tricky and elusive, I think it's literally
             | unknowable -- not a well-defined question. Like asking
             | about the simultaneity of disconnected events in special
             | relativity.
        
               | zodiac wrote:
               | You can modify the question to be well-defined and not
               | suffer those measurement problems, eg "the total number
               | of videos uploaded before midnight UTC on 2021-04-29"
        
               | suprfsat wrote:
               | Interesting, I wonder if a distributed database could be
               | developed to consistently answer queries phrased in this
               | way.
        
               | hderms wrote:
               | Seems like it would either need to be append only or have
               | some kind of snapshot isolation
        
               | mmcconnell1618 wrote:
               | Google Developed Spanner which is a globally distributed
               | database that uses atomic clocks to keep things
               | synchronized: https://static.googleusercontent.com/media/
               | research.google.c...
        
               | zodiac wrote:
               | I think the Chandy-Lamport snapshot algorithm tries to do
               | something like this for all distributed systems (in their
               | model, and it tries to get any consistent snapshot, not
               | allowing you to specify the "time"); not sure if it's
               | actually useful IRL though
        
               | nine_k wrote:
               | All these nodes are in the same light cone, so we
               | theoretically can stop all mutation and wait for the
               | exact final state to converge to a precise number of
               | uploaded videos.
               | 
               | But the question of the precise number of videos before
               | that moment is indeed ill-defined.
        
               | iainmerrick wrote:
               | You can theoretically stop all mutation, but the users
               | might start to complain!
        
           | saalweachter wrote:
           | I read "an unknowable amount" as "a meaninglessly large
           | number to our monkey brains".
           | 
           | It's like knowing the distance to the Sun is 93 million
           | miles. The difficulty there isn't that measuring the distance
           | from the Sun to the Earth exactly is hard, although it is, or
           | that the distance is constantly changing, although it is, or
           | that the question is ill-defined, because the Earth is an
           | object 8000 miles across and the Sun is 100 times bigger, and
           | what points are you measuring between?
           | 
           | The distance is "unknowable" because while we know what "93
           | million miles" means, it's much harder to say we know what it
           | "means". Even when we try to rephrase it to smaller numbers
           | like "it's the distance you could walk in 90 human lifetimes"
           | is still hard to really _feel_ beyond  "it's really really
           | far."
           | 
           | Likewise, does it matter if YouTube has 100, 1000, or 10,000
           | millennia of video content? Does that number have any real
           | meaning beyond back-of-the-envelope calculations of how much
           | storage they're using? Or is "500 years per minute" the most
           | comprehensible number they can give?
        
         | kyrra wrote:
         | Googler, opinions are my own.
         | 
         | Youtube isn't the only platform where Google does video
         | transcoding. I don't know them all, but here are a few other
         | places where video plays a part:
         | 
         | Meet - I'm guessing for participates that on are different
         | devices (desktop, android, ios) and depending on their
         | bandwidth will get different video feed quality. Though, the
         | real-time nature of this may not work as well? Though, Meet has
         | a live stream[0] feature for when your meeting is over 250
         | people, which gives you a youtube-like player, so this likely
         | is transcoded.
         | 
         | Duo - more video chat.
         | 
         | Photos - when you watch a photos video stored at google (or
         | share it with someone), it will likely be transcoded.
         | 
         | Video Ads. I'd guess these are all pre-processed for every
         | platform type for effecient delivery. While these are mainly on
         | youtube, they show up on other platforms as well.
         | 
         | Play Movies.
         | 
         | Nest cameras. This is a 24-stream of data to the cloud that
         | some people pay to have X days of video saved.
         | 
         | [0]
         | https://support.google.com/meet/answer/9308630?co=GENIE.Plat...
        
           | jankeymeulen wrote:
           | Also Googler, you can add Stadia here as well. Needs fast and
           | low-latency transcoding.
        
             | kyrra wrote:
             | You're right. I thought stadia did something special, but I
             | guess not. So yes, Stadia.
             | 
             | Another: Youtube TV
             | 
             | EDIT: Stadia: I did some searching around the interwebs and
             | found a year-old interview[0] that hints at something
             | different.
             | 
             | > It's basically you have the GPU. We've worked with AMD to
             | build custom GPUs for Stadia. Our hardware--our specialized
             | hardware--goes after the GPU. Think of it as it's a
             | specialized ASIC.
             | 
             | [0] https://www.techrepublic.com/article/how-google-stadia-
             | encod...
        
         | NavinF wrote:
         | Yeah that should work. If I do 1 request per second with 64 IP
         | addresses, I'd expect to find ~110 videos after 1 year of
         | random sampling if there are 1T videos on YouTube.
         | 
         | 110=(64*3.154e+7)*(1,000,000,000,000/2^64)
         | 
         | (The other thread that assumes a 62-character set is wrong
         | because they forgot about '-' and '_'. I'm fairly certain a
         | video ID is a urlsafe base64 encoded 64-bit int. 64^11==2^66)
        
       | samwestdev wrote:
       | Why don't they encode on the uploader machine?
        
         | absove wrote:
         | Sounds like something for the next version of recaptcha
        
         | devit wrote:
         | Because society results in companies being incentivized to
         | babysit users rather than cutting off those who are unable to
         | learn simple technical skills like optimally encoding a video
         | respecting a maximum bitrate requirement.
        
           | giantrobot wrote:
           | > simple technical skills like optimally encoding a video
           | respecting a maximum bitrate requirement.
           | 
           | This is in no way a "simple skill" as maximum video bitrate
           | is only one of a number of factors for encoding video. For
           | streaming to end users there's questions of codecs, codec
           | profiles, entropy coding options, GOP sizes, frame rates, and
           | frame sizes. This also applies for your audio but replacing
           | frame rates and sizes with sample rate and number of
           | channels.
           | 
           | Streaming to ten random devices will require different
           | combinations of any or all of those settings. There's no one
           | single optimum setting. YouTube encodes dozens of
           | combinations of audio and video streams from a single source
           | file.
           | 
           | Video it turns out is pretty complicated.
        
           | acdha wrote:
           | What about cutting off those who condescend others without
           | recognizing the limits of their own understanding?
           | 
           | I'm not an expert in this but I know that "optimally encoding
           | a video" is an actual job. That's because there's no global
           | definition of optimal (it varies depending on the source
           | material and target devices, not to mention the costs of your
           | compute, bandwidth, and time); you're doing it multiple times
           | using different codecs, resolutions, bandwidth targets, etc.;
           | and those change regularly so you need to periodically
           | reprocess without asking people to come back years later to
           | upload the iPhone 13 optimized version.
           | 
           | This brings us to a second important concept: YouTube is a
           | business which pays for bandwidth. Their definition of
           | optimal is not the same as yours (every pixel of my
           | masterpiece must be shown exactly as I see it!) and they have
           | a keen interest in managing that over time even if you don't
           | care very much because an old video isn't bringing you much
           | (or any) revenue. They have the resources to heavily optimize
           | that process but very few of their content creators do.
        
         | pta2002 wrote:
         | Can't trust user input, you'd have to spend quite a bit of
         | energy just checking to see if it's good. You also want to
         | transcode multiple resolutions, it'd end up being quite slow if
         | it's done using JS.
        
           | londons_explore wrote:
           | Checking the result is good shouldn't be too hard - a simple
           | spot check of a few frames should be sufficient, and it isn't
           | like the uploader gets a massive advantage for uploading
           | corrupt files.
           | 
           | The CPU and bandwidth costs of transcoding to 40+ different
           | audio and video formats would be massive though. I could
           | imagine a 5 minute video taking more than 24 hours to
           | transcode on a phone.
        
             | simcop2387 wrote:
             | > Checking the result is good shouldn't be too hard - a
             | simple spot check of a few frames should be sufficient, and
             | it isn't like the uploader gets a massive advantage for
             | uploading corrupt files.
             | 
             | Uploading corrupt files could allow the uploader to execute
             | code on future client machines. You _must_ check every
             | frame and the full encoding of the video.
        
               | kevincox wrote:
               | Must is a strong word. In theory browsers and other
               | clients treat all video stream as untrusted and it is
               | safe to watch an arbitrary video. However complex formats
               | like videos are a huge attack surface.
               | 
               | So yes, for the bigger names like Google this is an
               | unacceptable risk. They will generally avoid serving any
               | user-generated complex format like video, images or audio
               | to users directly. Everything is transcoded to reduce the
               | likelihood that an exploit was included.
        
           | amelius wrote:
           | Verification is simpler than encoding, I suppose.
        
         | mschuster91 wrote:
         | Because of the massive bandwidth and data requirements.
         | Assuming I as the source have a 20 MBit/s content that is 30
         | min long - that's about 3.6 GB of data.
         | 
         | Given your average DSL uplink of 5 MBit/s, that's 2 hours
         | uploading for the master version... and if I had to upload a
         | dozen smaller versions myself, that could easily add five times
         | the data and upload time.
        
         | greenknight wrote:
         | Imagine someone using a 10 year old computer to upload a 1 hour
         | video. not only do they need to transcode to multiple different
         | resolutions, but also codecs. This would not practical from a
         | business / client relationship. They want their client (the
         | uploader) to spend as little time as possible and get their
         | videos as quickly as possible.
        
           | ThatPlayer wrote:
           | I wouldn't even say 10 year old computer. Think phones or
           | tablets. As well as the battery drain. Or imagine trying to
           | upload something over 4G.
        
             | bufferoverflow wrote:
             | > _Or imagine trying to upload something over 4G._
             | 
             | 4G is perfectly fine for uploading videos. It can hit up to
             | 50 Mbps. LTE-Advanced can do 150 Mbps.
        
           | greenknight wrote:
           | Though that being said, it would be great to be like hey
           | google, ill do the conversions for you! but then they would
           | have to trust that the bitrate isnt too high / not going to
           | crash their servers etc.etc.etc.
        
         | lbotos wrote:
         | Because I make one output file and they optimize for like 7
         | different resolutions. If they make it longer for me to upload
         | I'd wager that would lower the video upload rate.
        
         | arghwhat wrote:
         | YouTube needs to re-encode occasionally (new
         | codecs/settings/platforms), it would be easy to abuse and send
         | too high bitrate or otherwise wrong content, and a lot of end-
         | user devices simply isn't powerful enough to complete the task
         | in a reasonable amount of time.
        
         | chrisseaton wrote:
         | > Why don't they encode on the uploader machine?
         | 
         | Are you basically asking why they don't take a performance-
         | sensitive, specialised, and parallel task and run it on a low-
         | performance, unspecialised, and sequential system?
         | 
         | Would take hours and be super inefficient.
        
       | 8K832d7tNmiQ wrote:
       | The real news here is that They still use GPU to transcode their
       | videos whilst other service such as search engine already use TPU
       | for almost a decade now.
       | 
       | I thought they've already use custom chip for transcoding for
       | decades.
        
         | kungito wrote:
         | i think they do more general purpose things like downsampling,
         | copyright detection etc which doesn't have globally available
         | custom asics. i think gpus don't do encoding/decoding
         | themselves, they have separate asics built in which do the
         | standardised encodings
        
         | boomlinde wrote:
         | Are TPUs particularly useful for this kind of workload,
         | compared to specialized encoders/decoders available on GPUs?
        
         | rland wrote:
         | GPUs have specialized hardware for video transcoding, no? So
         | this actually makes sense. The product was already made
         | (although, perhaps not up to Youtube's standard) by GPU
         | manufacturers.
        
           | numlock86 wrote:
           | The specialized hardware in GPUs is targeted at encoding
           | content on the fly. While you could use this to encode a
           | video for later playback it has a couple of drawbacks when it
           | comes to size and quality, namely h264, keyframes, static
           | frame allocations, no multipass encoding, etc. ... This is
           | why video production software that supports GPU encoding
           | usually marks this option as "create a preview, fast!". It's
           | fast but that's it. If you want a good quality/size ratio you
           | would use something like VP9 for example. Because of missing
           | specialized hardware and internals of the codec itself
           | currently this is very slow. Add multipass encoding,
           | something like 4k at 60 frames, adaptive codec bitrates and
           | suddenly encoding a second takes a over two minutes ... the
           | result is the need for specialized hardware.
        
         | baybal2 wrote:
         | They were actually transcoding on CPUs before, not GPUs
        
         | madeofpalk wrote:
         | Yeah I was surprised that it's taken them this long to build
         | custom hardware for encoding videos.
        
         | jng wrote:
         | Is there any solid information about Google using TPU for the
         | search engine, or is this an assumption you're making?
        
           | alarmingfox wrote:
           | This[0] Google blog from 2017 states they were using TPU for
           | RankBrain which is what powers the search engine
           | 
           | [0] - https://cloud.google.com/blog/products/ai-machine-
           | learning/g...
        
             | JohnJamesRambo wrote:
             | All that power to show me recipes that hemhaw around and
             | spend ten paragraphs to use all the right SEO words.
             | 
             | I feel like the Google results were better 20 years ago,
             | what did they use back then before TPUs?
        
               | mda wrote:
               | I think search results 20 years ago were laughably worse
               | than today.
        
               | KeplerBoy wrote:
               | The web just got worse in a lot of ways, because
               | everything needs to generate money.
        
             | londons_explore wrote:
             | They had to get special permission from the US government
             | to export TPU's abroad to use in their datacenters. The
             | TPU's fell under ITAR regulations (like many machine
             | learning chips). The US government granted permission, but
             | put some restriction like 'they must always be supervised
             | by an american citizen', which I imagine leads to some very
             | well paid foreign security guard positions for someone with
             | the correct passport...
             | 
             | Read all that on some random government document portal,
             | but can't seem to find it now...
        
         | anonymoushn wrote:
         | Right, at a competing video site we had vendors trying to sell
         | us specialized encoding hardware most of a decade ago.
        
       | pengaru wrote:
       | How long before the ads are realtime encoded into the video
       | streams such that even youtube-dl can't bypass them without a
       | premium login?
       | 
       | I've been surprised this wasn't already the case, but assumed it
       | was just an encoding overhead issue vs. just serving pre-encoded
       | videos for both the content and ads with necessarily well-defined
       | stream boundaries separating them.
        
         | IshKebab wrote:
         | That sounds like an enormous pain in the arse just to piss off
         | a vocal minority of users.
        
           | whywhywhywhy wrote:
           | A vocal minority who are not bringing in any revenue for the
           | site.
           | 
           | Saying that though the day they finally succeed in making ads
           | unskippable will be the time for a competitor to move in.
        
             | JohnWhigham wrote:
             | Yeah, if YT wanted to insert unskippable ads on the
             | backend, they would have years ago. The tech is not the
             | hard part. They know it'd be a PR disaster for them.
        
             | corobo wrote:
             | When it's possible to skip baked in ads (SponsorBlock[1])
             | -- the whack-a-mole will continue no matter what. Even if
             | it means you can't watch videos in realtime but have to
             | wait for them to fully download to rip the ad out, someone
             | will figure it out.
             | 
             | At that time everyone starts talking about it and I gotta
             | imagine a bunch of new people become adblocking users.
             | 
             | [1] https://news.ycombinator.com/item?id=26886275
        
               | KingMachiavelli wrote:
               | SponsorBlock only works because the sponsored segments
               | are at the same location for every viewer. If Youtube
               | spliced in their own ads they could easily do it at
               | variable intervals preventing any crowd sourced database
               | of ad segment timestamps. To be honest, nothing really
               | stops Youtube from just turning on Widevine encryption
               | for all videos (not just purchased/rented TV & movies)
               | besides breaking compatibility with old devices. Sure
               | widevine can be circumvented but most of the best/working
               | cracks are not public.
        
         | martin-adams wrote:
         | I suspect doing personalised ads obliterates any caching method
         | on cheaper hardware than transcoding servers. Interesting
         | problem to solve though.
        
           | jwandborg wrote:
           | The ads are not part of the encoded video AFAICT, they are
           | probably served as a separate stream which the client
           | requests alongside the regular video stream, this means that
           | videos and ads can be cached using traditional techniques.
        
           | amelius wrote:
           | > Interesting problem to solve though.
           | 
           | Ah, smart people and ads ...
        
         | garblegarble wrote:
         | You wouldn't even need to do real-time encoding for that, you
         | can simply mux them in at any GOP boundary (other services
         | already do real-time ad insertion in MPEG-DASH manifests)
         | 
         | Example: https://www.youtube.com/watch?v=LFHEko3vC98
        
           | elithrar wrote:
           | Right, using DAI means you don't have to actually touch the
           | original video (good!) but doesn't stop a smart enough client
           | (youtube-dl) from pattern matching and ignoring those
           | segments when stitching the final video together.
           | 
           | I am not, however, suggesting that encoding ads into the
           | final stream is appropriate or scalable, though!
        
             | kevincox wrote:
             | The client doesn't even have to know that there is an ad
             | playing if they really want to thwart ad blockers. If you
             | are talking about pattern-matching the actual video stream
             | ad-blockers could do that today and just seek forwards but
             | none do yet.
        
         | callamdelaney wrote:
         | Then you could just skip the ad in the video, unless the player
         | has some meta-data around when the ad is; in which case
         | youtube-dl can chop it out.
        
           | pengaru wrote:
           | Not if you tightly control the streaming rate to not get far
           | ahead of a realtime playback, just mete out the video stream
           | at a rate appropriate for watching, not as fast as the pipe
           | can suck it down.
        
             | londons_explore wrote:
             | I'm kinda surprised Google doesn't do this... They would
             | need to keep track of user seeks and stuff, but it still
             | seems do-able. One simple model is for the server to know
             | when ad-breaks should happen, and prevent any more
             | downloading for the duration of the ad.
             | 
             | Sure, it would break people who want to watch at 2x
             | realtime, but they seem small-fry compared to those with
             | adblockers.
        
               | giantrobot wrote:
               | The issue there is scale, MPEG-DASH/HLS let the edge
               | servers for video to be simple. The servers don't need to
               | do much more than serve up bytes via HTTP. This ends up
               | being better for clients, especially mobile clients,
               | since they can choose streams based on their local
               | conditions the server couldn't know about like
               | downgrading from LTE to UMTS.
               | 
               | Google would end up having to maintain a lot of extra
               | client state on their edge servers if they wanted to do
               | that all in-band. Right now it's done out of band with
               | their JavaScript player. Chasing down youtube-dl users
               | isn't likely worth that extra cost.
        
               | londons_explore wrote:
               | The edge server could implement this without much extra
               | complexity.
               | 
               | For example each chunk URL could be signed with a
               | "donotdeliverbefore" timestamp.
               | 
               | Now the edge server has zero state.
               | 
               | Similar things are done to prevent signed in URL's being
               | shared with other users.
        
               | giantrobot wrote:
               | There's no shared wall clock between the server and
               | client with HTTP-based streaming. There's also no
               | guarantee the client's stream will play continuously or
               | even hit the same edge server for two individual
               | segments. That's state an edge server needs to maintain
               | and even share between nodes. It would be different for
               | every client and every stream served from that node.
               | 
               | For streaming you actually _want_ the client to have a
               | buffer past the play head. If the client can buffer the
               | whole stream it makes sense to let them in many cases.
               | The client buffers the whole stream and then leaves your
               | infrastructure alone even if they skip around or pause
               | the content for a long time. The only limits that really
               | make sense are individual connection bandwidth limits and
               | overall connection limits.
               | 
               | The whole point of HTTP-based streaming is to minimize
               | the amount of work required on the server and push more
               | capability to the client. It's meant to allow servers to
               | be dumb and stateless. The more state you add, even if
               | it's negligible _per client_ , ends up being a lot of
               | state in aggregate. If a system meant edge servers could
               | handle 1% less traffic that means server costs increase
               | by 1%. Unless those ones of ad impressions skipped by
               | youtube-dl users come anywhere close to 1% of ad revenue
               | it's pointless for Google to bother.
        
               | londons_explore wrote:
               | > skipped by youtube-dl users
               | 
               | It's also ublock and adblock plus users. Estimated at
               | about 25% of youtube viewership.
               | 
               | Also, the shared clock only needs to be between edge
               | servers and application servers. And only to an accuracy
               | of a couple of seconds. I bet they have that in place
               | already.
        
       | Traubenfuchs wrote:
       | The dangerous case of custom hardware making a software business
       | significantly more efficient: This makes disruption and
       | competition even harder.
        
         | CyberRabbi wrote:
         | Usually competition for a general platform like YouTube comes
         | in the form of unbundling and in that case these last mile
         | optimizations will matter little.
        
           | narrator wrote:
           | The main competitors to YouTube are the sites that have non-
           | illegal content that YouTube won't host. e.g: Porn and
           | controversial political stuff.
        
             | CyberRabbi wrote:
             | That might be true but I think sites like odyssey are more
             | popular than controversial political video sites.
        
         | wishysgb wrote:
         | as long as you can buy the parts or have the HDL to deploy it
         | on an FPGA you should be fine
        
         | blihp wrote:
         | It's the seemingly infinite bandwidth that Google throws at
         | YouTube that make competition hard. Then there's the inability
         | to monetize. Transcoding is probably about 20th on the list of
         | issues.
        
         | azurezyq wrote:
         | It's inevitable, and this applies to other kinds of
         | optimizations as well. This place is too mature, disruption
         | might be easier elsewhere.
        
         | c7DJTLrn wrote:
         | What is there to compete for? Video hosting is a money-losing
         | business unless you have exclusives, like Floatplane.
        
           | endless1234 wrote:
           | What is floatplane, never heard of it? Seemingly an yt
           | competitor by a somewhat popular youtuber. App on Android has
           | "10k+" installs. Isn't it _way_ too early to say it wouldn't
           | be a money losing business?
        
             | bobsmooth wrote:
             | Linus' goal for Floatplane is "If it doesn't fly, it'll at
             | least float." There's only 20 creators on it and it's
             | intended to compliment YouTube, not replace it.
        
             | oblio wrote:
             | My guess is that the commenter is either a Floatplane
             | insider or possibly just optimistic :-)
        
             | throwaway3699 wrote:
             | Think of Floatplane as more of a Patreon competitor with a
             | video component, than a YouTube competitor.
        
           | human_error wrote:
           | What's floatplane? Hadn't heard of it. The website doesn't
           | say much.
        
             | pcmill wrote:
             | Floatplane is a video service built by the people behind
             | the popular Youtube channel LinusTechTips. It is not a
             | direct competitor to Youtube though. The platform makes it
             | easier to let paying fans get videos earlier but it is not
             | meant to build an audience.
        
         | fancyfredbot wrote:
         | This will keep costs down but I am not sure cost of transcoding
         | is the major barrier to entry? I think the network effect
         | (everyone is on YouTube) had already made disruption pretty
         | difficult!
        
           | NicoJuicy wrote:
           | Depends how you look at it. There could be someone making
           | these chips and then a competitor with lower startup costs
           | than before.
        
           | londons_explore wrote:
           | Things like youtube run on super-thin margins. Bandwidth and
           | storage costs are massive, compute costs quite big, and ad
           | revenue really quite low.
           | 
           | A competitor would need either a different model to keep
           | costs low (limit video length/quality, the vimeo model of
           | forcing creators to pay, or go for the netflix-like model of
           | having a very limited library), or very deep pockets to run
           | at a loss until they reach youtube-scale.
           | 
           | I'm still mystified how tiktok apparently manage to turn a
           | profit. I have a feeling they are using the 'deep pockets'
           | approach, although the short video format might also bring in
           | more ad revenue per hour of video stored/transcoded/served.
        
         | Traster wrote:
         | To be honest I suspect it isn't actually a differentiator. It's
         | good for Google that they can produce this chip and trim their
         | hardware costs by some percentage, but it's not going to give
         | them a competitive advantage in the market of video sharing.
         | Especially in a business like youtube with network effects,
         | getting the audience is the difficult bit, the technical
         | solutions are interesting but you're not going to beat google
         | by having 5% cheaper encoding costs.
        
         | ant6n wrote:
         | Perhaps. But the big issues for YouTube right now isnt
         | efficiency per se, but copyright, monetization, ai-tagging,
         | social clout. If a YouTube competitor can get the content
         | creators and offer them viewers, competition could perhaps
         | work. This fight is probably not fought at the margins of
         | hardware optimization.
        
         | mrtksn wrote:
         | It's like crystallisation of the software. When you decide that
         | this is the best version of an algorithm, you make a hardware
         | that is extremely efficient in running that algorithm.
         | 
         | It probably means that, unless you have a groundbreaking
         | algorithm on something that is available as hardware, you
         | simply do software on something that is not "perfected".
         | 
         | It trims marginal improvements.
        
       | imwillofficial wrote:
       | This is intense. ASICS making a comeback again. It's weird how
       | the computer market is so cyclical with regard to trends.
        
       | justinzollars wrote:
       | Youtube blog post on this topic: https://blog.youtube/inside-
       | youtube/new-era-video-infrastruc...
        
       | bradfa wrote:
       | The paper linked in the ARS article
       | (https://dl.acm.org/doi/abs/10.1145/3445814.3446723) seems to be
       | how they developed it. I find it interesting that they went from
       | C++ to hardware in order to optimize the development and
       | verification time.
       | 
       | In my past experience working with FPGA designers, I was always
       | told that any C-to-H(ardware) tooling was always quicker to
       | develop but often had significant performance implications for
       | the resulting design in that it would consume many more gates and
       | run significantly slower. But, if you have a huge project to
       | undertake and your video codec is only likely to be useful for a
       | few years, you need to get an improvement (any improvement!) as
       | quick as possible and so the tradeoff was likely worth it for
       | Google.
       | 
       | Or possibly the C-to-H tooling has gotten significantly better
       | recently? Anyone aware of what the state of the art is now with
       | this to shed some light on it?
        
         | pclmulqdq wrote:
         | It has not, and the type of design they show in the paper has a
         | lot of room to improve (FIFOs everywhere, inefficient blocks,
         | etc). However, video transcoding is suited to that approach
         | since the operations you do are so wide that you can't avoid a
         | speedup compared to software.
        
       | rurban wrote:
       | "now" being 2015. They are talking about the new 2nd generation
       | chip here, which is a bit faster.
        
       | pizza234 wrote:
       | A couple of interesting bits:
       | 
       | - without dedicated processors, VP9's encoding is roughly 4.5x as
       | slow as H.264, while with the VPUs, the two formats perform the
       | same; this is a big win for the open format(s)
       | 
       | - sad extract: "Google has aggressively fought to keep the site's
       | cost down, often reinventing Internet infrastructure and
       | _copyright_ in order to make it happen " (emphasis mine)
        
         | sillysaurusx wrote:
         | Why is that sad? No company in Google's position could've done
         | better, probably. Youtube was about to be sued into oblivion
         | till Google purchased it.
        
           | serf wrote:
           | >Why is that sad?
           | 
           | because the wholesale destruction and minimization of
           | knowledge, education, and information to appease (often
           | arbitrary) intellectual protectionism laws is sad, regardless
           | of who perpetrates it.
           | 
           | non-Google example : What.cd was a site centered around music
           | piracy, but that potentially illegal market created a huge
           | amount of original labels and music that still exists now in
           | the legal sphere.
           | 
           | No one would defend the legal right for what.cd to continue
           | operating, it was obviously illegal; but the unique, novel,
           | and creative works that came from the existence of this
           | illegal enterprise would be sad to destroy.
           | 
           | Swinging back to the Google example : YouTube systematically
           | destroys creations that they feel (often wrongly) infringe
           | upon IP. This is often not even the case, Google routinely
           | makes wrong decisions erring on the side of the legal team.
           | 
           | This destruction of creative work is _sad_ , in my opinion
           | it's more sad than the un-permitted use of work.
           | 
           | Of course, Google as a corporation _should_ act that way, but
           | it 's _sad_ in certain human aspects.
        
             | nolok wrote:
             | It's not just google as a corporation, it's google as a
             | legal entity.
             | 
             | Have your own site in your own individual name with no
             | corporate entity nor search for profit offering to host
             | people's videos for free, and I guarantee you that within
             | 24h you are dealing with things ranging from pedophilia to
             | copyright violations and the like. And if you don't clear
             | them out, you're the one responsible.
             | 
             | Google is acting the way society has decided they should
             | act through the laws it voted. Could they act another, more
             | expensive, way in order to save a bit more of content that
             | get caught by mistake ? Definitely, but why would they as a
             | company when the laws says any mistake or delay is their
             | fault.
             | 
             | Source: like many people, I once made a free image hosting
             | thingy. It was overrun by pedos within a week to my
             | absolute horror and shock. Copyright infringement is
             | obviously not the same at all, BUT the way the law act
             | toward the host is not that different "ensure there is none
             | and be proactive in cleaning, or else ...".
        
               | sfgweilr4f wrote:
               | Your free image hosting thingy is an example of low
               | barrier to entry both in cost and anonymity. If you had
               | made the cost trivial but traceable I wonder what the
               | outcome would have been. I wonder if a site like
               | lobste.rs but for video would work better. A graph of who
               | is posting what and a graph of how they got onto the site
               | in the first place.
               | 
               | If you vouch for someone who is dodgy now you are also
               | seen as a little dodgier than you were before. This
               | doesn't necessarily mean you lose your account because
               | you happened to vouch for someone, but it might mean that
               | your vouching means less in future.
        
             | maxerickson wrote:
             | It's not destroyed, it just isn't published. Or is the idea
             | that they should be the canonical archive of all uploads?
        
             | dev_tty01 wrote:
             | They aren't destroying anything. They are just not allowing
             | the material on their site. Are you saying that anyone who
             | creates a video hosting site must allow ANY content on
             | their site? I don't see any practical basis for that
             | contention.
        
               | [deleted]
        
         | asdfasgasdgasdg wrote:
         | I don't see any justification in the linked article for the
         | claim that YouTube has in any way reinvented copyright. It
         | seems like a throw-away line that is unsupported by any facts.
        
           | ximeng wrote:
           | https://www.reddit.com/r/videos/comments/n29fxn/piano_teache.
           | ..
           | 
           | https://www.reddit.com/r/videos/comments/n4a4l0/huge_history.
           | ..
           | 
           | Even if not supported in the article here are two examples in
           | the last couple of days of how YouTube is de facto defining
           | copyright regulation.
        
             | asdfasgasdgasdg wrote:
             | These are examples of YouTube following copyright laws
             | imperfectly, which is basically guaranteed to happen on a
             | regular basis at their scale. Definitely not what I would
             | consider YouTube redefining copyright.
        
               | grecy wrote:
               | > _These are examples of YouTube following copyright laws
               | imperfectly, which is basically guaranteed to happen on a
               | regular basis at their scale_
               | 
               | Given their entire copyright takedown system is
               | (in)famously entirely automated, I would have thought it
               | would be trivial for it to _always_ follow copyright laws
               | to the letter.. if they wanted it to.
        
               | asdfasgasdgasdg wrote:
               | If channel A uploads a video copied from channel B, then
               | makes a copyright claim against channel B, how does an
               | automated system determine which owns the rights?
               | Certainly it would seem in most cases that we should
               | presume channel B has the copyright, since they uploaded
               | first. But there is a very economically important class
               | of videos where infringers will tend to be the first to
               | upload (movies, TV shows, etc.). I don't really see how
               | an automated system solves this problem without making
               | any mistakes. Especially because the law (DMCA) puts the
               | onus on the service provider to take down or face
               | liability.
        
               | toast0 wrote:
               | It would be trivial to follow copyright laws to the
               | letter if authorship and user identity were trivial and
               | fair use exceptions were trivial to determine.
               | 
               | None of those things are trivial, and that's before
               | rights assignment.
               | 
               | YouTube's system is built primarily to placate
               | rightsholders and avoid human labor paid for by Google.
        
               | jsight wrote:
               | How would that work? Infringement and even ownership are
               | sometimes subjective or disputed. Automating it doesn't
               | make those issues any easier.
        
         | ksec wrote:
         | >- without dedicated processors, VP9's encoding is roughly 4.5x
         | as slow as H.264, while with the VPUs, the two formats perform
         | the same; this is a big win for the open format(s)
         | 
         | H.264 is an open format. Just not Royalty Free. The baseline of
         | H264 will soon be "free" once those patents expires in 2023. (
         | Or basically MPEG-5 EVC )
         | 
         | The hardware encoding for VP9 being the same as H.264 is mostly
         | due to hardware specifically optimise for VP9 and not H.264.
         | The complexity difference is still there.
        
           | threeseed wrote:
           | And VP9 is patent encumbered but they have a license from
           | MPEG-LA.
           | 
           | So it's definitely not any more open than H.264.
        
             | gruez wrote:
             | Source for this? https://en.wikipedia.org/wiki/VP9 says
             | some companies claimed patents on it, but google basically
             | ignored them.
        
               | vbezhenar wrote:
               | While Google might ignore them, can small company ignore
               | them? I don't think that Google will fight for some guy
               | using VP9 and getting sued.
        
         | dmitriid wrote:
         | > without dedicated processors, VP9's encoding is roughly 4.5x
         | as slow as H.264
         | 
         | > this is a big win for the open format(s)
         | 
         | How is this a big win if you need dedicated processors for it
         | to be as fast?
        
           | selfhoster11 wrote:
           | It increases adoption of the open standard on the supply
           | side.
        
             | dmitriid wrote:
             | wat?
             | 
             | I honestly can't parse this sentence.
             | 
             | "Google creates dedicated custom proprietary processors
             | which can process VP9 at roughly the same speed as a
             | 20-year-old codec". How is this a win for opensource
             | codecs?
        
               | selfhoster11 wrote:
               | You won't get adoption until the word gets around that
               | Big Company X is using Format Y, and they supply content
               | prominently in Format Y. That's when Chinese SoC
               | manufacturers start taking things seriously, add hardware
               | decode blocks to their designs, and adoption just spirals
               | out from there.
        
               | IshKebab wrote:
               | Because VP9 achieves better compression than H.264.
        
               | virtue3 wrote:
               | because the largest video site in the world will be
               | encoding as VP9.
        
               | dmitriid wrote:
               | It's a codec developed by Google, for Google, and Google
               | will happily abandon it. From the article:
               | 
               | > After pushing out and upgrading to VP8 and VP9, Google
               | is moving on to its next codec, called "AV1," which it
               | hopes will someday see a wide rollout.
               | 
               | I still can't see how this is a win.
        
               | saynay wrote:
               | VP9 is meant to be a parallel to h264, and AV1 to h265?
               | 
               | VP9 running on custom circuits being equivalent speed to
               | h264 running on custom circuits seems like a win for VP9?
               | Since VP9 isn't royalty encumbered the way h264 is, that
               | could well be a win for the rest of us too.
        
               | dmitriid wrote:
               | > Since VP9 isn't royalty encumbered the way h264 is,
               | that could well be a win for the rest of us too.
               | 
               | I can only repeat myself: "Google creates dedicated
               | custom proprietary processors which can process VP9 at
               | roughly the same speed as a 20-year-old codec". How is
               | this a win for anyone but Google (who is already eyeing
               | to replace VP9 with AV1)?
               | 
               | "The rest of us" are very unlikely to run Google's custom
               | chips. "The rest of us" are much more likely to run this
               | in software, for which, to quote the comment I was
               | originally replying to, "without dedicated processors,
               | VP9's encoding is roughly 4.5x as slow as H.264".
               | 
               | Note: I'm not questioning the codec itself. I'm
               | questioning the reasoning declaring this "a big win for
               | the open format(s)".
        
               | virtue3 wrote:
               | Aren't Vp8/9/av1 all open codecs tho? I don't really see
               | what the issue is.
               | 
               | Vp8 seems to be the most supported on all platforms
               | without melting your intel CPU. At least from when I was
               | deploying a webRTC solution to a larger customer base
               | last year.
               | 
               | > In May 2010, after the purchase of On2 Technologies,
               | Google provided an irrevocable patent promise on its
               | patents for implementing the VP8 format
               | 
               | This is significantly better than even h264 in terms of
               | patents/royalties.
               | 
               | Would you mind elaborating on your hate? There's nothing
               | for google to abandon here? It's already out in the wild.
        
               | dmitriid wrote:
               | > Would you mind elaborating on your hate?
               | 
               |  _How_ did you even come up with this question?
               | 
               | Please go an re-read my original question:
               | https://news.ycombinator.com/item?id=27035059 and the
               | follow-up from me:
               | https://news.ycombinator.com/item?id=27035112 and from
               | another person:
               | https://news.ycombinator.com/item?id=27036150
               | 
               | But yeah, sure, I _hate hate hate_ VP9 smh
        
               | threeseed wrote:
               | They have always been encoding in VP9 though. But it
               | doesn't mean they will be serving it.
               | 
               | For example OSX doesn't support it at all and iOS only
               | supports VP9 in 4K/HDR mode and only for certain
               | versions.
        
               | sdenton4 wrote:
               | There are a pile of transcodes of every video, served
               | appropriately for the device. Serving VP9 to non-OSX
               | devices is still a big win, at scale.
        
               | acdha wrote:
               | It's a relatively modest win versus H.265 unless you're
               | willing to substantially sacrifice quality and that has
               | to be balanced against the extra CPU time and storage.
               | 
               | YouTube uses so much bandwidth that this is still
               | measured in millions of dollars but it's really worth
               | remembering that "at scale" to them is beyond what almost
               | anyone else hits.
        
               | xodice wrote:
               | I'm watching a YouTube video in Safari right now being
               | served via VP9 (Verified under "Stats for nerds").
               | 
               | Heck, even the M1 supports VP9 in hardware up to 8K60.
               | 
               | I'm not sure where you got that macOS has no VP9 support,
               | it works quite well.
        
             | acdha wrote:
             | What impact does this really have, though? Are they making
             | better VP9 tools available to other people? Browsers
             | already have highly-tuned playback engines and YouTube
             | actively combats efforts to make downloaders or other
             | things which use their videos, is there a path I'm missing
             | where this has much of an impact on the rest of the
             | Internet?
        
       | CyberRabbi wrote:
       | Wow 33x throughput improvement for vp9 for the same hardware
       | cost. That seems excessive but their benchmark is using ffmpeg.
       | Is ffmpeg known to have the theoretically highest throughput
       | possible state of the art vp9 encoder algorithms? Or is there any
       | way of knowing if their hardware IP block is structured
       | equivalently to the ffmpeg software algorithm? I know that custom
       | hardware will always beat general hardware but 33x is a very
       | large improvement. Contemporary core counts coupled with very
       | wide simd makes CPUs functionally similar to ASIC/fpga in many
       | cases.
        
         | brigade wrote:
         | The only OSS VP9 encoder is Google's own libvpx, which is what
         | ffmpeg uses.
        
           | rndgermandude wrote:
           | By now Intel has released an open source encoder (BSD-2 +
           | patent grants), tuned for their Xeons:
           | 
           | https://github.com/OpenVisualCloud/SVT-VP9
        
         | bumbada wrote:
         | That doesn't look so excessive to me. We get hundred or
         | thousand times more efficiency and performance regularly using
         | custom electronics for things like 3d or audio recognition.
         | 
         | But programming fixed electronics in parallel is also way
         | harder than flexible CPUs.
         | 
         | "Contemporary core counts coupled with very wide simd makes
         | CPUs functionally similar to ASIC/fpga in many cases."
         | 
         | I don't think so. For things that have a way to be solved in
         | parallel, you can get at least a 100x advantage easily.
         | 
         | There are lots of problems that you could solve in the
         | CPU(serially) that you just can't solve in parallel(because
         | they have inter dependencies).
         | 
         | Today CPUs delegate the video load to video coprocessors of one
         | type or another.
        
           | bumbada wrote:
           | BTW: Multiple CPUs cores are not parallel programming in the
           | sense fpgas or ASICS (or even GPUs) are.
           | 
           | Multiple cores work like multiple machines, but parallel
           | units work choreographically in sync at lower speeds(with
           | quadratic energy consumption). They could share everything
           | and have only the needed electronics that do the job.
        
             | CyberRabbi wrote:
             | Well transistors are cheap and synchronization is not a
             | bottleneck for embarrassingly parallel video encoding jobs
             | like these. Contemporary CPUs already downclock when they
             | can to save power and conserve heat.
        
           | CyberRabbi wrote:
           | >> Contemporary core counts coupled with very wide simd makes
           | CPUs functionally similar to ASIC/fpga in many cases.
           | 
           | > I don't think so. For things that have a way to be solved
           | in parallel, you can get at least a 100x advantage easily.
           | 
           | That's kind of my point. CPUs are incredibly parallel now in
           | their interface. Let's say you have 32 cores and use 256 bit
           | simd for 4 64-bit ops. That would give you ~128x improvement
           | compared to doing all those ops serially. It's just a matter
           | of writing your program to exploit the available parallelism.
           | 
           | There's also implicit ILP going on as well but I think
           | explicitly using simd usually keeps execution ports filled.
        
             | WJW wrote:
             | TBH 32 or even 64 cores does not sound all that impressive
             | compared to the thousands of cores available on modern GPUs
             | and presumably even more that could be squeezed into a
             | dedicated ASIC.
             | 
             | In any case, wouldn't you run out of memory bandwidth long
             | before you can fill all those cores? It doesn't really
             | matter how many cores you have in that case.
        
               | CyberRabbi wrote:
               | Those thousands of cores are all much more simple and do
               | not have simd and have a huge penalty for branching.
        
       | gigel82 wrote:
       | I'm wondering if this is related to the recent Roku argument;
       | perhaps YouTube is trying to force Roku to incorporate a hardware
       | decoding chip (maybe with an increased cost) in future products
       | as a condition to stay on the platform.
        
         | conk wrote:
         | I don't think YouTube cares if you use hardware of software
         | decoding. I also don't think they care if you use their
         | hardware decoder or someone else's. The issue with roku is they
         | don't want to include any extra hardware to support vp9, and
         | they use such cheap/low spec hardware they can't reliably
         | decode in software.
        
       | qwertox wrote:
       | I wonder how YouTube's power consumption in transcoding the most
       | useless / harmful videos relates to Bitcoin's power consumption.
       | Maybe even every video should be included in the calculation,
       | since Bitcoin also has its positive aspects.
       | 
       | I've never heard about how much power YouTube's transcoding is
       | consuming, but transcoding has always been one of those very CPU-
       | intensive tasks (hence it was one of the first tasks to be moved
       | over to the GPU).
        
       ___________________________________________________________________
       (page generated 2021-05-04 23:03 UTC)