[HN Gopher] We accidentally solved robotics by watching 1M hours...
       ___________________________________________________________________
        
       We accidentally solved robotics by watching 1M hours of YouTube
        
       Author : alexcos
       Score  : 39 points
       Date   : 2025-06-29 16:08 UTC (6 hours ago)
        
 (HTM) web link (ksagar.bearblog.dev)
 (TXT) w3m dump (ksagar.bearblog.dev)
        
       | okdood64 wrote:
       | Does YouTube allow massive scraping like this in their ToS?
        
         | dangoodmanUT wrote:
         | What ToS
        
           | bobmcnamara wrote:
           | https://www.youtube.com/static?template=terms ?
        
         | mouse_ wrote:
         | Probably not.
         | 
         | Who cares at this point? No one is stopping ML sets from being
         | primarily pirated. The current power is effectively dismantling
         | copyright for AI related work.
        
           | perching_aix wrote:
           | > The current power is effectively dismantling copyright for
           | AI related work.
           | 
           | Out of the loop apparently, could you elaborate? By "the
           | current power" I take you mean the current US administration?
        
             | bgwalter wrote:
             | Trump fired the head of the copyright office:
             | 
             | https://www.heise.de/en/news/After-criticism-of-AI-
             | training-...
             | 
             | The "Big Beautiful Bill" contains a clause that prohibits
             | state "AI" legislation.
             | 
             | Trump has a "Crypto and AI czar" who is very active in
             | promoting "AI" on his YouTube propaganda outlet. The same
             | czar also promoted, pre-election of course, accelerated
             | peace with Russia and then stopped talking about the
             | subject altogether.
        
               | perching_aix wrote:
               | Oh wow okay, genuinely missed these. Thanks.
        
           | snickerdoodle12 wrote:
           | > Who cares at this point
           | 
           | Anyone who has a shred of integrity. I'm not a fan of
           | overreaching copyright laws, but they've been strictly
           | enforced for years now. Decades, even. They've ruined _many_
           | lives, like how they killed Aaron Swartz.
           | 
           | But now, suddenly, violating copyright is totally okay and
           | carries no consequences whatsoever because the billionaires
           | decided that's how they can get richer now?
           | 
           | If you want to even try to pretend you don't live in a
           | plutocracy and that the rule of law matters at all these
           | developments should concern you.
        
         | MaxPock wrote:
         | They don't and neither do I allow my site - whose content I
         | found on Gemini -scraped
        
         | klysm wrote:
         | I don't think they can legally prevent it
        
         | perching_aix wrote:
         | My "lawyer" (gpt4o) claims that since YouTube is merely a non-
         | exclusive licensee of the user content upload to their service,
         | even if they have such restrictions in their ToS (they do),
         | they likely would not hold up in court, citing [0]. Something
         | about that non-exclusivity meaning they cannot constrain the
         | copyright further on their own terms. Which I guess makes
         | sense?
         | 
         | And since scraping of publicly available data is not illegal
         | (in the US, according to the aforementioned "lawyer"), it seems
         | like it's okay?
         | 
         | Not legal advice.
         | 
         | [0]
         | https://www.skadden.com/insights/publications/2024/05/distri...
        
       | rzzzt wrote:
       | Friendly unit conversion man at your service: 114 years.
        
         | isoprophlex wrote:
         | How much is that in football fields?
        
           | forks wrote:
           | If you accept 30 years as the average lifespan of an nfl
           | stadium, 3.8
        
         | ReptileMan wrote:
         | So a half zoom meeting... or 1/3 Teams one.
        
           | perching_aix wrote:
           | I genuinely wish there was a cost estimation feature built
           | into them. Doesn't even have to be even remotely close to the
           | true cost if it's anything like the meetings I attend, there
           | will be enough people and it will go on for long enough to
           | make up for it.
        
             | ReptileMan wrote:
             | I worked as consultant. And started billing at normal
             | hourly rates for meetings. You will be surprised how fast
             | the company desire for my participation in them decreased.
        
               | hobs wrote:
               | Why would you do anything but that? You want to just chat
               | with me forever the rate is the rate.
        
       | contingencies wrote:
       | This is interesting for generalized problems ( _" make me a
       | sandwich"_) but not useful for most real world functions ( _"
       | perform x within y space at z cost/speed"_). I think the number
       | of people on the humanoid bandwagon trying to implement
       | generalized applications is staggering right now. The physics
       | tells you they will never be as fast as purpose-built devices,
       | nor as small, nor as cheap. That's not to say there's zero value
       | there, but really we're - uh - grasping at straws...
        
         | foobarian wrote:
         | I wonder if a generalized machine would have an advantage from
         | scale, and then putting all the specialized stuff into
         | software. We have seen this play out before.
        
         | ahmedbaracat wrote:
         | Well, there's a middle ground, kinda. Using more specialized
         | hardware (ex: cobots) but deploy state-of-art Physical AI
         | (ML/Computer Vision) on them. We're building one such startup
         | at ko-br (https://ko-br.com/) :))
        
           | contingencies wrote:
           | Quite a few startups in your space. Many deployed with
           | customers. Good luck finding a USP!
        
         | jjangkke wrote:
         | Very good point! This area faces a similar misalignment of
         | goals in that it tries to be a generic fit-all solution that is
         | rampant with today's LLMs.
         | 
         | We made a sandwich but it cost you 10x more than it would a
         | human and slower might slowly become faster and more efficient
         | but by the time you get really good at it, its simply not
         | transferable unless the model is genuinely able to make the
         | leap across into other domains that humans naturally do.
         | 
         | I'm afraid this is where the barrier of general intelligence
         | and human intelligence lies and with enough of these geospatial
         | motor skill database, we might get something that mimics humans
         | very well but still run into problems at the edge, and this
         | last mile problem really is a hinderance to so many domains
         | where we come close but never complete.
         | 
         | I wonder if this will change with some sort of computing
         | changes as well as how we interface with digital systems
         | (without mouse or keyboard), then this might be able to close
         | that 'last mile gap'.
        
           | esjeon wrote:
           | Note that the username here is a Korean derogatory term for
           | Chinese people.
        
         | jes5199 wrote:
         | analogy: a CPU is more expensive, more complicated, more energy
         | demanding than custom made circuitry, in most cases.
        
       | imranq wrote:
       | This was a bit hard to read. It would be good to have a narrative
       | structure and more clear explanation of concepts.
        
         | signal-intel wrote:
         | Very intentional. Their response would be: "if you need
         | narrative structure and clear explanation of concepts, yngmi".
        
       | richard___ wrote:
       | Solved??? Where?
        
       | pr337h4m wrote:
       | IMO, VideoMimic is a better proof-of-concept
       | 
       | https://www.videomimic.net/
       | 
       | https://www.videomimic.net/page1.html
        
         | Keyframe wrote:
         | Looks like it was trained on Shaolin Drunken Fist videos. Does
         | it look drunk because of the videos or because there's a
         | discrepancy between videos and it not accounting for gravity
         | and physics in general?
        
       | throwaway198846 wrote:
       | I wonder how much language does this model understand. If we pan
       | across text will it fill in sensible next word? How good will it
       | be?
        
       | ErrorNoBrain wrote:
       | Someone watched 'Devs' ?
       | 
       | if you havent - highly recommended.
        
         | andruby wrote:
         | Do you have a link or a less generic search term?
        
           | hshshshshsh wrote:
           | Bro chatgpt exist.
        
             | conception wrote:
             | Do we have a "let me ChatGPT that for you.." site yet?
        
           | VladVladikoff wrote:
           | It's a TV show made by Adam Garland
           | https://m.imdb.com/title/tt8134186/ It's pretty good sci fi
           | IMHO
        
       | hahaxdxd123 wrote:
       | Extremely oversold article.
       | 
       | > the core insight: predict in representation space, not pixels
       | 
       | We've been doing this since 2014? Not only that, others have been
       | doing it at a similar scale. e.g. Nvidia's world foundation
       | models (although those are generative).
       | 
       | > zero-shot generalization (aka the money shot)
       | 
       | This is easily beaten by flow-matching imitation learning models
       | like what Pi has.
       | 
       | > accidentally solved robotics
       | 
       | They're doing 65% success on very simple tasks.
       | 
       | The research is good. This article however misses a lot of other
       | work in the literature. I would recommend you don't read it as an
       | authoritative source.
        
       | accidentallfact wrote:
       | https://news.ycombinator.com/item?id=44073183
        
       | Voloskaya wrote:
       | This article contains so many falsehoods and history rewrites
       | that it's pretty painful to read.
        
       | rozab wrote:
       | I just wrote a reply to a comment talking about the AI tells this
       | writing has, but it got flagged so my comment disappeared when I
       | hit post. I'll rephrase out of spite:
       | 
       | My first thought upon reading this was that an LLM had been
       | instructed to add a pithy meme joke to each paragraph. They don't
       | make sense in context, and while some terminally online people do
       | speak in memes, those people aren't quoting doge in 2025.
       | 
       | There's also a sense of incoherence in the whole piece. For
       | instance, this section:
       | 
       | "- after: 22 million videos + 1 million images (now we're
       | talking)
       | 
       | they basically hoovered up everything: something-something v2,
       | kinetics, howto100m, and a billion youtube videos"
       | 
       | Was it a billion vids or 22m? It turns out the latter sentence is
       | just rephrasing the list of sources in a cool casual way, and the
       | last one is called YT-Temporal-1B. That's a billion frames of
       | video, not a billion videos.
        
       ___________________________________________________________________
       (page generated 2025-06-29 23:01 UTC)