[HN Gopher] We accidentally solved robotics by watching 1M hours...
___________________________________________________________________
We accidentally solved robotics by watching 1M hours of YouTube
Author : alexcos
Score : 39 points
Date : 2025-06-29 16:08 UTC (6 hours ago)
(HTM) web link (ksagar.bearblog.dev)
(TXT) w3m dump (ksagar.bearblog.dev)
| okdood64 wrote:
| Does YouTube allow massive scraping like this in their ToS?
| dangoodmanUT wrote:
| What ToS
| bobmcnamara wrote:
| https://www.youtube.com/static?template=terms ?
| mouse_ wrote:
| Probably not.
|
| Who cares at this point? No one is stopping ML sets from being
| primarily pirated. The current power is effectively dismantling
| copyright for AI related work.
| perching_aix wrote:
| > The current power is effectively dismantling copyright for
| AI related work.
|
| Out of the loop apparently, could you elaborate? By "the
| current power" I take you mean the current US administration?
| bgwalter wrote:
| Trump fired the head of the copyright office:
|
| https://www.heise.de/en/news/After-criticism-of-AI-
| training-...
|
| The "Big Beautiful Bill" contains a clause that prohibits
| state "AI" legislation.
|
| Trump has a "Crypto and AI czar" who is very active in
| promoting "AI" on his YouTube propaganda outlet. The same
| czar also promoted, pre-election of course, accelerated
| peace with Russia and then stopped talking about the
| subject altogether.
| perching_aix wrote:
| Oh wow okay, genuinely missed these. Thanks.
| snickerdoodle12 wrote:
| > Who cares at this point
|
| Anyone who has a shred of integrity. I'm not a fan of
| overreaching copyright laws, but they've been strictly
| enforced for years now. Decades, even. They've ruined _many_
| lives, like how they killed Aaron Swartz.
|
| But now, suddenly, violating copyright is totally okay and
| carries no consequences whatsoever because the billionaires
| decided that's how they can get richer now?
|
| If you want to even try to pretend you don't live in a
| plutocracy and that the rule of law matters at all these
| developments should concern you.
| MaxPock wrote:
| They don't and neither do I allow my site - whose content I
| found on Gemini -scraped
| klysm wrote:
| I don't think they can legally prevent it
| perching_aix wrote:
| My "lawyer" (gpt4o) claims that since YouTube is merely a non-
| exclusive licensee of the user content upload to their service,
| even if they have such restrictions in their ToS (they do),
| they likely would not hold up in court, citing [0]. Something
| about that non-exclusivity meaning they cannot constrain the
| copyright further on their own terms. Which I guess makes
| sense?
|
| And since scraping of publicly available data is not illegal
| (in the US, according to the aforementioned "lawyer"), it seems
| like it's okay?
|
| Not legal advice.
|
| [0]
| https://www.skadden.com/insights/publications/2024/05/distri...
| rzzzt wrote:
| Friendly unit conversion man at your service: 114 years.
| isoprophlex wrote:
| How much is that in football fields?
| forks wrote:
| If you accept 30 years as the average lifespan of an nfl
| stadium, 3.8
| ReptileMan wrote:
| So a half zoom meeting... or 1/3 Teams one.
| perching_aix wrote:
| I genuinely wish there was a cost estimation feature built
| into them. Doesn't even have to be even remotely close to the
| true cost if it's anything like the meetings I attend, there
| will be enough people and it will go on for long enough to
| make up for it.
| ReptileMan wrote:
| I worked as consultant. And started billing at normal
| hourly rates for meetings. You will be surprised how fast
| the company desire for my participation in them decreased.
| hobs wrote:
| Why would you do anything but that? You want to just chat
| with me forever the rate is the rate.
| contingencies wrote:
| This is interesting for generalized problems ( _" make me a
| sandwich"_) but not useful for most real world functions ( _"
| perform x within y space at z cost/speed"_). I think the number
| of people on the humanoid bandwagon trying to implement
| generalized applications is staggering right now. The physics
| tells you they will never be as fast as purpose-built devices,
| nor as small, nor as cheap. That's not to say there's zero value
| there, but really we're - uh - grasping at straws...
| foobarian wrote:
| I wonder if a generalized machine would have an advantage from
| scale, and then putting all the specialized stuff into
| software. We have seen this play out before.
| ahmedbaracat wrote:
| Well, there's a middle ground, kinda. Using more specialized
| hardware (ex: cobots) but deploy state-of-art Physical AI
| (ML/Computer Vision) on them. We're building one such startup
| at ko-br (https://ko-br.com/) :))
| contingencies wrote:
| Quite a few startups in your space. Many deployed with
| customers. Good luck finding a USP!
| jjangkke wrote:
| Very good point! This area faces a similar misalignment of
| goals in that it tries to be a generic fit-all solution that is
| rampant with today's LLMs.
|
| We made a sandwich but it cost you 10x more than it would a
| human and slower might slowly become faster and more efficient
| but by the time you get really good at it, its simply not
| transferable unless the model is genuinely able to make the
| leap across into other domains that humans naturally do.
|
| I'm afraid this is where the barrier of general intelligence
| and human intelligence lies and with enough of these geospatial
| motor skill database, we might get something that mimics humans
| very well but still run into problems at the edge, and this
| last mile problem really is a hinderance to so many domains
| where we come close but never complete.
|
| I wonder if this will change with some sort of computing
| changes as well as how we interface with digital systems
| (without mouse or keyboard), then this might be able to close
| that 'last mile gap'.
| esjeon wrote:
| Note that the username here is a Korean derogatory term for
| Chinese people.
| jes5199 wrote:
| analogy: a CPU is more expensive, more complicated, more energy
| demanding than custom made circuitry, in most cases.
| imranq wrote:
| This was a bit hard to read. It would be good to have a narrative
| structure and more clear explanation of concepts.
| signal-intel wrote:
| Very intentional. Their response would be: "if you need
| narrative structure and clear explanation of concepts, yngmi".
| richard___ wrote:
| Solved??? Where?
| pr337h4m wrote:
| IMO, VideoMimic is a better proof-of-concept
|
| https://www.videomimic.net/
|
| https://www.videomimic.net/page1.html
| Keyframe wrote:
| Looks like it was trained on Shaolin Drunken Fist videos. Does
| it look drunk because of the videos or because there's a
| discrepancy between videos and it not accounting for gravity
| and physics in general?
| throwaway198846 wrote:
| I wonder how much language does this model understand. If we pan
| across text will it fill in sensible next word? How good will it
| be?
| ErrorNoBrain wrote:
| Someone watched 'Devs' ?
|
| if you havent - highly recommended.
| andruby wrote:
| Do you have a link or a less generic search term?
| hshshshshsh wrote:
| Bro chatgpt exist.
| conception wrote:
| Do we have a "let me ChatGPT that for you.." site yet?
| VladVladikoff wrote:
| It's a TV show made by Adam Garland
| https://m.imdb.com/title/tt8134186/ It's pretty good sci fi
| IMHO
| hahaxdxd123 wrote:
| Extremely oversold article.
|
| > the core insight: predict in representation space, not pixels
|
| We've been doing this since 2014? Not only that, others have been
| doing it at a similar scale. e.g. Nvidia's world foundation
| models (although those are generative).
|
| > zero-shot generalization (aka the money shot)
|
| This is easily beaten by flow-matching imitation learning models
| like what Pi has.
|
| > accidentally solved robotics
|
| They're doing 65% success on very simple tasks.
|
| The research is good. This article however misses a lot of other
| work in the literature. I would recommend you don't read it as an
| authoritative source.
| accidentallfact wrote:
| https://news.ycombinator.com/item?id=44073183
| Voloskaya wrote:
| This article contains so many falsehoods and history rewrites
| that it's pretty painful to read.
| rozab wrote:
| I just wrote a reply to a comment talking about the AI tells this
| writing has, but it got flagged so my comment disappeared when I
| hit post. I'll rephrase out of spite:
|
| My first thought upon reading this was that an LLM had been
| instructed to add a pithy meme joke to each paragraph. They don't
| make sense in context, and while some terminally online people do
| speak in memes, those people aren't quoting doge in 2025.
|
| There's also a sense of incoherence in the whole piece. For
| instance, this section:
|
| "- after: 22 million videos + 1 million images (now we're
| talking)
|
| they basically hoovered up everything: something-something v2,
| kinetics, howto100m, and a billion youtube videos"
|
| Was it a billion vids or 22m? It turns out the latter sentence is
| just rephrasing the list of sources in a cool casual way, and the
| last one is called YT-Temporal-1B. That's a billion frames of
| video, not a billion videos.
___________________________________________________________________
(page generated 2025-06-29 23:01 UTC)