[HN Gopher] Veo 2: Our video generation model
       ___________________________________________________________________
        
       Veo 2: Our video generation model
        
       Author : mvoodarla
       Score  : 260 points
       Date   : 2024-12-16 17:04 UTC (5 hours ago)
        
 (HTM) web link (deepmind.google)
 (TXT) w3m dump (deepmind.google)
        
       | jsheard wrote:
       | Judging by how they've been trying to ram AI into YouTube
       | creators workflows I suppose it's only a matter of time before
       | they try to automate the entire pipeline from idea, to execution,
       | to "engaging" with viewers. It won't be _good_ at doing any of
       | that but when did that ever stop them.
       | 
       | https://www.youtube.com/watch?v=26QHXElgrl8
       | 
       | https://x.com/surri01/status/1867433782992879617
        
         | larodi wrote:
         | And then suddenly this is not something that fascinates people
         | anymore... in 10 years as non-synthetic becomes the new bio or
         | artisan or whatever you like.
         | 
         | Humanity has its ways of objecting accelerationism.
        
           | turnsout wrote:
           | Put another way, over time people devalue things which can be
           | produced with minimal human effort. I suspect it's less about
           | humanity's values, and more about the way money closely
           | tracks "time" (specifically the duration of human effort).
        
             | PittleyDunkin wrote:
             | https://en.wikipedia.org/wiki/Labor_theory_of_value
        
               | turnsout wrote:
               | Yes, exactly. Marx had this right. Money is a way to
               | trade time.
        
             | EGreg wrote:
             | I strongly disagree. How many clothes do you buy that have
             | 100 thread count, and are machine-made, vs hand-knit
             | sweaters or something?
             | 
             | When did you ask people for directions, or other major
             | questions, instead of Google?
             | 
             | You can wax poetic about wanting "the human touch", but at
             | the end of the day, the market speaks -- people will just
             | prefer everything automated. Including their partners,
             | after your boyfriend can remember every little detail about
             | you, notice everything including your pupils dilating, know
             | exactly how you like it, when you like it, never get angry
             | unless it's to spice things up, and has been trained on
             | 1000 other partners, how could you go back? When robots can
             | raise children better than parents, with patience and
             | discipline and teaching them with individual attention,
             | know 1000 ways to mold their behavior and achieve healthier
             | outcomes. Everything people do is being commodified as we
             | speak. Soon it will be humor, entertainment, nursing, etc.
             | Then personal relations.
             | 
             | Just extrapolate a decade or three into the future. Best
             | case scenario: if we nail alignment, we build a zoo for
             | ourselves where we have zero power and are treated like
             | animals who have sex and eat and fart all day long. No one
             | will care about whatever you have to offer, because
             | everyone will be surrounded by layers of bots from the time
             | they are born.
             | 
             | PS: anything you write on HN can already have been written
             | by AI, pretty soon you may as well quit producing any
             | content at all. No one will care whether you wrote it.
        
               | vouaobrasil wrote:
               | > PS: anything you write on HN can already have been
               | written by AI, pretty soon you may as well quit producing
               | any content at all. No one will care whether you wrote
               | it.
               | 
               | People theoretically would care, but the internet has
               | already set up producing things to be pseudo-anonymous,
               | so we have forgotten the value of actually having a human
               | being behind content. That's why AI is so successful, and
               | it's a damn shame.
        
               | skeledrew wrote:
               | What exactly is the value of having a human behind
               | content if it gets to the point that content generated by
               | AI is indistinguishable from content generated by humans?
        
               | turnsout wrote:
               | I think "indistinguishable" is a receding horizon. People
               | are already good at picking out AI text, and AI video is
               | even easier. Even if it looks 100% realistic on the
               | surface, the content itself (writing, concept, etc) will
               | have a kind of indescribable "sameness" that will give it
               | away.
               | 
               | If there's one thing that connects all media made in
               | human history, it's that humans find humans interesting.
               | No technology (like literally no technology ever) will
               | change that.
        
               | vouaobrasil wrote:
               | The fact that anyone would ask this question is
               | incredible!
               | 
               | It's so we can in a fraction of those cases, develop real
               | relationships to others behind the content! The whole
               | point of sharing is to develop connections with real
               | people. If all you want to do is consume independently of
               | that, you are effectively a soulless machine.
        
               | realce wrote:
               | What does indistinguishable even mean here?
               | 
               | If a fish could write a novel, would you find what it
               | wrote interesting, or would it seem like a fish wrote it?
               | Humans absorb information relative to the human
               | experience, and without living a human existence the
               | information will feel fuzzy or uncanny. AI can
               | approximate that but can't live it for real. Since it is
               | a derivative of an information set, it can never truly
               | express the full resolution of it's primary source.
        
               | turnsout wrote:
               | I have both machine-made and hand-knit sweaters. In
               | general, I expect handmade clothes to be more expensive
               | than machine-made, which kinda proves my point. I never
               | said machine-made things had zero value. I said we will
               | tend to devalue them relative to more human-intensive
               | things.
               | 
               | Asking for directions is a bad example, because it takes
               | very little time for both humans and machines to give you
               | directions. Therefore it would be highly unusual for
               | anyone to pay for this service (LOL)
        
           | echelon wrote:
           | Are you kidding?
           | 
           | TikTok is one of the easiest platforms to create for, and
           | look at how much human attention it has sucked up.
           | 
           | The attention/dopamine magnet is accelerating its
           | transformation into a gravitational singularity for human
           | minds.
        
             | tokioyoyo wrote:
             | TikTok's main attraction are the people, not just the
             | videos. Trends, drama and etc. all involve real humans
             | doing real human stuff, so it's relatable.
             | 
             | I might be wrong, but AI videos are on the same path as AI
             | generated images. Cool for the first year, then "ah ok,
             | zero effort content".
        
           | gom_jabbar wrote:
           | Sure, humanity has its ways of objecting Accelerationism, but
           | the process fundamentally challenges human identity:
           | 
           | "The Human Security System is structured by delusion. What's
           | being protected there is not some real thing that is mankind,
           | it's the structure of illusory identity. Just as at the more
           | micro level it's not that humans as an organism are being
           | threatened by robots, it's rather that your self-
           | comprehension as an organism becomes something that can't be
           | maintained beyond a certain threshold of ambient networked
           | intelligence." [0]
           | 
           | See also my research project on the core thesis of
           | Accelerationism that capitalism is AI. [1]
           | 
           | [0] https://syntheticzero.net/2017/06/19/the-only-thing-i-
           | would-...
           | 
           | [1] https://retrochronic.com/
        
           | vouaobrasil wrote:
           | > Humanity has its ways of objecting accelerationism.
           | 
           | Actually, typically human objection only slows it down and
           | often it becomes a fringe movement, while the masses continue
           | to consume the lowest common denominator. Take the revival of
           | the flip phone, typewriter, etc. Sadly, technology marches on
           | and life gets worse.
        
             | adolph wrote:
             | Does life get worse for the majority of people or do the
             | fruits of new technology rarely address any individual
             | person's progress toward senescence? (The latter feels like
             | tech moves forward but life gets worse.)
        
               | vouaobrasil wrote:
               | Of course, it depends on how you define "worse". If you
               | use life expectancy, infant mortality, and disease, then
               | life has in the past gotten better (although the
               | technology of the past 20 years has RARELY contributed to
               | any of that).
               | 
               | If you use 'proximity to wild nature', 'clean air', 'more
               | space', then life has gotten worse.
               | 
               | But people don't choose between these two. They choose
               | between alternatives that give them analgesics in an
               | already corrupt society creating a series of descending
               | local maximae.
        
         | noch wrote:
         | > Judging by how they've been trying to ram AI into YouTube
         | creators workflows [...]
         | 
         | Thanks for sharing that video and post!
         | 
         | One way to think about this stuff is to imagine that you are 14
         | and starting to create videos, art, music, etc in order to
         | build a platform online. Maybe you dream of having 7 channels
         | at the same time for your sundry hobbies and building
         | audiences.
         | 
         | For that 14 year old, these tools are available everywhere by
         | default and are a step function above what the prior generation
         | had. If you imagine these tools improving even faster in
         | usability and capability than prior generations' tools did ...
         | 
         | If you are of a certain age you'll remember how we were
         | harangued endlessly about "remix culture" and how mp3s were
         | enabling us to steal creativity without making an effort at
         | being creative ourselves, about how photobashing in Photoshop
         | (pirated cracked version anyway) was not real art, etc.
         | 
         | And yet, halfway through the linked video, the speaker, who has
         | misgivings, was laughing out loud at the inventiveness of the
         | generated replies and I was reminded that someone once said
         | that one true IQ test is the ability to make other humans
         | laugh.
        
           | jsheard wrote:
           | > laughing out loud at the inventiveness of the generated
           | replies
           | 
           | Inventive is one way of putting it, but I think he was
           | laughing at how bizarre or out-of-character the responses
           | would be if he used them. Like the AI suggesting that he post
           | "it is indeed a beverage that would make you have a hard time
           | finding a toilet bowl that can hold all of that liquid" as if
           | those were his own words.
        
           | handsaway wrote:
           | "remix culture" required skill and talent. Not everyone could
           | be Girl Talk or make The Grey Album or Wugazi. The artists
           | creating those projects clearly have hundreds if not
           | thousands of hours of practice differentiating them from
           | someone who just started pasting MP3s together in a DAW
           | yesterday.
           | 
           | If this is "just another tool" then my question is: does the
           | output of someone who has used this tool for one thousand
           | hours display a meaningful difference in quality to someone
           | who just picked it up?
           | 
           | I have not seen any evidence that it does.
           | 
           | Another idea: What the pro generative AI crowd doesn't seem
           | to understand is that good art is not about _execution_ it's
           | about _making deliberate choices_. While a master painter or
           | guitarist may indeed pull off incredible technical feats,
           | their execution is not the art in and of itself, it is
           | widening the amount of choices they can make. The more and
           | more generative AI steps into the role of making these
           | choices ironically the more useless it becomes.
           | 
           | And lastly: I've never met anyone who has spent significant
           | time creating art react to generative AI as anything more
           | than a toy.
        
             | Philpax wrote:
             | > does the output of someone who has used this tool for one
             | thousand hours display a meaningful difference in quality
             | to someone who just picked it up?
             | 
             | Yes. A thousand hours confers you with a much greater
             | understanding of what it's capable of, its constraints, and
             | how to best take advantage of these.
             | 
             | By comparison, consider photography: it is ostensibly only
             | a few controls and a button, but getting quality results
             | requires the user to understand the language of the medium.
             | 
             | > What the pro generative AI crowd doesn't seem to
             | understand is that good art is not about _execution_ it's
             | about _making deliberate choices_. While a master painter
             | or guitarist may indeed pull off incredible technical
             | feats, their execution is not the art in and of itself, it
             | is widening the amount of choices they can make.
             | 
             | This is often not true, as evidenced by the pre-existing
             | fields of generative art and evolutionary art. It's also a
             | pretty reductive definition of art: viewers can often find
             | art in something with no intentional artistry behind it.
             | 
             | > I've never met anyone who has spent significant time
             | creating art react to generative AI as anything more than a
             | toy.
             | 
             | It's a big world out there, and you haven't met everyone ;)
             | Just this last week, I went to two art exhibitions in Paris
             | that involved generative AI as part of the artwork; here's
             | one of the pieces:
             | https://www.muhka.be/en/exhibitions/agnieszka-polska-
             | flowers...
        
               | noch wrote:
               | > Just this last week, I went to two art exhibitions in
               | Paris that involved generative AI as part of the artwork;
               | here's one of the pieces
               | 
               | The exhibition you shared is rather beautiful. Thank you
               | for the link!
        
             | dragonwriter wrote:
             | > If this is "just another tool" then my question is: does
             | the output of someone who has used this tool for one
             | thousand hours display a meaningful difference in quality
             | to someone who just picked it up?
             | 
             | Yes, absolutely. Not necessarily in apparent execution
             | without knowledge of intent (though, often, there, too),
             | but in the scope of meaningful choices that fhey can make
             | and reflect with the tools, yes.
             | 
             | This is probably even more pronounced with use of open
             | models than the exclusively hosted ones, because more
             | choices and controls are exposed to the user (with the
             | right toolchain) than with most exclusively-hosted models.
        
             | noch wrote:
             | > "remix culture" required skill and talent.
             | 
             | We were told that what we were doing didn't require as much
             | skill as whatever the previous generation were doing to
             | sample music and make new tracks. In hindsight, of course
             | _you_ find it easy to cite the prominent successes _that
             | you know_ from the generation. That 's arguing from
             | _survivorship bias_ and _availability bias_.
             | 
             | But those successes were never the point: the publishers
             | and artists were pissed off at the tens of thousands of
             | teenagers remixing stuff for their own enjoyment and
             | forming small yet numerous communities and subcultures
             | globally over the net. Many of us never became famous so
             | you can cite our fame as proof of skill but we made money
             | hosting parties at the local raves with beats we remixed
             | together ad hoc and that others enjoyed.
             | 
             | > The artists creating those projects clearly have hundreds
             | if not thousands of hours of practice differentiating them
             | from someone who just started pasting MP3s together in a
             | DAW yesterday.
             | 
             | But they all began as I did, by being someone who "just
             | started pasting MP3s together" in my bedroom. Darude,
             | Skrillex, Burial, and all the others simply kept doing it
             | longer than those who decided they had to get an office job
             | instead.
             | 
             | The teenagers today are in exactly the same position,
             | except with vastly more powerful tools and the entire
             | corpus of human creativity free to download, whether in the
             | public domain or not.
             | 
             | I guess in response to your "required skill and talent",
             | I'm saying that skill is something that's developed within
             | the context of the technology a generation has available.
             | But it is always developed, then viewed as such in
             | hindsight.
        
         | spankalee wrote:
         | They basically already have this:
         | https://workspace.google.com/products/vids/
        
           | cj wrote:
           | Last week I started seeing a banner in Google Docs along the
           | lines of "Create a video based on the content of this doc!"
           | with a call to action that brought me to Google Vids.
        
             | lukan wrote:
             | Hey, it's AI and so it is good, right?
             | 
             | Seriously, it sounds like something kids can have fun with,
             | or bored deskworkers. But a serious use case, at the
             | current state of the art? I doubt it.
        
         | EGreg wrote:
         | Who needs viewers anyway? Automate the whole thing. I just see
         | the endgame for the internet is
         | https://en.wikipedia.org/wiki/Dead_Internet_theory
        
       | zb3 wrote:
       | We should collectively ignore these announcements of unavailable
       | models. There are models you can use today, even in the EU.
        
         | ilaksh wrote:
         | Actually there is a pretty significant new model announced
         | today and available now: "MiniMax (Hailuo)Video-01-Live"
         | https://blog.fal.ai/introducing-minimax-hailuo-video-01-live...
         | 
         | Although I tried that and it has the same issue all of them
         | seem to have for me: if you are familiar with the face but they
         | are not really famous then the features in the video are never
         | close enough to be able to recognize the same person.
        
           | creativenolo wrote:
           | It was announced weeks ago.
           | 
           | 50 cents per video. Far more when accounting for a cherrypick
           | rate.
        
         | the8thbit wrote:
         | I don't see why, unless you think they're lying and they filmed
         | their demos, or used some other preexisting model. I didn't
         | ignore the JWST launch just because I haven't been granted to
         | ability to use the telescope.
        
           | zb3 wrote:
           | Back when Imagen was not public, they didn't properly
           | validate whether you were a "trusted tester" on the backend,
           | so I managed to generate a few images..
           | 
           | ..and that's when I realized how much cherry picking we have
           | in these "demos". These demos are about deceiving you into
           | thinking the model is much better than it actually is.
           | 
           | This promotes not making the models available, because people
           | then compare their extrapolation of demo images with the
           | actual outputs. This can trick people into thinking Google is
           | winning the game.
        
       | tauntz wrote:
       | Google being Google:
       | 
       | > VideoFX isn't available in your country yet.
        
         | jjbinx007 wrote:
         | Give it a few months and it'll get cancelled
        
           | warkdarrior wrote:
           | Why would the country get cancelled?
        
             | Jabrov wrote:
             | He means the project, obviously
        
         | ilaksh wrote:
         | Don't worry, even if it was "available" in your country, it's
         | not really available. I am in the US and I just see a waitlist
         | sign up.
        
       | xnx wrote:
       | This looks great, but I'm confused by this part:
       | 
       | > Veo sample duration is 8s, VideoGen's sample duration is 10s,
       | and other models' durations are 5s. We show the full video
       | duration to raters.
       | 
       | Could the positive result for Veo 2 mean the raters like longer
       | videos? Why not trim Veo 2's output to 5s for a better controlled
       | test?
       | 
       | I'm not surprised this isn't open to the public by Google yet,
       | there's a huge amount of volunteer red-teaming to be done by the
       | public on other services like hailuoai.video yet.
       | 
       | P.S. The skate tricks in the final video are delightfully insane.
        
         | echelon wrote:
         | > I'm not surprised this isn't open to the public by Google
         | yet,
         | 
         | Closed models aren't going to matter in the long run. Hunyuan
         | and LTX both run on consumer hardware and produce videos
         | similar in quality to Sora Turbo, yet you can train them and
         | prompt them on anything. They fit into the open source
         | ecosystem which makes building plugins and controls super easy.
         | 
         | Video is going to play out in a way that resembles images.
         | Stable Diffusion and Flux like players will win. There might be
         | room for one or two Midjourney-type players, but by and large
         | the most activity happens in the open ecosystem.
        
           | sorenjan wrote:
           | > Hunyuan and LTX both run on consumer hardware
           | 
           | Are there other versions than the official?
           | 
           | > An NVIDIA GPU with CUDA support is required. > Recommended:
           | We recommend using a GPU with 80GB of memory for better
           | generation quality.
           | 
           | https://github.com/Tencent/HunyuanVideo
           | 
           | > I am getting CUDA out of memory on an Nvidia L4 with 24 GB
           | of VRAM, even after using the bfloat16 optimization.
           | 
           | https://github.com/Lightricks/LTX-Video/issues/64
        
             | jcims wrote:
             | Yes. Lots of folks on reddit running it on 24gb cards.
        
             | jokethrowaway wrote:
             | Yes you can, with some limitations
             | 
             | https://github.com/Tencent/HunyuanVideo/issues/109
        
           | dyauspitr wrote:
           | Stable Diffusion and Flux did not win though. Midjourney and
           | chatGPT won.
        
             | griomnib wrote:
             | "Won" what exactly? I have no issues running stable
             | diffusion locally.
             | 
             | Since Llama3.3 came out it is my first stop for coding
             | questions, and I'm only using closed models when llama3.3
             | has trouble.
             | 
             | I think it's fairly clear that between open weights and
             | LLMs plateauing, the game will be who can build what on top
             | of largely equivalent base models.
        
               | dyauspitr wrote:
               | The quality for SD is no where near the clear leaders.
        
           | WillyWonkaJr wrote:
           | I wonder if the more decisive aspect is the data, not the
           | model. Will closed data win over open data?
           | 
           | With the YouTube corpus at their disposal, I don't see how
           | anyone can beat Google for AI video generation.
        
       | sigmar wrote:
       | Winning 2:1 in user preference versus sora turbo is impressive.
       | It seems to have very similar limitations to sora. For example-
       | the leg swapping in the ice skating video and the bee keeper
       | picking up the jar is at a very unnatural acceleration (like it
       | pops up). Though by my eye maybe slightly better emulating
       | natural movement and physics in comparison to sora. The blog post
       | has slightly more info:
       | 
       | >at resolutions up to 4K, and extended to minutes in length.
       | 
       | https://blog.google/technology/google-labs/video-image-gener...
        
         | BugsJustFindMe wrote:
         | > _the jar is at a very unnatural acceleration (like it pops
         | up)._
         | 
         | It does pop up. Look at where his hand is relative to the jar
         | when he grabs it vs when he stops lifting it. The hand and the
         | jar are moving, but the jar is non-physically unattached to the
         | grab.
        
         | torginus wrote:
         | It looks Sora is actually the worst performer in the
         | benchmarks, with Kling being the best and others not far
         | behind.
         | 
         | Anyways, I strongly suspect that the funny meme content that
         | seems to be the practical uses case of these video generators
         | won't be possible on either Veo or Sora, because of copyright,
         | PC, containing famous people, or other 'safety' related
         | reasons.
        
       | alsodumb wrote:
       | My theory as to why all the bigtech companies are investing so
       | much money in video generation models is simple: they are trying
       | to eliminate the threat of influencers/content creators to their
       | ad revenue.
       | 
       | Think about it, almost everyone I know rarely clicks on ads or
       | buys from ads anymore. On the other hand, a lot of people
       | including myself look into buying something advertised implicitly
       | or explicitly by content creators we follow. Say a router
       | recommended by LinusTechTips. A lot of brands started moving
       | their as spending to influencers too.
       | 
       | Google doesn't have a lot of control on these influencers. But if
       | they can get good video generations models, they can control this
       | ad space too without having human in the loop.
        
         | PittleyDunkin wrote:
         | > Think about it, almost everyone I know rarely clicks on ads
         | or buys from ads anymore.
         | 
         | I remember saying this to a google VP fifteen years ago.
         | Somehow people are still clicking on ads today.
        
         | dragonwriter wrote:
         | > Think about it, almost everyone I know rarely clicks on ads
         | or buys from ads anymore.
         | 
         | Most people have claimed not to be influenced by ads since long
         | before networked computers were a major medium for delivering
         | them.
        
         | spankalee wrote:
         | It's so much simpler than that:
         | 
         | 1) AI is a massive wave right now and everyone's afraid that
         | they're going to miss it, and that it will change the world.
         | They're not obviously wrong!
         | 
         | 2) AI is showing real results in some places. Maybe a lot of us
         | are numb to what gen AI can do by now, but the fact that it can
         | generate the videos in this post is actually astounding! 10
         | years ago it would have been borderline unbelievable. Of course
         | they want to keep investing in that.
        
         | summerlight wrote:
         | > Think about it, almost everyone I know rarely clicks on ads
         | or buys from ads anymore.
         | 
         | This is a typical tech echo chamber. There is a significant
         | number of people who make direct purchases through ads.
         | 
         | > But if they can get good video generations models, they can
         | control this ad space too without having human in the loop.
         | 
         | Looks like based on a misguided assumption. Format might have
         | significant impacts on reach, but decision factor is trust on
         | the reviewer. Video format itself does not guarantee a decent
         | CTR/CVR. It's true that those ads company find this space
         | lucrative, but they're smart enough to acknowledge this
         | complexity.
        
           | the8thbit wrote:
           | > This is a typical tech echo chamber. There is a significant
           | number of people who make direct purchases through ads.
           | 
           | Even if its not, TV ads, newspaper ads, magazine ads,
           | billboards, etc... get exactly 0 clickthrus, and yet, people
           | still bought (and continue to buy) them. Why do we act like
           | impressions are hunky-dory for every other medium, but
           | worthless for web ads?
        
         | vinayuck wrote:
         | I did not think about that angle yet but I have to admit, I
         | agree. I rarely ever even pay attention to the YT ads and kind
         | of just zone out but the recommendations by content creators I
         | usually watch are one of the main sources I keep up with new
         | products and decide what to buy.
        
       | veryrealsid wrote:
       | FWIW it feels like Google should dominate text/image -> video
       | since they have access to Youtube unfettered. Excited to see what
       | the reception is here.
        
         | paxys wrote:
         | Everyone has access to YouTube. It's safe to assume that Sora
         | was trained on it as well.
        
           | Jeff_Brown wrote:
           | All you can eat? Surely they charge a lot for that, at least.
           | And how would you even find all the videos?
        
             | griomnib wrote:
             | They already did it, and I'm guessing they were using some
             | of the various YouTube down loaders Google has been going
             | after.
        
             | HeatrayEnjoyer wrote:
             | Who says they've talked to Google about it at all?
             | 
             | I can't speak to OpenAI but ByteDance isn't waiting for
             | permission.
        
           | bangaladore wrote:
           | Does everyone have "legal" access to YouTube.
           | 
           | In theory that should matter to something like
           | Open(Closed)Ai. But who knows.
        
             | dheera wrote:
             | I mean, I have trained myself on Youtube.
             | 
             | Why can't a silicon being train itself on Youtube as well?
        
               | dmonitor wrote:
               | Because silicon is a robot. A camcorder can't catch a
               | flick with me in the theater even if I dress it up like a
               | muppet.
        
         | hirako2000 wrote:
         | They also had a good chunk of the web text indexed, millions of
         | people's email sent every day, Google scholar papers, the
         | massive Google books that digitized most ever published books
         | and even discovered transformers.
        
       | lukol wrote:
       | Last time Google made a big Gemini announcement, OpenAI owned
       | them by dropping the Sora preview shortly after.
       | 
       | This feels like a bit of a comeback as Veo 2 (subjectively)
       | appears to be a step up from what Sora is currently able to
       | achieve.
        
       | Jotalea wrote:
       | Random fact: Veo means "I see" in Spanish. Take it on any way you
       | want.
        
         | espadrine wrote:
         | Hernan Moraldo is from Argentina. That may be all there is to
         | it.
        
         | arnaudsm wrote:
         | While Video means "I see" in latin
        
       | dangan wrote:
       | Is it just me or do all these models generate everything in a
       | weird pseudo-slow motion framerate?
        
       | thatfrenchguy wrote:
       | The example of a "Renaissance palace chamber" is very
       | historically inaccurate by around a century or two, the generated
       | video looks a lot like a pastiche of Versailles from the Age en
       | Enlightenment instead. I guess that's what you get by training on
       | the internet.
        
         | ralfd wrote:
         | I watched that 10 times because the details are bonkers and I
         | find amazing that she and the candle is visible in the mirror!
         | Speaking of inaccuracy though are these
         | pencils/textmarkers/pens on the desk? ;)
        
       | Retr0id wrote:
       | Huge swathes of social media users are going to love this shit.
       | It makes me so sad.
        
       | jasonjmcghee wrote:
       | I appreciate they posted the skateboarding video. Wildly
       | unrealistic whenever he performs a trick - just morphing body
       | parts.
       | 
       | Some of the videos look incredibly believable though.
        
         | dyauspitr wrote:
         | The honey, Peruvian women, swimming dog, bee keeper, DJ etc.
         | are stunning. They're short but I can barely find any
         | artifacts.
        
         | johndough wrote:
         | It is great so see a limitations section. What would be even
         | more honest is a very large list of videos generated without
         | any cherry picking to judge the expected quality for the
         | average user. Anyway, the lack of more videos suggests that
         | there might be something wrong somewhere.
        
         | cyv3r wrote:
         | I don't know why they say the model understands physics when it
         | makes mistakes like that still.
        
         | bahmboo wrote:
         | Cracks in the system are often places where artists find the
         | new and interesting. The leg swapping of the ice skater is
         | mesmerizing in its own way. It would be useful to be able to
         | direct the models in those directions.
        
         | mattigames wrote:
         | Just pretend it's a movie about a shape shifter alien and it's
         | just trying it's best at ice skating, art is subjective like
         | that doesn't it? I bet Salvador Dali would have found those
         | morphing body parts highly amusing.
        
         | visnup wrote:
         | our only hope for verifying truth in the future is that state
         | officials give their speeches while doing kick flips and
         | frontside 360s.
        
           | markus_zhang wrote:
           | Maybe they will do more in person talks, I guess. Back to the
           | old times.
        
           | stabbles wrote:
           | sadly it's likely that video gen models will master this
           | ability faster than state officials
        
             | mikepurvis wrote:
             | Remember when the iPhone came out and BlackBerry smugly
             | advertised that their products were "tools not toys"?
             | 
             | I remember saying to someone at the time that I was pretty
             | sure iPhone was going to get secure corporate email and
             | device management faster than BlackBerry was going to get
             | an approachable UI, decent camera, or app ecosystem.
        
         | kaonwarb wrote:
         | This was my favorite of all of the videos. There's no uncanny
         | valley; it's openly absurd, and I watched it 4-5 times with
         | increasing enjoyment.
        
       | qwertox wrote:
       | OpenAI is like the super luxurious yacht all pretty and shiny,
       | while Google's AI department is the humongous nuclear submarine
       | at least 5 times bigger than the yacht with a relatively cool
       | conning tower, but not that spectacular to look at.
       | 
       | Like the tanker which is still steering to fully align with the
       | course people expect it to be, which they don't recognize that it
       | will soon be there and be capable of rolling over everything
       | which comes in its way.
       | 
       | If OpenAi claims they're close to having AGI, Google most likely
       | already has it and is doing its shenanigans with the US
       | government under the radar. While Microsoft are playing the cool
       | guys and Amazon is still trying to get their act together.
        
         | byyoung3 wrote:
         | google definitely does not have AGI hhaaha
        
           | JeremyNT wrote:
           | Yeah pretty bad example from parent but the point stands I
           | think... I mostly just assume that for everything ChatGPT
           | hypes/teases Google probably has something equivalent
           | internally that they just aren't showing off to the public.
        
             | YetAnotherNick wrote:
             | I know that Google's internal ChatGPT alternative was
             | significally worse than ChatGPT(confirmed both in news and
             | by Googlers) around a year back. So you might say they
             | might overtake OpenAI because of more resources, but they
             | aren't significantly ahead of OpenAI.
        
           | simultsop wrote:
           | ex-googler confirms :/
        
         | tokioyoyo wrote:
         | All it took was a good old competition that has potential to
         | steal user base from core Google search product. Nice to be
         | back to competition era of web tech.
        
         | griomnib wrote:
         | Or, using Occams Razor; Sundar is a shit CEO and is playing
         | catchup with a company largely fueled by innovations created at
         | Google but never brought to market because it would eat into
         | ads revenue.
         | 
         | That, or they have a secret super human intelligence under
         | wraps at the pentagon.
        
       | klabb3 wrote:
       | It's telling that safety and responsibility gets so much fluff
       | words, technical details are fairly extensive, but no mention of
       | the training data? It's clearly relevant for both performance and
       | ethical discussions.
       | 
       | Maybe it's just me who couldn't find it, (the website barely
       | works at all on FF iOS)..
        
         | tokioyoyo wrote:
         | Most people called that the second one of the companies stop
         | caring about safety, others will stop as well. People hate
         | being told what they're not supposed to do. And not companies
         | will go forward with abandoning their responsible use policies.
        
       | gamesbrainiac wrote:
       | This might be a dumb question to ask, but what exactly is this
       | useful for? B-Roll for YouTube videos? I'm not sure why so much
       | effort is being put into something like this when the
       | applications are so limited.
        
         | Philpax wrote:
         | Are they that limited? It's a machine that can make videos from
         | user input: it can ostensibly be used wherever you need video,
         | including for creative, technical and professional
         | applications.
         | 
         | Now, it may not be the best fit for those yet due to its
         | limitations, but you've gotta walk before you can run: compare
         | Stable Diffusion 1.x to FLUX.1 with ControlNet to see where
         | quality and controllability could head in the future.
        
         | terhechte wrote:
         | Back when computers took up a whole room, you'd also have
         | asked: "but what exactly is this useful for? B-Roll some simple
         | calculations that anybody can do with a piece of paper and a
         | pen."?
         | 
         | Think 5-10 years into the future, this is a stepping stone
        
           | alectroem wrote:
           | That's comparing apples to oranges though isn't it?
           | Generating videos is the output of the technology, not the
           | tech itself. It would be like someone asking "this computer
           | that takes up a whole room printed out ascii art, what is
           | this useful for?"
        
           | code_for_monkey wrote:
           | this is kind of an unfair comparison. Whats the endpoint of
           | generating AI videos? What can this do that is useful,
           | contributes something to society, has artistic value, etc
           | etc. We can make educational videos with a script but its
           | also pretty easy for motivated parties to do that already,
           | and its getting easier as cameras get better and smaller. I
           | think asking "whats the point of this" is at least fair.
        
             | mindwok wrote:
             | They're a way firo
        
           | carlosjobim wrote:
           | They were calculating missile trajectories, everybody
           | understood what they were useful for.
        
             | terhechte wrote:
             | https://www.lexology.com/library/detail.aspx?g=164a442a-1b9
             | 0...
        
         | hnuser123456 wrote:
         | Because it's pretty cool to be able to imagine any kind of
         | scene in your head, put it into words, then see it be made into
         | a video file that you can actually see and share and refine.
        
           | carlosjobim wrote:
           | Use your imagination.
        
         | picafrost wrote:
         | I have observed some musicians creating their own music videos
         | with tools like this.
        
           | aenvoker wrote:
           | This silly music video was put together by one person in
           | about 10 hours.
           | 
           | https://www.reddit.com/r/aivideo/comments/1hbnyi2/comment/m1.
           | ..
           | 
           | Another more serious music video also made entirely by one
           | person. https://www.youtube.com/watch?v=pdqcnRGzH5c Don't
           | know how long it took though.
        
         | krunck wrote:
         | Streaming services where there is no end to new content that
         | matches your viewing patterns.
        
           | code_for_monkey wrote:
           | this sounds awful haha
        
         | drusepth wrote:
         | We're preparing to use video generation (specifically
         | image+text => video so we can also include an initial
         | screenshot of the current game state for style control) for
         | generating in-game cutscenes at our video game studio.
         | Specifically, we're generating them at play-time in a sandbox-
         | like game where the game plays differently each time, and
         | therefore we don't want to prerecord any cutscenes.
        
           | moritonal wrote:
           | Okay, so is the aim to run this locally on a client's
           | computer or served from a cloud? How does the math work out
           | where it's not just easier at that point to render it in
           | game?
        
         | notatoad wrote:
         | in it's current state, it's already useful for b-roll, video
         | backgrounds for websites, and any other sort of "generic"
         | application where the point of the shot is just to establish
         | mood and fill time.
         | 
         | but more than anything it's useful as a stepping stone to more
         | full-featured video generation that can maintain characters and
         | story across multiple scenes. it seems clear that at some point
         | tools like this will be able to generate full videos, not just
         | shots.
        
         | jonas21 wrote:
         | If you want to train a model to have a general understanding of
         | the physical world, one way is to show it videos and ask it to
         | predict what comes next, and then evaluate it on how close it
         | was to what actually came next.
         | 
         | To really do well on this task, the model basically has to
         | understand physics, and human anatomy, and all sorts of
         | cultural things. So you're forcing the model to learn all these
         | things about the world, but it's relatively easy to train
         | because you can just collect a lot of videos and show the model
         | parts of them -- you know what the next frame is, but the model
         | doesn't.
         | 
         | Along the way, this also creates a video generation model - but
         | you can think of this as more of a nice side effect rather than
         | the ultimate goal.
        
         | tucnak wrote:
         | You really think making videos with computers is not useful? Is
         | this a joke?
        
         | wnolens wrote:
         | TV commercials / youtube ads. You don't need a video team
         | anymore to make an ad.
        
       | ible wrote:
       | That product name sucks for Veo the AI sports video camera
       | company who literally makes a product called the Veo 2.
       | (https://www.veo.co)
        
       | theorangejuica wrote:
       | Time and money are better spent on creating actual video,
       | animation, and art than this gen AI drivel.
        
       | demarq wrote:
       | just to remind everyone that state of the art was Will Smith
       | Eating Spaghetti in April of 2023
       | 
       | https://arstechnica.com/information-technology/2023/03/yes-v...
       | 
       | We're not even done with 2024.
       | 
       | Just imagine what's waiting for us in 2025.
        
       | seanvelasco wrote:
       | as OpenAI released a feature that hit Google where it hurts,
       | Google released Veo 2 to utterly destroy OpenAI's Sora.
       | 
       | Google won.
        
       | markus_zhang wrote:
       | My friend working in a TV station is already using these tools to
       | generate videos for public advertising programs. It has been a
       | blast.
        
       | stabbles wrote:
       | It's interesting they host these videos on YouTube, cause it
       | signals they're fine with AI generated content. I wonder if
       | Google forgets that the creators themselves are what makes
       | YouTube interesting for viewers.
        
       | sylware wrote:
       | Anybody does realize this is very sad?
       | 
       | Namely, so few neurons to get picture in our heads.
       | 
       | I guess, end of the world scenarios may lead us to create that
       | super intelligence with a gigantic ultra performant artificial
       | "brain".
        
       ___________________________________________________________________
       (page generated 2024-12-16 23:00 UTC)