[HN Gopher] Lumiere: A space-time diffusion model for realistic ...
       ___________________________________________________________________
        
       Lumiere: A space-time diffusion model for realistic video
       generation
        
       Author : jonbaer
       Score  : 295 points
       Date   : 2024-01-24 05:49 UTC (17 hours ago)
        
 (HTM) web link (lumiere-video.github.io)
 (TXT) w3m dump (lumiere-video.github.io)
        
       | codetrotter wrote:
       | Their GitHub doesn't have anything other than the linked page
       | currently
       | 
       | https://github.com/lumiere-video
       | 
       | Nor did they claim it would. But I had to check anyway, and there
       | wasn't any link I could see to the GitHub profile. So here's a
       | link for anyone else that wants to check and don't want to type
       | the url of their profile manually from looking at the hosted
       | website url.
        
         | gardnr wrote:
         | A popular move with ai / ml folks: Use GitHub to publish
         | information about a thing that is not open. Then "It's on
         | GitHub".
        
           | sho_hn wrote:
           | I see this a lot as well, and I think we really ought to call
           | it out more often. It should be clear whether a GitHub
           | publication enables downstream use or contribution.
        
           | whywhywhywhy wrote:
           | So tired of the academia brained ML researchers. Can't wait
           | for the next generation of teenagers to completely change
           | this space and bypass this silliness completely.
        
             | jampekka wrote:
             | Isn't this more for-profit-corporation brained ML
             | researchers? ML from academia tends to be released open
             | source nowadays.
        
             | Der_Einzige wrote:
             | There are few better ways to get a cushy 300K a year plus
             | jobs, and publishing Ml research is one of those ways. The
             | new generation will simply do more publishing.
        
       | smusamashah wrote:
       | The examples are lot more consistent and longer than other
       | techniques we have seen before. Legs are not sliding on the floor
       | as much as they do with other models. On the other hand, human
       | faces didn't look good. e.g. the mona lisa smiling.
       | 
       | To me this looks like the first good video generation model.
       | 
       | EDIT: Just noticed its by Google, NVM, will never be released
       | publicly.
        
         | ithkuil wrote:
         | No but researchers will build on this research, as researchers
         | do and eventually some company will run a successful product
         | based on the result of a lot of research that includes this and
         | we'll be bitching about google falling behind.
         | 
         | Google is sponsoring a lot of cutting edge research and sharing
         | it openly. How cool is that? How long will it last?
        
         | i-use-nixos-btw wrote:
         | If it were to be released publicly, I'd give it a week before
         | NSFW models based on it were uploaded to Civitai.
        
           | Frost1x wrote:
           | A lot of current AI techniques are making people reevaluate
           | their perspectives on free speech.
           | 
           | We seem to value freedom of speech (and expression) only to a
           | tipping point that it begins to invade other aspects of life.
           | So far the noise and rate has been low enough people at large
           | support free speech but newer information techniques are
           | making it possible to generate a lot more realistic noise
           | (faux signal, if you will) at higher rates (it's becoming
           | cheaper and easier to do and scale).
           | 
           | So while you certainly have a point I mostly agree with,
           | we're letting private entities policies dictate the
           | limitations of expression, at least for the time being (until
           | someone comes along and makes these widely available for free
           | or cheap without such ethical policies). It does go to show
           | just how much sway industries have on markets through their
           | policies with no public oversight, which to me is concerning.
        
             | kjqgqkejbfefn wrote:
             | I've been experimenting with story generation/RP with
             | ChatGPT and now use jailbreaks systematically because it
             | makes the stories so much better. It's not just about
             | what's allowed or not, but what's expressed by default.
             | Without jailbreaks ChatGPT will always give narration a
             | positive twist, let alone inject the same sponsored themes
             | of environmentalism and feminism. Nothing wrong with that.
             | But I don't want 1/3rd of my stories to revolve around
             | these thematics.
        
               | kridsdale1 wrote:
               | Similarly I wanted to use it to illustrate my friend's
               | wizard character using a gorgon head to freeze some giant
               | evil bees.
               | 
               | The OpenAI content policies are pretty strictly opposed
               | to the holding and wielding of severed heads.
        
               | lbeltrame wrote:
               | I got lectured by Bard when I asked about help to improve
               | the description of an action scene, which involves people
               | getting hurt (at least the losing side) even if
               | marginally. I suppose you can still jailbreak ChatGPT? I
               | didn't know it was still a thing.
        
               | alec_irl wrote:
               | Sincere question - and maybe I'm missing the point here -
               | but why not just write stories yourself?
        
               | kjqgqkejbfefn wrote:
               | I'm trying to build a text-based open-world massively
               | multiplayer game in the style of GTA. Trying. It's really
               | difficult. My bet is on driving the game with narration
               | so my prompts are fueled with abstract notions borrowed
               | from the various theories in
               | https://en.wikipedia.org/wiki/Narratology, and this is
               | why I complain about ChatGPT's default ideas.
        
               | gs17 wrote:
               | > Nothing wrong with that.
               | 
               | The themes maybe, but the forced positivity is
               | frustrating. Trying to get stock ChatGPT to run a DnD-
               | type encounter is hilarious because it's so opposed to
               | initiating combat.
        
               | devbent wrote:
               | You can easily prompt gpt to write dark stories. When
               | asked to write in the style of game of thrones gpt 3.5
               | will happily write about people doing horrible things to
               | each other.
               | 
               | > Without jailbreaks ChatGPT will always give narration a
               | positive twist
               | 
               | Most modern stories in Western literature have a positive
               | twist. It is only natural that gpt's output will reflect
               | that!
        
             | throwuwu wrote:
             | I don't see why freedom of speech would be impacted by
             | this. Existing laws around copyright and libel will need to
             | be applied and litigated on a case by case basis but they
             | should cover the malicious uses. Anything that falls
             | outside of that is just noise and we have plenty of noise
             | already.
             | 
             | Even if we wind up at a point where no one trusts photos or
             | videos is that really a disaster? Blindly trusting a photo
             | or video that someone else, especially some anonymous
             | account, gives you is a terrible way to shape your
             | perception of the world. Ensuring that less people default
             | to trusting random videos may even be good for society. It
             | would force you to think about where the video came from,
             | if it's corroborated by other reports from various sources
             | and if you're able to verify the events through other
             | channels available to you. You have to do the same work
             | when evaluating any other claim after all.
        
           | turnsout wrote:
           | Eventually it will happen--if not this model, another one. AI
           | is going to absolutely decimate the porn industry.
        
             | whamlastxmas wrote:
             | Agreed - being able to watch a porn video and change
             | anything on the fly is going to be wild. Bigger boobs,
             | different eye color, speaking different language, etc.
        
         | gardnr wrote:
         | I wonder how many of the samples from this demo video are
         | authentic:
         | 
         | https://arstechnica.com/information-technology/2023/12/googl...
        
         | Archelaos wrote:
         | > e.g. the mona lisa smiling
         | 
         | This was not Leonardo da Vinci's "Mona Lisa"[1], but Johannes
         | Vermeer's "Girl with a Pearl Earring"[2].
         | 
         | [1] https://en.wikipedia.org/wiki/Mona_Lisa
         | 
         | [2] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring
        
           | idiotsecant wrote:
           | Keep scrolling.
        
             | Archelaos wrote:
             | Ah, I see. On my screen resolution, this image was hidden
             | in a carousel.
        
       | feverzsj wrote:
       | As realistic as my blurry dream.
        
         | yard2010 wrote:
         | Give it a break, until 2 years ago you wouldn't even dream
         | about a fraction of what's out already
         | 
         | Everything is relative!
        
       | wiz21c wrote:
       | newb question:
       | 
       | Do these models actually learn a 3D representation or do they
       | just learn "something" that is good enough to produce an very
       | convincing impression of 3D ?
       | 
       | Subquestion: if they don't learn 3D, can we say that models
       | learning a 3D representation first will lead to even better
       | productions ?
        
         | astrange wrote:
         | > Do these models actually learn a 3D representation or do they
         | just learn "something" that is good enough to produce a very
         | convincing impression of 3D ?
         | 
         | The second, but at the limit it's the same thing of course.
         | 
         | > Subquestion: if they don't learn 3D, can we say that models
         | learning a 3D representation first will lead to even better
         | productions ?
         | 
         | Generally speaking manual feature engineering almost always
         | turns out to be a waste of time if you can just make the model
         | bigger; this is called "the bitter lesson".
        
         | l33tman wrote:
         | It has been shown that at least still-image generators learn a
         | 3D representation internally and uses it to bootstrap their
         | generation. If you think about it this is the only way they can
         | be so good at shadows and reflections, perspective and lighting
         | etc.
        
       | sho_hn wrote:
       | With the weird creepy dream-like nature of these little AI video
       | gen samples, I'm perpetually disappointed that none of these
       | papers ever include a "dreaming of electric sheep" prompt as an
       | easter egg.
        
       | 55555 wrote:
       | The negative comments here shock me. This is the most amazing
       | text-to-video we've ever seen, by a longshot. It's good enough
       | for many uses. Absolutely mindblowing. Great job to the people
       | who worked on this.
        
         | boesboes wrote:
         | What negative comments?
        
           | saurik wrote:
           | Yeah... I have only found a single negative comment about the
           | quality--"As realistic as my blurry dream."--and it comes
           | across as more of a cynical joke than a true negative review.
        
         | eurekin wrote:
         | Just chiming in to second your opinion.
         | 
         | Years ago, I wouldn't even dare to dream it would be possible.
         | It's nowhere near, what people are used to watch normally, but
         | the fact it's even trying to compete is insane.
        
         | danielbln wrote:
         | I'm excited about these text 2 video models, what I'm not
         | excited about is that it's Google publishing this. That means,
         | no code, nothing deployed to try and most likely we will never
         | ever hear about this ever again, or maybe it quietly hits
         | Vertex AI in 2 years (like Imagen) and no-one will care.
         | 
         | Also, ever since the Gemini marketing video shenanigans, I
         | don't really feel like trusting whatever Google's research says
         | they have, if I can't test it myself.
        
           | sho_hn wrote:
           | > Also, ever since the Gemini marketing video shenanigans, I
           | don't really feel like trusting whatever Google's research
           | says they have, if I can't test it myself.
           | 
           | The video was released by Google product marketing for a
           | launch to customers, not research.
           | 
           | I'm still somewhat confused by this one. I understand the
           | community has decided to be harsh on Google for that video to
           | draw a line - fair, truth in advertising, etc. -, but at the
           | same time, we all had an understanding of where that tech is
           | at currently and the pace it progresses at. Did anyone
           | watching it really assume it was realtime? Can we not
           | differentiate between technical publications and marketing
           | anymore? Do we have to vilify everyone in an R&D department
           | for the sins of the product marketing wing?
        
             | ImprobableTruth wrote:
             | The issue isn't that it's not real-time/sped-up, it's that
             | it doesn't actually take video as input, but multiple hand
             | picked stills.
        
             | whywhywhywhy wrote:
             | We're all harsh on it because we all had to see it being
             | posted around by naive people as being amazing when it's
             | completely faked and misses half of the prompting, all of
             | the latency.
             | 
             | It was completely dishonest. Considering how trash Googles
             | actual AI products are they deserve to be dragged even more
             | over that video.
        
               | IshKebab wrote:
               | How is it completely faked? The video didn't give me the
               | impression that the results were calculated instantly, or
               | that no prompts were required.
        
           | sjwhevvvvvsj wrote:
           | "PoC or GTFO" as they say.
        
             | addandsubtract wrote:
             | PapersWithCode or GTFO.
             | 
             | [0] https://paperswithcode.com/
        
               | baldgeek wrote:
               | 2 clicks from the Posted Link: "Read Paper", then "Code,
               | Data and Media" tab will get you the dataset used
               | (https://paperswithcode.com/dataset/ucf101)
        
               | sjwhevvvvvsj wrote:
               | Well in the Ai/ML era maybe "models or gtfo" is better.
               | Training data is just common crawl for half these LMs.
        
         | whywhywhywhy wrote:
         | How can you get excited about a company that's shipped enough
         | pieces of research you can count them on one hand during the 12
         | or so years they've been telling us about their work.
         | 
         | Google can publish whatever research they want, literally
         | doesn't matter literally changes nothing because they can't
         | turn it into a product anyone can use and never will.
        
           | ryandvm wrote:
           | Indeed. These recent AI demos are pretty damn impressive
           | (even knowing there are smoke and mirrors), but it's hard to
           | get excited about what's happening with their R&D when my
           | Google Home device seems to be regressing on a daily basis.
           | It is now basically is only useful for alarms and timers.
        
             | lbeltrame wrote:
             | Perhaps OT but I see often these comments on HN. How do
             | these devices (I don't own one) lose functionality over
             | time? Features removed through updates?
        
               | pbronez wrote:
               | Yes.
               | 
               | The home assistant speakers aren't making enough money to
               | justify the large teams behind them. Thus we've seen
               | significant layoffs on those teams in the past year.
               | 
               | BigCos are looking for other ways to reduce costs.
               | Killing features is one way to do it.
               | 
               | There have also been situations where a feature is
               | removed because of legal action; lawsuits alleging the
               | features violates a patent.
               | 
               | Live updates givith, live updates taketh away!
        
           | smoldesu wrote:
           | I'm excited because I despise Google's products anyways and
           | would rather use the research myself. Did that with Google's
           | _BERT_ model a few years back to make a particularly clueless
           | Discord bot.
        
           | sjwhevvvvvsj wrote:
           | Hey now, Google is going to use these technologies to fire
           | their own employees to save costs for the next quarterly
           | earnings call!
           | 
           | Of course building products for actual users is no longer a
           | "thing", but think of the stock price.
        
         | tomcam wrote:
         | Agreed. And the dancing bear is an instant classic.
        
         | endisneigh wrote:
         | Well it's Google and a lot of folks foam at the mouth at Google
         | so it's no surprise. They can't dissociate the research from
         | the creator.
        
         | heisgone wrote:
         | It's indeed impressive. Stable diffusion is progressing so
         | fast. That being said, I find myself picking more and more cues
         | that an images is AI-generated. There is a feel to it. It's not
         | different than the best movie CGI. As Christopher Nolan
         | pointed-out, no matter how good it is, it's not the real deal.
        
         | malka wrote:
         | It's by google. It will rot somewhere, never to be used.
        
       | throwuwu wrote:
       | We will see the first feature length AI generated movie this
       | year. If you think I'm crazy then consider that even way back at
       | the dawn of cinema the average shot length was 12 seconds and
       | today it is only 2.5 seconds.
       | 
       | There are a few important techniques to be refined such as
       | keeping consistent subjects between generations but I could see
       | many inconsistencies being made up for by applying existing
       | methods such as separating the layers based on depth allowing
       | more static images to be used or creating simple 3D models with
       | textures where more depth is needed. With enough effort and skill
       | someone could probably do it with existing technologies.
        
         | __loam wrote:
         | It will probably be utter dogshit like every other piece of
         | media people are pumping out with this crap.
        
           | throwuwu wrote:
           | 90 percent of everything is crap but I've seen plenty of
           | creative people make compelling films with digital tools.
           | This technology puts that capability within each of people
           | who aren't also 3D modellers or graphic artists so we're
           | bound to get more output, good and bad. Same deal with when
           | film cameras became cheap and widely available or digital
           | cameras or iPhones.
        
         | seydor wrote:
         | why would we make a "movie" instead of one storyline where
         | viewers can customize the costumes at will?
        
         | felipeerias wrote:
         | It's easy to imagine a film maker creating multiple draft
         | versions of a movie to polish the script and the
         | cinematography, similar to how now they use storyboards.
        
       | rysertio wrote:
       | The amount of computing resources it's going to take to retrain
       | the model is enormous. So the most of us will have to wait for a
       | big company to publish or leak their weights until we get to use
       | anything written in the paper.
        
       | qwertox wrote:
       | If this isn't the bell ringing, announcing that an entire
       | industry will soon collapse, then I don't know what could
       | announce it more clearly.
       | 
       | I give it 5 years until it has been normalized to see AI
       | generated TV/YT ads, 10 to 15 when traditionally made ones will
       | be in the minority.
       | 
       | In the beginning just a bunch of geeks in front of computers
       | crafting the prompts, later everyone will be able to make it.
       | 
       | It will probably be access to computing resources which will be
       | the limiting factor.
        
         | tetris11 wrote:
         | Yep, and my cynical side is just hoping that the GPU vendors
         | aren't going to deliberately limit the number of user-
         | accessible resources there are to force people to depend on
         | their cloud platforms.
        
           | __loam wrote:
           | There's probably going to be more and more specialized
           | hardware for this stuff. Things like H100s are already pretty
           | inaccessible to consumers.
        
         | dkjaudyeqooe wrote:
         | That sounds like a great increase in productivity.
         | 
         | But also you're making the mistake of extrapolating against the
         | realities of the techniques.
         | 
         | Things may improve over time but prompts and random seeds
         | aren't great for detailed work, so there are limitations which
         | seriously limit the usefulness. "Everyone will be able to make
         | it" is likely true, but the specialist stuff will likely remain
         | and those users will likely be made more productive. It's those
         | in the middle that will lose out.
         | 
         | That an industry is destroyed is neither here nor there. Sucks
         | to have your business/job taken away but that's how the system
         | works. That which created your business also will destroy it.
        
           | wegfawefgawefg wrote:
           | Have you played with control net over comfyui? Try it. You
           | can pose arbitrary figures. Theres gonna be full kits that
           | provide control over every aspect of generation.
        
         | coldcode wrote:
         | OK, let's see it make full-sized videos first; making tiny demo
         | videos is a long way from showing it at 4K. Also, let's see the
         | entire paper and note how many computing resources were
         | required to build the models. Until everyone can try it for
         | themselves, we have no idea how cherry-picked the examples
         | were.
        
         | rexreed wrote:
         | I give it 12 months until the Pharmaceutical industry starts
         | using this in a significant way. Currently, most Pharma ads on
         | TV look like stock footage of random people doing random things
         | with text and voice-over. So AI-generated? Sure, as if people
         | are even watching the video action in any detail at all in
         | Pharma ads. AI gen video companies that focuses on pharma will
         | rake it in for sure in the short term.
         | 
         | [video prompt: Two elderly people taking a stroll on a
         | boardwalk, partaking in various boardwalk activities.] [AI gen
         | voice: Suffering from chronic blorgoriopsy? Try Neuvoplaxadip
         | by Excelon pharamceuticals. Reported side effects include...
         | Ask your doctor.]
        
           | Filligree wrote:
           | Well, that would be a US only thing. I don't think you can
           | build an industry on that.
        
             | Xirgil wrote:
             | What? The industry already exists. There's clearly money
             | there. The idea that you can't have an industry just
             | because it's specific to the richest country on earth is
             | silly.
        
         | kouru225 wrote:
         | I think you're overestimating how useful this is. Just like
         | image AI, this stuff will only be useful in combination with
         | existing techniques.
        
         | Escapado wrote:
         | I just want to feed an LLM hunter x hunter episodes and get out
         | new ones.
         | 
         | But on a more serious note, I vividly remember when GANs were
         | the next big thing when I was in university and the output
         | quality and variability was laughable compared to what
         | midjourney and the likes can produce today (my mind was still
         | blown back then). So I would be in no way suprised if we got to
         | a point in the next decade where we have "midjourney" for video
         | generation. So I wholeheartedly agree.
         | 
         | I also think the computational problem is tackled from so many
         | angles in the field of ML. You have nvidia releasing absolute
         | beasts of GPUs, some promising start ups pushing for
         | specialized hardware, a new paper on more optimized training
         | methods every week, mamba bursting on the scene, higher quality
         | data sets, merging of models, framework optimizations here and
         | there. Just the other day I think I saw a post here about
         | locally running larger LLMs. Stable Diffusion is already
         | available for iPhones at acceptable qualities and speed (given
         | the devices power).
         | 
         | What I wonder about the most though is whether we will get more
         | robust orchestration of different models or multi modal models.
         | It's one thing to have a model which given a text prompt
         | generates a short video snippet. But what if I instruct my
         | model(s) to come up with a new ad for a sports drink and
         | they/it does research, consolidates relevant data about the
         | target group, comes up with a proper script for an ad, creates
         | the ad, figures out an evaluation strategy for the ad, applies
         | it and eventually gives me back a "well thought out" video. And
         | all I had to do was provide a little bit of an intro and then
         | let the thing do its magic for an hour. I know we have lang
         | chain and baby AGI but they are not as robust as they would
         | need to be to displace a bunch of jobs just yet (but I assume
         | they will soon enough).
        
         | kranke155 wrote:
         | It is the bell ringing. I work in CGI for advertising and this
         | is clearly going the way of still genAI.
         | 
         | Single image genAI went from unusable to indistinguishable from
         | reality in 18-24 months.
        
         | wegfawefgawefg wrote:
         | I already have seen ai scenes in tv ads and anime. half the
         | youtube thumbnails i see sre ai now. So.. might not even be
         | five years. might be 2.
        
       | richrichardsson wrote:
       | The video inpainting is interesting. My kids were watching old
       | Spongebob episodes recently and the 4:3 aspect ratio was jarring
       | to me. I thought it would be an interesting use case to in-paint
       | the side borders to bring it back into 16:9 aspect, but I suppose
       | it would need some careful fine-tuning with some kind of look-
       | ahead for objects that enter frame from the sides.
        
         | araes wrote:
         | That actually sounds like a product somebody in the television
         | and movie industry might buy.
         | 
         | Dynamic adjustment of fixed aspect ratio film imagery to non-
         | native sizes without stretch or obvious distortion. Guess all
         | the added edges accurately enough that audiences won't notice.
         | 
         | 4:3 <-> 16:9 <-> 143:100 (IMAX) <-> 11:8 (Academy) <-> 3:2
         | (35mm) <-> 16:10 (tablets/desktops)
         | 
         | Make a new movie look like a classic b/w silent, then give it
         | the correct frame.
         | 
         | Adapt any movie to smoothly work on IMAX displays.
        
           | berniedurfee wrote:
           | Or make those darn new fangled vertical TikTok style videos
           | watchable!
        
       | mdrzn wrote:
       | DAMN! Take this announcement back just 2-3 years and it would
       | have been MIND BLOWING.
       | 
       | I know we're all used to new releases like this coming very soon
       | and very fast, but I'm amazed. I can't wait to have a software
       | with this abilities. edit: nvm, it's by Google. I'll wait for an
       | open source to be released.
        
       | pmontra wrote:
       | > Hover over the video to see the input prompt
       | 
       | That doesn't work on a phone. I hoped they added an event handler
       | for touching the animations. Instead they forgot they have a
       | mobile OS and that they sell phones.
        
         | johnnymellor wrote:
         | At least on Chrome for Android, you can long-press to trigger
         | the hover effect. Works on many websites. (There are
         | inconvenient side-effects like selecting text, but it's better
         | than nothing.)
        
         | itishappy wrote:
         | Did they? Works fine on my Pixel.
        
         | 7734128 wrote:
         | Worked in Kiwi, which is a Chrome derivate
        
         | pmontra wrote:
         | OP here: my bad. I didn't enable enough JS sites with NoScript.
         | It works now but touching the images. Thanks to everybody that
         | replied to me.
        
       | 88j88 wrote:
       | Is this all real, or faked a-la gemini?
        
       | ilaksh wrote:
       | Congratulations to the researchers. It would be nice if it wasn't
       | Google though. Because we probably will have to wait 3-6 months
       | for it show up in their Vertex API. For special customers only.
        
       | sorenjan wrote:
       | This is very impressive, but their approach to generate the whole
       | temporal duration at once limits it to short clips. I guess one
       | of the next steps is to make overlapping "clips" that then
       | becomes longer videos.
        
       | pylua wrote:
       | This is absolutely unbelievable. Truly impressive.
       | 
       | I felt like this was maybe 5-10 years away.
        
       | araes wrote:
       | Pixel themed post for pixel themed paper.
       | 
       | It's rather impressive and quite quickly will likely result in a
       | huge hoard of "make a movie with a paragraph" programs.
       | 
       | It's Google - It will probably go in a box and be a Rick and
       | Morty gadget we never see.
       | 
       | It has a cool author format list I like. The 1,2,3,4,*,+ thing is
       | nice for lead authors, institute attribution, and core
       | contributors. I read so many astronomy and physics papers that
       | are 10+ authors long, and I have no idea who did anything. The
       | arXiv link for example shows no similar formatting.
       | 
       | It will probably be immediately used for abusive porn. Walking
       | Woman Example: (5th variation) "Wearing no clothing"
        
         | whamlastxmas wrote:
         | This didn't occur to me but yeah, abusive porn is about to be
         | rampant with this sort of tech. Every single person in the
         | world is soon to have graphic realistic looking pornography
         | with their face on it
        
       | StarterPro wrote:
       | What is the point of this? I feel like it only serves to hinder
       | real artists who could use the money that people are paying for
       | these services and models. Maybe I'm too poor or short-sighted to
       | see it.
       | 
       | I would rather an actual animator create something beautiful for
       | me rather than an AI spit out something that needs to be worked
       | on by an actual animator ANYWAY.
        
         | chankstein38 wrote:
         | You're clearly not the target audience for this then. That's
         | usually my assumption when I can't figure out a use case for
         | some research a bunch of people are excited about.
        
           | sofixa wrote:
           | Crypto generally and NFTs in particular are good indicators
           | that things can get people excited and have no substance.
           | Even scams and Ponzi schemes have "target audiences" but that
           | doesn't make them any useful or good.
        
             | chankstein38 wrote:
             | Right but this is generating decent-quality video in
             | segments longer than the time of your average movie shot.
             | I'm sure it'd take some fiddling but I'm excited for a
             | model this good to come out so I can try some fancy multi-
             | shot videos.
             | 
             | I saw someone else say "I'm sure it'll be crap like all of
             | the other AI stuff I've seen" but that's a naive view.
             | Things that have been 100% created by AI, sure they're kind
             | of boring a lot of the time. But this kind of tech gives
             | people with a creative mind, but no money or time or
             | resources to create a storytelling movie/video, the
             | resources to do it. Obv ignoring the fact that Goog will
             | never release this, if something like this did come out,
             | it'd be game changing for a lot of people.
             | 
             | Think about something like RPG Maker. Yeah we've had a ton
             | of random garbage come out of that platform but there were
             | also incredible.
             | 
             | AI isn't just some garbage maker. It is a paint brush that
             | enables people who are alone in their room to make
             | something bigger than them.
        
               | evilduck wrote:
               | I've used SD to generate novel clipart with my kid for
               | their school project to make a board game. It isn't
               | taking away from an artist, I would never in a million
               | years pay an artist to create throwaway art for a corner
               | of a spray painted cardboard box. The alternative would
               | be nothing or my kids scribbling in something of their
               | own hand. But they were interested and it was available
               | so it went from simple and plain to "custom" and rather
               | nice and polished looking.
               | 
               | FWIW, my kid also designed their own board game pieces in
               | TinkerCAD and we 3D printed them. It's nothing special
               | but it's frankly astounding how far kids can go now
               | towards creating something not just imaginative but
               | almost professional quality with the tools at their
               | disposal now. For throwaway school projects. It may not
               | be my kids, but I'm excited for what the next generation
               | will be able to accomplish without massive capital
               | requirements to fulfill their vision and create
               | something.
        
           | StarterPro wrote:
           | I understand the use case. I'm saying from a human collateral
           | sense, what is the point of it?
           | 
           | Like we build these things and show them off, without any
           | thought to the ramifications that they could lead to. Maybe
           | I'm catastrophizing, but all this tech lately seems very
           | unregulated/dangerous.
        
         | sofixa wrote:
         | The same can be said for generators like Midjourney or Stable
         | Diffusion.
         | 
         | The target market is people and organisations who
         | like/want/need the speed and low cost of generated "art" and
         | prefer not dealing with external real world artists that need
         | to be fairly compensated and will take time to produce an art
         | piece.
         | 
         | Also laws are very murky on this for the moment (naturally,
         | since it's a very recent new thing), and some consider that AI
         | "art" can't be copyrighted. The EU is currently working on a
         | new AI framework which will probably cover that.
        
         | gedy wrote:
         | Many of these examples are combinations of realistic objects
         | and scenes from real world, these aren't in need of artistic
         | interpretation or manual re-creation or animation.
        
       | seydor wrote:
       | I wonder who s going to make a model that creates and textures a
       | 3D world with AI. It's going to be a necessity for VR goggles to
       | find some nongimmicky use cases
        
       | RcouF1uZ4gsC wrote:
       | Sorry, I discount all AI text/image/video generation that
       | actually doesn't have a demo site where I can put in prompts and
       | see what is being generated.
       | 
       | It is so easy to game and tweak examples, especially since there
       | is a random component to them. For example, you could do a prompt
       | 1 million times and only show the best response. Or you could use
       | prompts that it's optimized for.
       | 
       | The reason ChatGPT and Dall-e captured the public's imagination
       | is that the public could actually put in their prompts and see
       | the results.
        
       | alkonaut wrote:
       | If "translator" was the victim of LLMs and "stock photographer"
       | of diffusion models, which job is the first to be threatened by
       | diffusion models for moving pictures? OnlyFans streamers?
        
         | rwmj wrote:
         | The people involved in producing TV adverts.
        
       | thih9 wrote:
       | Looks like they're frequently mixing old images with a modern
       | dataset; if I took a portrait of George Washington and prompt for
       | "a man smiling", would I see dentures[1] or pearly whites?
       | 
       | [1] https://en.wikipedia.org/wiki/George_Washington%27s_teeth
        
         | mattnewton wrote:
         | I think you'd have to provide that out-of-distribution data in
         | the prompt of course - it's not clear these models have built
         | large world models of facts like some of the larger LLMS need
         | to, they are figuring out how things move. Most of the time
         | people have pearly whites to show in the dataset, and there are
         | no videos of Washington's mouth, so I would expect that to be
         | the default unless prompted with a detailed description of the
         | dentures you are looking for.
        
       | macawfish wrote:
       | This is remarkable, it would have been unthinkable 5 years ago.
        
       | abkolan wrote:
       | How soon can Google _Productize_ it?
        
       | ativzzz wrote:
       | Me, watching the video and looking at samples, excitement level
       | high
       | 
       | Me, scanning for a download link or a prompt to run the model and
       | not finding any, excitement level medium
       | 
       | Me, realizing it's by google, excitement level zero
        
         | zitterbewegung wrote:
         | Don't worry OpenAI will copy it and put it in chatgpt
        
         | baldgeek wrote:
         | Here is the dataset they used:
         | https://paperswithcode.com/dataset/ucf101
        
       | Aerbil313 wrote:
       | Eh. I knew this day would come. Video is no evidence of nothing
       | now.
        
       | harha_ wrote:
       | This pace of progress almost scares me.
        
       | max_ wrote:
       | Why won't Google publish a product that show does this ?
        
       | wantsanagent wrote:
       | I find it deeply offensive that this work is presented under the
       | auspices of scientific research.
       | 
       | The only way to describe this is bragging, advertising, or
       | marketing. There are no reproducible processes described. While
       | the diagram of their architecture may inspire others it does not
       | allow for the most crucial aspect of the scientific endeavor,
       | falsification.
       | 
       | There is no way we can know if Google is lying because there's no
       | way to check. It should be assumed that every example has been
       | cherry-picked and post processed. It should be assumed that the
       | data used to train the model (if one was trained at all) was
       | illicitly acquired. We _have_ to start from a mindset of extreme
       | skepticism because Google now routinely makes claims that cannot
       | be demonstrated. When the performance of Gemini in bard is
       | compared to GPT-4 for example, it falls far short. When they
       | release a video claiming to be an interaction with a model it
       | turns out it wasn 't anything of the kind.
       | 
       | Ideally _no_ organization would operate like this but Google has
       | become a particularly egregious repeat offender.
        
         | bugglebeetle wrote:
         | > There is no way we can know if Google is lying because
         | there's no way to check it.
         | 
         | We can gather that they are likely to be lying or cherry-
         | picking examples to make themselves look better, since they
         | were already caught faking an AI demo. In the world of actual
         | research, if you got caught doing this, all your subsequent and
         | prior work would be under severe scrutiny.
        
         | GaggiX wrote:
         | >When the performance of Gemini in bard is compared to GPT-4
         | for example, it falls far short.
         | 
         | How did people get access to Gemini Ultra? Or are you talking
         | about Gemini Pro, the one that compares to GPT-3.5?
        
         | Workaccount2 wrote:
         | Just an FYI, it's not illegal to use data to train a model.
         | It's illegal to have a model output that (identical) data for
         | commercial gain.
         | 
         | This difference is purposely muddied, but important to
         | understand.
        
           | leereeves wrote:
           | > it's not illegal to use data to train a model
           | 
           | That's not at all settled law. AI companies are hoping to use
           | the fair use exception to protect their businesses, but it
           | looks like it will soon be clarified the other way.
           | 
           | Wired summed it up: "Congress Wants Tech Companies to Pay Up
           | for AI Training Data"
           | 
           | https://www.wired.com/story/congress-senate-tech-
           | companies-p...
           | 
           | And Ars wrote "Media orgs want AI firms to license content
           | for training, and Congress is sympathetic."
           | 
           | https://arstechnica.com/information-technology/2024/01/at-
           | se...
           | 
           |  _" [Senator] Hawley expressed concerns that if the tech
           | companies' expansive interpretation of fair use prevails, it
           | would be like "the mouse that ate the elephant"--an exception
           | that would make copyright law toothless."_
        
             | Workaccount2 wrote:
             | Again, fair use concerns the production of copyrighted
             | works, it has nothing to do with the training. If this was
             | the case, every person who could draw a batman symbol from
             | memory would be in violation of copyright.
             | 
             | "Using copyrighted works for monetary gain" refers to using
             | art itself as the product. Knowing what Apple's logo is and
             | making a logo in that style is not a violation of
             | copyright. However using Apple's logo (or something
             | strikingly close) is a violation.
             | 
             | The reason this is muddied is because legally artists don't
             | really have a leg to stand on for "my art cannot be trained
             | on by a computer" whereas they do have strong legal
             | precedent (and actual laws) for "my art cannot be
             | reproduced by a computer".
        
               | leereeves wrote:
               | > fair use concerns the production of copyrighted works,
               | it has nothing to do with the training
               | 
               | Training is the "production" of a derivative work (a
               | model) based on the training data.
               | 
               | AI companies claim that this is covered by fair use, but
               | this is simply a claim that has not yet been tested in
               | court.
               | 
               | And even if courts rule in favor of the AI companies, it
               | sounds likely (based on what I've read) that Congress
               | will soon rewrite the law to support the artists'
               | position.
        
             | summerlight wrote:
             | Currently, neither party has a strong legal ground yet and
             | may require another landmark case to fully settle it down.
        
               | leereeves wrote:
               | If Congress doesn't get there first.
        
               | summerlight wrote:
               | Even if the Congress made a law, that can be effectively
               | delayed by injunctions until the Supreme court made the
               | ultimate decision. And I'm pretty sure big techs will
               | challenge with an army of lawyers.
        
         | summerlight wrote:
         | > There is no way we can know if Google is lying because
         | there's no way to check. It should be assumed that every
         | example has been cherry-picked and post processed. It should be
         | assumed that the data used to train the model (if one was
         | trained at all) was illicitly acquired. We have to start from a
         | mindset of extreme skepticism because Google now routinely
         | makes claims that cannot be demonstrated.
         | 
         | This doesn't sound like a productive stance for science. You
         | don't trust their result? It's fine to ignore all the claimed
         | artifacts and you can just take the core idea. You don't have
         | to assume anything malice to invalidate their so-called
         | advertisement.
         | 
         | While this kind of stance might make you feeling a bit better,
         | but will also make your claim political and slow you down if it
         | happens to be true, given the history that many of Google's
         | papers eventually have become a foundation of other useful
         | technologies even though almost of all them didn't contain
         | reproducible artifacts.
        
         | whamlastxmas wrote:
         | This video is almost certainly done mostly for Google
         | investors: look, we aren't dying, search isn't dying! dancing
         | bears!
         | 
         | That said, if this tech is as advertised, extremely impressive
         | to me
        
       | vessenes wrote:
       | Some comments: Google, so we'll probably never get to use this
       | directly.
       | 
       | That said, the idea is very interesting -- train the model to
       | generate a small full-time representation of the video, then
       | upscale on both time and pixels.
       | 
       | Essentially, we have seen models adding depth maps. This one adds
       | a 'time map' as another dimension.
       | 
       | Coherence is pretty good, to my eye. The jankiness seems to be
       | more about the model deciding what something should 'do' over
       | time, where a lot of models struggle on keeping coherence frame
       | by frame. The big insight from the Googlers is that you could
       | condition / train / generate on coherence as its own thing, then
       | fill in the frames.
       | 
       | I think this is likely copyable by any number of the model
       | providers out there; nothing jumps out as not implementable by
       | Stability, for instance.
        
       | Peritract wrote:
       | "We made _Girl with a Pearl Earring_ smile and wink "
       | demonstrates the fundamental failure of this (and similar)
       | technology: it's the promise of generating art, made by people
       | who really don't understand what art is.
        
         | adrenvi wrote:
         | The same was probably said about photography, moving film, film
         | with sound, computer graphics, etc.
        
           | Peritract wrote:
           | No it wasn't; absolutely no one ever thought the issue with
           | films with sound is that their creators fundamentally
           | misunderstood _Girl with a Pearl Earring_. Some people
           | thought that [new medium] wasn 't art, they didn't think it
           | was driven by and for people who didn't understand any art.
           | 
           | I do enjoy the irony though of you copy-and-pasting a generic
           | pro-AI rebuttal to a comment you didn't understand.
        
       | interestica wrote:
       | Video Inpainting: 4:3 --> 16:9 conversions
        
       ___________________________________________________________________
       (page generated 2024-01-24 23:02 UTC)