[HN Gopher] Lumiere: A space-time diffusion model for realistic ...
___________________________________________________________________
Lumiere: A space-time diffusion model for realistic video
generation
Author : jonbaer
Score : 295 points
Date : 2024-01-24 05:49 UTC (17 hours ago)
(HTM) web link (lumiere-video.github.io)
(TXT) w3m dump (lumiere-video.github.io)
| codetrotter wrote:
| Their GitHub doesn't have anything other than the linked page
| currently
|
| https://github.com/lumiere-video
|
| Nor did they claim it would. But I had to check anyway, and there
| wasn't any link I could see to the GitHub profile. So here's a
| link for anyone else that wants to check and don't want to type
| the url of their profile manually from looking at the hosted
| website url.
| gardnr wrote:
| A popular move with ai / ml folks: Use GitHub to publish
| information about a thing that is not open. Then "It's on
| GitHub".
| sho_hn wrote:
| I see this a lot as well, and I think we really ought to call
| it out more often. It should be clear whether a GitHub
| publication enables downstream use or contribution.
| whywhywhywhy wrote:
| So tired of the academia brained ML researchers. Can't wait
| for the next generation of teenagers to completely change
| this space and bypass this silliness completely.
| jampekka wrote:
| Isn't this more for-profit-corporation brained ML
| researchers? ML from academia tends to be released open
| source nowadays.
| Der_Einzige wrote:
| There are few better ways to get a cushy 300K a year plus
| jobs, and publishing Ml research is one of those ways. The
| new generation will simply do more publishing.
| smusamashah wrote:
| The examples are lot more consistent and longer than other
| techniques we have seen before. Legs are not sliding on the floor
| as much as they do with other models. On the other hand, human
| faces didn't look good. e.g. the mona lisa smiling.
|
| To me this looks like the first good video generation model.
|
| EDIT: Just noticed its by Google, NVM, will never be released
| publicly.
| ithkuil wrote:
| No but researchers will build on this research, as researchers
| do and eventually some company will run a successful product
| based on the result of a lot of research that includes this and
| we'll be bitching about google falling behind.
|
| Google is sponsoring a lot of cutting edge research and sharing
| it openly. How cool is that? How long will it last?
| i-use-nixos-btw wrote:
| If it were to be released publicly, I'd give it a week before
| NSFW models based on it were uploaded to Civitai.
| Frost1x wrote:
| A lot of current AI techniques are making people reevaluate
| their perspectives on free speech.
|
| We seem to value freedom of speech (and expression) only to a
| tipping point that it begins to invade other aspects of life.
| So far the noise and rate has been low enough people at large
| support free speech but newer information techniques are
| making it possible to generate a lot more realistic noise
| (faux signal, if you will) at higher rates (it's becoming
| cheaper and easier to do and scale).
|
| So while you certainly have a point I mostly agree with,
| we're letting private entities policies dictate the
| limitations of expression, at least for the time being (until
| someone comes along and makes these widely available for free
| or cheap without such ethical policies). It does go to show
| just how much sway industries have on markets through their
| policies with no public oversight, which to me is concerning.
| kjqgqkejbfefn wrote:
| I've been experimenting with story generation/RP with
| ChatGPT and now use jailbreaks systematically because it
| makes the stories so much better. It's not just about
| what's allowed or not, but what's expressed by default.
| Without jailbreaks ChatGPT will always give narration a
| positive twist, let alone inject the same sponsored themes
| of environmentalism and feminism. Nothing wrong with that.
| But I don't want 1/3rd of my stories to revolve around
| these thematics.
| kridsdale1 wrote:
| Similarly I wanted to use it to illustrate my friend's
| wizard character using a gorgon head to freeze some giant
| evil bees.
|
| The OpenAI content policies are pretty strictly opposed
| to the holding and wielding of severed heads.
| lbeltrame wrote:
| I got lectured by Bard when I asked about help to improve
| the description of an action scene, which involves people
| getting hurt (at least the losing side) even if
| marginally. I suppose you can still jailbreak ChatGPT? I
| didn't know it was still a thing.
| alec_irl wrote:
| Sincere question - and maybe I'm missing the point here -
| but why not just write stories yourself?
| kjqgqkejbfefn wrote:
| I'm trying to build a text-based open-world massively
| multiplayer game in the style of GTA. Trying. It's really
| difficult. My bet is on driving the game with narration
| so my prompts are fueled with abstract notions borrowed
| from the various theories in
| https://en.wikipedia.org/wiki/Narratology, and this is
| why I complain about ChatGPT's default ideas.
| gs17 wrote:
| > Nothing wrong with that.
|
| The themes maybe, but the forced positivity is
| frustrating. Trying to get stock ChatGPT to run a DnD-
| type encounter is hilarious because it's so opposed to
| initiating combat.
| devbent wrote:
| You can easily prompt gpt to write dark stories. When
| asked to write in the style of game of thrones gpt 3.5
| will happily write about people doing horrible things to
| each other.
|
| > Without jailbreaks ChatGPT will always give narration a
| positive twist
|
| Most modern stories in Western literature have a positive
| twist. It is only natural that gpt's output will reflect
| that!
| throwuwu wrote:
| I don't see why freedom of speech would be impacted by
| this. Existing laws around copyright and libel will need to
| be applied and litigated on a case by case basis but they
| should cover the malicious uses. Anything that falls
| outside of that is just noise and we have plenty of noise
| already.
|
| Even if we wind up at a point where no one trusts photos or
| videos is that really a disaster? Blindly trusting a photo
| or video that someone else, especially some anonymous
| account, gives you is a terrible way to shape your
| perception of the world. Ensuring that less people default
| to trusting random videos may even be good for society. It
| would force you to think about where the video came from,
| if it's corroborated by other reports from various sources
| and if you're able to verify the events through other
| channels available to you. You have to do the same work
| when evaluating any other claim after all.
| turnsout wrote:
| Eventually it will happen--if not this model, another one. AI
| is going to absolutely decimate the porn industry.
| whamlastxmas wrote:
| Agreed - being able to watch a porn video and change
| anything on the fly is going to be wild. Bigger boobs,
| different eye color, speaking different language, etc.
| gardnr wrote:
| I wonder how many of the samples from this demo video are
| authentic:
|
| https://arstechnica.com/information-technology/2023/12/googl...
| Archelaos wrote:
| > e.g. the mona lisa smiling
|
| This was not Leonardo da Vinci's "Mona Lisa"[1], but Johannes
| Vermeer's "Girl with a Pearl Earring"[2].
|
| [1] https://en.wikipedia.org/wiki/Mona_Lisa
|
| [2] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring
| idiotsecant wrote:
| Keep scrolling.
| Archelaos wrote:
| Ah, I see. On my screen resolution, this image was hidden
| in a carousel.
| feverzsj wrote:
| As realistic as my blurry dream.
| yard2010 wrote:
| Give it a break, until 2 years ago you wouldn't even dream
| about a fraction of what's out already
|
| Everything is relative!
| wiz21c wrote:
| newb question:
|
| Do these models actually learn a 3D representation or do they
| just learn "something" that is good enough to produce an very
| convincing impression of 3D ?
|
| Subquestion: if they don't learn 3D, can we say that models
| learning a 3D representation first will lead to even better
| productions ?
| astrange wrote:
| > Do these models actually learn a 3D representation or do they
| just learn "something" that is good enough to produce a very
| convincing impression of 3D ?
|
| The second, but at the limit it's the same thing of course.
|
| > Subquestion: if they don't learn 3D, can we say that models
| learning a 3D representation first will lead to even better
| productions ?
|
| Generally speaking manual feature engineering almost always
| turns out to be a waste of time if you can just make the model
| bigger; this is called "the bitter lesson".
| l33tman wrote:
| It has been shown that at least still-image generators learn a
| 3D representation internally and uses it to bootstrap their
| generation. If you think about it this is the only way they can
| be so good at shadows and reflections, perspective and lighting
| etc.
| sho_hn wrote:
| With the weird creepy dream-like nature of these little AI video
| gen samples, I'm perpetually disappointed that none of these
| papers ever include a "dreaming of electric sheep" prompt as an
| easter egg.
| 55555 wrote:
| The negative comments here shock me. This is the most amazing
| text-to-video we've ever seen, by a longshot. It's good enough
| for many uses. Absolutely mindblowing. Great job to the people
| who worked on this.
| boesboes wrote:
| What negative comments?
| saurik wrote:
| Yeah... I have only found a single negative comment about the
| quality--"As realistic as my blurry dream."--and it comes
| across as more of a cynical joke than a true negative review.
| eurekin wrote:
| Just chiming in to second your opinion.
|
| Years ago, I wouldn't even dare to dream it would be possible.
| It's nowhere near, what people are used to watch normally, but
| the fact it's even trying to compete is insane.
| danielbln wrote:
| I'm excited about these text 2 video models, what I'm not
| excited about is that it's Google publishing this. That means,
| no code, nothing deployed to try and most likely we will never
| ever hear about this ever again, or maybe it quietly hits
| Vertex AI in 2 years (like Imagen) and no-one will care.
|
| Also, ever since the Gemini marketing video shenanigans, I
| don't really feel like trusting whatever Google's research says
| they have, if I can't test it myself.
| sho_hn wrote:
| > Also, ever since the Gemini marketing video shenanigans, I
| don't really feel like trusting whatever Google's research
| says they have, if I can't test it myself.
|
| The video was released by Google product marketing for a
| launch to customers, not research.
|
| I'm still somewhat confused by this one. I understand the
| community has decided to be harsh on Google for that video to
| draw a line - fair, truth in advertising, etc. -, but at the
| same time, we all had an understanding of where that tech is
| at currently and the pace it progresses at. Did anyone
| watching it really assume it was realtime? Can we not
| differentiate between technical publications and marketing
| anymore? Do we have to vilify everyone in an R&D department
| for the sins of the product marketing wing?
| ImprobableTruth wrote:
| The issue isn't that it's not real-time/sped-up, it's that
| it doesn't actually take video as input, but multiple hand
| picked stills.
| whywhywhywhy wrote:
| We're all harsh on it because we all had to see it being
| posted around by naive people as being amazing when it's
| completely faked and misses half of the prompting, all of
| the latency.
|
| It was completely dishonest. Considering how trash Googles
| actual AI products are they deserve to be dragged even more
| over that video.
| IshKebab wrote:
| How is it completely faked? The video didn't give me the
| impression that the results were calculated instantly, or
| that no prompts were required.
| sjwhevvvvvsj wrote:
| "PoC or GTFO" as they say.
| addandsubtract wrote:
| PapersWithCode or GTFO.
|
| [0] https://paperswithcode.com/
| baldgeek wrote:
| 2 clicks from the Posted Link: "Read Paper", then "Code,
| Data and Media" tab will get you the dataset used
| (https://paperswithcode.com/dataset/ucf101)
| sjwhevvvvvsj wrote:
| Well in the Ai/ML era maybe "models or gtfo" is better.
| Training data is just common crawl for half these LMs.
| whywhywhywhy wrote:
| How can you get excited about a company that's shipped enough
| pieces of research you can count them on one hand during the 12
| or so years they've been telling us about their work.
|
| Google can publish whatever research they want, literally
| doesn't matter literally changes nothing because they can't
| turn it into a product anyone can use and never will.
| ryandvm wrote:
| Indeed. These recent AI demos are pretty damn impressive
| (even knowing there are smoke and mirrors), but it's hard to
| get excited about what's happening with their R&D when my
| Google Home device seems to be regressing on a daily basis.
| It is now basically is only useful for alarms and timers.
| lbeltrame wrote:
| Perhaps OT but I see often these comments on HN. How do
| these devices (I don't own one) lose functionality over
| time? Features removed through updates?
| pbronez wrote:
| Yes.
|
| The home assistant speakers aren't making enough money to
| justify the large teams behind them. Thus we've seen
| significant layoffs on those teams in the past year.
|
| BigCos are looking for other ways to reduce costs.
| Killing features is one way to do it.
|
| There have also been situations where a feature is
| removed because of legal action; lawsuits alleging the
| features violates a patent.
|
| Live updates givith, live updates taketh away!
| smoldesu wrote:
| I'm excited because I despise Google's products anyways and
| would rather use the research myself. Did that with Google's
| _BERT_ model a few years back to make a particularly clueless
| Discord bot.
| sjwhevvvvvsj wrote:
| Hey now, Google is going to use these technologies to fire
| their own employees to save costs for the next quarterly
| earnings call!
|
| Of course building products for actual users is no longer a
| "thing", but think of the stock price.
| tomcam wrote:
| Agreed. And the dancing bear is an instant classic.
| endisneigh wrote:
| Well it's Google and a lot of folks foam at the mouth at Google
| so it's no surprise. They can't dissociate the research from
| the creator.
| heisgone wrote:
| It's indeed impressive. Stable diffusion is progressing so
| fast. That being said, I find myself picking more and more cues
| that an images is AI-generated. There is a feel to it. It's not
| different than the best movie CGI. As Christopher Nolan
| pointed-out, no matter how good it is, it's not the real deal.
| malka wrote:
| It's by google. It will rot somewhere, never to be used.
| throwuwu wrote:
| We will see the first feature length AI generated movie this
| year. If you think I'm crazy then consider that even way back at
| the dawn of cinema the average shot length was 12 seconds and
| today it is only 2.5 seconds.
|
| There are a few important techniques to be refined such as
| keeping consistent subjects between generations but I could see
| many inconsistencies being made up for by applying existing
| methods such as separating the layers based on depth allowing
| more static images to be used or creating simple 3D models with
| textures where more depth is needed. With enough effort and skill
| someone could probably do it with existing technologies.
| __loam wrote:
| It will probably be utter dogshit like every other piece of
| media people are pumping out with this crap.
| throwuwu wrote:
| 90 percent of everything is crap but I've seen plenty of
| creative people make compelling films with digital tools.
| This technology puts that capability within each of people
| who aren't also 3D modellers or graphic artists so we're
| bound to get more output, good and bad. Same deal with when
| film cameras became cheap and widely available or digital
| cameras or iPhones.
| seydor wrote:
| why would we make a "movie" instead of one storyline where
| viewers can customize the costumes at will?
| felipeerias wrote:
| It's easy to imagine a film maker creating multiple draft
| versions of a movie to polish the script and the
| cinematography, similar to how now they use storyboards.
| rysertio wrote:
| The amount of computing resources it's going to take to retrain
| the model is enormous. So the most of us will have to wait for a
| big company to publish or leak their weights until we get to use
| anything written in the paper.
| qwertox wrote:
| If this isn't the bell ringing, announcing that an entire
| industry will soon collapse, then I don't know what could
| announce it more clearly.
|
| I give it 5 years until it has been normalized to see AI
| generated TV/YT ads, 10 to 15 when traditionally made ones will
| be in the minority.
|
| In the beginning just a bunch of geeks in front of computers
| crafting the prompts, later everyone will be able to make it.
|
| It will probably be access to computing resources which will be
| the limiting factor.
| tetris11 wrote:
| Yep, and my cynical side is just hoping that the GPU vendors
| aren't going to deliberately limit the number of user-
| accessible resources there are to force people to depend on
| their cloud platforms.
| __loam wrote:
| There's probably going to be more and more specialized
| hardware for this stuff. Things like H100s are already pretty
| inaccessible to consumers.
| dkjaudyeqooe wrote:
| That sounds like a great increase in productivity.
|
| But also you're making the mistake of extrapolating against the
| realities of the techniques.
|
| Things may improve over time but prompts and random seeds
| aren't great for detailed work, so there are limitations which
| seriously limit the usefulness. "Everyone will be able to make
| it" is likely true, but the specialist stuff will likely remain
| and those users will likely be made more productive. It's those
| in the middle that will lose out.
|
| That an industry is destroyed is neither here nor there. Sucks
| to have your business/job taken away but that's how the system
| works. That which created your business also will destroy it.
| wegfawefgawefg wrote:
| Have you played with control net over comfyui? Try it. You
| can pose arbitrary figures. Theres gonna be full kits that
| provide control over every aspect of generation.
| coldcode wrote:
| OK, let's see it make full-sized videos first; making tiny demo
| videos is a long way from showing it at 4K. Also, let's see the
| entire paper and note how many computing resources were
| required to build the models. Until everyone can try it for
| themselves, we have no idea how cherry-picked the examples
| were.
| rexreed wrote:
| I give it 12 months until the Pharmaceutical industry starts
| using this in a significant way. Currently, most Pharma ads on
| TV look like stock footage of random people doing random things
| with text and voice-over. So AI-generated? Sure, as if people
| are even watching the video action in any detail at all in
| Pharma ads. AI gen video companies that focuses on pharma will
| rake it in for sure in the short term.
|
| [video prompt: Two elderly people taking a stroll on a
| boardwalk, partaking in various boardwalk activities.] [AI gen
| voice: Suffering from chronic blorgoriopsy? Try Neuvoplaxadip
| by Excelon pharamceuticals. Reported side effects include...
| Ask your doctor.]
| Filligree wrote:
| Well, that would be a US only thing. I don't think you can
| build an industry on that.
| Xirgil wrote:
| What? The industry already exists. There's clearly money
| there. The idea that you can't have an industry just
| because it's specific to the richest country on earth is
| silly.
| kouru225 wrote:
| I think you're overestimating how useful this is. Just like
| image AI, this stuff will only be useful in combination with
| existing techniques.
| Escapado wrote:
| I just want to feed an LLM hunter x hunter episodes and get out
| new ones.
|
| But on a more serious note, I vividly remember when GANs were
| the next big thing when I was in university and the output
| quality and variability was laughable compared to what
| midjourney and the likes can produce today (my mind was still
| blown back then). So I would be in no way suprised if we got to
| a point in the next decade where we have "midjourney" for video
| generation. So I wholeheartedly agree.
|
| I also think the computational problem is tackled from so many
| angles in the field of ML. You have nvidia releasing absolute
| beasts of GPUs, some promising start ups pushing for
| specialized hardware, a new paper on more optimized training
| methods every week, mamba bursting on the scene, higher quality
| data sets, merging of models, framework optimizations here and
| there. Just the other day I think I saw a post here about
| locally running larger LLMs. Stable Diffusion is already
| available for iPhones at acceptable qualities and speed (given
| the devices power).
|
| What I wonder about the most though is whether we will get more
| robust orchestration of different models or multi modal models.
| It's one thing to have a model which given a text prompt
| generates a short video snippet. But what if I instruct my
| model(s) to come up with a new ad for a sports drink and
| they/it does research, consolidates relevant data about the
| target group, comes up with a proper script for an ad, creates
| the ad, figures out an evaluation strategy for the ad, applies
| it and eventually gives me back a "well thought out" video. And
| all I had to do was provide a little bit of an intro and then
| let the thing do its magic for an hour. I know we have lang
| chain and baby AGI but they are not as robust as they would
| need to be to displace a bunch of jobs just yet (but I assume
| they will soon enough).
| kranke155 wrote:
| It is the bell ringing. I work in CGI for advertising and this
| is clearly going the way of still genAI.
|
| Single image genAI went from unusable to indistinguishable from
| reality in 18-24 months.
| wegfawefgawefg wrote:
| I already have seen ai scenes in tv ads and anime. half the
| youtube thumbnails i see sre ai now. So.. might not even be
| five years. might be 2.
| richrichardsson wrote:
| The video inpainting is interesting. My kids were watching old
| Spongebob episodes recently and the 4:3 aspect ratio was jarring
| to me. I thought it would be an interesting use case to in-paint
| the side borders to bring it back into 16:9 aspect, but I suppose
| it would need some careful fine-tuning with some kind of look-
| ahead for objects that enter frame from the sides.
| araes wrote:
| That actually sounds like a product somebody in the television
| and movie industry might buy.
|
| Dynamic adjustment of fixed aspect ratio film imagery to non-
| native sizes without stretch or obvious distortion. Guess all
| the added edges accurately enough that audiences won't notice.
|
| 4:3 <-> 16:9 <-> 143:100 (IMAX) <-> 11:8 (Academy) <-> 3:2
| (35mm) <-> 16:10 (tablets/desktops)
|
| Make a new movie look like a classic b/w silent, then give it
| the correct frame.
|
| Adapt any movie to smoothly work on IMAX displays.
| berniedurfee wrote:
| Or make those darn new fangled vertical TikTok style videos
| watchable!
| mdrzn wrote:
| DAMN! Take this announcement back just 2-3 years and it would
| have been MIND BLOWING.
|
| I know we're all used to new releases like this coming very soon
| and very fast, but I'm amazed. I can't wait to have a software
| with this abilities. edit: nvm, it's by Google. I'll wait for an
| open source to be released.
| pmontra wrote:
| > Hover over the video to see the input prompt
|
| That doesn't work on a phone. I hoped they added an event handler
| for touching the animations. Instead they forgot they have a
| mobile OS and that they sell phones.
| johnnymellor wrote:
| At least on Chrome for Android, you can long-press to trigger
| the hover effect. Works on many websites. (There are
| inconvenient side-effects like selecting text, but it's better
| than nothing.)
| itishappy wrote:
| Did they? Works fine on my Pixel.
| 7734128 wrote:
| Worked in Kiwi, which is a Chrome derivate
| pmontra wrote:
| OP here: my bad. I didn't enable enough JS sites with NoScript.
| It works now but touching the images. Thanks to everybody that
| replied to me.
| 88j88 wrote:
| Is this all real, or faked a-la gemini?
| ilaksh wrote:
| Congratulations to the researchers. It would be nice if it wasn't
| Google though. Because we probably will have to wait 3-6 months
| for it show up in their Vertex API. For special customers only.
| sorenjan wrote:
| This is very impressive, but their approach to generate the whole
| temporal duration at once limits it to short clips. I guess one
| of the next steps is to make overlapping "clips" that then
| becomes longer videos.
| pylua wrote:
| This is absolutely unbelievable. Truly impressive.
|
| I felt like this was maybe 5-10 years away.
| araes wrote:
| Pixel themed post for pixel themed paper.
|
| It's rather impressive and quite quickly will likely result in a
| huge hoard of "make a movie with a paragraph" programs.
|
| It's Google - It will probably go in a box and be a Rick and
| Morty gadget we never see.
|
| It has a cool author format list I like. The 1,2,3,4,*,+ thing is
| nice for lead authors, institute attribution, and core
| contributors. I read so many astronomy and physics papers that
| are 10+ authors long, and I have no idea who did anything. The
| arXiv link for example shows no similar formatting.
|
| It will probably be immediately used for abusive porn. Walking
| Woman Example: (5th variation) "Wearing no clothing"
| whamlastxmas wrote:
| This didn't occur to me but yeah, abusive porn is about to be
| rampant with this sort of tech. Every single person in the
| world is soon to have graphic realistic looking pornography
| with their face on it
| StarterPro wrote:
| What is the point of this? I feel like it only serves to hinder
| real artists who could use the money that people are paying for
| these services and models. Maybe I'm too poor or short-sighted to
| see it.
|
| I would rather an actual animator create something beautiful for
| me rather than an AI spit out something that needs to be worked
| on by an actual animator ANYWAY.
| chankstein38 wrote:
| You're clearly not the target audience for this then. That's
| usually my assumption when I can't figure out a use case for
| some research a bunch of people are excited about.
| sofixa wrote:
| Crypto generally and NFTs in particular are good indicators
| that things can get people excited and have no substance.
| Even scams and Ponzi schemes have "target audiences" but that
| doesn't make them any useful or good.
| chankstein38 wrote:
| Right but this is generating decent-quality video in
| segments longer than the time of your average movie shot.
| I'm sure it'd take some fiddling but I'm excited for a
| model this good to come out so I can try some fancy multi-
| shot videos.
|
| I saw someone else say "I'm sure it'll be crap like all of
| the other AI stuff I've seen" but that's a naive view.
| Things that have been 100% created by AI, sure they're kind
| of boring a lot of the time. But this kind of tech gives
| people with a creative mind, but no money or time or
| resources to create a storytelling movie/video, the
| resources to do it. Obv ignoring the fact that Goog will
| never release this, if something like this did come out,
| it'd be game changing for a lot of people.
|
| Think about something like RPG Maker. Yeah we've had a ton
| of random garbage come out of that platform but there were
| also incredible.
|
| AI isn't just some garbage maker. It is a paint brush that
| enables people who are alone in their room to make
| something bigger than them.
| evilduck wrote:
| I've used SD to generate novel clipart with my kid for
| their school project to make a board game. It isn't
| taking away from an artist, I would never in a million
| years pay an artist to create throwaway art for a corner
| of a spray painted cardboard box. The alternative would
| be nothing or my kids scribbling in something of their
| own hand. But they were interested and it was available
| so it went from simple and plain to "custom" and rather
| nice and polished looking.
|
| FWIW, my kid also designed their own board game pieces in
| TinkerCAD and we 3D printed them. It's nothing special
| but it's frankly astounding how far kids can go now
| towards creating something not just imaginative but
| almost professional quality with the tools at their
| disposal now. For throwaway school projects. It may not
| be my kids, but I'm excited for what the next generation
| will be able to accomplish without massive capital
| requirements to fulfill their vision and create
| something.
| StarterPro wrote:
| I understand the use case. I'm saying from a human collateral
| sense, what is the point of it?
|
| Like we build these things and show them off, without any
| thought to the ramifications that they could lead to. Maybe
| I'm catastrophizing, but all this tech lately seems very
| unregulated/dangerous.
| sofixa wrote:
| The same can be said for generators like Midjourney or Stable
| Diffusion.
|
| The target market is people and organisations who
| like/want/need the speed and low cost of generated "art" and
| prefer not dealing with external real world artists that need
| to be fairly compensated and will take time to produce an art
| piece.
|
| Also laws are very murky on this for the moment (naturally,
| since it's a very recent new thing), and some consider that AI
| "art" can't be copyrighted. The EU is currently working on a
| new AI framework which will probably cover that.
| gedy wrote:
| Many of these examples are combinations of realistic objects
| and scenes from real world, these aren't in need of artistic
| interpretation or manual re-creation or animation.
| seydor wrote:
| I wonder who s going to make a model that creates and textures a
| 3D world with AI. It's going to be a necessity for VR goggles to
| find some nongimmicky use cases
| RcouF1uZ4gsC wrote:
| Sorry, I discount all AI text/image/video generation that
| actually doesn't have a demo site where I can put in prompts and
| see what is being generated.
|
| It is so easy to game and tweak examples, especially since there
| is a random component to them. For example, you could do a prompt
| 1 million times and only show the best response. Or you could use
| prompts that it's optimized for.
|
| The reason ChatGPT and Dall-e captured the public's imagination
| is that the public could actually put in their prompts and see
| the results.
| alkonaut wrote:
| If "translator" was the victim of LLMs and "stock photographer"
| of diffusion models, which job is the first to be threatened by
| diffusion models for moving pictures? OnlyFans streamers?
| rwmj wrote:
| The people involved in producing TV adverts.
| thih9 wrote:
| Looks like they're frequently mixing old images with a modern
| dataset; if I took a portrait of George Washington and prompt for
| "a man smiling", would I see dentures[1] or pearly whites?
|
| [1] https://en.wikipedia.org/wiki/George_Washington%27s_teeth
| mattnewton wrote:
| I think you'd have to provide that out-of-distribution data in
| the prompt of course - it's not clear these models have built
| large world models of facts like some of the larger LLMS need
| to, they are figuring out how things move. Most of the time
| people have pearly whites to show in the dataset, and there are
| no videos of Washington's mouth, so I would expect that to be
| the default unless prompted with a detailed description of the
| dentures you are looking for.
| macawfish wrote:
| This is remarkable, it would have been unthinkable 5 years ago.
| abkolan wrote:
| How soon can Google _Productize_ it?
| ativzzz wrote:
| Me, watching the video and looking at samples, excitement level
| high
|
| Me, scanning for a download link or a prompt to run the model and
| not finding any, excitement level medium
|
| Me, realizing it's by google, excitement level zero
| zitterbewegung wrote:
| Don't worry OpenAI will copy it and put it in chatgpt
| baldgeek wrote:
| Here is the dataset they used:
| https://paperswithcode.com/dataset/ucf101
| Aerbil313 wrote:
| Eh. I knew this day would come. Video is no evidence of nothing
| now.
| harha_ wrote:
| This pace of progress almost scares me.
| max_ wrote:
| Why won't Google publish a product that show does this ?
| wantsanagent wrote:
| I find it deeply offensive that this work is presented under the
| auspices of scientific research.
|
| The only way to describe this is bragging, advertising, or
| marketing. There are no reproducible processes described. While
| the diagram of their architecture may inspire others it does not
| allow for the most crucial aspect of the scientific endeavor,
| falsification.
|
| There is no way we can know if Google is lying because there's no
| way to check. It should be assumed that every example has been
| cherry-picked and post processed. It should be assumed that the
| data used to train the model (if one was trained at all) was
| illicitly acquired. We _have_ to start from a mindset of extreme
| skepticism because Google now routinely makes claims that cannot
| be demonstrated. When the performance of Gemini in bard is
| compared to GPT-4 for example, it falls far short. When they
| release a video claiming to be an interaction with a model it
| turns out it wasn 't anything of the kind.
|
| Ideally _no_ organization would operate like this but Google has
| become a particularly egregious repeat offender.
| bugglebeetle wrote:
| > There is no way we can know if Google is lying because
| there's no way to check it.
|
| We can gather that they are likely to be lying or cherry-
| picking examples to make themselves look better, since they
| were already caught faking an AI demo. In the world of actual
| research, if you got caught doing this, all your subsequent and
| prior work would be under severe scrutiny.
| GaggiX wrote:
| >When the performance of Gemini in bard is compared to GPT-4
| for example, it falls far short.
|
| How did people get access to Gemini Ultra? Or are you talking
| about Gemini Pro, the one that compares to GPT-3.5?
| Workaccount2 wrote:
| Just an FYI, it's not illegal to use data to train a model.
| It's illegal to have a model output that (identical) data for
| commercial gain.
|
| This difference is purposely muddied, but important to
| understand.
| leereeves wrote:
| > it's not illegal to use data to train a model
|
| That's not at all settled law. AI companies are hoping to use
| the fair use exception to protect their businesses, but it
| looks like it will soon be clarified the other way.
|
| Wired summed it up: "Congress Wants Tech Companies to Pay Up
| for AI Training Data"
|
| https://www.wired.com/story/congress-senate-tech-
| companies-p...
|
| And Ars wrote "Media orgs want AI firms to license content
| for training, and Congress is sympathetic."
|
| https://arstechnica.com/information-technology/2024/01/at-
| se...
|
| _" [Senator] Hawley expressed concerns that if the tech
| companies' expansive interpretation of fair use prevails, it
| would be like "the mouse that ate the elephant"--an exception
| that would make copyright law toothless."_
| Workaccount2 wrote:
| Again, fair use concerns the production of copyrighted
| works, it has nothing to do with the training. If this was
| the case, every person who could draw a batman symbol from
| memory would be in violation of copyright.
|
| "Using copyrighted works for monetary gain" refers to using
| art itself as the product. Knowing what Apple's logo is and
| making a logo in that style is not a violation of
| copyright. However using Apple's logo (or something
| strikingly close) is a violation.
|
| The reason this is muddied is because legally artists don't
| really have a leg to stand on for "my art cannot be trained
| on by a computer" whereas they do have strong legal
| precedent (and actual laws) for "my art cannot be
| reproduced by a computer".
| leereeves wrote:
| > fair use concerns the production of copyrighted works,
| it has nothing to do with the training
|
| Training is the "production" of a derivative work (a
| model) based on the training data.
|
| AI companies claim that this is covered by fair use, but
| this is simply a claim that has not yet been tested in
| court.
|
| And even if courts rule in favor of the AI companies, it
| sounds likely (based on what I've read) that Congress
| will soon rewrite the law to support the artists'
| position.
| summerlight wrote:
| Currently, neither party has a strong legal ground yet and
| may require another landmark case to fully settle it down.
| leereeves wrote:
| If Congress doesn't get there first.
| summerlight wrote:
| Even if the Congress made a law, that can be effectively
| delayed by injunctions until the Supreme court made the
| ultimate decision. And I'm pretty sure big techs will
| challenge with an army of lawyers.
| summerlight wrote:
| > There is no way we can know if Google is lying because
| there's no way to check. It should be assumed that every
| example has been cherry-picked and post processed. It should be
| assumed that the data used to train the model (if one was
| trained at all) was illicitly acquired. We have to start from a
| mindset of extreme skepticism because Google now routinely
| makes claims that cannot be demonstrated.
|
| This doesn't sound like a productive stance for science. You
| don't trust their result? It's fine to ignore all the claimed
| artifacts and you can just take the core idea. You don't have
| to assume anything malice to invalidate their so-called
| advertisement.
|
| While this kind of stance might make you feeling a bit better,
| but will also make your claim political and slow you down if it
| happens to be true, given the history that many of Google's
| papers eventually have become a foundation of other useful
| technologies even though almost of all them didn't contain
| reproducible artifacts.
| whamlastxmas wrote:
| This video is almost certainly done mostly for Google
| investors: look, we aren't dying, search isn't dying! dancing
| bears!
|
| That said, if this tech is as advertised, extremely impressive
| to me
| vessenes wrote:
| Some comments: Google, so we'll probably never get to use this
| directly.
|
| That said, the idea is very interesting -- train the model to
| generate a small full-time representation of the video, then
| upscale on both time and pixels.
|
| Essentially, we have seen models adding depth maps. This one adds
| a 'time map' as another dimension.
|
| Coherence is pretty good, to my eye. The jankiness seems to be
| more about the model deciding what something should 'do' over
| time, where a lot of models struggle on keeping coherence frame
| by frame. The big insight from the Googlers is that you could
| condition / train / generate on coherence as its own thing, then
| fill in the frames.
|
| I think this is likely copyable by any number of the model
| providers out there; nothing jumps out as not implementable by
| Stability, for instance.
| Peritract wrote:
| "We made _Girl with a Pearl Earring_ smile and wink "
| demonstrates the fundamental failure of this (and similar)
| technology: it's the promise of generating art, made by people
| who really don't understand what art is.
| adrenvi wrote:
| The same was probably said about photography, moving film, film
| with sound, computer graphics, etc.
| Peritract wrote:
| No it wasn't; absolutely no one ever thought the issue with
| films with sound is that their creators fundamentally
| misunderstood _Girl with a Pearl Earring_. Some people
| thought that [new medium] wasn 't art, they didn't think it
| was driven by and for people who didn't understand any art.
|
| I do enjoy the irony though of you copy-and-pasting a generic
| pro-AI rebuttal to a comment you didn't understand.
| interestica wrote:
| Video Inpainting: 4:3 --> 16:9 conversions
___________________________________________________________________
(page generated 2024-01-24 23:02 UTC)