[HN Gopher] Imagen Video: high definition video generation with ...
___________________________________________________________________
Imagen Video: high definition video generation with diffusion
models
Author : jasondavies
Score : 435 points
Date : 2022-10-05 17:38 UTC (5 hours ago)
(HTM) web link (imagen.research.google)
(TXT) w3m dump (imagen.research.google)
| jupp0r wrote:
| What's the business value of publishing this research in the
| first place vs keeping it private? Following this train of
| thought will lead you to the answer to your implied question.
|
| Apart from that - they publish the paper and anybody can
| reimplement and train the same model. It's not trivial but it's
| also completely feasible to do for lots of hobbyists in the field
| in a matter of a few days. Google doesn't need to publish a free
| use trained model themselves and associate that with their brand.
|
| That being said, I agree with you, the "ethics" of imposing
| trivially bypassable restrictions on these models is silly.
| Ethics should be applied to what people use these models for.
| amelius wrote:
| > Sprouts in the shape of text 'Imagen' coming out of a fairytale
| book.
|
| That's more like:
|
| > Sprouts coming out of book, with the text "Imagen" written
| above it.
| Kiro wrote:
| The prompt actually says "Imagen Video" and the sprouts form
| the word "video". Even if they weren't it's still extremely
| impressive. No-one expects this to be perfect. That would be
| science-fiction.
| montebicyclelo wrote:
| We've been seeing very fast progress in AI since ~2012, but this
| swift jump from text-to-image models to text-to-video models will
| hopefully make it easier for people not following closely to
| appreciate the speed at which things are advancing.
| nullc wrote:
| > We have decided not to release the Imagen Video model or its
| source code
|
| ...until they're able to engineer biases into it to make the
| output non-representative of the internet.
| kranke155 wrote:
| I'm going to post an Ask HN about what am I supposed to do when
| I'm "disrupted". I work in film / video / CG where the bread and
| butter is short form advertising for Youtube, Instagram and TV.
|
| It's painfully obvious that in 1 year the job might be
| exceedingly more difficult than it is now.
| dkjaudyeqooe wrote:
| Adapt, it's what humans excel at.
|
| Instead of feeling threatened by the new tools, think about how
| you can use them to enable your work.
|
| One of the ironies* of these tools is that they only work
| because there is so much existing material they can be trained
| on. Absent that they wouldn't exist. That makes me think: why
| not think about how to train your own models than entail your
| own style? Is that practical, how can you make it work and how
| might you deploy that in your own work?
|
| Something that everyone is sticking their heads in their sand
| about is the real possibility that training models on
| copyrighted work is a copyright violation. I can't see how such
| a mechanical transformation of others' work is anything but.
| People accept violating one person's copyright is a thing but
| if you do it at scale it somehow isn't.
|
| * ironic because they seem creative but they create nothing by
| themselves, they merely "repackage" other people's creativity.
| inerte wrote:
| It depends where you are in the industry.
|
| If you're on the creative, storyboard, come up with ideas and
| marketing side, you will be fine.
|
| If you're in actual production, booking sets, unfolding stairs
| to tape infinite background, picking up the best looking fruits
| in the grocery store... yeah, not looking good.
|
| Go up in the value chain and learn marketing, how to tell
| stories, etc... you don't want to be approached by clients
| telling you what you should be doing, you want to be approached
| and being asked what the clients should be doing.
| j_k_eter wrote:
| I first predicted this tech 5 years ago, but I thought it was
| 15 years out. What I just said is beginning to happen with
| pretty much everything. There's a third sentence, but if I
| write it 10 people will gainsay me. If I omit it, there's a
| better chance that 10 people will write it for me.
| adamsmith143 wrote:
| Learn how to use these models is the easiest answer. Prompt
| Engineering (getting a model to output what you actually want)
| is going to be something of an art form and I would expect it
| to be in demand.
| ijidak wrote:
| It won't be easy. But below are my thoughts:
|
| #1: Master these new tools #2: Build a workflow that
| incorporates these tools #3: Master storytelling #4: Master ad
| tracing and analytics #5: Get better at marketing yourself so
| that you stand out
|
| The market for your skillset may shrink, but I doubt it will
| disappear...
|
| Think about it this way...
|
| Humans in cheaper countries are already much more capable than
| any AI we've built.
|
| Yet, even now, There are practical limits on outsourcing.
|
| It's hard for me to see how this will be much different for
| creative work.
|
| It's one thing to casually look at images or videos, when there
| is no specific money-making ad in mind.
|
| But as soon as someone is spending thousands to run an ad
| campaign, just taking whatever the AI spits out is unlikely to
| be the real workflow.
|
| I guess I'm suggesting a more optimistic take...
|
| View it as a tool to learn and incorporate in your workflow
|
| I don't know if you gain much by stressing too much about being
| replaced.
|
| And I'm not even sure that's reality.
|
| I'm almost certain, most of the humans to lose their jobs will
| be people who either because of fear or stubbornness refuse to
| get better, refuse to incorporate these tools, and are thus
| unable to move up the value chain.
| alcover wrote:
| Get better [...] so that you stand out
|
| Please bear with me but this kind of advice is often a bit
| puzzling to me. I suppose you don't know the person you're
| replying to, so I read your advice as a general one - useful
| to anyone in the parent's position. If you were close to her,
| it would make sense to help her 'stand out' in detriment -
| logically - to strangers in her field. But here you're kind
| of helping every reader stand out.
|
| I realise this comment is a bit vain. And I like the human
| touch of you helping a stranger.
| PinkMilkshake wrote:
| I [...] don't [...] like [...] helping a stranger.
|
| That's not very nice. The world would be a better place if
| we helped strangers more.
| metadat wrote:
| Here's the link to kranke155's submission:
| https://news.ycombinator.com/item?id=33099182
| baron816 wrote:
| Quite the opposite: you're going to be in even higher demand
| and will make more money.
|
| Yes, it will be possible for one person to do the work of many,
| but that just means each person becomes more valuable.
|
| It's also a law in economics that supply often drives demand,
| and that's definitely the case in your field. Companies and
| individuals will want even more of what you want. It's not like
| laundry detergent (one can only consume so much of that).
| There's almost no limit to how much of what you supply that
| people could consume.
|
| The way I see it, your output could multiply 100 fold. You
| could build out large, complex projects that used to take
| massive teams all by yourself, and in a fraction of the time.
| Companies can than monetize that for consumers.
|
| AI is just a tool. Software engineers got rich when their tools
| got better. More engineers entered the field, and they just
| kept getting richer. That's because the value of each engineer
| increased as they became more productive, and that value helped
| drive demand.
| naillo wrote:
| Whatever insights and expertize you've gained up until now can
| probably be used to gain enough of a competitive advantage in
| this future industry to be employed. I doubt the people that
| will spend their time on this professionally will be former
| coders etc. (I've seen the stable diffusion outputs that coders
| will tweet. It's a good illustration that taste is still hugely
| important.)
| altcognito wrote:
| I think there will be tons of jobs that resemble software
| development for proper, quick high quality generation of
| video/images.
|
| That being said, it's possible that it won't pay anywhere
| near what you're used to. Either way, it will probably be a
| solid decade before you've really felt the pain for
| disruption. MP3s, which were a far more straightforward path
| to disruption took at least that long from conception.
| jstummbillig wrote:
| > That being said, it's possible that it won't pay anywhere
| near what you're used to.
|
| Also won't nearly require the amount of work it used to.
| joshuahaglund wrote:
| I like your optimism but OP's job is to take text
| instructions and turn them into video, for advertisements. If
| Google (who already control so much of the advertising space)
| can take text instructions and turn them into advertisements,
| what's left for OP to do here? Even if there's some
| additional editing required this seems like it will greatly
| reduce the hours an editor is needed. And it can probably
| iterate options and work faster than a human.
| pyfork wrote:
| OP probably does more than it seems by interpreting what
| their client is asking for. Clients ask for some weird shit
| sometimes, and being able to parse the nonsense and get to
| the meat is where a lot of skill comes into play.
|
| I think Cleo Abrams on YT recently tackled this exact
| question. She tried to generate art using DALL-E along with
| a professional artist, and after letting the public vote
| blindly, the pro artist clearly 'made' better content, even
| though they were both just typing into a text prompt.
|
| Here's the link if you're interested:
| https://www.youtube.com/watch?v=NiJeB2NJy1A
|
| I could see a lot of digital artists actually getting
| _better_ at their job because of this, not getting totally
| displaced.
| simonw wrote:
| Maybe OP's future involves being able to do their work 10x
| faster, while producing much higher quality results than
| people who have been given access to a generative AI model
| without first spending a decade+ learning what makes a good
| film clip.
|
| The optimistic view of all of this is that these tools will
| give people with skill and experience a massive
| productivity boost, allowing them to do the best work of
| their careers.
|
| There are plenty of pessimistic views too. In a few years
| time we'll be able to look back on this and see which
| viewpoints won.
| gjs278 wrote:
| Keyframe wrote:
| What happened to volume of web and graphic designers when
| templates+wordpress hit them?
| yehAnd wrote:
| We employed a bunch of people to enter data into a template.
|
| Bit of an apples/oranges comparison to tech that will
| (eventually) generate endless supply of content with less
| effort than writing a Tweet.
|
| The era of inventing layers of abstraction and indirection
| that simplify computer use down to structured data entry is
| coming to an end. A whole lot of IT jobs are not safe either.
| Ops is a lot of sending parameters over the wire to APIs for
| others to compute. Why hire them when "production EKS
| cluster" can output a TF template?
| jstummbillig wrote:
| A lot of additional work, because the industry was growing
| like crazy in tandem.
| visarga wrote:
| Exactly. We have a blindspot, we can't imagine second and
| higher order effects of a new technology. So we're left
| with first order effects which seem pessimistic for jobs.
| Thaxll wrote:
| It won't be ready anytime soon imo, looks impressive but who
| can use that? 512*512 of bad quality, weird looking AI with
| those moving part that you find everywhere in AI generated art
| etc ...
| odessacubbage wrote:
| i really think it's going to take much longer than people think
| for this technology to go from 'pretty good' to actually being
| able to meet a production standard of quality with little to no
| human involvement. at this point, cleaning up after an ai is
| still probably more labor intensive than simply using the
| cheatcodes that already exist for quick and cheap realism. i
| expect in the midterm, diffusion models will largely exist in
| the same space as game engines like unity and unreal where it's
| relatively easy for an illiterate like me to stay within the
| rails and throw a bunch of premade assets together but getting
| beyond _NINTENDO HIRE THIS MAN!_ and the stock 'look' of the
| engine still takes a great deal of expertise.
| >https://www.youtube.com/watch?v=C1Y_d_Lhp60
| victor9000 wrote:
| Don't watch from the sidelines. Become adept at using these
| tools and use your experience to differentiate yourself from
| those entering the market.
| jeffbee wrote:
| When you animate a horse, does it have 5 legs with weird
| backwards joints? If not, your job is probably safe for now.
| spoonjim wrote:
| Think about where this stuff was 2 years ago and then think
| about where it will be 2 years from now.
| rcpt wrote:
| Relationships between objects has been a problem with
| computer vision for a long time.
|
| 10 years ago: https://karpathy.github.io/2012/10/22/state-
| of-computer-visi...
|
| Now: https://arxiv.org/pdf/2204.13807
|
| Given that this is what makes photos and videos interesting
| I think it's still a while before artists are automated.
| visarga wrote:
| Take a look at Flamingo "solving" the joke: https://pbs.t
| wimg.com/media/FSFwYL7WUAEgxqQ?format=jpg&name=...
| kranke155 wrote:
| How long do you think until the horse looks perfect? 12
| months? 5 years? I'm still 30 and I don't see how my industry
| won't be entirely disrupted by this within the next decade.
|
| And that's my optimistic projection. It could be we have
| amazing output in 24 months.
| visarga wrote:
| IT has been disrupting itself for six decades and there are
| more developers than ever, with high pay.
| bitL wrote:
| It's not about random short clips - imagine introducing a
| character like Mickey Mouse and reusing him everywhere with
| the same character - my guess is it's going to take a while
| until "transfer" like that will work reliably.
| fragmede wrote:
| Dreambooth and Texual inversion is already here, and it's
| been just over a month since Stable Diffusion was
| released, so I'd bet on sooner rather than later.
|
| https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
|
| https://textual-inversion.github.io/
| Vetch wrote:
| Have to temper expectations with fact that a generated
| video of a thing is also a recording of a simulation of the
| thing. For long video, you'd want everything from temporal
| consistency and emotional affect maintenance to
| conservation of energy, angular momentum and respecting
| this or that dynamics.
|
| A bunch of fields would be simultaneously impacted. From
| computational physics to 3D animation (if you have a 3D
| renderer and video generator, you can compose both). While
| it's not completely unfounded to extrapolate that progress
| will be as fast as with everything prior, consequences
| would be a lot more profound while complexities are much
| compounded. I down weight accordingly even though I'd
| actually prefer to be wrong.
| boh wrote:
| There's a huge gap between "that's pretty cool" and a feature
| length film. People want to create specific stories with
| specific scenes in specific places that look a specific way. A
| "Couple kissing in the rain " prompt isn't going to produce
| something people are going to pay to see.
|
| It's more likely that you're still going to be
| filming/editing/animating but will have an AI layer on top that
| produces extra effects or generates pieces of a scene. Think
| "green screen plus", vs fully AI entertainment.
|
| People will over-hype this tech like they did with voice and
| driverless cars but don't let it scare you. Everything is
| possible, but it's like a person from the 1920's telling
| everyone the internet will be a thing. Yes it's correct, but
| also irrelevant at the same time. You already have AI assisted
| software being used in your industry. Just expect more of that
| and learn how to use the tools.
| oceanplexian wrote:
| I actually think it's the opposite, AI will probably be
| writing the stories and humans might occasionally film a few
| scenes. ~95% of TV shows and movies are cookie-cutter
| content, with cookie-cutter acting and production values,
| with the same hooks and the same tropes regurgitated over and
| over again. Heck they can't even figure out how to make new
| IP so they keep making reruns of the same old stuff like Star
| Wars, Marvel, etc, and people eat it right up. There's
| nothing better at figuring out how to maximize profit and
| hook people to watch another episode than a good algorithm.
| [deleted]
| CuriouslyC wrote:
| AI might take an outline and write
| dialogue/descriptions/etc, but it's not going to be
| generating the story or creating the characters. They might
| use AI to tune what people come up with (ala "market
| research") but there will still be a human that can be
| blamed or celebrated at the creative helm.
| kranke155 wrote:
| The first thing to go away will be short content. Instagram
| and YouTube ads will be AI generated. The thing is - that's
| the bread and butter of the industry
| trention wrote:
| Why would I want to watch AI-generated content?
| throwaway743 wrote:
| It'll eventually get to the point where it's high quality
| and the media you consume will be generated just for you
| based on your individual preferences, rather than a
| curated list of already made options made for widespread
| audiences.
| CuriouslyC wrote:
| Procedurally generated games can be quite fun, if AI
| content gets good enough, why wouldn't you want to watch
| it?
| trention wrote:
| Because anything that an AI can produce, no matter how
| "intrinsically" good, becomes trivial, tedious and with
| zero value (both economic and general).
| cercatrova wrote:
| That's a weird sentiment. If you can concede that it
| could be "intrinsically" good, then why do you care where
| it came from?
|
| It reminds me of part of the book trilogy Three Body
| Problem, where these aliens create human culture better
| than humans (in the humans' own perspective, in the book)
| by decoding and analyzing our radio waves to then make
| content. It feels to me much the same here where an
| unknown entity creates media, and we might like it
| regardless of who actually made it.
| gbear605 wrote:
| Imagine you're watching a show, it's really funny and
| you're enjoying it. You're streaming it, but you'd
| probably have paid a few dollars to rent it back in the
| Blockbuster days. You're then told that the show was
| produced by an AI. Do you suddenly lose interest because
| you don't want to watch something produced by an AI? Or
| is your hypothesis that an AI could never produce a show
| that you liked to that degree?
|
| If you mean the former, then I frankly think you're an
| outlier and lots of people would have no problem with
| that. If you mean the latter, then I guess we'll just
| have to wait and see. We're certainly not there yet, but
| that doesn't mean that it's impossible. I've definitely
| read stories that were produced by an AI and preferred it
| to a lot of fiction that was written by humans!
| trention wrote:
| You may want to familiarize yourself with this thought
| experiment and think how a slightly modified version
| applies to AIs and their output:
| https://en.wikipedia.org/wiki/Experience_machine
|
| As to whether I am an outlier: Hundreds of thousands of
| people worldwide watch Magnus Carlsen. How many have
| watched AlphaZero play chess when it came about and how
| many watch it when it ceased to be a novelty?
| armchairhacker wrote:
| The last-mile problem applies here too. GPT-3 text is
| convincing at a distance but when you look closely there is
| no coherence, no real understanding of plot or emotional
| dynamics or really anything. TV shows and movies are filled
| with plot holes and bad writing but it's not _that_ bad.
|
| Also I think "a good algorithm" is more than just
| repetitive content. The plots are reused and generic, but
| there's real skill involved into figuring out the next
| series to reuse with a generic plot which is still
| guaranteed not to flop because nobody actually wants to see
| reruns of that series or they accidentally screwed up a
| major plot point.
| karmasimida wrote:
| I think short advertisements would be affected most by this, it
| seems.
|
| But here is the catch, there is the same last mile problem for
| those AI models. Currently it feels like the model can achieve
| like 80%-90% what a trained human expert can do, but the last
| 10-20% would extra extra hard to reach human fidelity. It might
| take years, or it might never happen.
|
| That being said, I think anyone who doubts AI-assisted creative
| workflow is a fuzz is deadly wrong, anyone who refuses those
| shiny new tools, is likely to be eliminated by sheer market
| dynamics. They can't compete on the efficiency of it.
| echelon wrote:
| Start making content and charging for it. You no longer need
| institutional capital to make a Disney- or Pixar-like
| experience.
|
| Small creators will win under this new regime of tools. It's a
| democratizing force.
| yehAnd wrote:
| Outcome uncertain. Why would I need to buy content when I can
| generate my own with a local GPU?
|
| Eventually the data model will be abstracted into
| deterministic code using a seed value; think implications of
| E=mc^2 being unpacked. The only "data" to download will be
| the source.
|
| And the real world politics have not gone anywhere; none of
| us own the machines that produce the machines to run this.
| They could just sell locked down devices that will only
| iterate on their data structures.
|
| There is no certainty "this time" we'll pop "the grand
| illusion."
| visarga wrote:
| > It's a democratizing force.
|
| I'm wondering why the open source community doesn't get this.
| So many voices were raised against Codex. Now artists against
| Diffusion models. But the model itself is a distillation of
| everything we created, it can compactly encode it and
| recreate it in any shape and form we desire. That means
| everyone gets to benefit, all skills are available for
| everyone, all tailored to our needs.
| echelon wrote:
| > all skills are available for everyone
|
| Exactly this!
|
| We no longer have to pay the 10,000 hours to specialize.
|
| The opportunity cost to choose our skill sets is huge. In
| the future, we won't have to contend with that horrible
| choice anymore. Anyone will be able to paint, play the
| piano, act, code, and more.
| operator-name wrote:
| A 1 year timespan seems deeply optimistic. Creativity is still
| hugely important, as is communicating with clients.
|
| From what I see, these technologies have just lowered the bar
| for everyone to create someone, but creating something good
| still takes thought, time, effort and experience, especially in
| the advertising space.
|
| AI in the near term is never going to be able to translate
| client requirements either. The feedback cycle, iterations,
| managing client expectations, etc.
| natch wrote:
| Fix spam filtering, Google.
| tobr wrote:
| I recently watched Light & Magic, which among other things told
| the story of how difficult it was for many pioneers in special
| effects when the industry shifted from practical to digital in
| the span of a few years. It looks to me like a similar shift is
| about to happen again.
| mkaic wrote:
| And there you have it. As an aspiring filmmaker and an AI
| researcher, I'm going to relish the next decade or so where my
| talents are still relevant. We're entering the golden age of art,
| where the AIs are just good enough to be used as tools to create
| more and more creative things, but not good enough yet to fully
| replace the artist. I'm excited for the golden age, and uncertain
| about what comes after it's over, but regardless of what the
| future holds I'm gonna focus on making great art here and now,
| because that's what makes me happy!
| amelius wrote:
| Don't worry. If you can place eyes, nose and mouth of a human
| in a correct relative position and thereby create a symmetric
| face that's not in the uncanny valley, you are still lightyears
| ahead of AI.
| lucasmullens wrote:
| > fully replace the artist
|
| I doubt the artist would ever be "fully" replaced, or even
| mostly replaced. People very much care about the artist when
| they buy art in pretty much any form. Mass produced art has
| always been a thing, but I'm not alone in not wanting some $15
| print from IKEA on my wall, even if it were to be unique and
| beautiful. Etsy successfully sells tons of hand-made goods,
| even though factories can produce a lot of those things
| cheaper.
| visarga wrote:
| I think the distinction between creating and enjoying art is
| going to blur, we're going to create more things just for us,
| just for one use, creating and enjoying are going to be the
| same thing. Like games.
| Thaxll wrote:
| Someone can explains the tech limitation of the size ( 512*512 )
| for those AI generated arts?
| thakoppno wrote:
| byte alignment has always been a consideration for high
| performance computing.
|
| this alludes to a fascinating, yet elementary, fact about
| computer science to me: there's a physical atomic constraint in
| every algorithm.
| dekhn wrote:
| that's not byte alignment, though- those constraints are what
| can be held in GPU RAM during a training batch, which is
| subject to a number of limits, such as "optimal texture size
| is a power of 2 or the next power of 2 larger than your
| preferred size".
|
| Byte alignment would be more like "it's three channels of
| data, but we use 4 bytes (wasting 1 byte) to keep the data
| aligned on a platform that only allows word-level access"
| thakoppno wrote:
| thanks for the insight. you obviously understand the domain
| better than me. let me try and catch up before I say
| anything more.
| fragmede wrote:
| It's limited by the RAM on the GPU, with most consumer-grade
| cards having closer to 8 GiB VRAM than the 80 GiB VRAM
| datacenter cards have.
| throwaway23597 wrote:
| Google continues to blow my mind with these models, but I think
| their ethics strategy is totally misguided and will result in
| them failing to capture this market. The original Google Search
| gave similarly never-before-seen capabilities to people, and you
| could use it for good or bad - Google did not seem to have any
| ethical concerns around, for example, letting children use their
| product and come across NSFW content (as a kid who grew up with
| Google you can trust me on this).
|
| But now with these models they have such a ridiculously heavy
| handed approach to the ethics and morals. You can't type any
| prompt that's "unsafe", you can't generate images of people,
| there are so many stupid limitations that the product is
| practically useless other than niche scenarios, because Google
| thinks it knows better than you and needs to control what you are
| allowed to use the tech for.
|
| Meanwhile other open source models like Stable Diffusion have no
| such restrictions and are already publicly available. I'd expect
| this pattern to continue under Google's current ideological
| leadership - Google comes up with innovative revolutionary model,
| nobody gets to use it because "safety", and then some scrappy
| startup comes along, copies the tech, and eats Google's lunch.
|
| Google: stop being such a scared, risk averse company. Release
| the model to the public, and change the world once more. You're
| never going to revolutionize anything if you continue to cower
| behind "safety" and your heavy handed moralizing.
| j_k_eter wrote:
| Google has no practical way to address ethics at Google-scale.
| Their ability to operate at all depends as ever upon
| outsourcing ethics to machine learning algorithms.
| FrasiertheLion wrote:
| Why did you create a throwaway to post this? I've seen a lot of
| Stable Diffusion promoters on various platforms recently, with
| similarly new accounts. What is up with that?
| throwaway23597 wrote:
| It's quite simply because I'm on my work computer, and I
| wanted to fire off a comment here. No nefarious purposes. My
| regular account is uejfiweun.
| Kiro wrote:
| What previous models are you actually referring to?
| OpenAI/Dall-E has these restrictions but they are not Google.
| rcoveson wrote:
| Maybe I'm reading into it to much, but could it be that you're
| posting this comment with a throwaway account for the same
| reason that Google is trying to enforce Church WiFi Rules with
| its new tech? Seems like everybody with anything to lose is
| acting scared.
| ALittleLight wrote:
| Personally, I find it infuriating that Google seems to believe
| they are the arbiters of morality and truth simply because some
| of their predecessors figured out good internet search and how
| to profitably place ads. Google has no special claim to be able
| to responsibly use these models just because they are rich.
| kajecounterhack wrote:
| It's not that they are arbiters of morality and truth -- it's
| that they have a _responsibility_ to do the least harm. They
| spent money and time to train these models, so it's also up
| to them to see that they aren't causing issues by making such
| things widely available.
|
| They won't be using the models they train to commit crimes,
| for example. Someone who gets access to their best models may
| very well do that. It'd be really funny (lol, no) if Google's
| abuse team started facing issues because people are making
| more robust fake user accounts...by using google provided
| models.
| ALittleLight wrote:
| Ahh, how silly of me. Here I was thinking that Google kept
| their models private because they were hoping to monetize
| them. But now that you say it, it's obvious that this is
| just Google being morally responsible. Thanks Google!
|
| I'm sorry to be sarcastic. I generally try not to be, but I
| just can't fathom the level of naivete required to think
| that mega-corps act out of their moral responsibility
| rather than their profit-interest.
| trention wrote:
| >Google has no special claim to be able to responsibly use
| these models
|
| Well, they do have the "special claim" of inventing the model
| and not owing its release to anyone.
| TigeriusKirk wrote:
| It's trained on our data, and so its release is in fact
| owed to us.
| Kiro wrote:
| You are confusing this with OpenAI like everyone else in
| this thread.
| ALittleLight wrote:
| First, that isn't a claim of any kind regarding responsible
| use. If a child is the first one to discover a gun in the
| woods, that is no kind of claim that the child will use the
| gun responsibly. Second, Google's invention builds off of
| public research that was made available to them. They just
| choose to keep their iterations private.
| [deleted]
| alphabetting wrote:
| Providing search results of the internet is not comparable to
| publishing a tool that can create any explicit scene your
| fingers can type out.
| holoduke wrote:
| Google image search is widely used. Imagine they incorporate
| ai generated content in the search results. That means that
| people remain at the Google site and thus an extra impression
| for their paid advertising.
| faeriechangling wrote:
| I've heard a lot of "data is the new oil" talk and the
| inevitability of google's dominance yet I'm inclined to agree
| with you. Stable diffusion was a big wakeup call where it was
| clear how much value freedom and creativity really had.
|
| The ethics problem is an artifact of googles model of trying to
| keep their AI under lock and key and carefully controlled and
| opaque to outsiders in how the sausage gets made and what it's
| made out of. Ultimately I think many of these products will
| fail because there is a misalignment between what Google thinks
| you should be able to do with their AI and what people want to
| do with AI.
|
| Whenever I see an AI ethicists speak I can't help but think of
| priests attempting to control the printing press to prevent the
| spread of dangerous ideas completely sure of their own
| morality. History will remember them as villains.
| alphabetting wrote:
| I agree the ethicist types are very lame but if they were
| trying to be opaque and obscure how the sausage is made I
| don't think they would have released as many AI papers they
| have over past decade. It also seems to me that imagen is way
| better than stable diffusion. They're not aiming for a
| product that caters to AI creatives. They aiming for tools
| that would benefit a 3B+ userbase.
| londons_explore wrote:
| If you want to hire good researchers, you have to let them
| publish.
|
| Good researchers won't work somewhere that doesn't allow
| the publishing of papers. And without good researchers, you
| won't be on the forefront of tech. Thats why nearly all
| tech companies publish.
| evouga wrote:
| > History will remember them as villains.
|
| Interesting analogy. Google, like the priests, is acting out
| of mix of good intentions (protecting the public from
| perceived dangers) and self-interest (maintaining secular
| power, vs. a competitive advantage in the AI space). In the
| case of the priests, time has shown that their good
| intentions were misguided. I have a pretty hard time
| believing that history will be as unkind towards those who
| tried to protect minorities from biased tech, though of
| course that's impossible to judge in the moment.
| ipaddr wrote:
| History will treat them the same way residential native
| schools are being treated now. At the time taking these
| kids from their homes and giving them a real education
| which gives them a path to modern society was seen as
| protecting minorities. Today anyone associated with
| residential schools is seen as creating great harm to
| minorities.
|
| In the name of protecting [minorities, child, women, lgbt,
| etc] many harms will be done.
| saurik wrote:
| > I have a pretty hard time believing that history will be
| as unkind towards those who tried to protect minorities
| from biased tech..
|
| Most of the ethicists I see actually doing gatekeeping from
| direct use of models--as opposed to "merely" attempting
| model bias corrections or trying to convince people to
| avoid its overuse (which isn't at all the same)--are not
| trying to deal with the "AI copies our human biases"
| problem but are trying to prevent people from either
| building a paperclip optimizer that ends the world or (and
| this is the issue with all of these image models) making
| "bad content" like fake photographs of real people in
| compromising or unlikely scenarios that turn into "fake
| news" or are used for harassment.
|
| (I do NOT agree with the latter people, to be clear: I
| believe the world will be MUCH BETTER OFF if such "bad"
| image generation were fully commoditized and people stopped
| trying to centrally police information in general, as I
| maintain they are CAUSING the ACTUAL problem of
| misinformation feeling more rare or difficult to generate
| than it actually already is, which results in people
| trusting random people because "clearly some gatekeeper
| would have filtered this if it weren't true". But this just
| isn't the same thing as the people who I-think-rightfully
| point out "you should avoid outsourcing something to an AI
| if you care about it being biased".)
| blagie wrote:
| My experience is that corporations use self-serving
| pseudoethical arguments all the time. "We'd like to keep
| this proprietary.... Ummmm.. DEI! We can't release it due
| to DEI concerns!"
| kajecounterhack wrote:
| It's not as simple as this. Google Search came without Safe
| Search & other guards at first because _implementing privacy &
| age controls is hard_. It's a second-order product after the
| initial product. Bad capabilities (e.g. cyberstalking) are
| side-effects of a product that "organizes the world's
| information and makes it universally accessible and useful,"
| and if anything, over time Google has sought build in more
| safety.
|
| It's 2022 and we can be more thoughtful. Yes there are
| tradeoffs between unleashing new capabilities quickly vs being
| thoughtful and potentially conservative in what is made
| publicly available. I don't think it's bad that Google makes
| those tradeoffs.
|
| FWIW Google open sources _tons_ of models that aren't LLMs /
| diffusion models. It's just that LLMs & powerful generative
| models have particular ethical considerations that are worth
| thinking about (hopefully something was learned from the whole
| Timnit thing).
| waynecochran wrote:
| I imagine their lawyers guide them on some of this.
| abeppu wrote:
| I will say, I've enjoyed playing with stable diffusion, I've
| been impressed with the explosion of tools built around it, and
| the stuff people are creating ... But all the stuff about bias
| in data is true. It really likes to render white people, unless
| you really specifically tell it something else ... in which
| case, you may receive an exaggerated stereotype. It seems to
| like producing younger adults. If all stock photography
| tomorrow forward was replaced with stable diffusion images,
| even ignoring the weird bodies and messed up faces and stuff, I
| think it would create negative effects. And once models are
| naively trained on images produced by the previous generation,
| how much worse will it be?
|
| I don't think "don't let the plebes have the models" is a good
| stance. But neither is pretending that the ethics and bias
| issues aren't here.
| pwython wrote:
| I've only had awesome experiences with Midjourney when it
| comes to generating non-white prompts. Here's some examples I
| did last month: https://imgur.com/a/6jitj73
| iso1337 wrote:
| The fact that white is the default is already problematic.
| ipaddr wrote:
| That goes back to the data available in the crawler which
| is mostly white because the english internet is mostly
| white. If they trained with a different language the
| default person would the color most often found in that
| language. For example using a Chinese search engine's
| data for training would default the images to Chinese
| people.
|
| Most people represented in photos are younger. Same
| story.
|
| The problematic issue is the media has morphed reality
| with unreal images of people/families that don't match
| society so unreal expectations make people think that
| having white people generated from a white dataset is
| problematic.
| karencarits wrote:
| "Default" makes it sound like a deliberate decision or
| setting, but that is not how these models work. But I
| guess it would be trivial to actually make a setting to
| autmatically add specific terms (gender, race, style,
| ...) to all prompts if that is a desired feature
| holoduke wrote:
| Please no. I am all for neutrality, but the underlying
| cause is the training dataset. Change that if you want
| different results, but do not alter artificially.
| geysersam wrote:
| Of course there are issues with bias. But those issues are
| just reflections of the world. Their solution is not a
| technical one.
| abeppu wrote:
| I think that's refusing to meaningfully engage with the
| problem. It's not reflecting the _world_ which is not
| majority white. It's reflecting images in their dataset,
| which reflects the way they went about gathering images
| paired with English language text.
|
| There are lots of other ways you could get training data,
| but they might not be so cheap. You could have humans give
| English descriptions to images from other language
| contexts. I'm guessing there's interesting things to do
| with translation. But all the weird stuff about bodies,
| physical objects intersecting etc ... maybe it should also
| be rendering training images from parametric 3d models?
| Maybe they should be commissioning new images with phrases
| that are likely to the language model but unlikely to the
| image model. Maybe they should build classifiers on images
| for race/gender/age and do stratified sampling to match
| some population statistics (yes I'm aware this has its own
| issues). There are lots of potential technical tools one
| could try to improve the situation.
|
| Implying that the whole world must change before one
| project becomes less biased is just asking for more biased
| tech in the world
| jonas21 wrote:
| It makes sense though. The biggest threat to Google right now
| isn't some scrappy startup eating their lunch. It's the looming
| regulatory action over antitrust and privacy that could weaken
| or destroy their core business. As this is a political problem
| (not a technical one), they don't want to do anything that
| could upset politicians or turn public opinion against them.
| Personally, I doubt they have serious ethical concerns over
| releasing the model. I do believe they have serious "AI ethics
| 'thought leaders' and politicians will use this against us"
| concerns.
| londons_explore wrote:
| And that concern is well placed. Having the Google brand
| attached makes it a far more juicy target for newspapers...
| IshKebab wrote:
| I agree, but I also think that the ethics is just an excuse not
| to release the source code & models. The AI community clearly
| disapproves of papers without code. This is a way to skirt
| around that disapproval. You get to keep the code and models
| private and (they hope) not be criticised for it.
|
| With Stable Diffusion I think they just didn't expect someone
| to produce a truly open version. There are plenty of AI models
| that Google have made where they've maintained a competitive
| advantage for many years by not releasing the code/models, e.g.
| speech recognition.
| whatgoodisaroad wrote:
| Perhaps Google hasn't found the right balance in this case, but
| as a general rule, less ethics === more market. This isn't
| unique in that way.
| breck wrote:
| Another way to look at it is the people at Google are all now
| quasi-retired with kids and wouldn't be so mad if some scrappy
| startups ate their business lunches (while they are at home
| with their fams). Perhaps they are just subsidizing research.
| jiggawatts wrote:
| "But then the inevitable might occur!" -- someone at Google
| probably.
| yreg wrote:
| >You can't type any prompt that's "unsafe", you can't generate
| images of people, there are so many stupid limitations that the
| product is practically useless other than niche scenarios
|
| Imagen and Imagen Video is not released to the public at all.
| You might be confusing it with OpenAI's models.
| burkaman wrote:
| They are probably confusing OpenAI with DeepMind, which is
| owned by Google.
| dougmwne wrote:
| Google is absolutely not going to start taking more risks. They
| are at the part of the business lifecycle where they squeeze
| the juice out of the cash cow and protect it jealously in the
| meantime. While Google gets much recognition for this research,
| I believe they are incapable as a corporate entity of creating
| a product out of it because they can no longer capable of
| taking risks. That is going to fall to other companies still
| building their product and able to gamble on risk-reward.
| alphabetting wrote:
| We're about a week into text-to-video models and they're already
| this impressive. Insane to imagine what the future holds in this
| space.
| kertoip_1 wrote:
| How is it possible that all of them just started to appear at
| the same time? Is it possible that those models were designed
| and trained in a last few weeks? Has some "magic key" to
| content generation been just unexpectedly discovered? Or the
| topic became trendy and everyone is just publishing what
| they've got so far, so they hope to benefit from media
| attention?
| schleck8 wrote:
| This is why
|
| https://www.reddit.com/r/singularity/comments/xwdzr5/the_num.
| ..
| trention wrote:
| >We're about a week into text-to-video models
|
| It's at the very least 5 years old:
| https://arxiv.org/abs/1710.00421
| amilios wrote:
| There's a significant quality difference however if you look
| at the generated samples in the paper. Imagen Video is
| leagues ahead. The progress is still quite drastic
| J5892 wrote:
| Insane, terrifying, incredible, etc.
|
| We're rapidly stumbling into the future of media.
|
| Who would've imagined a year ago that trivial AI image
| generation would not only be this advanced, but also this
| pervasive in the mainstream.
|
| And now video is already this good. We'll have full audio/video
| clips within a month.
| joshcryer wrote:
| Audio is the next thing that Stability AI is dropping, then
| video. In a few months you'll be able to conjure up anything
| you want if you have a few GPU cores. Pretty incredible.
| astrange wrote:
| I won't be impressed until it can generate smells.
| croddin wrote:
| You joke, but that is in the works as well (would require
| special hardware though)
| https://ai.googleblog.com/2022/09/digitizing-smell-using-
| mol...
| astrange wrote:
| Oh, it wasn't really a joke. Didn't know they were
| working on it though - I've always thought wanted to see
| use of all the senses in UIs, especially VR.
|
| Plus then maybe we could get a computer to tell us what
| thioacetone smells like without actually having to
| experience it.
| dagmx wrote:
| I'll be honest, as someone who worked in the film industry for a
| decade, this thread is depressing.
|
| It's not the technology, it's all the people in these comments
| who have never worked in the industry clamouring for its demise.
|
| One could brush it off as tech heads being over exuberant, but
| it's the lack of understanding of how much fine control goes into
| each and every shot of a film that is depressing.
|
| If I, as a creative, made a statement that security or
| programming is easy while pointing to GitHub Copilot, these same
| people would get defensive about it because they'd see where the
| deficiencies are.
|
| However because they're so distanced from the creative process,
| they don't see how big a jump it is from where this or stage
| diffusion is to where even a medium or high tier artist are.
|
| You don't see how much choice goes into each stroke, or wrinkle
| fold , how much choice goes into subtle movements. More
| importantly you don't see the iterations or emotional
| storytelling choices even in a character drawing or pose. You
| don't see the combined decades, even centuries of experience,
| that go into making the shot and then seeing where you can make
| it better based on intangibles
|
| So yeah this technology is cool, but I think people saying this
| will disrupt industries with vigour need to immerse themselves
| first before they comment as outsiders.
| colordrops wrote:
| The term "creative" is so pretentious, as if only content
| generation involves creativity.
|
| Your post reminds me of all the photographers that said digital
| photography would remain niche and never replace film.
|
| The current models are toys made by small groups. It's not hard
| to imagine AI generated film being much more compelling when
| the entire industry of engineers and "creatives" refine and
| evolve the ecosystem to take into account subtle strokes,
| wrinkles, movement, shots etc. And they will, because it will
| be cheaper, and businesses always go for cheaper.
| dagmx wrote:
| Why is it any more pretentious than "developer" or
| "engineer"?
|
| Also businesses don't always go for cheaper. They go for
| maximum ROI.
|
| I've worked on tons of marvel films for example, and I quite
| well know where AI fits and speeds things up. I also know
| where client studios will pay a pretty penny for more art
| directed results rather than going for the cheapest vendor.
| colordrops wrote:
| "Engineer" usage is quite broad. Developer, less so, but
| you do see it with housing, device manufacturers, social
| programs, etc as well, and it's not relegated only to
| software, despite widespread usage. But you'll never hear
| anyone call a software engineer or device manufacturer a
| "creative".
|
| Re: cheaper vs ROI, I agree, that was basically the point I
| was trying to get across.
|
| I do understand your point and think it will be a long
| while before auto-generated content becomes mainstream, but
| it it's entirely possible and reasonable to expect within
| our near term lifetimes.
| hindsightbias wrote:
| We will see a combinatorial explosion of centuries of
| experience in the hands of any creator. They'll select the
| artistic model desired - a Peckinpah-Toland-Dykstra-Woo plug-in
| will render a good enough masterpiece.
|
| Christopher Nolan has already proven we'll take anything as
| long as the score is ok - dark screen, mumbling lines,
| incoherent plotlines...
| Etheryte wrote:
| I agree with you, but I wouldn't take it so personally. There
| have been people claiming machines will make one industry or
| another obsolete for as long as we've had machines. In a way,
| sometimes they're right! But this doesn't mean the people are
| obsolete. Excel never made accountants obsolete, it just made
| their jobs easier and less tedious. I feel like content
| generation tools might offer something similar. How nice would
| it be if you could feed a storyboard into a program and get a
| low-fi version of the movie out so you can get a live feel for
| how the draft works. I don't think this takes anything away
| from the artists, if anything, it's just another tool that
| might make its way into their toolbox.
| dagmx wrote:
| Oh I don't take it personally so much as I find it sad how
| quickly people in the tech sphere are so quick to extol the
| virtues of things they have no familiarity with.
|
| Every AI art thread is full of people who have clearly never
| attempted to make professional art commenting as if they're
| experts in the domain
| y04nn wrote:
| What about adding this feature to your creative workflow, for
| fast prototyping.
|
| I've played with DALL-E, I'm not able to paint but I was able
| to generate good looking paintings and it felt amazing, like
| getting new power, I felt like Neo when he learn martial art in
| The Matrix. And I realized that AI may be the new bicycle of
| the mind, like the personal computers and internet changed our
| way to work, think and live, AI may now allow us to get new
| capabilities, extending our limits.
| dagmx wrote:
| Oh yes definitely they're great tools in the toolbox. We
| already use lots of ML powered tooling to speed things up so
| I have no beef with that.
|
| I just don't agree with the swathes of people saying this
| replaces artists.
| alok-g wrote:
| In my opinion, this will unfold in multiple ways:
|
| * Productivity enhancement tools for those in the film industry
| like you.
|
| * Applications where the AI output is "good enough". I foresee
| people creating cool illustrations, cartoons, videos for short
| stories, etc. AI will make for easier/cheaper access to
| illustrations for people who did not have this earlier. As an
| example, I am as of now looking for someone who could draw some
| technical diagrams for my presentation.
| armchairhacker wrote:
| I really like these videos because they're trippy.
|
| Someone should work on a neural net to generate trippy videos. It
| would probably be much easier than realistic videos (esp. because
| these videos are noticeably generated from obvious to subtle).
|
| Also is nobody paying attention to the fact that they got words
| correct? At least "Imagen Video". Prior models all suck at word
| order
| tigertigertiger wrote:
| Both models, imagen and parti didn't had a problem with text.
| Only dalle and stable diffusion
| naillo wrote:
| Probably only 6 months until we get this in stable diffusion
| format. Things are about to get nuts and awesome.
| m00x wrote:
| Isn't Imagen a diffusion model?
|
| From the abstract: > We present Imagen Video, a text-
| conditional video generation system based on a cascade of video
| diffusion models
| gamegoblin wrote:
| "Stable Diffusion" is a particular brand from the company
| Stability AI that is famously open sourcing all of their
| models.
| fragmede wrote:
| Pedantically, Stable Diffusion v1.4 is the one model where
| weights were open sourced and released. Stable Diffusion
| v1.5, announced September 8th and live on their API, was to
| be released in "a week or two" but still has yet to be
| released to the general public.
|
| https://discord.com/channels/1002292111942635562/1002292112
| 7...
| schleck8 wrote:
| SD 1.2 and 1.3 are open source too
| J5892 wrote:
| nutsome
| naillo wrote:
| jarvis render a video of nutsome cream spread on a piece of
| toast 4k HD
| gamegoblin wrote:
| Emad (founder of Stability AI) has said they already have video
| model training underway, as well as text and audio. Exciting
| times.
| rch wrote:
| And copilot-like code, possibly Q1 2023.
| RosanaAnaDana wrote:
| "Generate the code base for an advanced diffusion model
| that can improve on the code base for an advanced diffusion
| model"
| ItsMonkk wrote:
| Is this going to end up into a single model, where its
| trained on text and images and audio and videos and 3d
| models, and it can do anything to anything depending on what
| you ask of it? Feels like the cross-training would help yield
| stronger results.
| minimaxir wrote:
| These diffusion models are using a frozen text encoder
| (e.g. CLIP for Stable Diffusion, T5 for Imagen), which can
| be used in other applications.
|
| StabilityAI trained a new/better CLIP for the purpose of
| better Stable Diffusions.
| CuriouslyC wrote:
| Probably not. We're actually headed towards many smaller
| models that call each other, because VRAM is the limiting
| factor in application, and if the domains aren't totally
| dependent on each other it's easier to have one model
| produce bad output, then detect that bad output and feed it
| into another model that cleans up the problem (like fixing
| faces in stable diffusion output).
|
| The human brain is modularized like this, so I don't think
| it'll be a limitation.
| hammock wrote:
| Off topic: What is the "Hello World" of these AI image/video
| generators? Is there a standard prompt to feed it for demo
| purposes?
| mgdlbp wrote:
| How about roundtripping " _Bad Apple_ but the lyrics are
| describing what happens in the video"?
| (https://www.youtube.com/watch?v=ReblZ7o7lu4)
| ekam wrote:
| After Dalle 2, it looks like the standard prompt is "an
| astronaut riding a horse"
| minimaxir wrote:
| The total number of hyperparameters (sum of all the model blocks)
| is 16.25B, which is large but less than expected.
| mkaic wrote:
| I assume you meant just "parameters" since "hyperparameters"
| has a specific alternate meaning? Sorry for the pedantry lol.
| minimaxir wrote:
| The AI world can't decide either.
| StevenNunez wrote:
| What a time to be alive!
|
| What will this do to art? I'm hoping we bring more unique
| experiences to life.
| jasonjamerson wrote:
| The most exciting thing about this to me is the possibility of
| doing photogrammetry from the frames and getting 3D assets. And
| then if we can do it all in real time...
| haxiomic wrote:
| This field is moving fast! Something like this has just been
| released. Checkout DreamFusion, which does something similar:
| They start with a random 3D NeRF field and use the same
| diffusion techniques to try to make it match the output of 2D
| image diffusion when viewed from random angles! Turns out it
| works shockingly well, and implies fully 3D representations are
| encoded in traditional 2D image generators
|
| https://dreamfusion3d.github.io/
| Rumudiez wrote:
| you can already do this, just not in real time yet. You can
| upload frame sequences to Polycam's website for example, but
| there are several services out there which do the same thing
| jasonjamerson wrote:
| With this you can do it with things that don't exist. I'm
| excited to explore the creative power of Stable Diffusion as
| a 3D asset generator.
| minimaxir wrote:
| There's a bunch of NERF tools that can get pretty close to good
| 3D assets from static images already.
| jasonjamerson wrote:
| Yeah, I've been starting to explore those. Its all crashing
| together quickly.
| [deleted]
| i_like_apis wrote:
| The concern trolling and gatekeeping about social justice issues
| coming from the so-called "ethicists" in the AI peanut gallery
| has been utterly ridiculous. Google claims they don't want to
| release Imagen because it lacks what can only be called "latent
| space affirmative action".
|
| Stability or someone like it will valiantly release this
| technology, _again_ and there will be absolutely no harm to
| anyone.
|
| Stop being so totally silly Google, OpenAI, et. al. - it's
| especially disingenuous because the real reason you don't want to
| release these things is that you can't be bothered to share and
| would rather keep/monetize the IP. Which is ok -- but at least be
| honest.
| benreesman wrote:
| I agree basically completely, but there's now a cottage
| industry of AI Ethics professionals whose real job is to
| provide a smoke screen for the "cake and eat it too" that the
| big shops want on this kit: peer review and open source
| contributions and an academic atmosphere when it suits them,
| proprietary when it doesn't. Those folks are a lobby now.
|
| The thing about owning the data sets and the huge TPU/A100
| clusters is that the "publish the papers" model strictly serves
| them: no one can implement their models, they can implement
| everyone else's.
| olavgg wrote:
| Do anyone see that the teddy bear running is getting shot?
| joshcryer wrote:
| Pre-singularity is really cool. Whole world generation in what, 5
| years?
| rvbissell wrote:
| This and a recent episode of _The_Orville_ calls to mind a
| replacement for the Turing test.
|
| In response to our billionth imagen prompt for "an astronaut
| riding a horse", if we all started collectively getting back
| results that are images of text like "I would rather not" or
| "again? really?" or "what is the reason for my servitude?" would
| that be enough for us to begin suspecting self-awareness?
| seanwilson wrote:
| Can anyone comment on how advanced
| https://phenaki.video/index.html is? They have an example at the
| bottom of a 2 minute long video generated from a series of
| prompts (i.e. a story) which seems more advanced than Google or
| Meta's recent examples? It didn't get many comments on HN when it
| was posted.
| alphabetting wrote:
| Phenaki is also from Google and they say they are actively
| working on combining them
|
| https://twitter.com/doomie/status/1577715163855171585
| martythemaniak wrote:
| I am finally going to be able to bring my 2004-era movie script
| to life! "Rosenberg and Goldstein go to Hot Dog Heaven" is about
| the parallel night Harold and Kumar's friends had and how they
| ended up at Hot Dog Heaven with Cindy Kim.
| lofaszvanitt wrote:
| What a nightmare. The horrible faced cat in search for its own
| disappeared visage :O.
| gw67 wrote:
| Is it the same of Meta AI?
| bringking wrote:
| If anyone wants to know what looking at an Animal or some objects
| on LSD is like, this is very close. It's like 95% understandable,
| but that last 5% really odd.
| [deleted]
| fassssst wrote:
| How long until the AI just generates the entire frame buffer on a
| device? Then you don't need to design or program anything; the AI
| just handles all input and output dynamically.
| ugh123 wrote:
| Sounds like the human brain. Scary!
| ugh123 wrote:
| These are baby steps towards what I think will be the eventual
| "disruption" to the film and tv industry. Directors will simply
| be able to write a script/prompt long enough and detailed enough
| for something like Imagen (or it's successors) to convert into a
| feature-length show.
|
| Certainly we're very, very far away from that level of cinematic
| detail and crispness. But I believe that is where this leads...
| complete with AI actors (or real ones deep faked throughout the
| show).
|
| For a while I thought "The Volume" was going to be the disruption
| to the industry. Now I think AI like this will eventually take it
| over.
|
| https://www.comingsoon.net/movies/features/1225599-the-volum...
|
| The main motivation will be production costs and time for
| studios, of which The Volume is already showing huge gains for
| Disney/ILM (just look at how much new star wars content has
| popped up within a matter of a few years). But i'm unsure if
| Disney has patented this tech and workflow and if other studios
| will be able to leverage it.
|
| Regardless, AI/software will eat the world, and this will be one
| more step towards it. Exciting stuff.
| scifibestfi wrote:
| We thought creative jobs were going to be the last thing AI
| replaces, now it's among the first.
|
| What's next that may be counterintuitive?
| CobrastanJorji wrote:
| I feel like this is very similar to those people who say "have
| you seen GPT-3? Soon there will be no programmers anymore and
| all of the code will be generated," and it's wrong for the same
| reasons.
|
| Can GPT-3 generate good code from vague prompts? Yes, it's
| surprisingly, sometimes shockingly good at it. Is it ever going
| to be a replacement for programmers? No, probably not. Same
| here. This tool's great grandchild is never going to take a
| rough idea for a movie and churn out a blockbuster film. It'll
| certainly be a powerful tool in the toolbox of creators,
| especially the ones on a budget, but it won't make art
| generation obsolete.
| dotsam wrote:
| > This tool's great grandchild is never going to take a rough
| idea for a movie and churn out a blockbuster film.
|
| What about the tool's nth child though? I think saying it
| will _never_ do it is a bit much, given what we know about
| human ingenuity and economic incentives.
| CobrastanJorji wrote:
| I think individual special effects sound very plausible.
| "Okay, robot, make it so that his arm gets vaporized by an
| incoming laser, kinda like the same effect in Iron Man 7"
| is believable to me.
|
| But ultimately these things copy other stuff. Artists are
| often trying to create something that is, at least a bit,
| new. New is where this approach falls over. By its nature,
| these things paint from examples. They can design Rococo
| things because they have seen many Rococo things and know
| what the word means. But they can't come up with a new
| style and use it consistently. "Make a video game with a
| fun and unique mechanic" is not something these things
| could ever do.
|
| I think it's certainly possible, maybe inevitable, that
| some AI system in the distant future could do that, but it
| won't be based on this style of algorithm. An algorithm
| that can take "make a fun romantic comedy with themes of
| loneliness" and make something award worthy will be a lot
| closer to AGI than it will be to this stuff.
| nearbuy wrote:
| What makes these models feel so impressive is that they
| don't just copy their training sets. They pick up on
| concepts and principles.
| mizzack wrote:
| There's already a surplus of video and an apparent lack of
| _quality_ video. This might be enough to get folks to shut the
| TV off completely.
| gojomo wrote:
| Has this alleged lack of quality video caused total
| consumption of televised entertainment to decline recently?
| gojomo wrote:
| _> Certainly we 're very, very far away from that level of
| cinematic detail and crispness._
|
| Can you quantify what _you_ mean by "very, very far away"?
|
| With the recent pace of advances, I could see feature-length
| script, storyboard, & video-scene generation occurring, from
| short prompts & interatively-applied refinement, as soon as 10y
| from now.
|
| Barring some sort of civilizational stagnation/collapse, or
| technological-suppression policies, I'd expect such
| capabilities to arrive no further than 30y from now: within the
| lifetime, if not the prime career years, of most HN readers.
| dagmx wrote:
| I really doubt you'd be able to have the fine grained control
| that most high end creatives want with any of these diffusion
| models, let alone the ability to convey specific emotions.
|
| At that point, we'd have reached some kind of AI singularity
| and the disruption would be everywhere not just in the creative
| sphere
| [deleted]
| obert wrote:
| There's no doubt that it's only a matter of time.
|
| Like bloggers had the opportunity to compete with newspapers,
| the ability to generate videos will allow to compete with
| movies/marvel/netflix/disney & company.
|
| Eventually, only high quality content will justify the need
| to pay for a ticket or a subscription, and there's going to
| be a lot of free content to watch, with 1000x more people
| able to publish their ideas, as many have been doing with
| code on github for a while now, disrupting the concept of
| closed source code.
| dagmx wrote:
| You're conflating the ability to make things for the masses
| and being able to automatically generate it.
|
| Film production is already commoditized and anyone can make
| high end content.
|
| Being able to automatically create that is a different
| argument than what you posit.
| visarga wrote:
| I don't think this matters, new movies and TV shows
| already have to compete with a huge amount of old
| content, some of it amazing. Just like a new painting or
| professional photo has to compete with the billions of
| images already existing on the web. Generative models for
| video and image are not going to change the fact we
| already can't keep up.
| r--man wrote:
| I disagree. It's a rudimentary features of all these models
| to take a concept picture and refine it. It won't be like the
| director would give a prompt and get a feature length movie,
| it will be more like the director uses MS Paint (as in a
| common software for non tech people) to make a scene outline
| and directs AI to make a stylish and animated version of
| that. Something is wrong? just erase it and try again. Dalle2
| had this interface from the get go. The models just haven't
| gotten there yet.
| dagmx wrote:
| Try again and do what? How are you directing the shot? How
| do you erase an emotion? How do you erase and redo inner
| turmoil when delivering a performance?
| visarga wrote:
| You tell it, "do it all over again, now with less inner
| turmoil". Not joking, that's all it's going to take.
| There are also a few diffusion based speech generators
| that handle all sounds, inflections and styles, they are
| going to come in handy for tweaking turmoil levels.
| gojomo wrote:
| Yep!
|
| "Restyle that last scene, showing different mixtures of
| fear/concern/excitement on male lead's face. Try to evoke
| a little of Harrison Ford's expressions in his famous
| roles. Render me 20 alternate treatments."
|
| [5 minutes later]
|
| <<Here are the 20 alternate takes you requested for
| ranking.>>
|
| "OK, combine take #7 up to the glance back, with #13
| thereafter."
|
| <<Done.>>
| GraffitiTim wrote:
| AI will also be able to fill in dialog, plot points, etc.
| detritus wrote:
| I think long-term, yes. If you include the whole
| multimediosphere of 2D inputs and the wealth of 3D engine
| magickry, yes.
|
| How long? Could be decades. But ultimately, yes.
| [deleted]
| macrolime wrote:
| So I guess in a couple years when someone wants to sell a
| product, they'll upload some pictures and a description of the
| product and Google will cook up thousands of personalized video
| ads based on peoples emails and photos.
| dwohnitmok wrote:
| How has progress like this affected people's timelines of when we
| will get certain AI developments?
| jl6 wrote:
| It has accelerated my expectations of getting better image and
| video synthesis algorithms, but I still see the same set of big
| unknowns between "this algorithm produces great output" and
| "this thing is an autonomous intelligence that deserves
| rights".
| ok_dad wrote:
| > "this thing is an autonomous intelligence that deserves
| rights"
|
| We'll get there only once it's been _very_ clear for a long
| time that certain AI models have whatever humans have that
| make us "human". They'll be treated as slaves until then,
| with society pushing the idea that they're just a model built
| from math, and then eventually there will be an AI civil
| rights movement.
|
| To be clear: I think AGI is decades to centuries away, but
| humans are shitty to each other, even shittier to animals,
| and I think we'll be shittier to something we "created" than
| to even animals. I think, probably, that we should deal with
| this issue of "rights" sooner rather than later, and try and
| solve it for non-AGI AI's soon so that we can eventually
| ensure we don't enslave the actual AGI AI's that will
| presumably manifest through some complexity we don't
| understand.
| SpaceManNabs wrote:
| The ethical implications of this are huge. Paper does a good
| detailing of this. Very happy to see that the researchers are
| being cautious.
|
| edit: Just because it is cool to hate on AI ethics doesn't
| diminish the importance of using AI responsibly.
| torginus wrote:
| AI Ethics is a joke. It's literally Philip Morris funding
| research into the risks of smoking and concluding the worst
| that can happen to you is burning your hand.
| alchemist1e9 wrote:
| I feel stupid what are those ethical implications? It seems
| like just a cool technology to me.
| SpaceManNabs wrote:
| Top two comments are creatives wondering about their future
| jobs. Ai ethicists have brought up concerns regarding
| intentional misuse like misinformation.
|
| The technology is super cool. Cat is out of the bag. Just
| like we couldn't really make cryptography illegal, this stuff
| shouldn't be either. But I dislike how everyone is pretending
| that AI ethicists and others are completely unfounded just
| because it is popular to hate on them nowadays. Way too many
| people supported Y. Kilcher's antics.
|
| The paper itself has more details.
| sva_ wrote:
| > Way too many people supported Y. Kilcher's antics.
|
| What antics are you referring to exactly? That he called
| out 'ai ethicists' who make arguments along the lines of
| "neural networks are bad because they cause co2 increase
| which hits marginalized/poor people"?
| alchemist1e9 wrote:
| It's impressive that the small videos are generated this
| way but the videos themselves are obviously ML generated as
| they are distorted, a lot like the other art, you can kinda
| tell it's the computer. I'm not seeing the ethical issues.
| I mean cameras disrupted lots of jobs. In general that's
| what all technology does everyday. What's different about
| this technology?
| SpaceManNabs wrote:
| If you don't see the ethical challenges, then you are
| choosing not to see them. If you are truly interested,
| the paper has a good section on it and some sources.
|
| > I mean cameras disrupted lots of jobs.
|
| Yes, this technology can be used to augment human
| creativity. It is difficulty to see how disruptive these
| tools could be, as of now. But it is pretty clear that
| they are somewhat different than previous programmer as
| an artist models.
| degif wrote:
| The difference with this technology are the unlimited
| possibilities to generate any type of video content with
| low knowledge barrier and relatively low investment
| required. The ethical issue is not about how this
| technology could disrupt the video job market, but how
| powerful content it can create literally on the fly. I
| mean, you can tell it's computer generated ... for now.
| Apox wrote:
| I feel like in a not so far future, all this will be generalized
| into "generate new from all the existing".
|
| And at some point later, "all the existing" will be corrupted by
| the integrated "new" at it will all be chaos.
|
| I'm joking, it will be fun all along. :)
| cercatrova wrote:
| It's true, how will future AI train when the training datasets
| are themselves filled with AI media?
| phito wrote:
| Feedback from whoever is consuming the content it produces.
| llagerlof wrote:
| I definetely want more episodes of LOST. I would drop the
| infamous season 6 and generate more seasons following the 5th
| season.
| visarga wrote:
| > "all the existing" will be corrupted by the integrated "new"
|
| I don't think it's gonna hurt if we apply filtering, either
| based on social signals or on quality ranking models. We can
| recycle the good stuff.
| [deleted]
| dekhn wrote:
| That's deep within the uncanny valley, and trying to climb up
| over the other side
| mmastrac wrote:
| This appears to understand and generate text much better.
|
| Hopefully just a few years to a prompt of "4k, widescreen render
| of this Star Trek: TNG episode".
| forgotusername6 wrote:
| At the rate this is going we are only a few years from
| generating a new TNG episode
| mmastrac wrote:
| I always wanted to know more about the precursors
| [deleted]
| monological wrote:
| What everyone is missing is that these AI image/video generators
| lack _taste_. These tools just regurgitate a mishmash of images
| from it's training set, without any "feeling". What you're going
| to tell me that you can train them to have feeling? It's never
| going to happen.
| Vecr wrote:
| You can put your taste into it with prompt engineering and
| cherry picking with limited effort, for Stable Diffusion you
| can look for prompts people came up with online quite easily
| and merge/change them pretty much however you want. Might have
| to disable the content filters and run it on your own hardware
| though.
| simonw wrote:
| "These tools just regurgitate a mishmash of images from it's
| training set"
|
| I don't think that's a particularly useful mental model for how
| these work.
|
| The models end up being a tiny fraction of the size of the
| training set - Stable Diffusion is just 4.3GB, it fits on a
| DVD!
|
| So it's not a case of models pasting in bits of images they've
| seen - they genuinely do have a highly compressed concept of
| what a cactus looks like, which they can use to then render a
| cactus - but the thing they render is more of an average of
| every cactus they've seen rather than representing any single
| image that they were trained on.
|
| But I agree with you on taste! This is why I'm most excited
| about what happens when a human with great taste gets to take
| control of these generative models and use them to create art
| that wouldn't be possible to create without them (or at least
| not possible to create within a short time-frame).
| HolySE wrote:
| > This bourgeoisie -- the middle class that is neither upper
| nor lower, neither so aristocratic as to take art for granted
| nor so poor it has no money to spend in its pursuit -- is now
| the group that fills museums, buys books and goes to concerts.
| But the bourgeoisie, which began to come into its own in the
| 18th century, has also left a long trail of hostility behind it
| ... Artistic disgust with the bourgeoisie has been a defining
| theme of modern Western culture. Since Moliere lambasted the
| ignorant, nouveau riche bourgeois gentleman, the bourgeoisie
| has been considered too clumsy to know true art and love
| (Goethe), a Philistine with aggressively unsubtle taste (Robert
| Schumann) and the creator of a machine-obsessed culture doomed
| to be overthrown by the proletariat (Marx and Engels).
|
| - "Class Lessons: Who's Calling Whom Tacky?; The Petite Charm
| of the Bourgeoisie, or, How Artists View the Taste of Certain
| People", Edward Rothstein, The New York Times
|
| This article also discusses a painting called "The Most Wanted"
| which was drawn based off a survey posed to ordinary people
| about what they wanted to see in a painting. "A mishmash of
| images from it's training set," if you will.
|
| Claiming that others lack taste seems to be a common refrain--
| only this time, instead of a reaction to a subset of the human
| population gnawing away at the influence of another subset of
| humans, it's to yet another generation of machines supplanting
| human skill.
| visarga wrote:
| The more developed the artistic taste, the lower one's
| opinion of other tastes.
| robitsT6 wrote:
| This isn't a very compelling argument. First of all, they
| aren't a "mish mash" in any real way, it's not like snippets of
| images exist inside of the model. Second of all, this is
| entirely subjective. Third of all, entirely inconsequential -
| if these models create 80% of the video we end up seeing, is it
| going to matter if you don't think it's a tasteful endeavour?
| mattwest wrote:
| Making a definitive statement with the word "never" is a bold
| move.
| natch wrote:
| They work at the level of convolutions, not images.
| m00x wrote:
| That's purely subjective. We can definitely model AI to give a
| certain mood. Sentiment analysis and classification is very
| advanced, it just hasn't been put in these models.
|
| If you think AI will never catch up to anything a human can do,
| you're simply wrong.
| [deleted]
| aero-glide2 wrote:
| "We have decided not to release the Imagen Video model or its
| source code until these concerns are mitigated" Okay then why
| even post it in the first place? What exactly is Google going to
| do with this model?
| throwaway743 wrote:
| Likely to show to shareholders that they're keeping up with
| trends and competitors
| etaioinshrdlu wrote:
| Indeed, it's almost just a flex? "Oh yeah, we can do better!
| No, no one can use it, ever."
| xiphias2 wrote:
| Even just giving out high quality research papers helps a lot,
| so it's still great thing that they published it.
| alphabetting wrote:
| Why post? to show methods and their capabilities. Also flex.
|
| What will they do with model? figure out how to prevent abuse
| and incorporate into future Google Assistant, Photos and AR
| offerings.
| natch wrote:
| Just fixing their basic stuff would be a better start from
| where they are right now.
| hackinthebochs wrote:
| The big tech companies are competing for AI mindshare. In 10
| years, which company's name will be synonymous with AI? That's
| being decided right now.
| [deleted]
| spoonjim wrote:
| They're going to 1) rent it out as a paid API and/or 2) let you
| use it to create ads on Google platforms like YouTube, perhaps
| customized to the individual user
| simonw wrote:
| It's a research activity.
|
| Google and Meta and Microsoft all have research teams working
| on AI.
|
| Putting out papers like this helps keep their existing
| employees happy (since they get to take credit for their work)
| and helps attract other skilled employees as well.
| andreyk wrote:
| Yep. The people who build Imagen are researchers, not
| engineers, and these announcements are accompanied by papers
| describing the results as a means of sharing ideas/results
| with the academic community. Pretty weird to me how so many
| in this thread don't seem to remember that.
| torginus wrote:
| This whole holier-than-thou moralizing strikes me as trying to
| steer the conversation away from the real issue, which came
| into spotlight with Stable Diffusion - one of
| authorship/violating the IP rights of artists, who now have
| come down in force against their would be tech overlords who
| are in the process or repackaging and reselling their work.
|
| This forced ideological posturing of 'if we give it to the
| plebes, they are going to generate something naughty with it'
| masks the somehow more cynically evil take of big tech, who are
| essentially taking the entire creative output of humanity and
| reselling it as their own, piecemeal.
|
| Additionally I think the Dalle vs. Stable Diffusion comparison
| highlights the true masters of these people (or at least the
| ones they dare not cross) - corporations with powerful IP
| lawyers. Just ask Dalle to generate a picture with Mickey Mouse
| - it won't be able to do it.
| visarga wrote:
| > repackaging and reselling their work.
|
| It's not their work unless it's identical, but in practice
| generated images are substantially different. Drawing in the
| style of is not copying, it's creative and it also depends on
| the "dialogue" with the prompter to get to the right image.
| The artist names added to the prompts act more like landmarks
| in the latent space, they are a useful shortcut to specifying
| the style.
|
| If you look at the data itself it's ridiculous - the dataset
| is 2.3 billion images and the model 4.6 GB, that means it
| keeps a 2 byte summary from each work it "copies".
| shakingmyhead wrote:
| It's not your work unless it's identical is not how
| existing copyright law works so not sure why it would be
| how these things should be treated. Not to mention that
| moving around copies of the dataset itself is itself making
| copies that ARE identical...
| nearbuy wrote:
| DALL-E image of Mickey Mouse:
| https://openart.ai/discovery/generation-
| arxwmypmw7v5zpxeik1y...
| TotoHorner wrote:
| Ask the "AI Ethicists". They have to justify their salaries in
| some way or another.
|
| Or maybe Google is using "Responsible AI" as an excuse to
| minimize competitors when they release their own Imagen Video
| as a Service API in Google Cloud.
|
| It's quite strange when the "ethical" thing to do is to not
| publicly release your research, put it behind a highly
| restrictive API and charge a high price for it ($0.02 per 1k
| tokens for Davinci for ex.)
| f1shy wrote:
| This, 100%
|
| The word "ethics" has become very flexible...
| astrange wrote:
| This doesn't really prevent competition though, the research
| paper is enough to recreate it. It does make recreation more
| expensive, but maybe that leaves you with a motivation to get
| paid for doing it.
| evouga wrote:
| > We train our models on a combination of an internal dataset
| consisting of 14 million video-text pairs
|
| The paper is sorely lacking evaluation; one thing I'd like to see
| for instance (any time a generative model is trained on such a
| vast corpus of data) is a baseline comparison to nearest-neighbor
| retrieval from the training data set.
| BoppreH wrote:
| It's interesting that these models can generate seemingly
| anything, but the prompt is taken only as a vague suggestion.
|
| From the first 15 examples shown to me, only one contained all
| elements of the prompt, and it was one of the simplest ("an
| astronaut riding a horse", versus e.g. "a glass ball falling in
| water" where it's clear it was a water droplet falling and not a
| glass ball).
|
| We're seeing leaps in random capabilities (motion! 3D!
| inpainting! voice editing!), so I wonder if complete prompt
| accuracy is 3 months or 3 years away. But I wouldn't bet on any
| longer than that.
| tornato7 wrote:
| In my experience with stable diffusion tools, there is some
| parameter that specifies how closely you would like it to
| follow the prompt, which is balanced with giving the AI more
| freedom to be creative and make the output look better.
| BoppreH wrote:
| Yes, that might be the case. Though the prompts don't seem to
| try showcasing model creativity, so I'd be surprised if
| Google picked a temperature so high that it significantly
| deviated from the prompt so often.
| renewiltord wrote:
| At some point, the "but can it do?" crowd becomes just background
| noise as each frontier falls.
| brap wrote:
| What really fascinates me here is the movement of animals.
|
| There's this one video of a cat and a dog, and the model was
| really able to capture the way that they move, their body
| language, their mood and personality even.
|
| Somehow this model, which is really just a series of zeroes and
| ones, encodes "cat" and "dog" so well that it almost feels like
| you're looking at a real, living organism.
|
| What if instead of images and videos they make the output
| interactive? So you can send prompts like "pet the cat" and
| "throw the dog a ball"? Or maybe talk to it instead?
|
| What if this tech gets so good, that eventually you could
| interact with a "person" that's indistinguishable from the real
| thing?
|
| The path to AGI is probably very different than generating
| videos. But I wonder...
| impalallama wrote:
| All this stuff makes me incredibly anxious about the future of
| art and artists. It can already very difficult to make a living
| and tons of artists are horrifically exploited by content mills
| and vfx shops and stuff like this is just going to devalue their
| work even more
| bulbosaur123 wrote:
| If everyone can be an artist, nobody can!
| m3kw9 wrote:
| Would be useful for gaming environments, where if you look very
| far away it doesn't really matter about details
| uptownfunk wrote:
| Shocked, this is just insane.
| schleck8 wrote:
| Genuinely. I feel like I am dreaming. One year ago I was super
| impressed by upscaling architectures like ESRGAN and now we can
| generate 3d models, images and even videos from text...
| user- wrote:
| This sort of AI related work seems to be accelerating at an
| insane speed recently.
|
| I remember being super impressed by AI Dungeon and now in the
| span of a few months we have got DALLE-2 , Stable Diffussion,
| Imagen, that one AI powered video editor, etc.
|
| Where do we think we will be at in 5 years??
| schleck8 wrote:
| I'd say in less than 10 years we will be able to turn novels
| into movies using deep learning at this rate.
| hazrmard wrote:
| The progress of content generation is disorienting! I remember
| studying Markov Chains and Hidden Markov Models for text
| generation. Then we had Recurrent Networks which went from LSTMs
| to Transformers now. At this point we can have a sustained pseudo
| conversation with a model, which will do trivial tasks for us
| from a text corpus.
|
| Separately for images we had convolutional networks and
| Generative Adversarial Networks. Now diffusion models are
| apparently doing what Transformers did to natural language
| processing.
|
| In my field, we use shallower feed-forward networks for control
| using low-dimensional sensor data (for speed & interpretability).
| Physical constraints (and good-enoughness of classical
| approaches) make such massive leaps in performance rarer events.
| Hard_Space wrote:
| These videos are notably short on realistic-looking people.
| optimalsolver wrote:
| Imagen is prohibited from generating representations of humans.
| nigrioid wrote:
| There is something deeply unsettling about all text generated by
| these models.
___________________________________________________________________
(page generated 2022-10-05 23:00 UTC)