[HN Gopher] How Imagen Works
___________________________________________________________________
How Imagen Works
Author : SleekEagle
Score : 92 points
Date : 2022-06-23 14:45 UTC (8 hours ago)
(HTM) web link (www.assemblyai.com)
(TXT) w3m dump (www.assemblyai.com)
| [deleted]
| varispeed wrote:
| > is trained on hundreds of millions of images and their
| associated captions
|
| So how do you get access to hundreds of millions of images and
| use them to create derivative works? Did they get consent from
| millions of authors?
|
| Or is something like that only available to the rich with access
| to lawyers on tap?
|
| I mean I can imagine if a nobody wanted to do something like
| this, they'd get bankrupted by having to deal with all the
| photographers / artists spotting a tiny sliver of their art in
| the image produced by the model.
|
| Furthermore, would something like this work with music? For
| instance, train the model on all Spotify songs and then generate
| songs based on "Get me a Bach symphony played on sticks with
| someone rapping like Dr Dre with lisp." Or do music industry have
| enough money to bully anyone into not doing that?
| SleekEagle wrote:
| Presumably Google's terms of service or fair use laws. The real
| restriction is that, even if you had the dataset, training
| costs tens of thousands of dollars. Only corporations can
| really afford to train these things.
|
| Regarding music - audio generation with Diffusion Models (the
| main component of Imagen and DALL-E 2) has been done, but not
| sure about music specifically. We will definitely reach the
| point where most e.g. pop beats will be able to be made by AI
| relatively soon.
|
| All a producer has to do is generate 100 beats and select the
| one s/he likes, potentially interpolate between 2 or finetune
| it.
| davikr wrote:
| I've seen an image generated by AI contain an "Alamy" watermark
| before.
| dubswithus wrote:
| If Google has something similar or better it definitely makes it
| look like OpenAI is wasting its time. None of this relates to
| AGI.
| SleekEagle wrote:
| I don't think anyone is saying that humanity is close to AGI,
| but check out DeepMind's Gato work for a more well-rounded
| agent:
|
| https://www.deepmind.com/publications/a-generalist-agent
| visarga wrote:
| I think we're past a certain threshold, maybe not AGI but
| some definite qualitative change is happening.
| SleekEagle wrote:
| I mean DALL-E 2 was the first time my jaw really hit the
| floor, although in fairness GPT-3 probably should've done
| that, but it's easier to do with images.
|
| And then for this to drop just a month later? Insane. It
| makes you wonder if they're actually releasing cutting
| edge, or Google decided to write this paper just because of
| the publication of DALL-E 2. Maybe they've had this model
| in the bag for a year.
| alphabetting wrote:
| Google also released this different text to image model
| yesterday
|
| https://parti.research.google/
|
| I think they've just got a lot of projects going on under
| the hood and timing was coincidence.
| SleekEagle wrote:
| Looks cool although not as good as Imagen. Autoregressive
| vs Diffusion i guess
| Workaccount2 wrote:
| I have shown imagen (and dalle2) to a number of people now (non-
| tech, just everyday friends, family, co-workers) and I have been
| pretty stunned by the response I get from most people:
|
| "Meh, that's kinda cool? I guess?" or "What am I looking
| at?"..."Ok? So a computer made it? That seems neat"
|
| To me I am still trying to get my jaw off the floor from 2 months
| ago. But the responses have been so muted and shoulder shrugging
| that I think either I am missing something or they are missing
| something. Even really drilling in, practically shaking them "DO
| YOU NOT UNDERSTAND THAT THIS IS A ORIGINAL IMAGE CONSTRUCTED
| ENTIRELY BY AN AI?!?!" and people just seem to see it as a party
| trick at best.
| Genbox wrote:
| I find that most people are primarily driven by a need. You
| need food? Pick some berries. You need warmth? Start a fire.
|
| When it comes to technology - especially advanced technology
| like Imagen - people don't see the value because they don't
| have a need associated with it.
| jazzyjackson wrote:
| I think if you've been paying attentiont to the space, this
| generation of image diffusion is shocking in how quickly it has
| improved on what we had a year ago.
|
| But if you've never considered that a computer can produce an
| original image, this is just a new thing computers can do. OTOH
| I think it's also a lack of imagination in how useful this is,
| so far the output has been kind of random, so it seems a little
| gimmicky. Already "Parti" has gotten much closer to allowing a
| user to describe exactly what they want in the image, and as
| people start to see the use cases for them personally, it will
| hit them that they no longer have to hire someone, they can
| just type a request into a box.
| SleekEagle wrote:
| I'm not sure there has been a period of more rapid
| development in DL than Diffusion Models (maybe
| transformers?). The next few years will be really
| interesting.
| joshcryer wrote:
| I've made perhaps overly absolutist statements like "don't you
| see! this kills artists jobs!" and it was shrugged off as if I
| was insane. I probably could've phrased it differently, but to
| me this is game changing in several fields. Granted, it will
| open up a new field of "generative artists" but, having played
| with these things, this is a pretty trivial job, and their
| training nets are only going to get _better_.
| danielvaughn wrote:
| To me, it paves the way for creative prototyping. I don't see
| this as a zero-sum game between artists and AI. Instead, I
| could see artists using this for some serious time saving,
| and leveraging that extra time and energy for creating better
| results.
| Uehreka wrote:
| I've had a lot of fun playing with Disco Diffusion prompts,
| but I agree that the people excited about "a generation of
| prompt artists" are a bit misguided. Soon an AI will emerge
| that can come up with "better" prompts than you, and the
| "art" of creating prompts will have a lower skill ceiling.
| [deleted]
| russdill wrote:
| The GPT algorithms are actually pretty good at making
| detailed image generation prompts if you ask it to describe
| in detail the general idea you want.
| SleekEagle wrote:
| Do you have a link to any papers about this? Would love
| to check them out
| tsol wrote:
| Like a neutral network just for making prompts that result
| in aesthetically pleasing Imagen images? And then maybe we
| can come up with a neutral net that can decide which
| pictures are good and which aren't. Then we can just have
| robots making art for the sake of consumption solely by
| robots.
| SleekEagle wrote:
| It could also be used for more nefarious reasons like
| disinformation campaigns though... it will be interesting to
| see what the next few years have in store
| dougmwne wrote:
| I think I can explain this that for most people the whole world
| is basically magic anyway. They don't understand any of the
| details about how any digital tech works so to them they have
| no framework for which things are impressive and which things
| are not. The just know that computers can do a great many
| things that they know nothing about. "Oh I can bank online?
| Ok." "Oh, I can have the computer write my book report for me?
| Ok." "Oh, this McDonalds is fully staffed by sentiment robots?
| Ok."
| endymi0n wrote:
| I think that hits home.
|
| A lot of people would just answer something to the likes of
| "Well, they made The Matrix with a computer 20 years ago",
| and technically that's just as true.
|
| From their remote viewpoint on what's happening in IT, the
| rest is an implementation detail to them.
| GrabbinD33ze69 wrote:
| A pretty common generalization I've witnessed is many non
| technical people (even people who are tech savvy but have no
| CS background do this) is people assuming the feature that is
| in reality quite difficult to implement won't take much
| effort, and vice versa.
| [deleted]
| [deleted]
| mortenjorck wrote:
| This is the other side of the classic XKCD "Tasks"
| (https://xkcd.com/1425/).
|
| A non-technical person in 2014 (when the above was originally
| published) would likely have the same conception of the
| difficulty of recognizing a bird from an image as they would
| in 2022, even though the task itself has gone from near-
| insurmountable to off-the-shelf-library in eight years.
|
| Even as Imagen and Dall-E 2 amaze us today, these feats will
| likely be commonplace in a few years. The non-technical may
| have only a vague sense that their new TikTok filter is doing
| something that was impossible only a few years prior.
| dougmwne wrote:
| Exactly and I was thinking of that XKCD. Very much case in
| point, I have the Merlin Bird ID app which can determine
| species from ridiculously blurry photos and can also
| identify hundreds of birds from their calls alone in noisy
| environments. In 2014 I would have sworn this would be
| impossible.
| SleekEagle wrote:
| I've gotten a lot of "wow, that's cool!"s, which is a pretty
| fair response for a non-technical person if you ask me!
| thruuavay wrote:
| Well, I'm still in awe that I have a bunch of walls around me
| and can cover my body with clothes, or that I'm still alive
| after all this time, and that I can even rest most of the day
| and not spend body energy running after or from animals.
| Amazing stuff.
|
| A program that transforms text to an image? Huh.
| Wistar wrote:
| I haven't gotten such dismissive responses, but probably only
| because those I'm inclined to share such things with are the
| exact kinds of people who'd be blown away by them, and
| immediately grasp the significance.
| ja3k wrote:
| I couldn't convince my mother in law it was more impressive
| than photoshop.
| trention wrote:
| It's just an illustration of the fact that the average person
| doesn't give a sh*t about AI "art" and that it will have ~zero
| cost and ~zero value.
| bergenty wrote:
| With the amount of context awareness this AI has, there's
| nothing all that special about human "art" to be honest.
| trention wrote:
| I am willing to bet that the revenue from AI-generated
| "art" will be smaller than the revenue from human-generated
| art in 5 years (or even 10 years) despite the former
| probably being at least 2 orders of magnitude higher in
| volume. This is basic supply and demand + acknowledging the
| fact that humans don't care about AI "achievements".
| bergenty wrote:
| AI achievements will be indistinguishable from human
| achievements. Humans will try to pass off AI achievements
| as their own. The line will become so blurred that it
| will be impossible to tell the difference.
| trention wrote:
| If that happens, all art will simply have no value and
| art as % of GDP will plummet.
|
| Incidentally, this hasn't happened in areas where AI
| already dominates like chess and go. Magnus Carlsen alone
| probably generates more "revenue" than all chess AIs
| combined.
| phailhaus wrote:
| Treating Imagen as just an "AI art generator" is extremely
| short sighted. Sure, you could just try to sell the outputs
| directly. But the real value is using it to supplement larger
| works. No need for a stock photo subscription service if you
| can just generate them automatically. Don't need artists to
| create textures for your simple games. I can spin up a merch
| shop powered entirely by AI art and nobody would know. The
| marginal cost of creation is approaching zero.
| SleekEagle wrote:
| And perhaps even more interestingly these things not only
| exist but there is competition in this space! Essentially
| unregulated competition as well (and likely for the next 10
| years). The cost will be driven into the ground.
| Miraste wrote:
| The apocryphal Henry Ford quote about the average person
| wanting better horses comes to mind. People off the street
| have no concept of the impact this tech and the methods
| behind it will have. Sure, no one is going to be printing
| these and hanging them in museums. Very few artists support
| themselves that way, though. The people diffusion models are
| coming for are the graphic designers, the concept artists,
| the marketers, and everyone else with a copy of Photoshop and
| a Getty subscription. GPT-3 is amazing, but it's also not
| good enough to be useful. Imagen is industry-destroying.
| trention wrote:
| Although I agree that a somehow less extreme version of
| that will happen in the course of this decade bar a legal
| decision to prohibit using those models, that won't
| translate to comparable revenues. The companies providing
| those services will struggle to make even 10% of the
| salaries of the displaced workers in revenue. In fact, this
| will probably be a GDP-destroying (though not value-
| destroying) application of technology.
| SleekEagle wrote:
| It's not about generating more revenue, it's about
| cutting costs. Any company that employs graphic designers
| etc. will be able to cut 90% of the staff.
|
| Video game companies that need concept art? How about 1
| guy/gal with Imagen to generate baselines and then
| curating/tailoring as necessary instead of a team of 5
| trention wrote:
| That has nothing to do with anything I wrote. And doesn't
| contradict it actually.
|
| Saved costs will not translate to higher margins for
| those that cut them because all competitors will be able
| to slash them as well, resulting in lower prices across
| the board.
| monkeybutton wrote:
| Perhaps it's the combination of AI being so overhyped in the
| general public plus media that's already inundated with CGI,
| that it just doesn't blow them away?
| clircle wrote:
| People dont care because all their text to image needs are well
| covered by Google Images.
| skinner_ wrote:
| > The central intuition in using T5 is that extremely large
| language models, by virtue of their sheer size alone, may still
| learn useful representations despite the fact that they are not
| explicitly trained with any text/image task in mind. [...]
| Therefore, the central question being addressed by this choice is
| whether or not a massive language model trained on a massive
| dataset independent of the task of image generation is a
| worthwhile trade-off for a non-specialized text encoder. The
| Imagen authors bet on the side of the large language model, and
| it is a bet that seems to pay off well.
|
| The way out of this dilemma is to fine-tune T5 on the caption
| dataset instead of keeping it frozen. The paper notes that they
| don't do fine-tuning, but does not provide any ablation or other
| justification. I wonder if it would help or not.
| DonHopkins wrote:
| Wait, this isn't about the line of intelligent xeroxographic
| laser printers developed by Imagen Corporation in 1981,
| supporting the Impress printer language?
|
| https://tug.org/TUGboat/tb02-2/tb03imagen.pdf
|
| https://www.openprinting.org/driver/imagen
| SleekEagle wrote:
| How do you think it prints the images!
| coding123 wrote:
| Is this by a person that knows or is guessing?
| watmough wrote:
| The important part seems to be the diffusion model.
|
| Explanation linked from same page:
| https://www.assemblyai.com/blog/diffusion-models-for-machine...
| nestorD wrote:
| The paper is very well explained and, reading this post, they
| seems to mostly make its content accessible to non domain
| expert.
| tiborsaas wrote:
| I guess he read the research paper.
| thunderbird120 wrote:
| Google published these implementation details
| natch wrote:
| > Imagen, released just last month, can generate high-quality,
| high-resolution images given only a description of a scene
|
| "Released"? What? Papers are published. Websites are published.
| Tools are "released."
|
| Where has Imagen been released?
| bpiche wrote:
| This implementation popped up on hacker news not too long ago.
| I got it working on Colab first, and then my own GPU at home.
| But just barely. Need more memory :)
|
| https://github.com/lucidrains/imagen-pytorch
| Voloskaya wrote:
| The value is in the data and the trained weights, the
| implementation is not where the bottleneck is in term of
| reproducing those models.
|
| Still great work from the author though, but we most
| definitely cannot say that imagen is released.
| stavros wrote:
| Wait, so I can try this on Colab right now?
| refulgentis wrote:
| No, something that's been causing a lotta confusion in AI
| art is people stand up quick implementations generally
| matching the general description in the paper, but, they're
| not really investing in training them. Then people see
| "imagen-pytorch" on GitHub and get confused, either think
| it's Imagen itself or a suitable replica of it.
|
| There's like 3 projects named DallE, and then the 2 real
| DallEs...frustrating.
| joshcryer wrote:
| People are really thirsty to play with this tech, you
| can't blame them. Just search for dataset creators on
| Hugging Face. I'd link directly to several of them
| running but it would just overwhelm the creators. If you
| want to be in early you'll find them. The beautiful thing
| is open source is going to make this stuff available for
| _everyone_ and in very short timeframe. It 's crazy how
| fast it moves.
| spullara wrote:
| It is a suitable replica of it. Just isn't trained.
| natch wrote:
| But the training is the thing that would make it
| suitable.
| bpiche wrote:
| I mean, you try training this thing without a warehouse
| full of GPUs... to me, the algorithm is just as
| interesting as the model. Perhaps more so.
| echelon wrote:
| Are there any large publicly available models, ready to fine
| tune and deploy, that were trained on massive data sets?
|
| I really want to build services with these.
| alexccccc wrote:
| Super interesting
___________________________________________________________________
(page generated 2022-06-23 23:01 UTC)