[HN Gopher] DALL*E: Creating Images from Text
___________________________________________________________________
DALL*E: Creating Images from Text
Author : todsacerdoti
Score : 1158 points
Date : 2021-01-05 19:08 UTC (1 days ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| savant_penguin wrote:
| I hope gpt3 dungeon is having a great update incoming
| keyle wrote:
| This is incredible. Such technology with a RPG adventure game
| would open up a new genre of exploration games!
| anthk wrote:
| There was some programming language akin to PovRay (not to
| raytrace) in order to describe the scene with commands like
| "place a solid here" and so on.
|
| I can't remember its name.
|
| EDIT:
|
| https://www.contextfreeart.org/
| scribu wrote:
| There are several projects like this, but they can only
| generate abstract shapes, i.e. they're much lower-level than a
| natural language caption.
| superkuh wrote:
| Context Free is incredibly fun but it's not machine learning or
| even AI. And it's very easy to understand how things go from
| defintions of shape, rotations, translations, etc, to the
| finished image.
| gfody wrote:
| this is incredible but I can't help but feel like we're skipping
| some important steps by working with plain text and bitmaps - "a
| collection of glasses sitting on a table" sometimes eyeglasses
| sometimes drinking glasses sometimes a weird amalgamation. and as
| long as we're OK with ambiguity in every layer are we really ever
| going to be able to meaningfully interrogate the reasons behind a
| particular output?
| reubens wrote:
| "For other captions, such as "a snail made of harp," the results
| are less good, with images that combine snails and harps in odd
| ways." [0]
|
| You try drawing a snail made of harp! Seriously! DALL-E did an
| incredible job
|
| [0]
| https://www.technologyreview.com/2021/01/05/1015754/avocado-...
| anonytrary wrote:
| I think those examples are great. Another way to judge this is
| to consider what a class of 8th graders might come up with if
| you ask them to draw "a snail made of harp". The request is
| non-sensical itself, so I imagine the results from DALL-E are
| actually pretty good.
| inakarmacoma wrote:
| Really? I found those examples to be laid the most compelling
| and interesting overall.
| JacobiX wrote:
| Results are spectacular, but as always and especially with OpenAI
| announcements one should be very cautious (lessons learned from
| GPT3). I hope that the model is not doing advanced
| patchwork/stitching of training images. I think that this kind of
| test should be included when measuring the performance of any
| generative model. Unfortunately the paper is not yet released,
| the model is not available and no online demo has been published.
| Recently a team of researchers discovered that in some cases
| advanced generative text models copy a large chunk of training
| data-set text ...
| ralfd wrote:
| > I hope that the model is not doing advanced
| patchwork/stitching of training images
|
| It would still be impressive that it knows were to include
| hands, christmas sweater or a unicycle.
| ludwigschubert wrote:
| That's a delightful result you all; and beautifully explored,
| too!
| saberience wrote:
| Maybe I'm missing something but does it say what library of
| images was used to train this model? I couldn't quite understand
| the process of building DALL-E. Did they have a large database of
| labeled images and they combined this database with GPT-3?
| worldsayshi wrote:
| There's something that really creeps me out about errors in AI
| generated images. More than uncanny valley creepiness. Like
| trypophobia creepy.
| Nition wrote:
| Same for me. It's like the feeling you get in a dream where
| things seem normal and you think you're awake, then suddenly
| you notice something _wrong_ about the room, something
| impossible.
| jamcohen wrote:
| I get the same feeling, to the point that I occasionally let
| out a brief scream when browsing GAN images.
| dfischer wrote:
| Just wait until you can't tell the difference and then
| contemplate if it matters, and then if that's what reality
| already is.
| ravi-delia wrote:
| I know exactly what you mean. Like if you had to see it in real
| life you'd see something horrible just out of shot. For some
| reason that's amplified with the furniture.
| jcims wrote:
| They are going to train on YouTube/PornHub before long and it's
| going to get weird.
| MrBuddyCasino wrote:
| Also, someone will make a version for furries. They pay well.
| dfischer wrote:
| Maybe already has...
| irrational wrote:
| Combine this with deep fakes.
|
| Donald Trump is Nancy Pelosi's and AOC's step-brother in a
| three-way in the Lincoln Bedroom.
| wj wrote:
| Can't unsee!
| jcims wrote:
| At least we're spared the smell...for now.
| SoSoRoCoCo wrote:
| I'm not sure how to feel, because I had this exact same
| thought. The evolution of porn from 320x200 EGA on a BBS, to
| usenet (alt.binaries,pictures.erotica, etc.) on XVGA (on an AIX
| Term), to the huge pool of categories on today's porn sites,
| which eventually became video and bespoke cam performers... Is
| this going to be some new weird kind of porn that Gen Alpha
| normalizes?
| dj_mc_merlin wrote:
| This is real? A computer can take "an armchair in the shape of an
| avocado" as input and make a picture of one?
|
| I can't believe it. How does it put the baby daikon radish in the
| tutu?
| numpad0 wrote:
| Because the Internet has plenty of well captioned drawings of
| daikon radish drawn as a bipedal humanoid. D'oh!
| crdrost wrote:
| If we knew how it did it, it wouldn't be machine learning.
|
| The defining feature of machine learning in other words is that
| the machine constructs a hypersurface in a very-high-
| dimensional space based on the samples that it sees, and then
| extrapolates along the surface for new queries. Whereas you can
| explain features of why the hypersurface is shaped the way it
| is, the machine learning algorithm essentially just tries to
| match its shape well, and intentionally does not try to extract
| reasons "why" that shape "has to be" the way it is. It is a
| correlator, not a causalator.
|
| If you had something bigger you'd call it "artificial
| intelligence research" or something. Machine learning is
| precisely the subset right now that is focused on "this whole
| semantic mapping thing that characterized historical AI
| research programs--figure out amazing strategy so that you need
| very little compute--did not bear fruit fast enough compared to
| the exponential increases in computing budget so let us instead
| see what we can do with tons of compute and vastly less
| strategy." It is a deliberate reorientation with some good and
| some bad parts to it. (Practical! Real results now! Who cares
| whether it "really knows" what it's doing? But also, you peer
| inside the black box and the numbers are quite inscrutable; and
| also, adversarial approaches routinely can train another
| network to find regions of the hyperplane where an obvious
| photograph of a lion is mischaracterized as a leprechaun or
| whatever.)
| dj_mc_merlin wrote:
| Machine learning is not a concept that is fundamentally
| incompatible with interpretability, and indeed research is
| being done in this area.
|
| One method for example is occlusion, removing pieces of input
| to assemble statistical representations of which parts your
| model cares about.
|
| It's all still baby steps, but with time the theory will
| catch up.
|
| > If you had something bigger you'd call it "artificial
| intelligence research"
|
| Usually called just Data Science, and does deal with that (we
| had lectures on interpratibility of models at university)
| Igelau wrote:
| Switch it over to pikachu in a helmet wielding a blue
| lightsaber. Some hilariously bad results there.
| ralfd wrote:
| "pikachu in a xxx staring at its reflection in a mirror" is
| interesting. Both hilariously bad and impressive if the
| helmet is mirrored.
| m_rpn wrote:
| Nice result, but their claims are a little misleading, as always.
| They have come full circle, having a very powerful language model
| they can attach it to other inputs(video, audio,...) creating an
| ensemble system that gives the illusion that it has "knowledge"
| about domains and can reason in context, as they ofter state in
| all their works. But we should know better that what they are
| doing is very different, eg. trowing a truckload of data into
| billion-parameter neural networks, spending millions, if not
| billions, of dollars, just to minimize a loss function in an
| adversarial environment (a GAN), in this case.
| ve55 wrote:
| I really do think AI is going to replace millions of workers very
| quickly, but just not in the order that we used to think of. We
| will replace jobs that require creativity and talent before we
| will replace most manual factor workers, as hardware is
| significantly more difficult to scale up and invent than
| software.
|
| At this point I have replaced a significant amount of creative
| workers with AI for personal usage, for example:
|
| - I use desktop backgrounds generated by VAEs (VD-VAE)
|
| - I use avatars generated by GANs (StyleGAN, BigGAN)
|
| - I use and have fun with written content generated by
| transformers (GPT3)
|
| - I listen to and enjoy music and audio generated by autoencoders
| (Jukebox, Magenta project, many others)
|
| - I don't purchase stock images or commission artists for many
| previous things I would have when a GAN exists that already makes
| the class of image I want
|
| All of this has happened in that last year or so for me, and I
| expect that within a few more years this will be the case for
| vastly more people and in a growing number of domains.
| ErikAugust wrote:
| Isn't training data effectively a form of sampling?
|
| Couldn't any creator of images that a model was trained on sue
| for copyright infringement?
|
| Or do great artists really just steal (just at a massive
| scale)?
| ve55 wrote:
| Currently that is not the case:
|
| >Models in general are generally considered "transformative
| works" and the copyright owners of whatever data the model
| was trained on have no copyright on the model. (The fact that
| the datasets or inputs are copyrighted is irrelevant, as
| training on them is universally considered fair use and
| transformative, similar to artists or search engines; see the
| further reading.) The model is copyrighted to whomever
| created it.
|
| Source (scroll up slightly past where it takes you):
| https://www.gwern.net/Faces#copyright
| ErikAugust wrote:
| Thank you, this is the part I find most relevant:
|
| "Models in general are generally considered "transformative
| works" and the copyright owners of whatever data the model
| was trained on have no copyright on the model. (The fact
| that the datasets or inputs are copyrighted is irrelevant,
| as training on them is universally considered fair use and
| transformative, similar to artists or search engines; see
| the further reading.) The model is copyrighted to whomever
| created it. Hence, Nvidia has copyright on the models it
| created but I have copyright under the models I trained
| (which I release under CC-0)."
| visarga wrote:
| I bet they can claim copyright up to the gradients
| generated on their media, but in the end the gradients get
| summed up, so their contribution is lost in the cocktail.
|
| If I write a copyrighted text on a book, then I print a
| million other texts on top of it, in both white an black,
| mixing it all up to be like white noise, would the original
| authors have a claim?
| astrange wrote:
| Models can unpredictably memorize sensitive input data,
| so there can be a real copyright issue here, I think.
|
| https://arxiv.org/abs/1802.08232
|
| Worse, sometimes the input data is illegal to distribute
| for other reasons than copyright.
| plasticchris wrote:
| But does that still hold when the model memorized a chunk
| of the training data? Or can a network plagiarize output
| while being a transformative work itself?
| yowlingcat wrote:
| > We will replace jobs that require creativity
|
| Frankly, I think the "AI will replace jobs that require X"
| angle of automation is borderline apocalyptic conspiracy porn.
| It's always phrased as if the automation simply stops at making
| certain jobs redundant. It's never phrased as if the automation
| lowers the bar to entry from X to Y for /everyone/, which
| floods the market with crap and makes people crave the good
| stuff made by the top 20%. Why isn't it considered as likely
| that this kind of technology will simply make the best 20% of
| creators exponentially more creatively prolific in quantity and
| quality?
| ve55 wrote:
| > Why isn't it considered as likely that this kind of
| technology will simply make the best 20% of creators
| exponentially more creatively prolific in quantity and
| quality?
|
| I think that's well within the space of reasonable
| conclusions. For as much as we are getting good at generating
| content/art, we are also therefore getting good at assisting
| humans at generating it, so it's possible that pathway ends
| up becoming much more common.
| rich_sasha wrote:
| Not to undermine this development, but so far, no surprise, AI
| depends on vast quantities of human-generated data. This leads
| us to a loop: if AI replaces human creativity, who will create
| novel content for new generation of AI? Will AI also learn to
| break through conventions, to shock and rewrite the rules of
| the game?
|
| It's like efficient market hypothesis: markets are efficient
| because arbitrage, which is highly profitable, makes them so.
| But if they are efficient, how can arbitrageurs afford to stay
| in business? In practice, we are stuck in a half-way house,
| where markets are very, but not perfectly, efficient.
|
| I guess in practice, the pie for humans will keep on shrinking,
| but won't disappear too soon. Same as horse maintenance
| industry, farming and manufacturing, domestic work etc. Humans
| are still needed there, just a lot less of them.
| p1esk wrote:
| _if AI replaces human creativity, who will create novel
| content for new generation of AI?_
|
| Vast majority of human generated content is not very novel or
| creative. I'm guessing less than 1% of professional human
| writers or composers create something original. Those people
| are not in any danger to be replaced by AI, and will probably
| be earning more money as a result of more value being placed
| on originality of content. Humans will strive (or be forced)
| to be more creative, because all non-original content
| creation will be automated. It's a win-win situation.
| ve55 wrote:
| > Will AI also learn to break through conventions, to shock
| and rewrite the rules of the game?
|
| I think AlphaGo was a great in-domain example of this. I
| definitely see things I'd refer to colloquially as
| 'creativity' in this DALL-E post, but you can decide for
| yourself, but that still isn't claiming it matches what some
| humans can do.
| rich_sasha wrote:
| True, but AlphaGo exists in a world where everything is
| absolute. There are new ways of playing Go, but the same
| rules.
|
| If I train an AI on classical paintings, can it ever invent
| Impressionism, Cubism, Surrealism? Can it do irony? Can it
| come up with something altogether new? Can it do meta?
| "AlphaPaint, a recursive self-portrait"?
|
| Maybe. I'm just not sure we have seen anything in this
| dimension yet.
| ve55 wrote:
| >If I train an AI on classical paintings, can it ever
| invent Impressionism, Cubism, Surrealism?
|
| I see your point, but it's an unfair comparison: if you
| put a human in a room and never showed them anything
| except classical paintings, it's unlikely they would
| quickly invent cubism either. The humans that invented
| new art styles had seen so many things throughout their
| life that they had a lot of data to go off of.
| Regardless, I think we can do enough neural style
| transfer already to invent new styles of art though.
| xpl wrote:
| > how can arbitrageurs afford to stay in business
|
| Most arbitrageurs cannot stay in the business, it's the law
| of diminishing returns. Economies of scale eventually prevent
| small individual players to profit from the market, only a
| few big-ass hedge funds can stay, because due to their
| investments they can get preference from exchanges
| (significantly lower / zero / negative fees, co-located
| hardware, etc.) which makes the operation reasonable to them.
| With enough money you can even build your own physical cables
| between exchanges to outperform the competitors in latency
| games. I'm a former arbitrageur, by the way :)
|
| Same with AI-generated content. You would have to be
| absolutely brilliant to compete with AI. Only a few select
| individuals would be "allowed" to enter the market. Not even
| sure that it has something to do with the quality of the
| content, maybe it's more about prestige.
|
| You see, there already are gazillions of decent human
| artists, but only a few of them are really popular. So the
| top-tier artists would probably remain human, because we need
| someone real to worship to. Their producers would surely use
| AI as a production tool, depicting it as a human work. But
| all the low-tier artists would be totally pushed out of the
| market. There will be simply no job for a session musician or
| a freelance designer.
| ryan93 wrote:
| Those don't seem in any way similar to like writing a tv show
| or animating a Pixar movie.
| ve55 wrote:
| I agree, and due to the amount of compute that is required
| for those types of works I think those are still quite awhile
| away.
|
| But the profession for creative individuals consists of much
| more than highly-paid well-credentialed individuals working
| at well-known US corporations. There are millions of artists
| that just do quick illustrations, logos, sketches, and so on,
| on a variety of services, and they will be replaced far
| before Pixar is.
| [deleted]
| ignoranceprior wrote:
| Do you think investing in MSFT/GOOGL is the best way to profit
| off this revolution?
| ve55 wrote:
| It's too hard to say I think. Big players will definitely
| benefit a lot, so it probably isn't a _bad_ idea, but if you
| could find the right startups or funds, you might be able to
| get significantly more of a return.
| karmasimida wrote:
| I think this is actually not a bad thing.
|
| I won't say many of those things are creativity driven. There
| are more like auto assets generation.
|
| One use case of such model would be in gaming industry, to
| generate large amount of assets quickly. This process along
| takes years, and more and more expensive as gamers are
| demanding higher and higher resolution.
|
| AI can make this process much more tenable, bring down the
| overall cost.
| sushisource wrote:
| > - I use and have fun with written content generated by
| transformers (GPT3)
|
| > - I listen to and enjoy music and audio generated by
| autoencoders (Jukebox, Magenta project, many others)
|
| Really, you've "replaced" normal music and books with these?
| Somehow I doubt that.
| notJim wrote:
| What are you talking about, this is my favorite album:
| https://www.youtube.com/watch?v=K0t6ecmMbjQ
| ve55 wrote:
| Not entirely, no, I don't hope I implied that. I listen to
| human-created music every day. I just mean to say that I've
| also listened to AI-created music that I've enjoyed, so it's
| gone from being 0% of what I listen to to 5%, and presumably
| may increase much more later.
| p1esk wrote:
| You should try Aiva (http://aiva.ai). At some point I was
| mostly listening to compositions I generated through that
| platform. Now I'm back to Spotify, but AI music is
| definitely on my radar.
| ve55 wrote:
| Looks great, thanks for the suggestion
| [deleted]
| Impossible wrote:
| I believe that AI will accelerate creativity. This will have a
| side effect of devaluing some people's work (like you
| mentioned), but it will also increase the value of some types
| of art and, more importantly, make it possible to do things
| that were impossible before, or allow for small teams and
| individuals to produce content that were prohibitively
| expensive.
| [deleted]
| minimaxir wrote:
| There still needs to be some sort of human curation, lest
| bad/rogue output risks sinking the entire AI-generated
| industry. (in the case of DALL-E, OpenAI's new CLIP system is
| intended to mitigate the need for cherry-picking, although from
| the final demo it's still qualitative)
|
| The demo inputs here for DALL-E are curated and utilize a few
| GPT-3 prompt engineering tricks. I suspect that for typical
| unoptimized human requests, DALL-E will go off the rails.
| andybak wrote:
| Personally speaking I don't want curation. What is
| fascinating about generative AI is the failure modes.
|
| I want the stuff that no human being could have made - not
| the things that could pass for genuine works by real people.
| minimaxir wrote:
| Failure modes are fun when they get 80-90% of the way there
| and hit the uncanny valley.
|
| Unfortunately many generations fail to hit that.
| ve55 wrote:
| Yes, but there's no reason we can't partially solve this by
| throwing more data at the models, since we have vast amounts
| of data we can use for that (ratings, reviews, comments,
| etc), and we can always generate more en masse whenever we
| need it.
| minimaxir wrote:
| This isn't a problem that can be solved with more data.
| It's a function of model architecture, and as OpenAI has
| demonstrated, larger models generally perform better even
| if normal people can't run them on consumer hardware.
|
| But there is still a _lot_ of room for more clever
| architectures to get around that limitation. (e.g.
| Shortformer)
| ve55 wrote:
| I think it's both - we have a lot of architectural
| improvements that we can try now and in the future, but I
| don't see why you can't take the output of generative art
| models, have humans rate them, and then use those ratings
| to improve the model such that its future art is likely
| to get a higher rating.
| A4ET8a8uTh0 wrote:
| You are probably right. Still, there is hope that this just a
| prelude to getting closer to a Transmetropolitan box ( assuming
| we can ever figure out how to make AI box that can make
| physical items based purely on information given by the user ).
| commonturtle wrote:
| This is simultaneously amazing and depressing, like watching
| someone set off a hydrogen bomb for the first time and marveling
| at the mushroom cloud it creates.
|
| I really find it hard to understand why people are optimistic
| about the impact AI will have on our future.
|
| The pace of improvement in AI has been really fast over the last
| two decades, and I don't feel like it's a good thing. Compare the
| best text generator models from 10 years ago with GPT-3. Now do
| the same for image generators. Now project these improvements 20
| years into the future. The amount of investment this work is
| getting grows with every such breakthrough. It seems likely to me
| we will figure out general-purpose human-level AI in a few
| decades.
|
| And what then? There are so many ways this could turn into a
| dystopian future.
|
| Imagine for example huge mostly-ML operated drone armies, tens of
| millions strong, that only need a small number of humans to
| supervise them. Terrified yet? What happens to democracy when
| power doesn't need to flow through a large number of people? When
| a dozen people and a few million armed drones can oppress a
| hundred million people?
|
| If there's even a 5% chance of such an outcome (personally I
| think it's higher), then we should be taking it seriously.
| desideratum wrote:
| Nick Bostrom's "Superintelligence" is a sober perspective on
| this issue and a very worthwhile read.
| commonturtle wrote:
| Yup that's a good recommendation. I've read it and some of
| the AI Safety work that a small portion of the AI community
| is working on. At the moment there seems no reason to believe
| that we can solve this.
| stevofolife wrote:
| Regulations. That's what the government is for. You think any
| country is going to let someone operate millions of drones at
| their will? Yeah ok.
| ajnin wrote:
| I think the points you make are very important. Not only the
| "Terminator" scenario but also the "hyper-capitalism" scenario.
| But the solution is not to stop working on such research, it is
| political.
| kertoip_1 wrote:
| You are assuming that AI will magically appear in one hands
| only. We can prevent that, as developers we can make AI
| research open and provide AI tools to masses in order to keep
| "balance". If everyone had the same power, then it wouldn't be
| such big advantage anymore.
| dash2 wrote:
| That's not obvious. What if _everyone_ has the tools to
| create their own army of nuclear-tipped killer drones?
| m12k wrote:
| The scary thing about automation isn't the technology itself.
| It's that it breaks the tenuous balance of power between those
| who own and those who work - if the former can just own robots
| instead of hiring the latter, what will become of the latter?
| The truth is, what's scary about that imbalance of power is
| already true, it's just that until now, technological
| limitations made that imbalance incomplete - workers still had
| some bargaining power. That is about to go away, and what will
| be left is the realization that the solution to this isn't
| ludditism, the solution is political. As it always was.
| CuriouslyC wrote:
| That's not exactly true. A lot (low level) human labor will
| be made irrelevant, but AI tools will allow people to easily
| work productively at a higher level. Musicians will be able
| to hum out templates of music, then iteratively refine the
| result using natural language and gestures. Writers will be
| able to describe a plot, and iteratively refine the prose and
| writing style. Movie producers will be able to describe
| scenes then iteratively refine the angles, lighting, acting,
| cuts, etc. It will be a golden age for creativity, where
| there's an abundance of any sort of art or entertainment
| you'd like to consume, and the only problem is locating it in
| the sea of abundance.
|
| The only issue I see here is that government will need to
| take a hand in mitigating capitalistic wealth inequality, and
| access to creative tools will need to be subsidized for low
| income individuals (assuming we can't bring the compute cost
| down a few orders of magnitude).
| Simon321 wrote:
| How typically cynical of human beings, a wondrous technology
| comes a long that can free mankind of tedious work and
| massively improve our lives, maybe even eliminate scarcity
| eventually and all people can think about is how it could be
| bad for us.
| CuriouslyC wrote:
| Armies of high powered smart drones aren't going to be a thing
| until we figure out security, and I'm not sure that's ever
| going to happen. Having people in the loop is affordable and
| much more expensive/time consuming to subvert.
| [deleted]
| lbrito wrote:
| >It seems likely to me we will figure out general-purpose
| human-level AI in a few decades.
|
| "The singularity is _always near_". We've been here before
| (1950s-1970s); people hoping/fearing that general AI was just
| around the corner.
|
| I might be severely outdated on this, but the way I see it AI
| is just rehashing already existent knowledge/information in
| (very and increasingly) smart ways. There is absolutely no
| spark of creativity coming from the AI itself. Any "new"
| information generated by AI is really just refined noise.
|
| Don't get me wrong, I'm not trying to take a leak on the field.
| Like everyone else I'm impressed by all the recent
| breakthroughs, and of course something like GPT is infinitely
| more advanced than a simple `rand` function. But the ontology
| remains unchanged; we're just doing an extremely opinionated,
| advanced and clever `rand` function.
| 3pt14159 wrote:
| No we're not.
|
| About a decade ago I trained a model on Wikipedia which was
| tuned to classify documents into what branch of knowledge the
| document could be part of. Then I fed in one of my own blog
| posts. The second highest ranking concept that came back to
| me was "mereology" a term I had never even heard of and one
| that was quite apt for the topic I was discussing in the blog
| post.
|
| My own software, running on the contents of millions of
| authors' work, ingesting my own blog post, taught _me_ the
| orchestrator of the process about his own work. This feedback
| loop is accelerating and just because it takes decades for
| the irrefutable to come, it doesn 't mean that it never will.
| People in the early 40s said atomic weapons would never
| happen because it would be too difficult. For some people
| nothing short of seeing is believing, but those with
| predictive minds know that this truly is just around the
| corner.
| captainmuon wrote:
| Maybe I'm cynical, but I'm really skeptical. What if this is just
| some poor image generation code and a few hundred minimum wage
| workers manually assembling the examples? Unless I can feed it
| arbitrary sentences we can never know.
|
| I would be disappointed, but not surprized if OpenAI turns out to
| be the Theranos of AI...
| apatap wrote:
| At least GPT-3 can generate texts much faster than a worker
| would need to create them manually.
| captainmuon wrote:
| Right, but I bet the images shown here were preselected.
| sanxiyn wrote:
| They swear they aren't preselected. I am willing to bet
| you, but I am not sure how to settle the bet.
| nojvek wrote:
| Wow. This is amazing. Although I wish they documented how much
| compute and data was used to get these results.
|
| I absolutely believe we'll crack the fundamental principles of
| intelligence in our lifetimes. We now have capability to process
| all public data available on internet (of wikipedia is a huge
| chunk). We have so many cameras and microphones (one in each
| pocket).
|
| It's also scary to think if it goes wrong (the great filter for
| fermi paradox). However I'm optimistic.
|
| The brain only uses 20 watts of power to do all its magic. The
| entire human body is built from 700MB of data in the DNA. The
| fundamental principles of intelligence is within reach if we look
| from that perspective.
|
| Right now GPT3 and DALL-E seem to be using an insane amount of
| computation to achieve what they are doing. My prediction is that
| in 2050, we'll have pretty good intelligence in our phones that
| has deep understanding (language and visual) of the world around
| us.
| SoSoRoCoCo wrote:
| > Although I wish they documented how much compute and data was
| used to get these results.
|
| I'm hearing astonishing numbers, in the tens of megawatts range
| for training these billion-parameter models.
|
| And I wish they showed us all the rejected images. If those
| images (like the snail harp) were the FIRST pass of the release
| candidate model.... wow... but how much curating did they do?
|
| EDIT: Units. Derp.
| ajnin wrote:
| > in the tens of megawatts
|
| But for how long? 1 second, 1 hour, 1 month? The energy
| matters more than the power.
| nullc wrote:
| > tens of mega-joule range
|
| do you mean tera-joules?
|
| A hundred megajoules is about three bucks at 10 cents per
| kwh.
|
| I routinely do giga-joule level computations using just a
| rack of computers in my garage, they're no big deal.
| eutectic wrote:
| metawatts?
| nullc wrote:
| Joules is a unit of total computation, watts is a unit of
| computation rate. :P
|
| Metawatt is a unit for rates of speculation, uninformed
| by multiplication, about AI energy usage.
| eutectic wrote:
| Oops, *mega, obviously! But I did mean watts.
| StavrosK wrote:
| MWh is what you'd want.
| eutectic wrote:
| I was thinking power consumption. Probably a mis-
| correction anyway.
| [deleted]
| jackric wrote:
| > The entire human body is built from 700MB of data in the DNA.
|
| I think this notion is misleading. It doesn't relate to the
| ease of simulation it on our current computers. You'll need a
| quantum computer to emulate this ROM in anything like realtime.
|
| The DNA program was optimised to execute in an environment
| offering quantum tunnelling, multi component chemical reactions
| etc.
| nojvek wrote:
| My point wasn't to emulate the DNA in a virtual machine. The
| point was that between humans and chimps (our closest DNA
| relatives) we're 99% same to them. So high order intelligence
| that gives rise to written language and tool building is
| somewhere in that 700MB of DNA code. And 1% of that (just
| 7MB) is responsible for creating the smartest intelligence we
| know (humans).
|
| In that sense intelligence artchitecture isn't very
| complicated. The uniformity of isocortex which we have the
| most relative to our brain size compared to any animal says
| we ought to replicate its behavior in a machine.
|
| The isocortex/neocortex is where the gold is. It's very
| uniform when seen under microscopes. Brain cells from one
| region can be put in another region and they work just fine.
| All of ^ says intelligence is some recursive architecture of
| information processing. That's why I'm optimistic we'll crack
| it.
| ZeikJT wrote:
| I think what the parent was saying is important though,
| that 700mb of "data" isn't complete. It's basically just
| really really good compression that requires the runtime of
| our universe to work properly. The way proteins form and
| interact, the way physics works, etc are all requirements
| for that DNA to be able to realize itself as complex
| thinking human beings.
| thom wrote:
| Do you think we'll crack the principles, or do you think we'll
| just have very powerful models without really knowing what
| makes them clever?
| nojvek wrote:
| My bet is understanding the fundamental principles. Like
| building an airbus plane or starship requires fundamental
| understanding of aerodynamic principles, chemistry, materials
| and physics.
|
| DNNs will definitely not get us there in their current form.
| WitCanStain wrote:
| I am very curious to see if concepts from cognitive science
| and theoretical linguistics (like the Language of Thought
| paradigm as a framework of cognition or the Merge function
| as a fundamental cognitive operation) will be applied to
| machine learning. They seem to be some of the best
| candidates for the fundamental principles of cognition.
| pontus wrote:
| Maybe a nitpick, but there's a difference between energy
| consumption during training and inference. If you want to talk
| about the energy necessary to train a human brain, it involves
| years of that 20W power consumption. What is the power
| consumption for inference time for Dall-E?
| littlestymaar wrote:
| I did some kick napkin math for Lee Sedol vs AlphaGo a few
| weeks ago: https://news.ycombinator.com/item?id=25493358
| nojvek wrote:
| Wow. Good to know that entire lifetime energy consumption
| is 50 MWh. At $0.1/KWh, you're looking at $5000 in
| equivalent electric energy consumption over entire lifetime
| of a human being.
|
| The brain uses 20W of power. For a life time of ~80 years,
| that is 14MWh of energy usage. Suppose we say the brain
| trains for the first 25 years then that is 4.38 MWh.
| Equivalent electric energy consumption is only at $438.
|
| So yeah, the brain is quite efficient both in hardware and
| software.
| pontus wrote:
| That's still really surprising I think. I would have
| imagined that it would have been off by many orders of
| magnitude. The fact that these models are within a factor
| of 3-10 of the human consumption is pretty impressive.
|
| That being said, these models are training only for very
| specific tasks whereas obviously the human brain is far
| more sophisticated in terms of its capabilities.
| littlestymaar wrote:
| Bear in mind that only a tiny fraction of the energy
| spent by Sedol's brain during his whole life was
| dedicated to learning Go. Even while playing Go, a human
| neural system spends a big part of its energy doing
| _mundane_ stuff like moving your hand, standing up,
| deciphering the inputs coming from the eyes and every
| other sensitive body parts and subconsciously processing
| it (my back hurts I need to change posture, my opponent
| smells well, the light in the room is too bright, etc.).
| Interestingly enough, doing most of these things is also
| a big challenge for IA today.
| burrows wrote:
| Not to mention the billions of years spent in the
| evolutionary pipeline.
| littlestymaar wrote:
| Well, if you want to go that route, you'd need to count all
| the energy spent to build computers of all kind since the
| 50s, and also all the energy spent to sustain the lives of
| people working on AI. And, well, all the millions of years
| spent in the evolutionary pipeline before these people
| where born ;)
| martamorena9 wrote:
| > I absolutely believe we'll crack the fundamental principles
| of intelligence in our lifetimes.
|
| I tend to agree. However this looks a lot like the beginning of
| the end for the human race as well. Perhaps we are really just
| a statistical approximation device.
| nojvek wrote:
| Yeah why i mentioned the great filter of Fermi paradox.
|
| I also believe humans in our current species form won't
| become a space bearing species. We're pretty awful as space
| travelers.
|
| It is very likely that we'll have robots with human like
| intelligence, sensor and motor capabilities sent as probes to
| other planets to explore and carry on the human story.
|
| But future is hard to predict. I do know that if the
| intelligence algorithm is only in the hands of Google and
| Facebook, we are doomed. This is a thing that ought to be
| open source and equally beneficial to everyone.
| dfischer wrote:
| What if reality is basically a slightly more advanced form of
| This? Thoughts are generative finite infinite potentials and
| reality is the render. Interesting.
| drdeca wrote:
| I don't know what you mean by "finite infinite potentials". It
| seems a little word-salad-y?
| dfischer wrote:
| Seriously downvotes? Lol. Terrible configuration and what a way
| to silence. At this point I'll take my thoughts elsewhere.
| Enjoy your echo chamber.
| sircastor wrote:
| In various Episodes of Star Trek The Next Geneneration, the crew
| asks the computer to generate some environment or object with
| relatively little description. It's a story telling tool of
| course, but looking at this, I can begin to imagine how we might
| get there from here.
| dfischer wrote:
| Almost as if thoughts and reality are of the same thing.
| inferense wrote:
| In spite of the close architectural resemblance with the VQVAE2,
| it definitely pushes the text-to-image synthesis domain forward.
| I'd be curious to see how well it can perform on a multi-object
| image setting which currently presents the main challenge in the
| field. Also, I wouldn't be surprised if these results were
| limited to openAI scale of computing resources. All in all, great
| progress in the field. The phase of development here is simply
| staggering, considering the fact that few years back we could
| hardly generate any image in high fidelity.
| minimaxir wrote:
| The way this model operates is the equivalent of machine learning
| shitposting.
|
| Broke: Use a text encoder to feed text data to an image
| generator, like a GAN.
|
| Woke: Use a text and image encoder _as the same input_ to decode
| text and images _as the same output_
|
| And yet, due to the magic of Transformers, it works.
|
| From the technical description, this seems feasible to clone
| given a sufficiently robust dataset of images, although the scope
| of the demo output implies a much more robust dataset than the
| ones Microsoft has offered publicly.
| dfischer wrote:
| Shows you where the role of a meme and a shit poster may exist
| in a cosmological technological hierarchy. Humans are just
| rendering notes replicating memes, man. /s in the dude voice
| from big Lebowski.
| thunderbird120 wrote:
| It's not really surprising given what we now know about
| autoregressive modeling with transformers. It's essentially a
| game of predict hidden information given visible information.
| As long as the relationship between the visible and hidden
| information is non-random you can train the model to understand
| an amazing amount about the world by literally just predicting
| the next token in a sequence given all the previous ones.
|
| I'm curious if they do a backward pass here, would probably
| have value. They seem to describe sticking the text tokens
| first meaning that once you start generating image tokens all
| the text tokens are visible. That would have the model learning
| to generate an image with respect to a prompt but you could
| also literally just reverse the order of the sequence to have
| the model also learn to generate prompts with respect to the
| image. It's not clear if this is happening.
| lukeplato wrote:
| Is this kind of happening with the CLIP classifier [1] to
| rank the generated images?
|
| > Similar to the rejection sampling used in VQVAE-2, we use
| CLIP to rerank the top 32 of 512 samples for each caption in
| all of the interactive visuals. This procedure can also be
| seen as a kind of language-guided search16, and can have a
| dramatic impact on sample quality.
|
| > CLIP pre-trains an image encoder and a text encoder to
| predict which images were paired with which texts in our
| dataset. We then use this behavior to turn CLIP into a zero-
| shot classifier. We convert all of a dataset's classes into
| captions such as "a photo of a dog" and predict the class of
| the caption CLIP estimates best pairs with a given image.
|
| [1] https://openai.com/blog/clip/
| minimaxir wrote:
| That approach wouldn't work out of the box; it sees text for
| the first 256 tokens and images for the following 1024
| tokens, and tries to predict the same. It likely would not
| have much to go on if you gave it the 1024 tokens for the
| image and then 256 for the text later since it doesn't have
| much of a basis.
|
| A network optimizing for both use cases (e.g. the training
| set is half 256 + 1024, half 1024 + 256) would _likely_ be
| worse than a model optimizing for one of the use cases, but
| then again models like T5 argue against it.
| thomasahle wrote:
| It's actually a bit more complicated. Since DALL-E uses CLIP
| for training, and CLIP is itself trained using separate text
| and image encoders: https://openai.com/blog/clip/
|
| At some point we'll have so many models based on so many other
| models it will longer longer be possible to tell which
| techniques are really involved.
| [deleted]
| nabla9 wrote:
| Shitposts are more creative. What I would like to see is more
| extrapolation and complex mixing:
|
| "A photo of a iPhone from the stone age."
|
| "Adolf Hitler pissing against the wind and enjoying it."
|
| "Painting: Captain Jean-Luc Picard crossing of the Delaware
| River in a Porsche 911".
| ricardobeat wrote:
| You can get "a computer from the 1900s" in the examples in
| the post.
| numpad0 wrote:
| Repeatable, measurable, automated image meme shitposting is
| absolutely a destructive device though.
| throwaway2245 wrote:
| You're describing a Jim'll Paint It AI bot
| https://jimllpaintit.tumblr.com/
| Tycho wrote:
| Recently heard a resident machine learning expert describe GPT-3
| as 'not revolutionary in the slightest' or something like that.
| minimaxir wrote:
| It's not revolutionary, just a typical-but-notable iterative
| step in NLP. Which is fine!
|
| I wrote a blog post on that a few months ago after playing a
| bit with GPT-3, and it holds up.
| https://news.ycombinator.com/item?id=23891226
| jokethrowaway wrote:
| It's not, but it showed that we can get a magnitude better
| results by adding a magnitude more data.
|
| To be honest, it's not where I'd like to see efforts in the
| field go.
|
| Not because I'm afraid of AI taking over, but because I'd
| rather have humans recreate something comparable to a human
| brain (functionality wise).
| visarga wrote:
| Who knows, maybe in a few years you will be amazed at the new
| universal transformer chip that runs on 20W of power and can
| do almost any task. No need for retraining, just speak to it,
| show it what you need. Even prompt engineering has been
| automated (https://arxiv.org/abs/2101.00121) so no more
| mystery. So much for the new hot job of GPT prompt engineer
| that would replace software dev.
| asbund wrote:
| I was skeptical before, but now i open to this idea
| jokethrowaway wrote:
| we've been making these since the beginning of time, we
| call them humans
| commonturtle wrote:
| Human's want health insurance and 40 hours work-weeks.
| The super-smart AGI that will exist 20 years from now
| won't.
| visarga wrote:
| It's revolutionary in costs, and delivers for every dollar
| spent.
| FL33TW00D wrote:
| I think that they're correct saying that GPT-3 isn't
| revolutionary, since it just demonstrates the power of scaling.
| However I would argue that the underlying architecture, the
| Transformer (GP(T)), is/was/will be revolutionary.
| imhoguy wrote:
| I just got creepy thought what genetic engineering "GEN-E" could
| bring in a couple of decades :(
|
| IN: "give me living giraffe turtle"
|
| OUT: a few weeks later himera crawls out of the AI lab box
| wwarner wrote:
| Similar to Wordseye https://www.wordseye.com/
| thepace wrote:
| Wordeye seems to be about scene generation out of pre-existing
| building blocks where as DALL-E is about creating those
| building blocks themselves.
| deeplstm wrote:
| This is super impressive!! Those generated images are quite
| accurate and realistic. Here are some of my thoughts and
| explanation about how they do use discrete vocabulary to describe
| an image. https://youtu.be/UfAE-1vdj_E
| visarga wrote:
| I'm wondering why the image comes out non-blocky because
| transformers would take slices of the image as input. They say
| they have about 1024 tokens for the image and that would mean
| 32x32 patches. How is it possible that these patches align along
| the edges so well and not have JPEG like artifacts?
| minimaxir wrote:
| If you read footnote #2, the source images are 256x256 but
| downsampled using a VAE, and presumably upsampled using a VAE
| for publishing (IIRC they are less prone to the infamous GAN
| artifacts).
| hnthrowopen wrote:
| Is there a link to the git repo or is OpenAI not really open?
| jokethrowaway wrote:
| the only open thing is the name
| wccrawford wrote:
| I suspect you meant for Dall-E specifically, but this is their
| repo. Found on their about page.
|
| https://github.com/openai/
| CyberRabbi wrote:
| Seems like we're getting closer to AI driven software
| engineering.
|
| Prompt: a Windows GUI executable that implements a scientific
| calculator.
| jle17 wrote:
| There are some attempts to get AI Dungeon (GPT-2/3 based game)
| to generate code. This scenario for example (you need to create
| an account to launch it): https://play.aidungeon.io/main/scenar
| ioView?publicId=af4a05f....
| hooande wrote:
| What you'll get is the same thing as GPT-3: the equivalent of
| googling the prompt. You can google "implement a scientific
| calculator" and get multiple tutorials right now.
|
| You'll still need humans to make anything novel or interesting,
| and companies will still need to hire engineers to work on
| valuable problems that are unique to their business.
|
| All of these transformers are essentially trained on "what's
| visible to google", which also defines the upper bound of their
| utility
| commonturtle wrote:
| > You'll still need humans to make anything novel or
| interesting, and companies will still need to hire engineers
| to work on valuable problems that are unique to their
| business.
|
| Give it 10 years :) GPT-10 will probably be able to replace a
| sizeable proportion of today's programmers. What will GPT-20
| be able to do?
| [deleted]
| adamredwoods wrote:
| Possibly, but in software the realm of errors is wider and more
| detrimental. Imagery, the human mind will fill in the gaps and
| allow interpretation. Software, not so much.
| visarga wrote:
| True, but the human mind needs an expensive, singleton body
| in the real world, while a code writing GPT-3 only needs a
| compiler and a CPU to run its creations. Of course they would
| put a million cores to work at 'learning to code' so it would
| go much faster. Compare that with robotics, where it's so
| expensive to run your agents. I think this direction really
| has a shot.
| social_quotient wrote:
| Would someone like GitHub be the right person to solve
| this? They have ALL of the code.
| ravi-delia wrote:
| To be fair, anyone else has almost all the code, since
| it's all public.
| ricardobeat wrote:
| Someone did this simply by giving GPT-3 some code samples last
| year:
| https://twitter.com/sharifshameem/status/1282676454690451457...
|
| Strongly recommend watching the whole video!
| dustinls wrote:
| Prompt: One self-improving self-optimizing misanthropic quine
| please
| Triv888 wrote:
| Without seeing the source, they could be mainly using Google
| image search for that (or Yandex which is much better for image
| search and reverse image search).
| flanbiscuit wrote:
| Just yesterday I was joking with my coworker that I would like a
| tool where I could create meme images from a text description and
| today I open HN and here we are. This looks amazing!
| an4rchy wrote:
| Very impressive results.
|
| This seems like it could be a great replacement for
| searching/creating your own stock photo/images.
|
| Hopefully all output is copyright friendly.
| unnouinceput wrote:
| Does it allow refining the result on iterations? Meaning after
| getting version one, apply more words to refine the image to a
| closer description? Because if it does then this can become a
| very good tool in getting a reliable picture from a witness when
| asking to describe the suspect. Combine this with existing
| China's massive surveillance face recognition and you can locate
| the suspect (or the political dissident) as fast as you get your
| witness in front of a laptop running this software.
|
| It's a tool, and like any other existing tool it will be used for
| both bad and good.
| brian_herman wrote:
| Now we just have to wait for huggingface to create an open source
| implementation. So much for openness I guess if you go on
| Microsoft azure you can use closed ai.
| TravisLS wrote:
| If I put text into this tool and generate an original and unique
| image, who owns that image? If it's OpenAI, do they license it?
| durpkingOP wrote:
| i predict one day you can create animation/videos in the future
| with a variation of this, then you could define
| characters/scripts/etc and then insert a story and it generates a
| movie.
| PIKAL wrote:
| When will people wake up and realize that ML and AI are out of
| control?
| natch wrote:
| I tried a working demo of a system like this in Kristian
| Hammond's lab at Northwestern University 20 years ago. Actually
| his system was generating MTV style videos from web images and
| music with just some keywords as input. He had some other amazing
| (at the time) stuff. The GPT 3 part of this gives it a lot more
| potential of course, so I don't want to take away from that. Just
| saying though, since they didn't reference Hammond's work, that
| this party has been going on for a while.
|
| https://www.mccormick.northwestern.edu/research-faculty/dire...
| ramsrigouthamg wrote:
| OpenAI did with DALL.E what I envisioned to do with AiArtist :)
| (https://www.aiartist.io/)
|
| An AI to provide illustrations to your written content.
|
| https://www.linkedin.com/in/ramsrig/
| https://twitter.com/ramsri_goutham
| fumblebee wrote:
| Does anyone have any insight on how much it would cost for OpenAI
| to host an online, interactive demo of a model like this? I'd
| expect a lot - even just for inference - based on the size of the
| model and the expected virality of the demo, but have no
| reference points for quantifying.
| littlestymaar wrote:
| You can edit the prompt to play a bit with it. (the results are
| far less good that what's featured in the blog post though ...)
| nl wrote:
| The "collection of glasses sitting on a table" example is
| excellent.
|
| Some pics are of drinking glasses and some are of eye glasses,
| and one has both.
| adamredwoods wrote:
| I also like the telephones from different eras, including the
| future.
| [deleted]
| letsburnthis wrote:
| If decide to make one of those exact chairs in the shape of an
| avocado. Can I be sued for copyright infringement?
| renjimen wrote:
| Depends on who is suing you: OpenAI for using their model
| results, or the owner(s) of the data their model was trained
| on? Either way, it's a grey zone that copyright law hasn't come
| to grips with yet. See
| https://scholarlykitchen.sspnet.org/2020/02/12/if-my-ai-wrot...
| xdeals1209 wrote:
| https://couponsxdeals.com/ has been doing this for a little while
| now and I've been pretty happy with the performance,
| TedDoesntTalk wrote:
| > an illustration of a baby daikon radish in a tutu walking a dog
|
| Wow!
| aaron695 wrote:
| The real power here is a Google destroyer.
|
| These AI's can't yet produce things of value to humans but I
| doubt Google's AI could know that.
|
| Pump out billions of pages of text and pictures and it should
| swamp Google.
| dinkleberg wrote:
| RIP to all the fiverr artists out there.
|
| This is impressive.
| victor9000 wrote:
| Given ClosedAI's recent moves, I doubt the public will ever
| have access to this. So I think those artists will be just
| fine.
| the8472 wrote:
| You must be using a very short definition of "ever". These
| kinds of works will get replicated if they're not published.
| totetsu wrote:
| This would make the process of creating elaborative encoded
| visual mnemonics faster.
| asbund wrote:
| This is amazing
| desideratum wrote:
| Some truly impressive results. I'll pick my usual point here when
| a fancy new (generative) model comes out, and I'm sure some of
| the other commenters have alluded to this. The examples shown are
| likely from a set of well-defined (read: lots of data, high bias)
| input classes for the model. What would be really interesting is
| how the model generalizes to /object concepts/ that have yet to
| be seen, and which have abstract relationships to the examples it
| has seen. Another commenter here mentioned "red square on green
| square" working, but "large cube on small cube", not working.
| Humans are able to infer and understand such abstract concepts
| with very few examples, and this is something AI isn't as close
| to as it might seem.
| adsche wrote:
| Yes, I don't really see impressive language (i.e. GPT3) results
| here? It seems to morph the images of the nouns in the prompt
| in an aesthetically-pleasing and almost artifact-free way (very
| cool!).
|
| But it does not seem 'understand' anything like some other
| commenters have said. Try '4 glasses on a table' and you will
| rarely see 4 glasses, even though that is a very well-defined
| input. I would be more impressed about the language model if it
| had a working prompt like: "A teapot that does _not_ look like
| the image prompt. "
|
| I think some of these examples trigger some kind of bias, where
| we think: "Oh wow, that armchair _does_ look like an avocado! "
| - But morphing an armchair and an avocado will almost always
| look like both because they have similar shapes. And it does
| not 'understand' what you called 'object concepts', otherwise
| it should not produce armchairs where you clearly cannot sit in
| due to the avocado stone (or stem in the flower-related
| 'armchairs').
| ralfd wrote:
| > I would be slightly more impressed about the language model
| if it had a working prompt like: "A teapot that does not look
| like the image prompt."
|
| _Slightly?_ Jesus, you guys are hard to please.
| adsche wrote:
| Right, that was unnecessary and I edited it out.
|
| What I meant is that 'not' is in principal an easy keyword
| to implement 'conservatively'. But yes, having this in a
| language model has proven to be very hard.
|
| Edit: Can I ask, what do you find impressive about the
| language model?
| dash2 wrote:
| Perhaps the rest of the world is less blase - rightly or
| wrongly. I do get reminded of this:
| https://www.youtube.com/watch?v=oTcAWN5R5-I when I read
| some comments. I mean... we are telling the computer
| "draw me a picture of XXX" and it's actually doing it. To
| me that's utterly incredible.
| adsche wrote:
| > "draw me a picture of XXX" and it's actually doing it.
| To me that's utterly incredible.
|
| Sure, would be, but this is not happening here.
|
| And yes, rest assured, the rest of the world is probably
| less 'blase' than I am :) Very evident by the hype around
| GPT3.
| viggity wrote:
| I'm in the open ai beta for GPT-3, and I don't see how to
| play with DALL-E. Did you actually try "4 glasses on a
| table"? If so, how? Is there a separate beta? Do you work for
| open ai?
| nicholast wrote:
| In the demonstrations click on the underlined keywords and
| you can select alternates from dropdown menu.
| spyder wrote:
| Yea, with these kind of generative examples, they should always
| include the closest matches from the training set to see how
| much it just "copied".
| [deleted]
| jonesn11 wrote:
| This is a spot on point. My prediction is that it wouldn't be
| able to. Given its difficulty to generate correct counts of
| glasses, it seems as though it still struggles with systematic
| generalization and compositionality. As a point of reference,
| cherrypicking aside, it could model obscure but probably well-
| defined baby daikon radish in tutu walking dog, but couldn't
| model red on green on blue cubes. Maybe more sequential
| perception, action, video data or system-2 like paradigm, but
| it remains to be seen.
| sendtown_expwy wrote:
| It seems unlikely the model has seen "baby daikon radishes in
| tutus walking dogs," or cubes made out of porcupine textures,
| or any other number of examples the post gives.
| Alex3917 wrote:
| If you type in different plants and animals into GIS, you
| don't even get the right species half the time. If GPT-3 has
| solved this problem, that would be substantially more
| impressive than drawing the images.
| agravier wrote:
| What is GIS? I only know Geographical Information System.
| the8472 wrote:
| probably Google Image Search
| m3at wrote:
| It might not have seen that specific combination, but finding
| an anthropomorphized radish sure is easier than I thought:
| type "Da Gen anime" in your search engine and you'll find
| plenty of results
| ronsor wrote:
| At least for certain types of art, sites such as pixiv and
| danbooru are useful for training ML models: all the images
| on them are tagged and classified already.
| numpad0 wrote:
| Image search "Da Gen Ni Ren Hua " do return similar
| results to the AI-generated pictures, e.g. 3rd from top[0]
| in my environment, but sparse. "Da Gen anime" in text
| search actually gives me results about an old hobbyist
| anime production group[1], some TV anime[2] with the word
| in title...hmm
|
| Then I found these[3][4] in Videos tab. Apparently there's
| a 10-20 year old manga/merch/anime franchise of walking and
| talking daikon radish characters.
|
| So the daikon part is already figured in the dataset. The
| AI picked up the prior art and combined it with the dog
| part, which is still tremendous but maybe not "figuring out
| the daikon walking part on its own" tremendous.
|
| (btw anyone knows how best to refer to _anime_ art style in
| Japanese? It's a bit of mystery to me)
|
| 0: https://images.app.goo.gl/LPwveUJPWHr6oK8Y8
|
| 1: https://ja.wikipedia.org/wiki/DAICON_FILM
|
| 2: https://ja.wikipedia.org/wiki/%E7%B7%B4%E9%A6%AC%E5%A4%A
| 7%E6...
|
| 3: https://youtube.com/watch?v=J1vvut5DvSY
|
| 4: https://youtu.be/1Gzu2lJuVDQ?t=42
| tkgally wrote:
| > anyone knows how best to refer to anime art style in
| Japanese?
|
| The term _mangachikku_ (Man Hua chitsuku, mangachitsuku,
| "manga-tic") is sometimes used to refer to the art style
| typical of manga and anime; it can also refer to
| exaggerated, caricatured depictions in general. Perhaps
| _anime fu irasuto_ (animeFeng irasuto, anime-style
| illustration), while a less colorful expression, would be
| closer to what you 're looking for.
| blueblisters wrote:
| I can't find a source for the dataset but going by the hints
| peppered throughout the article, they likely used <img> `alt`
| tags for supervision? Fascinating that an accessibility tool is
| being repurposed to train machine learning models.
| ignoranceprior wrote:
| Does this address NLP skeptics' concerns that Transformer models
| don't "understand" language?
|
| If the AI can actually draw an image of a green block on a red
| block, and vice versa, then it clearly understands something
| about the concepts "red", "green", "block", and "on".
| karmasimida wrote:
| I think it is safe to say that learning a joint distribution of
| vision + language, is fully possible at this stage,
| demonstrating by this work.
|
| But 'understanding' itself needs to be further specified, in
| order to be tested even.
|
| What strikes me most is the fidelity of those generated images,
| matching the SOTA from GAN literature with much more variety,
| without using the GAN objective.
|
| It seems Transformer model might be the best neural construct
| we have right now, to learn any distribution, assuming more
| than enough data.
| tikwidd wrote:
| if(adj == 'red') drawBlock(RED)
|
| According to your definition of understanding, this program
| understands something about the concept RED. But the code is
| just dealing with arbitrary values in memory (e.g. RED =
| 0xFF0000)
| TigeriusKirk wrote:
| There are examples on twitter showing it doesn't really
| understand spatial relations very well. Stuff like "red block
| on top of blue block on top of green block" will generate red,
| green, and blue blocks, but not in the desired order.
|
| https://twitter.com/peabody124/status/1346565268538089483
| [deleted]
| tralarpa wrote:
| Try a large block on a small block. As the authors also have
| noted in their comments the success rate is nearly zero. One
| may wonder why. Maybe because that's something you see rarely
| in photos? At the end, it doesn't "understand" the meaning of
| the words.
| bra-ket wrote:
| nah, it's still big and dumb model with no idea what it's
| doing, deepfake 2.0.
|
| It looks like a variation on plain old image search engine,
| unreliable at that, as compared to exact matching.
|
| But it has obvious application in design as it can create these
| interesting combinations of objects & styles. And I loved the
| snail-harp.
| dcolkitt wrote:
| The root-case of skepticism has always been that while
| Transformers do exceptionally well on finite-sized tasks, they
| lack any fully recursive understanding of the concepts.[0]
|
| A human can learn basic arithmetic, then generalize those
| principles to bigger number arithmetic, then go from there to
| algebra, then calculus, then so. Successively building on
| previously learned concepts in a fully recursive manner.
| Transformers are limited by the exponential size of their
| network. So GPT-3 does very well with 2-digit addition and okay
| with 2-digit multiplication, but can't abstract to 6-digit
| arithmetic.
|
| DALL-E is an incredible achievement, but doesn't really do
| anything to change this fact. GPT-3 can have an excellent
| understanding of a finite sized concept space, yet it's still
| architecturally limited at building recursive abstractions. So
| maybe it can understand "green block on a red block". But try
| to give it something like "a 32x16 checkerboard of green and
| red blocks surrounded by a gold border frame studded with blue
| triangles". I guarantee the architecture can't get that exactly
| correct.
|
| The point is that, in some sense, GPT-3 is a technical dead-
| end. We've had to exponentially scale up the size of the
| network (12B parameters) to make the same complexity gains that
| humans make with linear training. The fact that we've managed
| to push it this far is an incredible technical achievement, but
| it's pretty clear that we're still missing something
| fundamental.
|
| [0] https://arxiv.org/pdf/1906.06755.pdf
| Veedrac wrote:
| > So GPT-3 does very well with 2-digit addition and okay with
| 2-digit multiplication, but can't abstract to 6-digit
| arithmetic.
|
| This is false, GPT-3 can do 10-digit addition with ~60%
| accuracy, with comma separators. Without BPEs it would
| doubtlessly manage much better.
| dcolkitt wrote:
| The accuracy largely comes from the fact that addition
| rarely requires carrying more than a single digit. So it's
| easy to pattern match from single digit problems that it
| was previously trained on.
|
| With multiplication, which requires much more extensive
| cross-column interaction, accuracy falls off a cliff with
| anything more than a few digits.
| codetrotter wrote:
| > So GPT-3 does very well with 2-digit addition and okay with
| 2-digit multiplication, but can't abstract to 6-digit
| arithmetic.
|
| That sounds disappointing but what if instead of trying to
| teach it to do addition one would teach it to write source
| code for making addition and other maths operations instead?
|
| Then you can ask it to solve a problem but instead of it
| giving you the answer it would give you source code for
| finding the answer.
|
| So for example you ask it "what is the square root of five?"
| then it responds: fn main () {
| println!("{}", 5f64.sqrt()); }
| camdenlock wrote:
| Amazing. Would love to play with this.
|
| Is OpenAI going to offer this as a closed paywalled service? Once
| again wondering how the "open" comes into play.
| Jerry2 wrote:
| After their new CEO came in, former president of YC, they
| closed off everything and took a lot of investment. Only thing
| that's open about them is the name.
| [deleted]
| kbos87 wrote:
| First: This strikes me as truly amazing - but my mind immediately
| goes to the economic impact of something like this. Personally I
| try not to be an alarmist about the potential for jobs to be
| automated away, but how strikingly good this is makes me wonder
| if we just haven't seen AI that is good enough to automate away
| large parts of the workforce.
|
| Seeing the "lovestruck cup of boba" reminded me of an
| illustration a friend of mine did for a startup a few years back.
| It would be a lot easier and less time consuming for someone to
| simply request such an image from an AI assistant. If I were a
| graphic artist or photographer, this would scare me.
|
| I don't know what the right answer is here. I have little to no
| faith in regulators to help society deal with the sweeping
| negative effects even one new AI-based product looks like it
| could have on a large swath of the economy. Short of regulation
| and social safety nets, I wonder if society will eventually step
| up and hold founders and companies accountable when they cause
| broad negative economic impacts for their own enrichment.
| cflyingdutchman wrote:
| I believe founders/companies can and do "cause broad negative
| economic impacts for their own enrichment", but creating a
| lower-cost path to the same good/result is a good thing
| fundamentally. Yes, this can cause greater income/life-
| experience inequality, and we should adjust for that, but in
| ways that do not punish innovation. In short, we should
| optimize for human happiness by better sharing the wealth
| rather than by limiting it.
| jmmcd wrote:
| One perspective is: anything that can be automated (thus
| lowered in cost) should be. For drudge-work, of course that's
| good. For some examples, showing that it can be automated
| shows that it IS drudge-work. But replacing a creative
| illustrator? That is not drudge-work, it is a fulfilling and
| enjoyable profession. I don't think it's clear that changing
| it to become a hobby (because it's no longer viable as a
| profession) is "a good thing fundamentally". I would need to
| hear further arguments on this.
| cflyingdutchman wrote:
| This very quickly gets into "what's the point of it all?"
| and I'll admit that I don't have the answer. :)
| bryanrasmussen wrote:
| I wonder what it makes out of green ideas sleep furiously.
| Marinlemaignan wrote:
| i want to see it go into an infinite loop with an "image
| recognition software" (one where you feed an image and you get a
| written description of if)
| vnjxk wrote:
| I believe it will end up stabilazing on one image or a sequence
| of images whose text return themselves
| steventey wrote:
| That name though -
|
| DALL*E = Dali + WALL*E
|
| Freaking brilliant.
|
| Was that generated by an AI as well?
|
| I'm actually building a name generator that is as intelligent and
| creative as that for my senior year thesis (and also for
| https://www.oneword.domains/)
|
| I already have an MVP that I'm testing out locally but I'd
| appreciate any ideas on how to make it as smart as possible!
| TecoAndJix wrote:
| I learned the word portmanteau from this!
| ArtWomb wrote:
| "Teapot in the shape of brain coral" yields the opposite. The
| topology is teapot-esque. The texture composed of coral-like
| appendages. Sorry if this is overly semantic, I just happen to be
| in a deep dive in Shape Analysis at the moment ;)
|
| >>> DALL*E appears to relate the shape of a half avocado to the
| back of the chair, and the pit of the avocado to the cushion.
|
| That could be human bias recognizing features the generator
| yields implicitly. Most of the images appear as "masking" or
| "decal" operations. Rather than a full style transfer. In other
| words the expected outcome of "soap dispenser in the shape of
| hibiscus" would resemble a true hybridized design. Like an haute
| couture bottle of eau du toilette made to resemble rose petals.
|
| The name DALL-E is terrific though!
| dj_mc_merlin wrote:
| I find it's ability to give different interpretations of the
| same thing amazing. This kind of fuzziness is also present in
| human art.
|
| Another good example is the "collection of glasses" on the
| table. It makes both glassware and eyeglasses!
| [deleted]
| [deleted]
| dane-pgp wrote:
| > a living room with two white armchairs and a painting of the
| colosseum. the painting is mounted above a modern fireplace.
|
| With the ability to construct complex 3D scenes, surely the next
| step would be for it to ingest YouTube videos or TV/movies and be
| able to render entire scenes based on a written narration and
| dialogue.
|
| The results would likely be uncanny or absurd without careful
| human editorial control, but it could lead to some interesting
| short films, or fan-recreations of existing films.
| dfischer wrote:
| How do we know this isn't already happening with state actors?
| IfOnlyYouKnew wrote:
| Because state actors are all busy acting smart on the
| internet by using terms such as "state actors".
| ilaksh wrote:
| I think in a way that's the next step but they may have to wait
| a little bit before they have the processing power.
|
| If you are talking about 24 frames per second, then
| theoretically one second of video could require 24 times as
| much processing power. And 100 seconds 2400 X. Obviously that's
| just a random guess but surely it is much more than for
| individual images.
|
| But I'm sure we'll get there.
| alpaca128 wrote:
| I'd love to see what this does with item/person/artwork/monster
| descriptions from Dwarf Fortress. Considering the game has
| creatures like were-zebras, beasts in random shapes and
| materials, undead hair, procedurally generated instruments and
| all kinds of artefacts menacing with spikes I imagine it could
| make the whole thing even more entertaining.
| jandrese wrote:
| How long until the Rule 34 perverts get their hands on this and
| start inputting stuff like "Bobby Fischer Minotaur fucking a lime
| green Toyota Echo"?
|
| The shipping community will go apeshit if this thing works as
| advertised.
| speedgoose wrote:
| I remember looking at generated porn pictures with an old
| model, not taking text inputs, and some pictures were very
| disturbing because the bodies were impossible or very not
| healthy.
|
| There is a reason that the examples are cartoons animals or
| objects. It's not disturbing that the pig teapot is not
| realistic, or that the dragon cat is missing a leg. This kind
| of problem is very disturbing on realistic pictures of human
| bodies.
|
| Eventually it will get there, I guess you could make an AI to
| filter the pictures to see which of them are disturbing or not.
| jandrese wrote:
| On the other hand, no matter how misshapen or deformed the
| body comes out that will be someone's kink.
| tikwidd wrote:
| I wonder what happens when you ask it for an impossible object,
| e.g. a square triangle?
| kome wrote:
| black magic
| lacker wrote:
| I wish this was available as a tool for people to use! It's neat
| to see their list of pregenerated examples, but it would be more
| interesting to be able to try things out. Personally, I get a
| better sense of the powers and limitations of a technology when I
| can brainstorm some functionality I might want, and then see how
| close I can come to creating it. Perhaps at some point someone
| will make an open source version.
| [deleted]
| industriousthou wrote:
| I have no idea what kind of compute power something like this
| relies on. Would this be able to run on a consumer desktop?
| p1esk wrote:
| Depends on how fast you want it to generate results, but yes,
| it can run on a desktop provided there's enough RAM.
| m3at wrote:
| They note that the model has 12B parameters, which in terms
| of order of magnitude make it sit right between gpt 2 and 3
| (1.5 and 170 respectively). With some tricks, you can run gpt
| 2 on good personal hardware, so this might be reachable as
| well with the latest hardware.
|
| EDIT: I'm assuming you mean for inference, for training it
| would be an other kind of challenge and the answer would be a
| clear no
| ricardobeat wrote:
| In the linked CLIP paper they say it is trained on 256 GPUs
| for 2 weeks. No mention of the size of the trained output.
| [deleted]
| m3at wrote:
| I wish so too! I don't expect them to release the code (they
| rarely do) and they wield their usual "it might have societal
| impact, let us decide what's good for the world":
|
| > We recognize that work involving generative models has the
| potential for significant, broad societal impacts
|
| The community did raise up to the challenge of re-implementing
| it (sometimes better) in the past, so I'm hopeful.
| rkagerer wrote:
| Anywhere this can be tried out interactively? I'd like to type
| some phrases and see how it does.
| keyle wrote:
| Imagine the amount of body parts pictures on the server... /s
| inakarmacoma wrote:
| I came hoping for the same. While these are amazing, in
| publication reading, one really needs to try it out. Would love
| to get my hands dirty. Still wait-listed got gpt3 access, but
| no hope in sight...
| chishaku wrote:
| ok how can we use this?
| jaimex2 wrote:
| we can't, it's just a post of what they created.
| littlestymaar wrote:
| The results highlighted in the blog post are incredible,
| unfortunately, they are also cherry-picked: I've played with the
| prompt a bit, and every result (not involving a drawing) was
| disappointing...
|
| I may have not been so disappointed if they had not highlighted
| such incredible results in the first place. Managing expectations
| is tough.
| [deleted]
___________________________________________________________________
(page generated 2021-01-06 23:02 UTC)