[HN Gopher] DALL*E: Creating Images from Text
       ___________________________________________________________________
        
       DALL*E: Creating Images from Text
        
       Author : todsacerdoti
       Score  : 1158 points
       Date   : 2021-01-05 19:08 UTC (1 days ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | savant_penguin wrote:
       | I hope gpt3 dungeon is having a great update incoming
        
       | keyle wrote:
       | This is incredible. Such technology with a RPG adventure game
       | would open up a new genre of exploration games!
        
       | anthk wrote:
       | There was some programming language akin to PovRay (not to
       | raytrace) in order to describe the scene with commands like
       | "place a solid here" and so on.
       | 
       | I can't remember its name.
       | 
       | EDIT:
       | 
       | https://www.contextfreeart.org/
        
         | scribu wrote:
         | There are several projects like this, but they can only
         | generate abstract shapes, i.e. they're much lower-level than a
         | natural language caption.
        
         | superkuh wrote:
         | Context Free is incredibly fun but it's not machine learning or
         | even AI. And it's very easy to understand how things go from
         | defintions of shape, rotations, translations, etc, to the
         | finished image.
        
       | gfody wrote:
       | this is incredible but I can't help but feel like we're skipping
       | some important steps by working with plain text and bitmaps - "a
       | collection of glasses sitting on a table" sometimes eyeglasses
       | sometimes drinking glasses sometimes a weird amalgamation. and as
       | long as we're OK with ambiguity in every layer are we really ever
       | going to be able to meaningfully interrogate the reasons behind a
       | particular output?
        
       | reubens wrote:
       | "For other captions, such as "a snail made of harp," the results
       | are less good, with images that combine snails and harps in odd
       | ways." [0]
       | 
       | You try drawing a snail made of harp! Seriously! DALL-E did an
       | incredible job
       | 
       | [0]
       | https://www.technologyreview.com/2021/01/05/1015754/avocado-...
        
         | anonytrary wrote:
         | I think those examples are great. Another way to judge this is
         | to consider what a class of 8th graders might come up with if
         | you ask them to draw "a snail made of harp". The request is
         | non-sensical itself, so I imagine the results from DALL-E are
         | actually pretty good.
        
         | inakarmacoma wrote:
         | Really? I found those examples to be laid the most compelling
         | and interesting overall.
        
       | JacobiX wrote:
       | Results are spectacular, but as always and especially with OpenAI
       | announcements one should be very cautious (lessons learned from
       | GPT3). I hope that the model is not doing advanced
       | patchwork/stitching of training images. I think that this kind of
       | test should be included when measuring the performance of any
       | generative model. Unfortunately the paper is not yet released,
       | the model is not available and no online demo has been published.
       | Recently a team of researchers discovered that in some cases
       | advanced generative text models copy a large chunk of training
       | data-set text ...
        
         | ralfd wrote:
         | > I hope that the model is not doing advanced
         | patchwork/stitching of training images
         | 
         | It would still be impressive that it knows were to include
         | hands, christmas sweater or a unicycle.
        
       | ludwigschubert wrote:
       | That's a delightful result you all; and beautifully explored,
       | too!
        
       | saberience wrote:
       | Maybe I'm missing something but does it say what library of
       | images was used to train this model? I couldn't quite understand
       | the process of building DALL-E. Did they have a large database of
       | labeled images and they combined this database with GPT-3?
        
       | worldsayshi wrote:
       | There's something that really creeps me out about errors in AI
       | generated images. More than uncanny valley creepiness. Like
       | trypophobia creepy.
        
         | Nition wrote:
         | Same for me. It's like the feeling you get in a dream where
         | things seem normal and you think you're awake, then suddenly
         | you notice something _wrong_ about the room, something
         | impossible.
        
         | jamcohen wrote:
         | I get the same feeling, to the point that I occasionally let
         | out a brief scream when browsing GAN images.
        
         | dfischer wrote:
         | Just wait until you can't tell the difference and then
         | contemplate if it matters, and then if that's what reality
         | already is.
        
         | ravi-delia wrote:
         | I know exactly what you mean. Like if you had to see it in real
         | life you'd see something horrible just out of shot. For some
         | reason that's amplified with the furniture.
        
       | jcims wrote:
       | They are going to train on YouTube/PornHub before long and it's
       | going to get weird.
        
         | MrBuddyCasino wrote:
         | Also, someone will make a version for furries. They pay well.
        
         | dfischer wrote:
         | Maybe already has...
        
         | irrational wrote:
         | Combine this with deep fakes.
         | 
         | Donald Trump is Nancy Pelosi's and AOC's step-brother in a
         | three-way in the Lincoln Bedroom.
        
           | wj wrote:
           | Can't unsee!
        
             | jcims wrote:
             | At least we're spared the smell...for now.
        
         | SoSoRoCoCo wrote:
         | I'm not sure how to feel, because I had this exact same
         | thought. The evolution of porn from 320x200 EGA on a BBS, to
         | usenet (alt.binaries,pictures.erotica, etc.) on XVGA (on an AIX
         | Term), to the huge pool of categories on today's porn sites,
         | which eventually became video and bespoke cam performers... Is
         | this going to be some new weird kind of porn that Gen Alpha
         | normalizes?
        
       | dj_mc_merlin wrote:
       | This is real? A computer can take "an armchair in the shape of an
       | avocado" as input and make a picture of one?
       | 
       | I can't believe it. How does it put the baby daikon radish in the
       | tutu?
        
         | numpad0 wrote:
         | Because the Internet has plenty of well captioned drawings of
         | daikon radish drawn as a bipedal humanoid. D'oh!
        
         | crdrost wrote:
         | If we knew how it did it, it wouldn't be machine learning.
         | 
         | The defining feature of machine learning in other words is that
         | the machine constructs a hypersurface in a very-high-
         | dimensional space based on the samples that it sees, and then
         | extrapolates along the surface for new queries. Whereas you can
         | explain features of why the hypersurface is shaped the way it
         | is, the machine learning algorithm essentially just tries to
         | match its shape well, and intentionally does not try to extract
         | reasons "why" that shape "has to be" the way it is. It is a
         | correlator, not a causalator.
         | 
         | If you had something bigger you'd call it "artificial
         | intelligence research" or something. Machine learning is
         | precisely the subset right now that is focused on "this whole
         | semantic mapping thing that characterized historical AI
         | research programs--figure out amazing strategy so that you need
         | very little compute--did not bear fruit fast enough compared to
         | the exponential increases in computing budget so let us instead
         | see what we can do with tons of compute and vastly less
         | strategy." It is a deliberate reorientation with some good and
         | some bad parts to it. (Practical! Real results now! Who cares
         | whether it "really knows" what it's doing? But also, you peer
         | inside the black box and the numbers are quite inscrutable; and
         | also, adversarial approaches routinely can train another
         | network to find regions of the hyperplane where an obvious
         | photograph of a lion is mischaracterized as a leprechaun or
         | whatever.)
        
           | dj_mc_merlin wrote:
           | Machine learning is not a concept that is fundamentally
           | incompatible with interpretability, and indeed research is
           | being done in this area.
           | 
           | One method for example is occlusion, removing pieces of input
           | to assemble statistical representations of which parts your
           | model cares about.
           | 
           | It's all still baby steps, but with time the theory will
           | catch up.
           | 
           | > If you had something bigger you'd call it "artificial
           | intelligence research"
           | 
           | Usually called just Data Science, and does deal with that (we
           | had lectures on interpratibility of models at university)
        
         | Igelau wrote:
         | Switch it over to pikachu in a helmet wielding a blue
         | lightsaber. Some hilariously bad results there.
        
           | ralfd wrote:
           | "pikachu in a xxx staring at its reflection in a mirror" is
           | interesting. Both hilariously bad and impressive if the
           | helmet is mirrored.
        
       | m_rpn wrote:
       | Nice result, but their claims are a little misleading, as always.
       | They have come full circle, having a very powerful language model
       | they can attach it to other inputs(video, audio,...) creating an
       | ensemble system that gives the illusion that it has "knowledge"
       | about domains and can reason in context, as they ofter state in
       | all their works. But we should know better that what they are
       | doing is very different, eg. trowing a truckload of data into
       | billion-parameter neural networks, spending millions, if not
       | billions, of dollars, just to minimize a loss function in an
       | adversarial environment (a GAN), in this case.
        
       | ve55 wrote:
       | I really do think AI is going to replace millions of workers very
       | quickly, but just not in the order that we used to think of. We
       | will replace jobs that require creativity and talent before we
       | will replace most manual factor workers, as hardware is
       | significantly more difficult to scale up and invent than
       | software.
       | 
       | At this point I have replaced a significant amount of creative
       | workers with AI for personal usage, for example:
       | 
       | - I use desktop backgrounds generated by VAEs (VD-VAE)
       | 
       | - I use avatars generated by GANs (StyleGAN, BigGAN)
       | 
       | - I use and have fun with written content generated by
       | transformers (GPT3)
       | 
       | - I listen to and enjoy music and audio generated by autoencoders
       | (Jukebox, Magenta project, many others)
       | 
       | - I don't purchase stock images or commission artists for many
       | previous things I would have when a GAN exists that already makes
       | the class of image I want
       | 
       | All of this has happened in that last year or so for me, and I
       | expect that within a few more years this will be the case for
       | vastly more people and in a growing number of domains.
        
         | ErikAugust wrote:
         | Isn't training data effectively a form of sampling?
         | 
         | Couldn't any creator of images that a model was trained on sue
         | for copyright infringement?
         | 
         | Or do great artists really just steal (just at a massive
         | scale)?
        
           | ve55 wrote:
           | Currently that is not the case:
           | 
           | >Models in general are generally considered "transformative
           | works" and the copyright owners of whatever data the model
           | was trained on have no copyright on the model. (The fact that
           | the datasets or inputs are copyrighted is irrelevant, as
           | training on them is universally considered fair use and
           | transformative, similar to artists or search engines; see the
           | further reading.) The model is copyrighted to whomever
           | created it.
           | 
           | Source (scroll up slightly past where it takes you):
           | https://www.gwern.net/Faces#copyright
        
             | ErikAugust wrote:
             | Thank you, this is the part I find most relevant:
             | 
             | "Models in general are generally considered "transformative
             | works" and the copyright owners of whatever data the model
             | was trained on have no copyright on the model. (The fact
             | that the datasets or inputs are copyrighted is irrelevant,
             | as training on them is universally considered fair use and
             | transformative, similar to artists or search engines; see
             | the further reading.) The model is copyrighted to whomever
             | created it. Hence, Nvidia has copyright on the models it
             | created but I have copyright under the models I trained
             | (which I release under CC-0)."
        
             | visarga wrote:
             | I bet they can claim copyright up to the gradients
             | generated on their media, but in the end the gradients get
             | summed up, so their contribution is lost in the cocktail.
             | 
             | If I write a copyrighted text on a book, then I print a
             | million other texts on top of it, in both white an black,
             | mixing it all up to be like white noise, would the original
             | authors have a claim?
        
               | astrange wrote:
               | Models can unpredictably memorize sensitive input data,
               | so there can be a real copyright issue here, I think.
               | 
               | https://arxiv.org/abs/1802.08232
               | 
               | Worse, sometimes the input data is illegal to distribute
               | for other reasons than copyright.
        
             | plasticchris wrote:
             | But does that still hold when the model memorized a chunk
             | of the training data? Or can a network plagiarize output
             | while being a transformative work itself?
        
         | yowlingcat wrote:
         | > We will replace jobs that require creativity
         | 
         | Frankly, I think the "AI will replace jobs that require X"
         | angle of automation is borderline apocalyptic conspiracy porn.
         | It's always phrased as if the automation simply stops at making
         | certain jobs redundant. It's never phrased as if the automation
         | lowers the bar to entry from X to Y for /everyone/, which
         | floods the market with crap and makes people crave the good
         | stuff made by the top 20%. Why isn't it considered as likely
         | that this kind of technology will simply make the best 20% of
         | creators exponentially more creatively prolific in quantity and
         | quality?
        
           | ve55 wrote:
           | > Why isn't it considered as likely that this kind of
           | technology will simply make the best 20% of creators
           | exponentially more creatively prolific in quantity and
           | quality?
           | 
           | I think that's well within the space of reasonable
           | conclusions. For as much as we are getting good at generating
           | content/art, we are also therefore getting good at assisting
           | humans at generating it, so it's possible that pathway ends
           | up becoming much more common.
        
         | rich_sasha wrote:
         | Not to undermine this development, but so far, no surprise, AI
         | depends on vast quantities of human-generated data. This leads
         | us to a loop: if AI replaces human creativity, who will create
         | novel content for new generation of AI? Will AI also learn to
         | break through conventions, to shock and rewrite the rules of
         | the game?
         | 
         | It's like efficient market hypothesis: markets are efficient
         | because arbitrage, which is highly profitable, makes them so.
         | But if they are efficient, how can arbitrageurs afford to stay
         | in business? In practice, we are stuck in a half-way house,
         | where markets are very, but not perfectly, efficient.
         | 
         | I guess in practice, the pie for humans will keep on shrinking,
         | but won't disappear too soon. Same as horse maintenance
         | industry, farming and manufacturing, domestic work etc. Humans
         | are still needed there, just a lot less of them.
        
           | p1esk wrote:
           | _if AI replaces human creativity, who will create novel
           | content for new generation of AI?_
           | 
           | Vast majority of human generated content is not very novel or
           | creative. I'm guessing less than 1% of professional human
           | writers or composers create something original. Those people
           | are not in any danger to be replaced by AI, and will probably
           | be earning more money as a result of more value being placed
           | on originality of content. Humans will strive (or be forced)
           | to be more creative, because all non-original content
           | creation will be automated. It's a win-win situation.
        
           | ve55 wrote:
           | > Will AI also learn to break through conventions, to shock
           | and rewrite the rules of the game?
           | 
           | I think AlphaGo was a great in-domain example of this. I
           | definitely see things I'd refer to colloquially as
           | 'creativity' in this DALL-E post, but you can decide for
           | yourself, but that still isn't claiming it matches what some
           | humans can do.
        
             | rich_sasha wrote:
             | True, but AlphaGo exists in a world where everything is
             | absolute. There are new ways of playing Go, but the same
             | rules.
             | 
             | If I train an AI on classical paintings, can it ever invent
             | Impressionism, Cubism, Surrealism? Can it do irony? Can it
             | come up with something altogether new? Can it do meta?
             | "AlphaPaint, a recursive self-portrait"?
             | 
             | Maybe. I'm just not sure we have seen anything in this
             | dimension yet.
        
               | ve55 wrote:
               | >If I train an AI on classical paintings, can it ever
               | invent Impressionism, Cubism, Surrealism?
               | 
               | I see your point, but it's an unfair comparison: if you
               | put a human in a room and never showed them anything
               | except classical paintings, it's unlikely they would
               | quickly invent cubism either. The humans that invented
               | new art styles had seen so many things throughout their
               | life that they had a lot of data to go off of.
               | Regardless, I think we can do enough neural style
               | transfer already to invent new styles of art though.
        
           | xpl wrote:
           | > how can arbitrageurs afford to stay in business
           | 
           | Most arbitrageurs cannot stay in the business, it's the law
           | of diminishing returns. Economies of scale eventually prevent
           | small individual players to profit from the market, only a
           | few big-ass hedge funds can stay, because due to their
           | investments they can get preference from exchanges
           | (significantly lower / zero / negative fees, co-located
           | hardware, etc.) which makes the operation reasonable to them.
           | With enough money you can even build your own physical cables
           | between exchanges to outperform the competitors in latency
           | games. I'm a former arbitrageur, by the way :)
           | 
           | Same with AI-generated content. You would have to be
           | absolutely brilliant to compete with AI. Only a few select
           | individuals would be "allowed" to enter the market. Not even
           | sure that it has something to do with the quality of the
           | content, maybe it's more about prestige.
           | 
           | You see, there already are gazillions of decent human
           | artists, but only a few of them are really popular. So the
           | top-tier artists would probably remain human, because we need
           | someone real to worship to. Their producers would surely use
           | AI as a production tool, depicting it as a human work. But
           | all the low-tier artists would be totally pushed out of the
           | market. There will be simply no job for a session musician or
           | a freelance designer.
        
         | ryan93 wrote:
         | Those don't seem in any way similar to like writing a tv show
         | or animating a Pixar movie.
        
           | ve55 wrote:
           | I agree, and due to the amount of compute that is required
           | for those types of works I think those are still quite awhile
           | away.
           | 
           | But the profession for creative individuals consists of much
           | more than highly-paid well-credentialed individuals working
           | at well-known US corporations. There are millions of artists
           | that just do quick illustrations, logos, sketches, and so on,
           | on a variety of services, and they will be replaced far
           | before Pixar is.
        
             | [deleted]
        
         | ignoranceprior wrote:
         | Do you think investing in MSFT/GOOGL is the best way to profit
         | off this revolution?
        
           | ve55 wrote:
           | It's too hard to say I think. Big players will definitely
           | benefit a lot, so it probably isn't a _bad_ idea, but if you
           | could find the right startups or funds, you might be able to
           | get significantly more of a return.
        
         | karmasimida wrote:
         | I think this is actually not a bad thing.
         | 
         | I won't say many of those things are creativity driven. There
         | are more like auto assets generation.
         | 
         | One use case of such model would be in gaming industry, to
         | generate large amount of assets quickly. This process along
         | takes years, and more and more expensive as gamers are
         | demanding higher and higher resolution.
         | 
         | AI can make this process much more tenable, bring down the
         | overall cost.
        
         | sushisource wrote:
         | > - I use and have fun with written content generated by
         | transformers (GPT3)
         | 
         | > - I listen to and enjoy music and audio generated by
         | autoencoders (Jukebox, Magenta project, many others)
         | 
         | Really, you've "replaced" normal music and books with these?
         | Somehow I doubt that.
        
           | notJim wrote:
           | What are you talking about, this is my favorite album:
           | https://www.youtube.com/watch?v=K0t6ecmMbjQ
        
           | ve55 wrote:
           | Not entirely, no, I don't hope I implied that. I listen to
           | human-created music every day. I just mean to say that I've
           | also listened to AI-created music that I've enjoyed, so it's
           | gone from being 0% of what I listen to to 5%, and presumably
           | may increase much more later.
        
             | p1esk wrote:
             | You should try Aiva (http://aiva.ai). At some point I was
             | mostly listening to compositions I generated through that
             | platform. Now I'm back to Spotify, but AI music is
             | definitely on my radar.
        
               | ve55 wrote:
               | Looks great, thanks for the suggestion
        
           | [deleted]
        
         | Impossible wrote:
         | I believe that AI will accelerate creativity. This will have a
         | side effect of devaluing some people's work (like you
         | mentioned), but it will also increase the value of some types
         | of art and, more importantly, make it possible to do things
         | that were impossible before, or allow for small teams and
         | individuals to produce content that were prohibitively
         | expensive.
        
         | [deleted]
        
         | minimaxir wrote:
         | There still needs to be some sort of human curation, lest
         | bad/rogue output risks sinking the entire AI-generated
         | industry. (in the case of DALL-E, OpenAI's new CLIP system is
         | intended to mitigate the need for cherry-picking, although from
         | the final demo it's still qualitative)
         | 
         | The demo inputs here for DALL-E are curated and utilize a few
         | GPT-3 prompt engineering tricks. I suspect that for typical
         | unoptimized human requests, DALL-E will go off the rails.
        
           | andybak wrote:
           | Personally speaking I don't want curation. What is
           | fascinating about generative AI is the failure modes.
           | 
           | I want the stuff that no human being could have made - not
           | the things that could pass for genuine works by real people.
        
             | minimaxir wrote:
             | Failure modes are fun when they get 80-90% of the way there
             | and hit the uncanny valley.
             | 
             | Unfortunately many generations fail to hit that.
        
           | ve55 wrote:
           | Yes, but there's no reason we can't partially solve this by
           | throwing more data at the models, since we have vast amounts
           | of data we can use for that (ratings, reviews, comments,
           | etc), and we can always generate more en masse whenever we
           | need it.
        
             | minimaxir wrote:
             | This isn't a problem that can be solved with more data.
             | It's a function of model architecture, and as OpenAI has
             | demonstrated, larger models generally perform better even
             | if normal people can't run them on consumer hardware.
             | 
             | But there is still a _lot_ of room for more clever
             | architectures to get around that limitation. (e.g.
             | Shortformer)
        
               | ve55 wrote:
               | I think it's both - we have a lot of architectural
               | improvements that we can try now and in the future, but I
               | don't see why you can't take the output of generative art
               | models, have humans rate them, and then use those ratings
               | to improve the model such that its future art is likely
               | to get a higher rating.
        
         | A4ET8a8uTh0 wrote:
         | You are probably right. Still, there is hope that this just a
         | prelude to getting closer to a Transmetropolitan box ( assuming
         | we can ever figure out how to make AI box that can make
         | physical items based purely on information given by the user ).
        
       | commonturtle wrote:
       | This is simultaneously amazing and depressing, like watching
       | someone set off a hydrogen bomb for the first time and marveling
       | at the mushroom cloud it creates.
       | 
       | I really find it hard to understand why people are optimistic
       | about the impact AI will have on our future.
       | 
       | The pace of improvement in AI has been really fast over the last
       | two decades, and I don't feel like it's a good thing. Compare the
       | best text generator models from 10 years ago with GPT-3. Now do
       | the same for image generators. Now project these improvements 20
       | years into the future. The amount of investment this work is
       | getting grows with every such breakthrough. It seems likely to me
       | we will figure out general-purpose human-level AI in a few
       | decades.
       | 
       | And what then? There are so many ways this could turn into a
       | dystopian future.
       | 
       | Imagine for example huge mostly-ML operated drone armies, tens of
       | millions strong, that only need a small number of humans to
       | supervise them. Terrified yet? What happens to democracy when
       | power doesn't need to flow through a large number of people? When
       | a dozen people and a few million armed drones can oppress a
       | hundred million people?
       | 
       | If there's even a 5% chance of such an outcome (personally I
       | think it's higher), then we should be taking it seriously.
        
         | desideratum wrote:
         | Nick Bostrom's "Superintelligence" is a sober perspective on
         | this issue and a very worthwhile read.
        
           | commonturtle wrote:
           | Yup that's a good recommendation. I've read it and some of
           | the AI Safety work that a small portion of the AI community
           | is working on. At the moment there seems no reason to believe
           | that we can solve this.
        
         | stevofolife wrote:
         | Regulations. That's what the government is for. You think any
         | country is going to let someone operate millions of drones at
         | their will? Yeah ok.
        
         | ajnin wrote:
         | I think the points you make are very important. Not only the
         | "Terminator" scenario but also the "hyper-capitalism" scenario.
         | But the solution is not to stop working on such research, it is
         | political.
        
         | kertoip_1 wrote:
         | You are assuming that AI will magically appear in one hands
         | only. We can prevent that, as developers we can make AI
         | research open and provide AI tools to masses in order to keep
         | "balance". If everyone had the same power, then it wouldn't be
         | such big advantage anymore.
        
           | dash2 wrote:
           | That's not obvious. What if _everyone_ has the tools to
           | create their own army of nuclear-tipped killer drones?
        
         | m12k wrote:
         | The scary thing about automation isn't the technology itself.
         | It's that it breaks the tenuous balance of power between those
         | who own and those who work - if the former can just own robots
         | instead of hiring the latter, what will become of the latter?
         | The truth is, what's scary about that imbalance of power is
         | already true, it's just that until now, technological
         | limitations made that imbalance incomplete - workers still had
         | some bargaining power. That is about to go away, and what will
         | be left is the realization that the solution to this isn't
         | ludditism, the solution is political. As it always was.
        
           | CuriouslyC wrote:
           | That's not exactly true. A lot (low level) human labor will
           | be made irrelevant, but AI tools will allow people to easily
           | work productively at a higher level. Musicians will be able
           | to hum out templates of music, then iteratively refine the
           | result using natural language and gestures. Writers will be
           | able to describe a plot, and iteratively refine the prose and
           | writing style. Movie producers will be able to describe
           | scenes then iteratively refine the angles, lighting, acting,
           | cuts, etc. It will be a golden age for creativity, where
           | there's an abundance of any sort of art or entertainment
           | you'd like to consume, and the only problem is locating it in
           | the sea of abundance.
           | 
           | The only issue I see here is that government will need to
           | take a hand in mitigating capitalistic wealth inequality, and
           | access to creative tools will need to be subsidized for low
           | income individuals (assuming we can't bring the compute cost
           | down a few orders of magnitude).
        
         | Simon321 wrote:
         | How typically cynical of human beings, a wondrous technology
         | comes a long that can free mankind of tedious work and
         | massively improve our lives, maybe even eliminate scarcity
         | eventually and all people can think about is how it could be
         | bad for us.
        
         | CuriouslyC wrote:
         | Armies of high powered smart drones aren't going to be a thing
         | until we figure out security, and I'm not sure that's ever
         | going to happen. Having people in the loop is affordable and
         | much more expensive/time consuming to subvert.
        
         | [deleted]
        
         | lbrito wrote:
         | >It seems likely to me we will figure out general-purpose
         | human-level AI in a few decades.
         | 
         | "The singularity is _always near_". We've been here before
         | (1950s-1970s); people hoping/fearing that general AI was just
         | around the corner.
         | 
         | I might be severely outdated on this, but the way I see it AI
         | is just rehashing already existent knowledge/information in
         | (very and increasingly) smart ways. There is absolutely no
         | spark of creativity coming from the AI itself. Any "new"
         | information generated by AI is really just refined noise.
         | 
         | Don't get me wrong, I'm not trying to take a leak on the field.
         | Like everyone else I'm impressed by all the recent
         | breakthroughs, and of course something like GPT is infinitely
         | more advanced than a simple `rand` function. But the ontology
         | remains unchanged; we're just doing an extremely opinionated,
         | advanced and clever `rand` function.
        
           | 3pt14159 wrote:
           | No we're not.
           | 
           | About a decade ago I trained a model on Wikipedia which was
           | tuned to classify documents into what branch of knowledge the
           | document could be part of. Then I fed in one of my own blog
           | posts. The second highest ranking concept that came back to
           | me was "mereology" a term I had never even heard of and one
           | that was quite apt for the topic I was discussing in the blog
           | post.
           | 
           | My own software, running on the contents of millions of
           | authors' work, ingesting my own blog post, taught _me_ the
           | orchestrator of the process about his own work. This feedback
           | loop is accelerating and just because it takes decades for
           | the irrefutable to come, it doesn 't mean that it never will.
           | People in the early 40s said atomic weapons would never
           | happen because it would be too difficult. For some people
           | nothing short of seeing is believing, but those with
           | predictive minds know that this truly is just around the
           | corner.
        
       | captainmuon wrote:
       | Maybe I'm cynical, but I'm really skeptical. What if this is just
       | some poor image generation code and a few hundred minimum wage
       | workers manually assembling the examples? Unless I can feed it
       | arbitrary sentences we can never know.
       | 
       | I would be disappointed, but not surprized if OpenAI turns out to
       | be the Theranos of AI...
        
         | apatap wrote:
         | At least GPT-3 can generate texts much faster than a worker
         | would need to create them manually.
        
           | captainmuon wrote:
           | Right, but I bet the images shown here were preselected.
        
             | sanxiyn wrote:
             | They swear they aren't preselected. I am willing to bet
             | you, but I am not sure how to settle the bet.
        
       | nojvek wrote:
       | Wow. This is amazing. Although I wish they documented how much
       | compute and data was used to get these results.
       | 
       | I absolutely believe we'll crack the fundamental principles of
       | intelligence in our lifetimes. We now have capability to process
       | all public data available on internet (of wikipedia is a huge
       | chunk). We have so many cameras and microphones (one in each
       | pocket).
       | 
       | It's also scary to think if it goes wrong (the great filter for
       | fermi paradox). However I'm optimistic.
       | 
       | The brain only uses 20 watts of power to do all its magic. The
       | entire human body is built from 700MB of data in the DNA. The
       | fundamental principles of intelligence is within reach if we look
       | from that perspective.
       | 
       | Right now GPT3 and DALL-E seem to be using an insane amount of
       | computation to achieve what they are doing. My prediction is that
       | in 2050, we'll have pretty good intelligence in our phones that
       | has deep understanding (language and visual) of the world around
       | us.
        
         | SoSoRoCoCo wrote:
         | > Although I wish they documented how much compute and data was
         | used to get these results.
         | 
         | I'm hearing astonishing numbers, in the tens of megawatts range
         | for training these billion-parameter models.
         | 
         | And I wish they showed us all the rejected images. If those
         | images (like the snail harp) were the FIRST pass of the release
         | candidate model.... wow... but how much curating did they do?
         | 
         | EDIT: Units. Derp.
        
           | ajnin wrote:
           | > in the tens of megawatts
           | 
           | But for how long? 1 second, 1 hour, 1 month? The energy
           | matters more than the power.
        
           | nullc wrote:
           | > tens of mega-joule range
           | 
           | do you mean tera-joules?
           | 
           | A hundred megajoules is about three bucks at 10 cents per
           | kwh.
           | 
           | I routinely do giga-joule level computations using just a
           | rack of computers in my garage, they're no big deal.
        
             | eutectic wrote:
             | metawatts?
        
               | nullc wrote:
               | Joules is a unit of total computation, watts is a unit of
               | computation rate. :P
               | 
               | Metawatt is a unit for rates of speculation, uninformed
               | by multiplication, about AI energy usage.
        
               | eutectic wrote:
               | Oops, *mega, obviously! But I did mean watts.
        
               | StavrosK wrote:
               | MWh is what you'd want.
        
               | eutectic wrote:
               | I was thinking power consumption. Probably a mis-
               | correction anyway.
        
             | [deleted]
        
         | jackric wrote:
         | > The entire human body is built from 700MB of data in the DNA.
         | 
         | I think this notion is misleading. It doesn't relate to the
         | ease of simulation it on our current computers. You'll need a
         | quantum computer to emulate this ROM in anything like realtime.
         | 
         | The DNA program was optimised to execute in an environment
         | offering quantum tunnelling, multi component chemical reactions
         | etc.
        
           | nojvek wrote:
           | My point wasn't to emulate the DNA in a virtual machine. The
           | point was that between humans and chimps (our closest DNA
           | relatives) we're 99% same to them. So high order intelligence
           | that gives rise to written language and tool building is
           | somewhere in that 700MB of DNA code. And 1% of that (just
           | 7MB) is responsible for creating the smartest intelligence we
           | know (humans).
           | 
           | In that sense intelligence artchitecture isn't very
           | complicated. The uniformity of isocortex which we have the
           | most relative to our brain size compared to any animal says
           | we ought to replicate its behavior in a machine.
           | 
           | The isocortex/neocortex is where the gold is. It's very
           | uniform when seen under microscopes. Brain cells from one
           | region can be put in another region and they work just fine.
           | All of ^ says intelligence is some recursive architecture of
           | information processing. That's why I'm optimistic we'll crack
           | it.
        
             | ZeikJT wrote:
             | I think what the parent was saying is important though,
             | that 700mb of "data" isn't complete. It's basically just
             | really really good compression that requires the runtime of
             | our universe to work properly. The way proteins form and
             | interact, the way physics works, etc are all requirements
             | for that DNA to be able to realize itself as complex
             | thinking human beings.
        
         | thom wrote:
         | Do you think we'll crack the principles, or do you think we'll
         | just have very powerful models without really knowing what
         | makes them clever?
        
           | nojvek wrote:
           | My bet is understanding the fundamental principles. Like
           | building an airbus plane or starship requires fundamental
           | understanding of aerodynamic principles, chemistry, materials
           | and physics.
           | 
           | DNNs will definitely not get us there in their current form.
        
             | WitCanStain wrote:
             | I am very curious to see if concepts from cognitive science
             | and theoretical linguistics (like the Language of Thought
             | paradigm as a framework of cognition or the Merge function
             | as a fundamental cognitive operation) will be applied to
             | machine learning. They seem to be some of the best
             | candidates for the fundamental principles of cognition.
        
         | pontus wrote:
         | Maybe a nitpick, but there's a difference between energy
         | consumption during training and inference. If you want to talk
         | about the energy necessary to train a human brain, it involves
         | years of that 20W power consumption. What is the power
         | consumption for inference time for Dall-E?
        
           | littlestymaar wrote:
           | I did some kick napkin math for Lee Sedol vs AlphaGo a few
           | weeks ago: https://news.ycombinator.com/item?id=25493358
        
             | nojvek wrote:
             | Wow. Good to know that entire lifetime energy consumption
             | is 50 MWh. At $0.1/KWh, you're looking at $5000 in
             | equivalent electric energy consumption over entire lifetime
             | of a human being.
             | 
             | The brain uses 20W of power. For a life time of ~80 years,
             | that is 14MWh of energy usage. Suppose we say the brain
             | trains for the first 25 years then that is 4.38 MWh.
             | Equivalent electric energy consumption is only at $438.
             | 
             | So yeah, the brain is quite efficient both in hardware and
             | software.
        
               | pontus wrote:
               | That's still really surprising I think. I would have
               | imagined that it would have been off by many orders of
               | magnitude. The fact that these models are within a factor
               | of 3-10 of the human consumption is pretty impressive.
               | 
               | That being said, these models are training only for very
               | specific tasks whereas obviously the human brain is far
               | more sophisticated in terms of its capabilities.
        
               | littlestymaar wrote:
               | Bear in mind that only a tiny fraction of the energy
               | spent by Sedol's brain during his whole life was
               | dedicated to learning Go. Even while playing Go, a human
               | neural system spends a big part of its energy doing
               | _mundane_ stuff like moving your hand, standing up,
               | deciphering the inputs coming from the eyes and every
               | other sensitive body parts and subconsciously processing
               | it (my back hurts I need to change posture, my opponent
               | smells well, the light in the room is too bright, etc.).
               | Interestingly enough, doing most of these things is also
               | a big challenge for IA today.
        
           | burrows wrote:
           | Not to mention the billions of years spent in the
           | evolutionary pipeline.
        
             | littlestymaar wrote:
             | Well, if you want to go that route, you'd need to count all
             | the energy spent to build computers of all kind since the
             | 50s, and also all the energy spent to sustain the lives of
             | people working on AI. And, well, all the millions of years
             | spent in the evolutionary pipeline before these people
             | where born ;)
        
         | martamorena9 wrote:
         | > I absolutely believe we'll crack the fundamental principles
         | of intelligence in our lifetimes.
         | 
         | I tend to agree. However this looks a lot like the beginning of
         | the end for the human race as well. Perhaps we are really just
         | a statistical approximation device.
        
           | nojvek wrote:
           | Yeah why i mentioned the great filter of Fermi paradox.
           | 
           | I also believe humans in our current species form won't
           | become a space bearing species. We're pretty awful as space
           | travelers.
           | 
           | It is very likely that we'll have robots with human like
           | intelligence, sensor and motor capabilities sent as probes to
           | other planets to explore and carry on the human story.
           | 
           | But future is hard to predict. I do know that if the
           | intelligence algorithm is only in the hands of Google and
           | Facebook, we are doomed. This is a thing that ought to be
           | open source and equally beneficial to everyone.
        
       | dfischer wrote:
       | What if reality is basically a slightly more advanced form of
       | This? Thoughts are generative finite infinite potentials and
       | reality is the render. Interesting.
        
         | drdeca wrote:
         | I don't know what you mean by "finite infinite potentials". It
         | seems a little word-salad-y?
        
         | dfischer wrote:
         | Seriously downvotes? Lol. Terrible configuration and what a way
         | to silence. At this point I'll take my thoughts elsewhere.
         | Enjoy your echo chamber.
        
       | sircastor wrote:
       | In various Episodes of Star Trek The Next Geneneration, the crew
       | asks the computer to generate some environment or object with
       | relatively little description. It's a story telling tool of
       | course, but looking at this, I can begin to imagine how we might
       | get there from here.
        
         | dfischer wrote:
         | Almost as if thoughts and reality are of the same thing.
        
       | inferense wrote:
       | In spite of the close architectural resemblance with the VQVAE2,
       | it definitely pushes the text-to-image synthesis domain forward.
       | I'd be curious to see how well it can perform on a multi-object
       | image setting which currently presents the main challenge in the
       | field. Also, I wouldn't be surprised if these results were
       | limited to openAI scale of computing resources. All in all, great
       | progress in the field. The phase of development here is simply
       | staggering, considering the fact that few years back we could
       | hardly generate any image in high fidelity.
        
       | minimaxir wrote:
       | The way this model operates is the equivalent of machine learning
       | shitposting.
       | 
       | Broke: Use a text encoder to feed text data to an image
       | generator, like a GAN.
       | 
       | Woke: Use a text and image encoder _as the same input_ to decode
       | text and images _as the same output_
       | 
       | And yet, due to the magic of Transformers, it works.
       | 
       | From the technical description, this seems feasible to clone
       | given a sufficiently robust dataset of images, although the scope
       | of the demo output implies a much more robust dataset than the
       | ones Microsoft has offered publicly.
        
         | dfischer wrote:
         | Shows you where the role of a meme and a shit poster may exist
         | in a cosmological technological hierarchy. Humans are just
         | rendering notes replicating memes, man. /s in the dude voice
         | from big Lebowski.
        
         | thunderbird120 wrote:
         | It's not really surprising given what we now know about
         | autoregressive modeling with transformers. It's essentially a
         | game of predict hidden information given visible information.
         | As long as the relationship between the visible and hidden
         | information is non-random you can train the model to understand
         | an amazing amount about the world by literally just predicting
         | the next token in a sequence given all the previous ones.
         | 
         | I'm curious if they do a backward pass here, would probably
         | have value. They seem to describe sticking the text tokens
         | first meaning that once you start generating image tokens all
         | the text tokens are visible. That would have the model learning
         | to generate an image with respect to a prompt but you could
         | also literally just reverse the order of the sequence to have
         | the model also learn to generate prompts with respect to the
         | image. It's not clear if this is happening.
        
           | lukeplato wrote:
           | Is this kind of happening with the CLIP classifier [1] to
           | rank the generated images?
           | 
           | > Similar to the rejection sampling used in VQVAE-2, we use
           | CLIP to rerank the top 32 of 512 samples for each caption in
           | all of the interactive visuals. This procedure can also be
           | seen as a kind of language-guided search16, and can have a
           | dramatic impact on sample quality.
           | 
           | > CLIP pre-trains an image encoder and a text encoder to
           | predict which images were paired with which texts in our
           | dataset. We then use this behavior to turn CLIP into a zero-
           | shot classifier. We convert all of a dataset's classes into
           | captions such as "a photo of a dog" and predict the class of
           | the caption CLIP estimates best pairs with a given image.
           | 
           | [1] https://openai.com/blog/clip/
        
           | minimaxir wrote:
           | That approach wouldn't work out of the box; it sees text for
           | the first 256 tokens and images for the following 1024
           | tokens, and tries to predict the same. It likely would not
           | have much to go on if you gave it the 1024 tokens for the
           | image and then 256 for the text later since it doesn't have
           | much of a basis.
           | 
           | A network optimizing for both use cases (e.g. the training
           | set is half 256 + 1024, half 1024 + 256) would _likely_ be
           | worse than a model optimizing for one of the use cases, but
           | then again models like T5 argue against it.
        
         | thomasahle wrote:
         | It's actually a bit more complicated. Since DALL-E uses CLIP
         | for training, and CLIP is itself trained using separate text
         | and image encoders: https://openai.com/blog/clip/
         | 
         | At some point we'll have so many models based on so many other
         | models it will longer longer be possible to tell which
         | techniques are really involved.
        
         | [deleted]
        
         | nabla9 wrote:
         | Shitposts are more creative. What I would like to see is more
         | extrapolation and complex mixing:
         | 
         | "A photo of a iPhone from the stone age."
         | 
         | "Adolf Hitler pissing against the wind and enjoying it."
         | 
         | "Painting: Captain Jean-Luc Picard crossing of the Delaware
         | River in a Porsche 911".
        
           | ricardobeat wrote:
           | You can get "a computer from the 1900s" in the examples in
           | the post.
        
           | numpad0 wrote:
           | Repeatable, measurable, automated image meme shitposting is
           | absolutely a destructive device though.
        
           | throwaway2245 wrote:
           | You're describing a Jim'll Paint It AI bot
           | https://jimllpaintit.tumblr.com/
        
       | Tycho wrote:
       | Recently heard a resident machine learning expert describe GPT-3
       | as 'not revolutionary in the slightest' or something like that.
        
         | minimaxir wrote:
         | It's not revolutionary, just a typical-but-notable iterative
         | step in NLP. Which is fine!
         | 
         | I wrote a blog post on that a few months ago after playing a
         | bit with GPT-3, and it holds up.
         | https://news.ycombinator.com/item?id=23891226
        
         | jokethrowaway wrote:
         | It's not, but it showed that we can get a magnitude better
         | results by adding a magnitude more data.
         | 
         | To be honest, it's not where I'd like to see efforts in the
         | field go.
         | 
         | Not because I'm afraid of AI taking over, but because I'd
         | rather have humans recreate something comparable to a human
         | brain (functionality wise).
        
           | visarga wrote:
           | Who knows, maybe in a few years you will be amazed at the new
           | universal transformer chip that runs on 20W of power and can
           | do almost any task. No need for retraining, just speak to it,
           | show it what you need. Even prompt engineering has been
           | automated (https://arxiv.org/abs/2101.00121) so no more
           | mystery. So much for the new hot job of GPT prompt engineer
           | that would replace software dev.
        
             | asbund wrote:
             | I was skeptical before, but now i open to this idea
        
             | jokethrowaway wrote:
             | we've been making these since the beginning of time, we
             | call them humans
        
               | commonturtle wrote:
               | Human's want health insurance and 40 hours work-weeks.
               | The super-smart AGI that will exist 20 years from now
               | won't.
        
         | visarga wrote:
         | It's revolutionary in costs, and delivers for every dollar
         | spent.
        
         | FL33TW00D wrote:
         | I think that they're correct saying that GPT-3 isn't
         | revolutionary, since it just demonstrates the power of scaling.
         | However I would argue that the underlying architecture, the
         | Transformer (GP(T)), is/was/will be revolutionary.
        
       | imhoguy wrote:
       | I just got creepy thought what genetic engineering "GEN-E" could
       | bring in a couple of decades :(
       | 
       | IN: "give me living giraffe turtle"
       | 
       | OUT: a few weeks later himera crawls out of the AI lab box
        
       | wwarner wrote:
       | Similar to Wordseye https://www.wordseye.com/
        
         | thepace wrote:
         | Wordeye seems to be about scene generation out of pre-existing
         | building blocks where as DALL-E is about creating those
         | building blocks themselves.
        
       | deeplstm wrote:
       | This is super impressive!! Those generated images are quite
       | accurate and realistic. Here are some of my thoughts and
       | explanation about how they do use discrete vocabulary to describe
       | an image. https://youtu.be/UfAE-1vdj_E
        
       | visarga wrote:
       | I'm wondering why the image comes out non-blocky because
       | transformers would take slices of the image as input. They say
       | they have about 1024 tokens for the image and that would mean
       | 32x32 patches. How is it possible that these patches align along
       | the edges so well and not have JPEG like artifacts?
        
         | minimaxir wrote:
         | If you read footnote #2, the source images are 256x256 but
         | downsampled using a VAE, and presumably upsampled using a VAE
         | for publishing (IIRC they are less prone to the infamous GAN
         | artifacts).
        
       | hnthrowopen wrote:
       | Is there a link to the git repo or is OpenAI not really open?
        
         | jokethrowaway wrote:
         | the only open thing is the name
        
         | wccrawford wrote:
         | I suspect you meant for Dall-E specifically, but this is their
         | repo. Found on their about page.
         | 
         | https://github.com/openai/
        
       | CyberRabbi wrote:
       | Seems like we're getting closer to AI driven software
       | engineering.
       | 
       | Prompt: a Windows GUI executable that implements a scientific
       | calculator.
        
         | jle17 wrote:
         | There are some attempts to get AI Dungeon (GPT-2/3 based game)
         | to generate code. This scenario for example (you need to create
         | an account to launch it): https://play.aidungeon.io/main/scenar
         | ioView?publicId=af4a05f....
        
         | hooande wrote:
         | What you'll get is the same thing as GPT-3: the equivalent of
         | googling the prompt. You can google "implement a scientific
         | calculator" and get multiple tutorials right now.
         | 
         | You'll still need humans to make anything novel or interesting,
         | and companies will still need to hire engineers to work on
         | valuable problems that are unique to their business.
         | 
         | All of these transformers are essentially trained on "what's
         | visible to google", which also defines the upper bound of their
         | utility
        
           | commonturtle wrote:
           | > You'll still need humans to make anything novel or
           | interesting, and companies will still need to hire engineers
           | to work on valuable problems that are unique to their
           | business.
           | 
           | Give it 10 years :) GPT-10 will probably be able to replace a
           | sizeable proportion of today's programmers. What will GPT-20
           | be able to do?
        
         | [deleted]
        
         | adamredwoods wrote:
         | Possibly, but in software the realm of errors is wider and more
         | detrimental. Imagery, the human mind will fill in the gaps and
         | allow interpretation. Software, not so much.
        
           | visarga wrote:
           | True, but the human mind needs an expensive, singleton body
           | in the real world, while a code writing GPT-3 only needs a
           | compiler and a CPU to run its creations. Of course they would
           | put a million cores to work at 'learning to code' so it would
           | go much faster. Compare that with robotics, where it's so
           | expensive to run your agents. I think this direction really
           | has a shot.
        
             | social_quotient wrote:
             | Would someone like GitHub be the right person to solve
             | this? They have ALL of the code.
        
               | ravi-delia wrote:
               | To be fair, anyone else has almost all the code, since
               | it's all public.
        
         | ricardobeat wrote:
         | Someone did this simply by giving GPT-3 some code samples last
         | year:
         | https://twitter.com/sharifshameem/status/1282676454690451457...
         | 
         | Strongly recommend watching the whole video!
        
         | dustinls wrote:
         | Prompt: One self-improving self-optimizing misanthropic quine
         | please
        
       | Triv888 wrote:
       | Without seeing the source, they could be mainly using Google
       | image search for that (or Yandex which is much better for image
       | search and reverse image search).
        
       | flanbiscuit wrote:
       | Just yesterday I was joking with my coworker that I would like a
       | tool where I could create meme images from a text description and
       | today I open HN and here we are. This looks amazing!
        
       | an4rchy wrote:
       | Very impressive results.
       | 
       | This seems like it could be a great replacement for
       | searching/creating your own stock photo/images.
       | 
       | Hopefully all output is copyright friendly.
        
       | unnouinceput wrote:
       | Does it allow refining the result on iterations? Meaning after
       | getting version one, apply more words to refine the image to a
       | closer description? Because if it does then this can become a
       | very good tool in getting a reliable picture from a witness when
       | asking to describe the suspect. Combine this with existing
       | China's massive surveillance face recognition and you can locate
       | the suspect (or the political dissident) as fast as you get your
       | witness in front of a laptop running this software.
       | 
       | It's a tool, and like any other existing tool it will be used for
       | both bad and good.
        
       | brian_herman wrote:
       | Now we just have to wait for huggingface to create an open source
       | implementation. So much for openness I guess if you go on
       | Microsoft azure you can use closed ai.
        
       | TravisLS wrote:
       | If I put text into this tool and generate an original and unique
       | image, who owns that image? If it's OpenAI, do they license it?
        
       | durpkingOP wrote:
       | i predict one day you can create animation/videos in the future
       | with a variation of this, then you could define
       | characters/scripts/etc and then insert a story and it generates a
       | movie.
        
       | PIKAL wrote:
       | When will people wake up and realize that ML and AI are out of
       | control?
        
       | natch wrote:
       | I tried a working demo of a system like this in Kristian
       | Hammond's lab at Northwestern University 20 years ago. Actually
       | his system was generating MTV style videos from web images and
       | music with just some keywords as input. He had some other amazing
       | (at the time) stuff. The GPT 3 part of this gives it a lot more
       | potential of course, so I don't want to take away from that. Just
       | saying though, since they didn't reference Hammond's work, that
       | this party has been going on for a while.
       | 
       | https://www.mccormick.northwestern.edu/research-faculty/dire...
        
       | ramsrigouthamg wrote:
       | OpenAI did with DALL.E what I envisioned to do with AiArtist :)
       | (https://www.aiartist.io/)
       | 
       | An AI to provide illustrations to your written content.
       | 
       | https://www.linkedin.com/in/ramsrig/
       | https://twitter.com/ramsri_goutham
        
       | fumblebee wrote:
       | Does anyone have any insight on how much it would cost for OpenAI
       | to host an online, interactive demo of a model like this? I'd
       | expect a lot - even just for inference - based on the size of the
       | model and the expected virality of the demo, but have no
       | reference points for quantifying.
        
         | littlestymaar wrote:
         | You can edit the prompt to play a bit with it. (the results are
         | far less good that what's featured in the blog post though ...)
        
       | nl wrote:
       | The "collection of glasses sitting on a table" example is
       | excellent.
       | 
       | Some pics are of drinking glasses and some are of eye glasses,
       | and one has both.
        
         | adamredwoods wrote:
         | I also like the telephones from different eras, including the
         | future.
        
       | [deleted]
        
       | letsburnthis wrote:
       | If decide to make one of those exact chairs in the shape of an
       | avocado. Can I be sued for copyright infringement?
        
         | renjimen wrote:
         | Depends on who is suing you: OpenAI for using their model
         | results, or the owner(s) of the data their model was trained
         | on? Either way, it's a grey zone that copyright law hasn't come
         | to grips with yet. See
         | https://scholarlykitchen.sspnet.org/2020/02/12/if-my-ai-wrot...
        
       | xdeals1209 wrote:
       | https://couponsxdeals.com/ has been doing this for a little while
       | now and I've been pretty happy with the performance,
        
       | TedDoesntTalk wrote:
       | > an illustration of a baby daikon radish in a tutu walking a dog
       | 
       | Wow!
        
       | aaron695 wrote:
       | The real power here is a Google destroyer.
       | 
       | These AI's can't yet produce things of value to humans but I
       | doubt Google's AI could know that.
       | 
       | Pump out billions of pages of text and pictures and it should
       | swamp Google.
        
       | dinkleberg wrote:
       | RIP to all the fiverr artists out there.
       | 
       | This is impressive.
        
         | victor9000 wrote:
         | Given ClosedAI's recent moves, I doubt the public will ever
         | have access to this. So I think those artists will be just
         | fine.
        
           | the8472 wrote:
           | You must be using a very short definition of "ever". These
           | kinds of works will get replicated if they're not published.
        
       | totetsu wrote:
       | This would make the process of creating elaborative encoded
       | visual mnemonics faster.
        
       | asbund wrote:
       | This is amazing
        
       | desideratum wrote:
       | Some truly impressive results. I'll pick my usual point here when
       | a fancy new (generative) model comes out, and I'm sure some of
       | the other commenters have alluded to this. The examples shown are
       | likely from a set of well-defined (read: lots of data, high bias)
       | input classes for the model. What would be really interesting is
       | how the model generalizes to /object concepts/ that have yet to
       | be seen, and which have abstract relationships to the examples it
       | has seen. Another commenter here mentioned "red square on green
       | square" working, but "large cube on small cube", not working.
       | Humans are able to infer and understand such abstract concepts
       | with very few examples, and this is something AI isn't as close
       | to as it might seem.
        
         | adsche wrote:
         | Yes, I don't really see impressive language (i.e. GPT3) results
         | here? It seems to morph the images of the nouns in the prompt
         | in an aesthetically-pleasing and almost artifact-free way (very
         | cool!).
         | 
         | But it does not seem 'understand' anything like some other
         | commenters have said. Try '4 glasses on a table' and you will
         | rarely see 4 glasses, even though that is a very well-defined
         | input. I would be more impressed about the language model if it
         | had a working prompt like: "A teapot that does _not_ look like
         | the image prompt. "
         | 
         | I think some of these examples trigger some kind of bias, where
         | we think: "Oh wow, that armchair _does_ look like an avocado! "
         | - But morphing an armchair and an avocado will almost always
         | look like both because they have similar shapes. And it does
         | not 'understand' what you called 'object concepts', otherwise
         | it should not produce armchairs where you clearly cannot sit in
         | due to the avocado stone (or stem in the flower-related
         | 'armchairs').
        
           | ralfd wrote:
           | > I would be slightly more impressed about the language model
           | if it had a working prompt like: "A teapot that does not look
           | like the image prompt."
           | 
           |  _Slightly?_ Jesus, you guys are hard to please.
        
             | adsche wrote:
             | Right, that was unnecessary and I edited it out.
             | 
             | What I meant is that 'not' is in principal an easy keyword
             | to implement 'conservatively'. But yes, having this in a
             | language model has proven to be very hard.
             | 
             | Edit: Can I ask, what do you find impressive about the
             | language model?
        
               | dash2 wrote:
               | Perhaps the rest of the world is less blase - rightly or
               | wrongly. I do get reminded of this:
               | https://www.youtube.com/watch?v=oTcAWN5R5-I when I read
               | some comments. I mean... we are telling the computer
               | "draw me a picture of XXX" and it's actually doing it. To
               | me that's utterly incredible.
        
               | adsche wrote:
               | > "draw me a picture of XXX" and it's actually doing it.
               | To me that's utterly incredible.
               | 
               | Sure, would be, but this is not happening here.
               | 
               | And yes, rest assured, the rest of the world is probably
               | less 'blase' than I am :) Very evident by the hype around
               | GPT3.
        
           | viggity wrote:
           | I'm in the open ai beta for GPT-3, and I don't see how to
           | play with DALL-E. Did you actually try "4 glasses on a
           | table"? If so, how? Is there a separate beta? Do you work for
           | open ai?
        
             | nicholast wrote:
             | In the demonstrations click on the underlined keywords and
             | you can select alternates from dropdown menu.
        
         | spyder wrote:
         | Yea, with these kind of generative examples, they should always
         | include the closest matches from the training set to see how
         | much it just "copied".
        
         | [deleted]
        
         | jonesn11 wrote:
         | This is a spot on point. My prediction is that it wouldn't be
         | able to. Given its difficulty to generate correct counts of
         | glasses, it seems as though it still struggles with systematic
         | generalization and compositionality. As a point of reference,
         | cherrypicking aside, it could model obscure but probably well-
         | defined baby daikon radish in tutu walking dog, but couldn't
         | model red on green on blue cubes. Maybe more sequential
         | perception, action, video data or system-2 like paradigm, but
         | it remains to be seen.
        
         | sendtown_expwy wrote:
         | It seems unlikely the model has seen "baby daikon radishes in
         | tutus walking dogs," or cubes made out of porcupine textures,
         | or any other number of examples the post gives.
        
           | Alex3917 wrote:
           | If you type in different plants and animals into GIS, you
           | don't even get the right species half the time. If GPT-3 has
           | solved this problem, that would be substantially more
           | impressive than drawing the images.
        
             | agravier wrote:
             | What is GIS? I only know Geographical Information System.
        
               | the8472 wrote:
               | probably Google Image Search
        
           | m3at wrote:
           | It might not have seen that specific combination, but finding
           | an anthropomorphized radish sure is easier than I thought:
           | type "Da Gen anime" in your search engine and you'll find
           | plenty of results
        
             | ronsor wrote:
             | At least for certain types of art, sites such as pixiv and
             | danbooru are useful for training ML models: all the images
             | on them are tagged and classified already.
        
             | numpad0 wrote:
             | Image search "Da Gen  Ni Ren Hua " do return similar
             | results to the AI-generated pictures, e.g. 3rd from top[0]
             | in my environment, but sparse. "Da Gen anime" in text
             | search actually gives me results about an old hobbyist
             | anime production group[1], some TV anime[2] with the word
             | in title...hmm
             | 
             | Then I found these[3][4] in Videos tab. Apparently there's
             | a 10-20 year old manga/merch/anime franchise of walking and
             | talking daikon radish characters.
             | 
             | So the daikon part is already figured in the dataset. The
             | AI picked up the prior art and combined it with the dog
             | part, which is still tremendous but maybe not "figuring out
             | the daikon walking part on its own" tremendous.
             | 
             | (btw anyone knows how best to refer to _anime_ art style in
             | Japanese? It's a bit of mystery to me)
             | 
             | 0: https://images.app.goo.gl/LPwveUJPWHr6oK8Y8
             | 
             | 1: https://ja.wikipedia.org/wiki/DAICON_FILM
             | 
             | 2: https://ja.wikipedia.org/wiki/%E7%B7%B4%E9%A6%AC%E5%A4%A
             | 7%E6...
             | 
             | 3: https://youtube.com/watch?v=J1vvut5DvSY
             | 
             | 4: https://youtu.be/1Gzu2lJuVDQ?t=42
        
               | tkgally wrote:
               | > anyone knows how best to refer to anime art style in
               | Japanese?
               | 
               | The term _mangachikku_ (Man Hua chitsuku, mangachitsuku,
               | "manga-tic") is sometimes used to refer to the art style
               | typical of manga and anime; it can also refer to
               | exaggerated, caricatured depictions in general. Perhaps
               | _anime fu irasuto_ (animeFeng irasuto, anime-style
               | illustration), while a less colorful expression, would be
               | closer to what you 're looking for.
        
       | blueblisters wrote:
       | I can't find a source for the dataset but going by the hints
       | peppered throughout the article, they likely used <img> `alt`
       | tags for supervision? Fascinating that an accessibility tool is
       | being repurposed to train machine learning models.
        
       | ignoranceprior wrote:
       | Does this address NLP skeptics' concerns that Transformer models
       | don't "understand" language?
       | 
       | If the AI can actually draw an image of a green block on a red
       | block, and vice versa, then it clearly understands something
       | about the concepts "red", "green", "block", and "on".
        
         | karmasimida wrote:
         | I think it is safe to say that learning a joint distribution of
         | vision + language, is fully possible at this stage,
         | demonstrating by this work.
         | 
         | But 'understanding' itself needs to be further specified, in
         | order to be tested even.
         | 
         | What strikes me most is the fidelity of those generated images,
         | matching the SOTA from GAN literature with much more variety,
         | without using the GAN objective.
         | 
         | It seems Transformer model might be the best neural construct
         | we have right now, to learn any distribution, assuming more
         | than enough data.
        
         | tikwidd wrote:
         | if(adj == 'red') drawBlock(RED)
         | 
         | According to your definition of understanding, this program
         | understands something about the concept RED. But the code is
         | just dealing with arbitrary values in memory (e.g. RED =
         | 0xFF0000)
        
         | TigeriusKirk wrote:
         | There are examples on twitter showing it doesn't really
         | understand spatial relations very well. Stuff like "red block
         | on top of blue block on top of green block" will generate red,
         | green, and blue blocks, but not in the desired order.
         | 
         | https://twitter.com/peabody124/status/1346565268538089483
        
         | [deleted]
        
         | tralarpa wrote:
         | Try a large block on a small block. As the authors also have
         | noted in their comments the success rate is nearly zero. One
         | may wonder why. Maybe because that's something you see rarely
         | in photos? At the end, it doesn't "understand" the meaning of
         | the words.
        
         | bra-ket wrote:
         | nah, it's still big and dumb model with no idea what it's
         | doing, deepfake 2.0.
         | 
         | It looks like a variation on plain old image search engine,
         | unreliable at that, as compared to exact matching.
         | 
         | But it has obvious application in design as it can create these
         | interesting combinations of objects & styles. And I loved the
         | snail-harp.
        
         | dcolkitt wrote:
         | The root-case of skepticism has always been that while
         | Transformers do exceptionally well on finite-sized tasks, they
         | lack any fully recursive understanding of the concepts.[0]
         | 
         | A human can learn basic arithmetic, then generalize those
         | principles to bigger number arithmetic, then go from there to
         | algebra, then calculus, then so. Successively building on
         | previously learned concepts in a fully recursive manner.
         | Transformers are limited by the exponential size of their
         | network. So GPT-3 does very well with 2-digit addition and okay
         | with 2-digit multiplication, but can't abstract to 6-digit
         | arithmetic.
         | 
         | DALL-E is an incredible achievement, but doesn't really do
         | anything to change this fact. GPT-3 can have an excellent
         | understanding of a finite sized concept space, yet it's still
         | architecturally limited at building recursive abstractions. So
         | maybe it can understand "green block on a red block". But try
         | to give it something like "a 32x16 checkerboard of green and
         | red blocks surrounded by a gold border frame studded with blue
         | triangles". I guarantee the architecture can't get that exactly
         | correct.
         | 
         | The point is that, in some sense, GPT-3 is a technical dead-
         | end. We've had to exponentially scale up the size of the
         | network (12B parameters) to make the same complexity gains that
         | humans make with linear training. The fact that we've managed
         | to push it this far is an incredible technical achievement, but
         | it's pretty clear that we're still missing something
         | fundamental.
         | 
         | [0] https://arxiv.org/pdf/1906.06755.pdf
        
           | Veedrac wrote:
           | > So GPT-3 does very well with 2-digit addition and okay with
           | 2-digit multiplication, but can't abstract to 6-digit
           | arithmetic.
           | 
           | This is false, GPT-3 can do 10-digit addition with ~60%
           | accuracy, with comma separators. Without BPEs it would
           | doubtlessly manage much better.
        
             | dcolkitt wrote:
             | The accuracy largely comes from the fact that addition
             | rarely requires carrying more than a single digit. So it's
             | easy to pattern match from single digit problems that it
             | was previously trained on.
             | 
             | With multiplication, which requires much more extensive
             | cross-column interaction, accuracy falls off a cliff with
             | anything more than a few digits.
        
           | codetrotter wrote:
           | > So GPT-3 does very well with 2-digit addition and okay with
           | 2-digit multiplication, but can't abstract to 6-digit
           | arithmetic.
           | 
           | That sounds disappointing but what if instead of trying to
           | teach it to do addition one would teach it to write source
           | code for making addition and other maths operations instead?
           | 
           | Then you can ask it to solve a problem but instead of it
           | giving you the answer it would give you source code for
           | finding the answer.
           | 
           | So for example you ask it "what is the square root of five?"
           | then it responds:                   fn main ()         {
           | println!("{}", 5f64.sqrt());         }
        
       | camdenlock wrote:
       | Amazing. Would love to play with this.
       | 
       | Is OpenAI going to offer this as a closed paywalled service? Once
       | again wondering how the "open" comes into play.
        
         | Jerry2 wrote:
         | After their new CEO came in, former president of YC, they
         | closed off everything and took a lot of investment. Only thing
         | that's open about them is the name.
        
         | [deleted]
        
       | kbos87 wrote:
       | First: This strikes me as truly amazing - but my mind immediately
       | goes to the economic impact of something like this. Personally I
       | try not to be an alarmist about the potential for jobs to be
       | automated away, but how strikingly good this is makes me wonder
       | if we just haven't seen AI that is good enough to automate away
       | large parts of the workforce.
       | 
       | Seeing the "lovestruck cup of boba" reminded me of an
       | illustration a friend of mine did for a startup a few years back.
       | It would be a lot easier and less time consuming for someone to
       | simply request such an image from an AI assistant. If I were a
       | graphic artist or photographer, this would scare me.
       | 
       | I don't know what the right answer is here. I have little to no
       | faith in regulators to help society deal with the sweeping
       | negative effects even one new AI-based product looks like it
       | could have on a large swath of the economy. Short of regulation
       | and social safety nets, I wonder if society will eventually step
       | up and hold founders and companies accountable when they cause
       | broad negative economic impacts for their own enrichment.
        
         | cflyingdutchman wrote:
         | I believe founders/companies can and do "cause broad negative
         | economic impacts for their own enrichment", but creating a
         | lower-cost path to the same good/result is a good thing
         | fundamentally. Yes, this can cause greater income/life-
         | experience inequality, and we should adjust for that, but in
         | ways that do not punish innovation. In short, we should
         | optimize for human happiness by better sharing the wealth
         | rather than by limiting it.
        
           | jmmcd wrote:
           | One perspective is: anything that can be automated (thus
           | lowered in cost) should be. For drudge-work, of course that's
           | good. For some examples, showing that it can be automated
           | shows that it IS drudge-work. But replacing a creative
           | illustrator? That is not drudge-work, it is a fulfilling and
           | enjoyable profession. I don't think it's clear that changing
           | it to become a hobby (because it's no longer viable as a
           | profession) is "a good thing fundamentally". I would need to
           | hear further arguments on this.
        
             | cflyingdutchman wrote:
             | This very quickly gets into "what's the point of it all?"
             | and I'll admit that I don't have the answer. :)
        
       | bryanrasmussen wrote:
       | I wonder what it makes out of green ideas sleep furiously.
        
       | Marinlemaignan wrote:
       | i want to see it go into an infinite loop with an "image
       | recognition software" (one where you feed an image and you get a
       | written description of if)
        
         | vnjxk wrote:
         | I believe it will end up stabilazing on one image or a sequence
         | of images whose text return themselves
        
       | steventey wrote:
       | That name though -
       | 
       | DALL*E = Dali + WALL*E
       | 
       | Freaking brilliant.
       | 
       | Was that generated by an AI as well?
       | 
       | I'm actually building a name generator that is as intelligent and
       | creative as that for my senior year thesis (and also for
       | https://www.oneword.domains/)
       | 
       | I already have an MVP that I'm testing out locally but I'd
       | appreciate any ideas on how to make it as smart as possible!
        
       | TecoAndJix wrote:
       | I learned the word portmanteau from this!
        
       | ArtWomb wrote:
       | "Teapot in the shape of brain coral" yields the opposite. The
       | topology is teapot-esque. The texture composed of coral-like
       | appendages. Sorry if this is overly semantic, I just happen to be
       | in a deep dive in Shape Analysis at the moment ;)
       | 
       | >>> DALL*E appears to relate the shape of a half avocado to the
       | back of the chair, and the pit of the avocado to the cushion.
       | 
       | That could be human bias recognizing features the generator
       | yields implicitly. Most of the images appear as "masking" or
       | "decal" operations. Rather than a full style transfer. In other
       | words the expected outcome of "soap dispenser in the shape of
       | hibiscus" would resemble a true hybridized design. Like an haute
       | couture bottle of eau du toilette made to resemble rose petals.
       | 
       | The name DALL-E is terrific though!
        
         | dj_mc_merlin wrote:
         | I find it's ability to give different interpretations of the
         | same thing amazing. This kind of fuzziness is also present in
         | human art.
         | 
         | Another good example is the "collection of glasses" on the
         | table. It makes both glassware and eyeglasses!
        
       | [deleted]
        
       | [deleted]
        
       | dane-pgp wrote:
       | > a living room with two white armchairs and a painting of the
       | colosseum. the painting is mounted above a modern fireplace.
       | 
       | With the ability to construct complex 3D scenes, surely the next
       | step would be for it to ingest YouTube videos or TV/movies and be
       | able to render entire scenes based on a written narration and
       | dialogue.
       | 
       | The results would likely be uncanny or absurd without careful
       | human editorial control, but it could lead to some interesting
       | short films, or fan-recreations of existing films.
        
         | dfischer wrote:
         | How do we know this isn't already happening with state actors?
        
           | IfOnlyYouKnew wrote:
           | Because state actors are all busy acting smart on the
           | internet by using terms such as "state actors".
        
         | ilaksh wrote:
         | I think in a way that's the next step but they may have to wait
         | a little bit before they have the processing power.
         | 
         | If you are talking about 24 frames per second, then
         | theoretically one second of video could require 24 times as
         | much processing power. And 100 seconds 2400 X. Obviously that's
         | just a random guess but surely it is much more than for
         | individual images.
         | 
         | But I'm sure we'll get there.
        
         | alpaca128 wrote:
         | I'd love to see what this does with item/person/artwork/monster
         | descriptions from Dwarf Fortress. Considering the game has
         | creatures like were-zebras, beasts in random shapes and
         | materials, undead hair, procedurally generated instruments and
         | all kinds of artefacts menacing with spikes I imagine it could
         | make the whole thing even more entertaining.
        
       | jandrese wrote:
       | How long until the Rule 34 perverts get their hands on this and
       | start inputting stuff like "Bobby Fischer Minotaur fucking a lime
       | green Toyota Echo"?
       | 
       | The shipping community will go apeshit if this thing works as
       | advertised.
        
         | speedgoose wrote:
         | I remember looking at generated porn pictures with an old
         | model, not taking text inputs, and some pictures were very
         | disturbing because the bodies were impossible or very not
         | healthy.
         | 
         | There is a reason that the examples are cartoons animals or
         | objects. It's not disturbing that the pig teapot is not
         | realistic, or that the dragon cat is missing a leg. This kind
         | of problem is very disturbing on realistic pictures of human
         | bodies.
         | 
         | Eventually it will get there, I guess you could make an AI to
         | filter the pictures to see which of them are disturbing or not.
        
           | jandrese wrote:
           | On the other hand, no matter how misshapen or deformed the
           | body comes out that will be someone's kink.
        
       | tikwidd wrote:
       | I wonder what happens when you ask it for an impossible object,
       | e.g. a square triangle?
        
       | kome wrote:
       | black magic
        
       | lacker wrote:
       | I wish this was available as a tool for people to use! It's neat
       | to see their list of pregenerated examples, but it would be more
       | interesting to be able to try things out. Personally, I get a
       | better sense of the powers and limitations of a technology when I
       | can brainstorm some functionality I might want, and then see how
       | close I can come to creating it. Perhaps at some point someone
       | will make an open source version.
        
         | [deleted]
        
         | industriousthou wrote:
         | I have no idea what kind of compute power something like this
         | relies on. Would this be able to run on a consumer desktop?
        
           | p1esk wrote:
           | Depends on how fast you want it to generate results, but yes,
           | it can run on a desktop provided there's enough RAM.
        
           | m3at wrote:
           | They note that the model has 12B parameters, which in terms
           | of order of magnitude make it sit right between gpt 2 and 3
           | (1.5 and 170 respectively). With some tricks, you can run gpt
           | 2 on good personal hardware, so this might be reachable as
           | well with the latest hardware.
           | 
           | EDIT: I'm assuming you mean for inference, for training it
           | would be an other kind of challenge and the answer would be a
           | clear no
        
           | ricardobeat wrote:
           | In the linked CLIP paper they say it is trained on 256 GPUs
           | for 2 weeks. No mention of the size of the trained output.
        
         | [deleted]
        
         | m3at wrote:
         | I wish so too! I don't expect them to release the code (they
         | rarely do) and they wield their usual "it might have societal
         | impact, let us decide what's good for the world":
         | 
         | > We recognize that work involving generative models has the
         | potential for significant, broad societal impacts
         | 
         | The community did raise up to the challenge of re-implementing
         | it (sometimes better) in the past, so I'm hopeful.
        
       | rkagerer wrote:
       | Anywhere this can be tried out interactively? I'd like to type
       | some phrases and see how it does.
        
         | keyle wrote:
         | Imagine the amount of body parts pictures on the server... /s
        
         | inakarmacoma wrote:
         | I came hoping for the same. While these are amazing, in
         | publication reading, one really needs to try it out. Would love
         | to get my hands dirty. Still wait-listed got gpt3 access, but
         | no hope in sight...
        
       | chishaku wrote:
       | ok how can we use this?
        
         | jaimex2 wrote:
         | we can't, it's just a post of what they created.
        
       | littlestymaar wrote:
       | The results highlighted in the blog post are incredible,
       | unfortunately, they are also cherry-picked: I've played with the
       | prompt a bit, and every result (not involving a drawing) was
       | disappointing...
       | 
       | I may have not been so disappointed if they had not highlighted
       | such incredible results in the first place. Managing expectations
       | is tough.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-01-06 23:02 UTC)