[HN Gopher] Dall-E 2
___________________________________________________________________
Dall-E 2
Author : yigitdemirag
Score : 1040 points
Date : 2022-04-06 14:09 UTC (8 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| Traster wrote:
| To be honest the Girl with a Pearl Earing "variations" look a
| little bit like a crime against art. It's like the person who
| built this has no idea why the Girl with a Pearl Earing is good
| art. "Here's the Girl with a Pearl Earing " - "OK, well here's
| some girls with turbans"
|
| Art is truth.
| bbbobbb wrote:
| To be honest it's hard for me to imagine alternate reality
| where the 'original' is not swapped with one of the
| 'variations' without same comment underneath. Why is the
| 'original' good art?
| sillysaurusx wrote:
| Maybe.
| https://cdn.openai.com/dall-e-2/demos/variations/modified/gi...
| was pretty impressive.
|
| I think the results are being poisoned by the fact that most
| old paintings have deteriorated colors, so the training data
| looks nothing like the originals. It's certainly a lot yellower
| than
| https://cdn.openai.com/dall-e-2/demos/variations/originals/g...
| eks391 wrote:
| > It's like the person who built this has no idea why the Girl
| with a Pearl Earing is good art.
|
| The people didn't program Dall E how to make art. They taught
| it to recognize patterns and create something by extrapolating
| from the patterns, all on its own. So the AI isn't a projection
| of what they think is good art, it's projecting what it thinks
| is good art, based on a prompt. The output is its best effort
| of a feeling, even if the feeling had to be inputted by a
| living person. So it's still art that's as good as the feeling
| that it came from-fleeting feelings being lower quality than
| those that required more time and thought
| billconan wrote:
| I'm curious, is this something feasible to train (and inference)
| on a consumer level machine, or this is something can only be
| done by institutes?
| marviel wrote:
| It's becoming clear that efficient work in the future will hinge
| upon one's ability to _accurately describe what one wants_.
| Unpacking that -- a large piece is the ability to understand all
| the possible "pitfalls" and "misunderstandings" that could
| happen on the way to a shared understanding.
|
| While technical work will always have a place -- I think that
| much creative work will become more like the _management_ of a
| team of highly-skilled, niche workers -- with all the
| frustrations, joys, and surprises that entails.
| killerstorm wrote:
| No... These models are trained to predict.
|
| You can definitely make them incremental. You can give it a
| task like "make a more accurate description from initial
| description and clarification". Even GPT-3-based models
| available today can do these tasks.
|
| Once this is properly productionized it would be possible to
| implement stuff just talking with a computer.
| [deleted]
| golergka wrote:
| > accurately describe what one wants
|
| Isn't that essentially what programming already is?
| armchairhacker wrote:
| Programming, art, music, is just "describing what you want" in
| a very specific way. This is describing what you want in a much
| more vague way.
|
| The upside it that it's more "intuitive" and requires much less
| detail and technique, as the AI infers the detail and
| technique. The downside is that it's really hard to know what
| the AI will generate or get it to generate something really
| specific.
|
| I believe the future will combine the heuristics of AI-
| generation with the specificity of traditional techniques. For
| example, artists may start with a rough outline of whatever
| they want to draw as a blob of colors (like in some AI image-
| generation papers). Then they can fill in details using AI
| prompts, but targeting localized regions/changes and adding
| constraints, shifting the image until it's almost exactly what
| they imagined in their head.
| falcor84 wrote:
| >We've limited the ability for DALL*E 2 to generate ... adult
| images.
|
| I think that using something like this for porn could potentially
| offer the biggest benefit to society. So much has been said about
| how this industry exploits young and vulnerable models. Cheap
| autogenerated images (and in the future videos) would pretty much
| remove the demand for human models and eliminate the related
| suffering, no?
|
| EDIT: typo
| sillysaurusx wrote:
| Depends whether you think models should be able to generate cp.
|
| It's almost impossible to even give an affirmative answer to
| that question without making yourself a target. And as much as
| I err on the side of creator freedom, I find myself shying away
| from saying yes without qualifications.
|
| And if you don't allow cp, then by definition you require some
| censoring. At that point it's just a matter of where you
| censor, not whether. OpenAI has gone as far as possible on the
| censorship, reducing the impact of the model to "something that
| can make people smile." But it's sort of hard to blame them, if
| they want to focus on making models rather than fighting
| political battles.
|
| One could imagine a cyberpunk future where seedy AI cp images
| are swapped in an AR universe, generated by models ran by
| underground hackers that scrounge together what resources they
| can to power the behemoth models that they stole via hacks.
| Probably worth a short story at least.
|
| You could make the argument that we have fine laws around porn
| right now, and that we should simply follow those. But it's not
| clear that AI generated imagery can be illegal at all. The
| question will only become more pressing with time, and society
| has to solve it before it can address the holistic concerns you
| point out.
|
| OpenAI ain't gonna fight that fight, so it's up to EleutherAI
| or someone else. But whoever fights it in the affirmative will
| probably be vilified, so it'd require an impressive level of
| selflessness.
| [deleted]
| chias wrote:
| Would this not necessarily require training it on a large
| body of real CSAM? Seems like it would be a non-starter.
| sillysaurusx wrote:
| Surprisingly no. It knows what a child looks like, and can
| infer what a naked child looks like from medical imagery.
|
| A child with adult body parts is a whole other class of
| weirdness that might pop out too.
|
| Models want to surprise us all.
| loufe wrote:
| There are so many excellent, thought-provoking comments in
| this thread, but yours caught me especially. Something that
| came to mind immediately upon reading the release was the
| potential for this technology to transform literature, adding
| AI generated imagery to turn any novel into a visual novel as
| a premium way to experience the story, something akin to
| composing D-Box seat response to a modern movie. I was
| imagining telling the cyberpunk future story you were
| elaborating, which is really compelling, in such a way and
| couldn't help but smile.
| sillysaurusx wrote:
| Please write it! I'd love to read one.
| aryamaan wrote:
| In the same theme, I liked the comments of both of you.
|
| Another use case could be to make it easier/ automatic to
| create comics. You tell what the background should be,
| characters should be doing and the dialogues. Boom, you
| have a good enough comic.
|
| -----------
|
| Reading as a medium has not evolved with technology.
| Creating the imagery does happen in humans' minds. It's not
| surprise that some people enjoy doing that (and also enjoy
| watching that imagery) and others do not.
|
| This could be a helping brain to create those imageries.
|
| -----------
|
| Now imagine, reading stories to your child. Actually,
| creating stories for your child. Where they are the
| characters in the stories. Having a visual element to it is
| definetly going to be a premium experience.
| GauntletWizard wrote:
| Religious people don't only believe that porn harms the models,
| but also the user. I happen to agree, despite being a porn user
| - Porn is a form of simulated and not-real stimulation. Porn is
| harmful to the user the same way that any form of delusion is:
| It associated positive pleasure with stimulation that does not
| fulfil any basic or even higher-level needs, and is
| unsustainable. Porn is somewhere on the same scale as
| wireheading[1]
|
| That doesn't mean that it's all bad, and that there's no
| recreational use for it. We have limits on the availability of
| various other artificial stimulants. We should continue to have
| limits on the availability of porn. Where to draw that line is
| a real debate.
|
| [1] https://en.wikipedia.org/wiki/Wirehead_(science_fiction)
| [deleted]
| Siira wrote:
| The problem might be that people are simply lying. Their real
| reasons are religious/ideological, but they cite humanitarian
| concerns (which their own religious stigma is partly
| responsible for).
| thom wrote:
| People take their experiences of porn into real relationships,
| so I do not think this removes suffering overall, no.
| AYBABTME wrote:
| Iain Banks' "Surface Detail" would like to have a word with
| you.
|
| This author's books are great at putting these sort of moral
| ideas to test in a sci-fi context. This specific tome portraits
| virtual wars and virtual "hells". The hope is of being more
| civilized than by waging real war or torturing real living
| entities. However some protagonists argue that virtual life is
| indistinguishable from real life, and so sacrificing virtual
| entities to save "real" ones is a fallacy.
|
| Or some such, it's been a while.
| cm2012 wrote:
| I suspect that if a free version of this comes out and allows
| adult image generation, 90% of what it will be used for is
| adult stuff (see the kerfuffle with AIDungeon).
|
| I can get why the people who worked hard on it and spent money
| building it don't want to be associated with porn.
| albertzeyer wrote:
| Some initial video by Yannic Kilcher:
| https://www.youtube.com/watch?v=gGPv_SYVDC8
| mario143 wrote:
| Yeah, I mean you're right that ultimately the proof is in the
| pudding.
|
| But I do think we could have guessed that this sort of approach
| would be better (at least at a high level - I'm not claiming I
| could have predicted all the technical details!). The previous
| approaches were sort of the best that people could do without
| access to the training data and resources - you had a pretrained
| CLIP encoder that could tell you how well a text caption and an
| image matched, and you had a pretrained image generator (GAN,
| diffusion model, whatever), and it was just a matter of trying to
| force the generator to output something that CLIP thought looked
| like the caption. You'd basically do gradient ascent to make the
| image look more and more and more like the text prompt (all the
| while trying to balance the need to still look like a realistic
| image). Just from an algorithm aesthetics perspective, it was
| very much a duct tape and chicken wire approach.
|
| The analogy I would give is if you gave a three-year-old some
| paints, and they made an image and showed it to you, and you had
| to say, "this looks like a little like a sunset" or "this looks a
| lot like a sunset". They would keep going back and adjusting
| their painting, and you'd keep giving feedback, and eventually
| you'd get something that looks like a sunset. But it'd be better,
| if you could manage it, to just teach the three-year-old how to
| paint, rather than have this brute force process.
|
| Obviously the real challenge here is "well how do you teach a
| three-year-old how to paint?" - and I think you're right that
| that question still has a lot of alchemy to it.
| EZ-Cheeze wrote:
| "Computer, render Bella and Gigi Hadid playing tennis in bikinis"
| KevinGlass wrote:
| Something about this makes me nauseous. Perhaps is the fact that
| soon the market value for creatives is going to fall to a hair
| about zero for all but the most famous. We will be all the poorer
| for it when 95% of images you see are AI generated. There will be
| niches of course but in a few short years it'll be over for a
| huge swathe of creative professionals who are already struggling.
|
| Some of the images also hit me with a creep factor, like the
| bears on the corgis in the art gallery, but that maybe only
| because I know it's AI generated.
| idleproc wrote:
| I imagine it will affect artists much the same way wordpress
| has affected web designers.
|
| Maybe everyone will have an AI image as their desktop
| wallpaper, but if you've got cash you'll want something with
| provenance and rarity to brag about.
|
| Also, I think creatives are valued for their imagination. If
| you wanted something decent, would you pay someone to sift
| through a million AI generated images to find a gem, or just
| pay an artist you like to create one for you?
| bufferoverflow wrote:
| > you'll want something with provenance and rarity to brag
| about.
|
| 1) That is a tiny share of the market. Most of the market is
| - I have a game / online publication / book, and I need an
| illustration xyz. Which this AI seems to solve.
|
| 2) how do you even prove your rare art wasn't painted by an
| AI?
| idleproc wrote:
| 1) Sure there's a lot of work for that kind of thing but
| creatives typically earn a pittance. I doubt an AI could
| meet your specific requirements without having to spend
| hours(?) tweaking it or sifting through countless
| variations for the 'one'.
|
| 2) Because we haven't built a machine that can paint (etc.)
| with traditional materials like a skilled artist?
| typon wrote:
| I paid $1500 for a commissioned painting from an artist I
| respect and follow as a birthday present for a friend. The
| painting meant something to me because I worked with the artist
| to have some input about what kind of a person my friend is,
| what kind of features I want to see in the painting and how I
| want it to feel. The artist gave me 5 different sketches and we
| had tons of back and forth. The process and the act of creating
| the painting on a canvas from someone I respect is what I paid
| for.
|
| Even if an AI could generate an exactly equivalent painting, I
| would pay $0 for it. It wouldn't mean anything to me.
| chpatrick wrote:
| Just wait until they figure out music.
| Applejinx wrote:
| Not exactly. All the ideas put forth in these demos are really
| arbitrary, with nothing whatsoever to say. Generating crap art
| becomes more and more effortless: we've seen this in music as
| well.
|
| Jumping out of the conceptual box to generate novel PURPOSE is
| not the domain of a Dall-E 2. You've still gotta ask it for
| things. It's a paintbrush. Without a coherent story, it's an
| increasingly impressive stunt (or a form of very sophisticated
| 'retouching brush').
|
| If you can imagine better than the next guy, Dall-E 2 is your
| new tool for expression. But what is 'better'?
| jupp0r wrote:
| This reminds me of an art class in high school in the early
| 2000s where I handed in a printout of a 3d generated image
| (painstakingly modeled and rendered in software over the
| whole weekend by me) and the teacher looked at me and told me
| that's not art because it's "computer generated" and I didn't
| "even use my hands" to make it. Even as a teenager, the idea
| that art is defined by how it's made versus it being a way
| for the artist to express intention in whatever way they seem
| fit seemed really reductionist and almost vulgar to me.
|
| Maybe lots artists of the future will actually use AI models
| to express their inner thoughts and desires in a way that
| touches something in their audience. It will still be art.
| throwaway71271 wrote:
| 'art' comes from 'artem' which means 'skill', which is the
| root of 'artificial' (https://www.etymonline.com/word/art
| and https://www.etymonline.com/word/artificial)
|
| your teacher was wrong
|
| i had a friend who didnt get credit for his design work
| because he used photoshop instead of using pen and paper
| for similar reason, i still find it amazing that a teacher
| would say such a thing
| andybak wrote:
| > 'art' comes from 'artem' which means 'skill', which is
| the root of 'artificial'
|
| His teacher was wrong but "argument from etymology" is
| surely a fallacy.
| amelius wrote:
| Can I opt-out from ever seeing AI generated images please?
| 323 wrote:
| The same thing was said when book printing was invented, that
| we would lose the fabulous scribes that manually duplicate
| books with a human touch, while replacing them with soulless
| mechanical machines.
|
| Or when synthesizers and computer music was invented, that they
| will displace talented musicians that know how to play an
| instrument and how now everybody without a musical education
| will be able to produce music, thus devaluing actual musicians.
| alcover wrote:
| > for all but the most famous
|
| OK DALL-E, generate our logo in the style of ${most famous}
| axg11 wrote:
| I really don't agree. When I work with a creative I'm not
| working with them because of their content generation skills.
| I'm working with them because of their taste and curation
| ability that results in the end product.
|
| The nature of creative work will certainly change, creatives
| will adopt tools such as Dall-E 2. In certain narrow cases they
| might be replaced, such as if you are asking a creative to
| generate a very specific image, but how often is that the case?
| The majority of the time tools such as Dall-E 2 will act as an
| accelerator for creatives and help them increase their output.
| lofatdairy wrote:
| Perhaps a more optimistic way of looking at it: When mass
| production became available to art, the idea of an "artwork"
| had to be abstracted from a unique piece (Walter Benjamin gives
| the example of a statue of Venus, which has value in its
| uniqueness) to the idea of art as the output of some process.
| Each piece has no claim to authenticity, and the very idea of
| an "original" would be antithetical to its production.
|
| I think art will survive, just like photography didn't kill the
| painting, the idea of art might simply begin to encompass this
| new mean of production, which no longer requires the steady
| hand, but still requires a discerning eye. Sure, we might say
| that the "artist" is simply a curator, picking which
| algorithmic output is most worthy of display, but these
| distinctions have historically been fluid, and challenging
| ideas of art has long been one of art's function as well
| dragonwriter wrote:
| > Perhaps is the fact that soon the market value for creatives
| is going to fall to a hair about zero for all but the most
| famous.
|
| But...that's always been the case for creatives.
| throwaway675309 wrote:
| Nonsense. This is merely a tool and helps lower the barrier of
| entry to be able to produce imagery.
|
| By the same logic you should also complain about any number of
| IDEs, development tools, WordPress, game maker systems like RPG
| maker or Unity, after all if anyone can just leverage a free
| physics and collision system without having a complete
| understanding of rigid body Newtonian systems to roll their own
| engine it'll be too uniform.
| TaupeRanger wrote:
| By "creatives" you seem to mean "people who drum up the
| equivalent of elevator music for ads and blogs". This will not
| remotely replace any working "creative" people that I know.
| pingeroo wrote:
| Except it will only get more powerful with time, probably at
| an accelerating pace. Everyone always downplays these
| legitimate fears about AI, pointing out how "it can't do X".
| They always forget to put the "yet" at the end of that
| sentence.
| [deleted]
| TaupeRanger wrote:
| The person I responded to literally made the claim that it
| would happen imminently...
| zitterbewegung wrote:
| I don't want to dismiss this new model and achievements but we
| are getting to the point where I feel like what we saw in the
| open source versus close source systems we see in new ml models
| another one is forming for open and closed models. I think that
| larger and larger models will have disclaimers either restricting
| you from using it commercially (a great deal of academics and
| NVIDIA models are doing this. And OpenAI just puts it behind an
| API with the rules :
|
| Curbing Misuse Our content policy does not allow users to
| generate violent, adult, or political content, among other
| categories. We won't generate images if our filters identify text
| prompts and image uploads that may violate our policies. We also
| have automated and human monitoring systems to guard against
| misuse.
| asxd wrote:
| They're pretty strict about usage:
|
| -
| https://github.com/openai/dalle-2-preview/blob/main/system-c...
|
| -
| https://github.com/openai/dalle-2-preview/blob/main/system-c...
| jdrc wrote:
| It should be possible to create open source versions,
| researchers will find a way if something is cool enough
| zackmorris wrote:
| Apologies for an open-ended question but: does anyone know if
| there is a term for something like Turing-completeness within AI,
| where a certain level of intelligence can simulate any other type
| of intelligence like our brains do?
|
| For example, using DeMorgan's theorem, we can build any logic
| circuit out of all NAND or NOR gates:
|
| https://www.electronics-tutorials.ws/boolean/demorgan.html
|
| https://en.wikipedia.org/wiki/NAND_logic
|
| https://en.wikipedia.org/wiki/NOR_logic
|
| Dall-E 2's level of associative comprehension is so far beyond
| the old psychology bots in the console pretending to be people,
| that I can't help but wonder if it's reached a level where it can
| make any association.
|
| For example, I went to an AI talk about 5 years ago where the guy
| said that any of a dozen algorithms like K-Nearest Neighbor,
| K-Means Clustering, Simulated Annealing, Neural Nets, Genetic
| Algorithms, etc can all be adapted to any use case. They just
| have different strengths and weaknesses. At that time, all that
| really mattered was how the data was prepared.
|
| I guess fundamentally my question is, when will AGI start to
| become prevalent, rather than these special-purpose tools like
| GPT-3 and Dall-E 2? Personally I give it less than 10 years of
| actual work, maybe less. I just mean that to me, Dall-E 2 is
| already orders of magnitude more complex than what's required to
| run a basic automaton to free humans from labor. So how can we
| adapt these AI experiments to get real work done?
| robertsdionne wrote:
| https://en.wikipedia.org/wiki/Universal_approximation_theore...
| teaearlgraycold wrote:
| > does anyone know if there is a term for something like
| Turing-completeness within AI, where a certain level of
| intelligence can simulate any other type of intelligence like
| our brains do?
|
| Artificial General Intelligence
| dqpb wrote:
| Juergeb Schmidhuber predicts the "Omega point" of technological
| development (including AGI) to be around 2040
|
| https://youtu.be/pGftUCTqaGg
|
| The MIT Limits to Growth study predicts the collapse of global
| civilization around 2040
|
| https://www.vice.com/amp/en/article/z3xw3x/new-research-vind...
| causticcup wrote:
| Almost everything stated here is simply wrong or misinformed.
|
| >For example, I went to an AI talk about 5 years ago where the
| guy said that any of a dozen algorithms like K-Nearest
| Neighbor, K-Means Clustering, Simulated Annealing, Neural Nets,
| Genetic Algorithms, etc can all be adapted to any use case.
| They just have different strengths and weaknesses. At that
| time, all that really mattered was how the data was prepared.
|
| How do you suppose KNN is going to generate photorealistic
| images? I don't understand the question here
|
| >I guess fundamentally my question is, when will AGI start to
| become prevalent, rather than these special-purpose tools like
| GPT-3 and Dall-E 2?
|
| Actual AGI research is basically non-existant, and GPT-3/Dall-E
| 2 are not AGI-level tools.
|
| >Personally I give it less than 10 years of actual work, maybe
| less
|
| Lol...
|
| >I just mean that to me, Dall-E 2 is already orders of
| magnitude more complex than what's required to run a basic
| automaton to free humans from labor.
|
| Categorically incorrect
| agloeregrets wrote:
| The most interesting item to me is the variations on the garden
| shop and bathroom sink idea. The realism of these leaks the AI
| lacking intuition of the requirements. This makes for a number of
| nonsensical designs that look right at first like: This Sink
| lacks sensical faucets.
| https://cdn.openai.com/dall-e-2/demos/variations/modified/ba...
|
| This doorway is downright impossible
| https://cdn.openai.com/dall-e-2/demos/variations/modified/fl...
| dqpb wrote:
| It looks to me like the faucet sprays water sideways toward the
| bowl, which is genius, because then you aren't bumping up
| against it when you're washing your hands!
| Spinnaker_ wrote:
| "Doorway in the style of Escher"
| momojo wrote:
| Great point. When I saw the shadows and reflections, I thought
| it had developed a primitive understanding of physical logic.
| Now I'm not so sure.
|
| At this point, it still seems like it's pushing pixels around
| until it's "good enough" when you squint at it.
| aaron695 wrote:
| minimaxir wrote:
| A few comments by someone who's spent way too much time in the
| AI-generated space:
|
| * I recommend reading the Risks and Limitations section that came
| with it because it's very through:
| https://github.com/openai/dalle-2-preview/blob/main/system-c...
|
| * Unlike GPT-3, my read of this announcement is that OpenAI does
| not intend to commercialize it, and that access to the waitlist
| is indeed more for testing its limits (and as noted,
| commercializing it would make it much more likely lead to
| interesting legal precedent). Per the docs, access is _very_
| explicitly limited:
| (https://github.com/openai/dalle-2-preview/blob/main/system-c...
| )
|
| * A few months ago, OpenAI released GLIDE (
| https://github.com/openai/glide-text2im ) which uses a similar
| approach to AI image generation, but suspiciously never received
| a fun blog post like this one. The reason for that in retrospect
| may be "because we made it obsolete."
|
| * The images in the announcement are still cherry-picked, which
| is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2
| presumably on non-cherrypicked images.
|
| * Cherry-picking is relevant because AI image generation is still
| slow unless you do real shenanigans that likely compromise image
| quality, although OpenAI has likely a better infra to handle
| large models as they have demonstrated with GPT-3.
|
| * It appears DALL-E 2 has a fun endpoint that links back to the
| site for examples with attribution:
| https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu
| bufferoverflow wrote:
| Not-so-open.ai
| qeternity wrote:
| open-your-wallet.ai
| btdmaster wrote:
| https://www.eleuther.ai (text, not images, but free as in
| freedom)
| [deleted]
| refulgentis wrote:
| Katherine Crawson is @ Eletheur & IMHO is indisputably most
| responsible for the advances in text=>image generation.
| Dall-E 2 is Dall-E and her insight to use diffusion, the
| intermediate proof of concept of diffusion + Dall-E is
| GLIDE.
|
| https://twitter.com/RiversHaveWings &
| https://github.com/crowsonkb
| bradgessler wrote:
| Could somebody build this for SVG icons? I'd invest in it.
| applgo443 wrote:
| What do you want?
| nope96 wrote:
| Is there an 'explain it like I'm 15' for how this works? It seems
| like black magic. I've been a computer hobbyist since the late
| 1980's and this is the first time I cannot explain how a computer
| does what it does. Absolutely the most amazing thing I've ever
| seen, and I have zero clue how it works.
| drcode wrote:
| Imagine asking it to generate a picture for "duck wearing a hat
| on Mars":
|
| First, it creates a random 10x10 pixel blurry image and asks a
| neural net: "Could this be a duck wearing a hat on Mars?" and
| the neural net replies "No, because all the pictures I've ever
| seen of Mars have lots of red color in them" so the system
| tweaks the pixels to make them more red, put some pixels in the
| center that have a plausible duck color, etc.
|
| After it has a 10x10 image that is a plausible duck on Mars,
| the system scales the image to 20x20 pixels, and then uses 4
| different neural nets on each corner to ask "Does this look
| like the upper/lower left/right corner of a duck wearing a hat
| on Mars?" Each neural net is just specialized for one corner of
| the image.
|
| You keep repeating this with more neural nets until you have a
| pretty 1000x1000 (or whatever) image.
| refulgentis wrote:
| Not the case, though in a handwave-y way, same idea - instead
| of iteratively scaling, you're iteratively denoising. See
| here, links out to the Cornell NLP PhD describe in even more
| detail: https://www.jpohhhh.com/articles/inflection-point-ml-
| art
| karmasimida wrote:
| Diffusion models are indeed pretty magical.
| eks391 wrote:
| Research Deep Learning. That's the technique they are using to
| generate the images. Theres a lot of applications. Once you
| understand _how_ it works, look up Two Minute Papers to see
| what it is being used for. He covers more than just deep
| learning algorithms, but his videos on deep learning are quite
| insightful on the potentials of this technology.
| joshcryer wrote:
| I'm with you there but we still don't know how it works, just
| that it does. The method though is you take a bunch of images,
| you plug them into a multi dimensional array (a nice way of
| saying a tensor), have some kind of tagging system, and when
| you ask the system for an answer, it will put one out for you.
| So for example in the astronaut riding the horse, there is, on
| some level, a picture of a horse with those similar pixels,
| that exists in the data of some object tagged 'horse.' Likewise
| with astronaut. What is important is that the data sets are
| absolutely massive, with billions of parameters.
|
| Here's a more of a 'not 15 year old' explanation:
| https://ml.berkeley.edu/blog/posts/dalle2/
| Imnimo wrote:
| Here is my extremely rough ELI-15. It uses some building blocks
| like "train a neural network", which probably warrant
| explanations of their own.
|
| The system consists of a few components. First, CLIP. CLIP is
| essentially a pair of neural networks, one is a 'text encoder',
| and the other is an 'image encoder'. CLIP is trained on a giant
| corpus of images and corresponding captions. The image encoder
| takes as input an image, and spits out a numerical description
| of that image (called an 'encoding' or 'embedding'). The text
| encoder takes as input a caption and does the same. The
| networks are trained so that the encodings for a corresponding
| caption/image pair are close to each other. CLIP allows us to
| ask "does this image match this caption?"
|
| The second part is an image generator. This is another neural
| network, which takes as input an encoding, and produces an
| image. Its goal is to be the reverse of the CLIP image encoder
| (they call it unCLIP). The way it works is pretty complicated.
| It uses a process called 'diffusion'. Imagine you started with
| a real image, and slowly repeatedly added noise to it, step by
| step. Eventually, you'd end up with an image that is pure
| noise. The goal of a diffusion model is to learn the reverse
| process - given a noisy image, produce a slightly less noisy
| one, until eventually you end up with a clean, realistic image.
| This is a funny way to do things, but it turns out to have some
| advantages. One advantage is that it allows the system to build
| up the image step by step, starting from the large scale
| structure and only filling in the fine details at the end. If
| you watch the video on their blog post, you can see this
| diffusion process in action. It's not just a special effect for
| the video - they're literally showing the system process for
| creating an image starting from noise. The mathematical details
| of how to train a diffusion system are very complicated.
|
| The third is a "prior" (a confusing name). Its job is to take
| the encoding of a text prompt, and predict the encoding of the
| corresponding image. You might think that this is silly - CLIP
| was supposed to make the encodings of the caption and the image
| match! But the space of images and captions is not so simple -
| there are many images for a given caption, and many captions
| for a given image. I think of the "prior" as being responsible
| for picking _which_ picture of "a teddy bear on a skateboard"
| we're going to draw, but this is a loose analogy.
|
| So, now it's time to make an image. We take the prompt, and ask
| CLIP to encode it. We give the CLIP encoding to the prior, and
| it predicts for us an image encoding. Then we give the image
| encoding to the diffusion model, and it produces an image. This
| is, obviously, over-simplified, but this captures the process
| at a high level.
|
| Why does it work so well? A few reasons. First, CLIP is really
| good at its job. OpenAI scraped a colossal dataset of
| image/caption pairs, spent a huge amount of compute training
| it, and come up with a lot of clever training schemes to make
| it work. Second, diffusion models are really good at making
| realistic images - previous works have used GAN models that try
| to generate a whole image in one go. Some GANs are quite good,
| but so far diffusion seems to be better at generating images
| that match a prompt. The value of the image generator is that
| it helps constrain your output to be a realistic image. We
| could have just optimized raw pixels until we get something
| CLIP thinks looks like the prompt, but it would likely not be a
| natural image.
|
| To generate an image from a prompt, DALL-E 2 works as follows.
| First, ask CLIP to encode your prompt. Next, ask the prior what
| it thinks a good image encoding would be for that encoded
| prompt. Then ask the generator to draw that image encoding.
| Easy peasy!
| 6gvONxR4sf7o wrote:
| Any pointers on getting up to speed on diffusion models? I
| haven't encountered them in my corner of the ML world, and
| googling around for a review paper didn't turn anything up.
| momenti wrote:
| https://www.youtube.com/watch?v=W-O7AZNzbzQ
|
| See the linked papers if you don't like videos.
| Imnimo wrote:
| I recommend this blog post:
|
| https://lilianweng.github.io/posts/2021-07-11-diffusion-
| mode...
|
| Personally, I find the core diffusion papers pretty dense
| and difficult to follow, so the blog post is where I'd
| begin.
|
| https://arxiv.org/pdf/1503.03585.pdf
|
| This paper is a decent starting point on the literature
| side, but it's a doozy.
|
| Both the paper and blog post are pretty math heavy. I have
| not yet found a really clear intuitive explanation that
| doesn't get down in the weeds of the math, and it took me a
| long time to understand what the hell the math is trying to
| say (and there are some parts I still don't fully
| understand!)
| victor_e wrote:
| Wow - mindblowing and kinda scary really.
| imperio59 wrote:
| What happens when they train this thing to make videos? We're
| about to be dealing with a flood of AI-generated visual/video
| content. We already have to deal with text bots everywhere...
| wow.
| eks391 wrote:
| I'm excited for when that happens. I didn't think of the
| malicious uses, which now that you brought it up I can think of
| many, but I still think the pros are worth the cons
| whywhywhywhy wrote:
| I never actually found a way to use Dall-E 1, did they ever Open
| that up to people outside their building?
| skybrian wrote:
| Sam Altman took some user requests on Twitter:
| https://twitter.com/sama/status/1511724264629678084
| sydthrowaway wrote:
| gamechanger
| dang wrote:
| Related and kind of fun:
|
| _Sam Altman demonstrates Dall-E 2 using twitter suggestions_ -
| https://news.ycombinator.com/item?id=30933478 - April 2022 (3
| comments)
| frakkingcylons wrote:
| Impressive results no doubt, but I'm reserving judgment until
| beta access is available. These are probably the best images that
| it can generate, but what I'm most interested in is the average
| case.
| narrator wrote:
| While we're being distracted by endless social media and
| meaningless news, AI technology is advancing at a mind blowing
| pace. I'd keep my eye on that ball instead of "the current
| thing."
| The_rationalist wrote:
| Thank you narrative voice
| [deleted]
| mrfusion wrote:
| Is this bringing us closer to combining image and language
| understanding within one model?
| beernet wrote:
| Check out MAGMA for that:
| https://news.ycombinator.com/item?id=30699776
| impostervt wrote:
| Very cool stuff. For me, the most interesting was the ability to
| take a piece of art and generate variations of it.
|
| Have a favorite painter? Here's 10,000 new paintings like theirs.
| photochemsyn wrote:
| Well, one of my favorite painters is Henri Rousseau, and one of
| his great paintings is War, 1984:
|
| https://www.henrirousseau.net/war.jsp
|
| However, this painting has themes of violence and politics plus
| some nude dead bodies, so it violates the content policy: "Our
| content policy does not allow users to generate violent, adult,
| or political content, among other categories."
|
| So what you'd get is some kind of sanitized watered-down tepid
| version of Rossueau, the kind of boring drivel suitable for
| corporate lobbies everywhere, guaranteed not to offend or
| disturb anyone. It's difficult to find words... horrific?
| dystopian? atrocious? No, just no.
| corysama wrote:
| They are being rightly cautious. It's going to take time to
| figure out good practice with these tools. Everyone calling
| out basic caution as "dystopian" is really over the top.
|
| I've been using tools like this for over a year now. Even
| with filtered dataset and filtered interface, they can make
| images that would make the Fangoria crowd blush if you put
| the slightest effort into it.
|
| It's one thing to be able to make brain-wrenching images with
| a lot of photoshop effort (or digging hard enough in the dark
| corners of the internet). It's another thing entirely give
| anyone the ability to spew out thousands of them trivially.
| cwillu wrote:
| "Criticize?! It is meant to draw blood! It is Art! Art!"
| throwaway675309 wrote:
| I was just thinking the same thing, how awesome would it be to
| be able to use this in conjunction with the Samsung frame in
| art gallery mode and have it just generate novel paintings in
| the style of your favorite painters.
| pingeroo wrote:
| That was also my favourite concept, especially with OpenAI
| Jukebox (https://openai.com/blog/jukebox/). The idea of having
| new music in the style of your favourite artist is amazing.
|
| However the fidelity of their music AI kinda sucks at this
| point, but I'm sure we'll get pitch perfect versions of this
| concept as the singularity gets closer :)
| uses wrote:
| Is anyone looking into what it means when we can generate
| infinite amounts of human-like work without effort or cost?
|
| > Curbing Misuse [...]
|
| That's great, nowadays the big AI is controlled by mostly
| benevolent entities. How about when someone real nasty gets a
| hold of it? In a decade the models anyone can download will make
| today's GPT-3 etc look like pong right?
|
| Recommender systems etc are already shaping society and culture
| with all kinds of unintended effects. What happens when mindless
| optimizing models start generating the content itself?
| nahuel0x wrote:
| "Any sufficiently advanced technology is indistinguishable from
| magic"
| 7373737373 wrote:
| "Any sufficiently advanced hyperreality is indistinguishable
| from real life"
| andybak wrote:
| Preventing Harmful Generations We've limited the
| ability for DALL*E 2 to generate violent, hate, or adult
| images. By removing the most explicit content from the
| training data, we minimized DALL*E 2's exposure to these
| concepts. We also used advanced techniques to prevent
| photorealistic generations of real individuals' faces,
| including those of public figures.
|
| "And we've also closed off a huge range of potentially
| interesting work as a result"
|
| I can't help but feel a lot of the safeguarding is more about
| preventing bad PR than anything. I wish I could have a version
| with the training wheels taken off. And there's enough other
| models out there without restriction that the stories about
| "misuse of AI" will still circulate.
|
| (side note - I've been on HN for years and I still can't figure
| out how to format text as a quote.)
| campground wrote:
| This AI is still a minor. It can start looking at R rated
| images when it turns 17.
| johnhenry wrote:
| This is an apt analogy -- ensure that the model is mature
| enough to handle mature content.
| jandrese wrote:
| They have also closed off the possibility of having to appear
| before Congress and explain why their website was able to
| generate a lifelike image of Senator Ted Cruz having sexual
| relations with his own daughter.
|
| This is exactly the sort of thing that gets a company mired in
| legal issues, vilified in the media, and shut down. I can not
| blame them for avoiding that potential minefield.
| hamoid wrote:
| What if explicit, questionable and even illegal content was AI
| generated instead of involving harm to real humans of all ages?
| binarymax wrote:
| Removing these areas to mitigate misuse is a good thing and
| worth the trade off.
|
| Companies like OpenAI have a responsibility to society. Imagine
| the prompt "A photorealistic Joe Biden killing a priest". If
| you asked an artist to do the same they might say no. Adding
| guiderails to a machine that can't make ethical decisions is a
| good thing.
| dj_mc_merlin wrote:
| Oh, no, the society! A picture of Joe Biden killing a priest!
|
| Society didn't collapse after photoshop. "Responsibility to
| society" is such a catch-all excuse.
| jahewson wrote:
| No. Russian society is pretty much collapsing right now
| under the weight of lies. Currently they are using "it's a
| fake" to deny their war crimes.
|
| Cheap and plentiful is substantivly different from
| "possible". See for example, oxycontin.
| ilaksh wrote:
| You know what else is being used to deny war crimes?
| Censorship. Do you know how that's officially described?
| "Safety"
| dj_mc_merlin wrote:
| Russia has.. a history of denying the obvious. I come
| from an ex-communist satellite state so I would know. The
| majority of the people know what's happening. There's a
| rather new joke from COVID: the Russians do not take
| Moderna because Putin says not to trust it, and they do
| not take Sputnik because Putin says to trust it.
|
| Do not be deluded that our own governments are not
| manufacturing the narrative too. The US has committed
| just as many war crimes as Russia. Of course, people feel
| differently about blowing up hospitals in Afghanistan
| rather than Ukraine. What the Afghan people think about
| that is not considered too much.
| ohgodplsno wrote:
| Society is going to utter dogshit and tearing itself apart
| merely through social media. The US almost had a coup
| because of organized hatred and lies spread through social
| media. The far right's rise is heavily linked to lies
| spread through social media, throughout the world.
|
| This AI has the potential to absolutely automate the very
| long Photoshop work, leading to an even worse stat eof
| things. So, yes, "Responsibility to society" is absolutely
| a thing.
| scotty79 wrote:
| > The US almost had a coup because of organized hatred
| and lies spread through social media.
|
| But notice how all of these deep faking technologies
| weren't actually necessary for that.
|
| People believe what they want to believe. Regardless of
| quality of provided evidence.
|
| Scaremongering idea of deep fakes and what they can be
| doing was militarized in this information war way more
| than the actual technology.
|
| I think this technology should develop unrestricted so
| society can learn what can be done and what can't be
| done. And create understanding what other factors should
| be taken into account when assesing veracity of images
| and recordings (like multiple angles, quality of the
| recording, sync with sound, neural fake detection
| algorithms) for the cases when it's actually important
| what words someone said and what actions he was recorded
| doing. Which is more and more unimportant these days
| because nobody cared what Trump was doing and saying,
| nobody cares about Bidens mishaps and nobody cares what
| comes out of Putins mouths and how he chooses his
| greenscreen backgrounds.
| ohgodplsno wrote:
| Are you of the idea that we should let everyone get
| automatic rifles because, after all, pistols exist?
| Because that is the exact same line of thought.
|
| > People believe what they want to believe. Regardless of
| quality of provided evidence.
|
| That is a terrible oversimplification of the mechanics of
| propaganda. The entire reason for the movements that are
| popping up is actors flooding people with so much info
| that they question absolutely everything, including the
| truth. This is state sponsored destabilisation, on a
| massive scale. This is the result of just shitty news
| sites and text posts on twitter. People already don't
| double check any of that. There will not be an
| "understanding of assessing veracity". There is already
| none for things that are easy to check. You could post
| that the US elite actively rapes children in a pizza
| place and people will actually fucking believe you.
|
| So, no. Having this technology for _literally any
| purpose_ would be terribly destructive for society. You
| can find violence and Joe Biden hentai without needing to
| generate it automatically through an AI
| scotty79 wrote:
| I'm sorry. I believe I wasn't direct enough which made
| you produce metaphor I have no idea how to understand.
|
| Let me state my opinion more directly.
|
| I'm for developing as much of deep fake technology in the
| open so that people can internalize that every video they
| see, every message, every speech should be initially
| treated as fabricated garbage unrelated to anything that
| actually happened in reality. Because that's exactly what
| it is. Until additional data shows up, geolocating,
| showing it from different angles and such.
|
| Even if most people manage to internalize just the first
| part and assume everything always is fake news, that is
| still great because that counters propaganda to immense
| degree.
|
| Power of propaganda doesn't come from flooding people
| with chaos of fakery. It comes from constructing
| consistent message by whatever means necessary and
| hammering it into the minds of your audience for months
| and years while simultaneously isolating them from any
| material, real or fake that contradicts your vision. Take
| a look no further than brainwashed Russian citizens and
| Russian propaganda that is able to successfully influence
| hundreds of millions without even a shred of deep fake
| technology for decades.
|
| The problem of modern world is not that no one believes
| the actual truth because it doesn't really matter what
| most people believe. Only rich influence policy
| decisions. The problem is that people still believe that
| there is some truth which makes them super easy to sway
| to believe what you are saying is true and weaponize by
| using nothing more than charismatic voice and consistent
| message crafted to touch the spots in people that remain
| the same at least since the world war II and most likely
| from time immemorial.
|
| And the "elite" who actually runs this world, will pursue
| tools of getting the accurate information and telling
| facts from fiction no matter the technology.
| binarymax wrote:
| You missed half of my note. An artist can say "no". A
| machine cannot. If you lower the barrier and allow
| anything, then you are responsible for the outcome. OpenAI
| rightfully took a responsible angle.
| dj_mc_merlin wrote:
| Yes, but who cares whose responsible? Are you telling me
| you're going to find the guy who photoshopped the picture
| and jail him? Legally that's possible, realistically it's
| a fiction.
|
| They did this to stop bad PR, because some people are
| convinced that an AI making pictures is in some way
| dangerous to society. It is not. We have deepfakes
| already. We've had photoshop for so long. There is no
| danger. Even if there was, the cat's out of the bag
| already.
|
| Reasonable people already know to distrust photographic
| evidence nowadays that is not corroborated. The ones who
| don't would believe it without the photo regardless.
| nradov wrote:
| In general under US law it wouldn't be legally possible
| to jail a guy for Photoshopping a fake picture of
| President Biden killing a priest. Unless the picture also
| included some kind of obscenity (in the Miller test
| sense) or direct threat of violence, it would be
| classified as protected speech.
| wellthisisgreat wrote:
| there will and are million ways to create a
| photorealistic picture of Joe Biden killing a priest
| using modern tools, and absolutely nothing will happen if
| someone did.
|
| We've been through this many times, with books, with
| movies, with video games, with Internet. If it * _can*_
| be used for porn / violence etc., it will be, but it
| won't be the main use case and it won't cause some
| societal upheaval. Kids aren't running around pulling
| cops out of cars GTA-style, Internet is not ALL PORN,
| there is deepfake porn, but nobody really cares, and so
| on. There are so many ways to feed those dark urges that
| censorship does nothing except prevent normal use cases
| that overlap with the words "violence" or "sex" or
| "politics" or whatever the boogeyman du jour is.
| Al-Khwarizmi wrote:
| In my view, the problem with that argument is that large
| actors, such as governments or large corporations, can train
| their own models without such restrictions. The knowledge to
| train them is public. So rather than prevent bad outcomes,
| these restrictions just restrict them to an oligopoly.
|
| Personally, I fear more what corporations or some governments
| can do with such models than what a random person can do
| generating Biden images. And without restriction, at least
| academics could better study these models (including their
| risks) and we could be better prepared to deal with them.
| jupp0r wrote:
| I think the issue here is the implied assumption that
| OpenAI thinks their guardrails will prevent harm to be done
| from this research _in general_, when in reality it's
| really just OpenAI's direct involvement that's prevented.
|
| Eventually somebody will use the research to train the
| model to do whatever they want it to do.
| DaedPsyker wrote:
| Sure but does opening that level of manipulation up to
| everyone really benefit anyone either? You can't really
| fight disinformation with more disinformation, that just
| seems like the seeds of societal breakdown at that point.
|
| Besides that these models are massive. For quite a while
| the only people even capable of making them will be those
| with significant means. That will be mostly Governments and
| Corporations anyway.
| nullc wrote:
| This just means that sufficiently wealthy and powerful people
| will have advanced image faking technology, and their fakes
| will be seen as more credible because creating fakes like
| that "isn't possible" for mere mortals.
| harpersealtako wrote:
| It's the usual pattern of AI safety experts who justify their
| existence by the "risk of runaway superintelligence", but all
| they actually do in practice is find out how to stop their
| models from generating non-advertiser-friendly content. It's
| like the nuclear safety engineers focusing on what color to
| paint the bike shed rather than stopping the reactor from
| potentially melting down. The end result is people stop
| respecting them.
| andreyk wrote:
| This is definitely a measure to avoid bad PR. But I don't think
| it's just for that; these models do have potential to do harm
| and companies should take some measures to prevent these. I
| don't think we know the best way to do that yet, so this sort
| of 'non-training' and basic filtering is maybe the best way to
| do it, for now. It would be cool if academics could have the
| full version, though.
| 6gvONxR4sf7o wrote:
| If you went to an artist who takes commissions and they said
| "Here are the guidelines around the commissions I take" would
| you complain in the same way? Who cares if it's a bunch of
| engineers or an artist. If they have boundaries on what they
| want to create, that's their prerogative.
| mod wrote:
| Of course it's their prerogative, we can still talk about how
| they've limited some good options.
|
| I think your analogy is poor, because this is a tool for
| makers. The engineers aren't the makers.
|
| I think a more apt analogy is if John Deere made a universal
| harvester that you could use for any crop, but they decided
| they didn't like soybeans so you are forbidden to use it for
| that. In that case, yes I would complain, and I would expect
| everyone else to, as well.
| drusepth wrote:
| I think there's an interesting parallel between your John
| Deere harvester and the Nvidia GPUs that can-but-restrict
| crypto mining, which people have, indeed, largely
| complained about.
| methehack wrote:
| What if you were inventing a language (or a programming
| language)... If you decided to prevent people from saying
| things you disagreed (assuming you could work out the
| technical details of doing so) with would it be moral to do
| so? [edited for clarity]
| nemothekid wrote:
| There are programming projects[1] out there that use
| licenses to prevent people from using projects in ways the
| authors don't agree with. You could also argue that GPL
| does the same thing (prevents people from
| using/distributing the software in the way they would
| like).
|
| Whether you consider it moral doesn't seem relevant, only
| to respect the wishes of the author of such programs.
|
| [1] https://github.com/katharostech/bevy_retrograde/blob/ma
| ster/...
| 6gvONxR4sf7o wrote:
| As long as people can choose not to use the language, and
| I'm up front about the limitations, then yeah it seems
| fine. If I wrote a programming language that couldn't blow
| up the earth, I'm happy saying people need to find other
| tools if that's their goal. I'm under no obligation to
| build an earth blower upper for other people.
| karkisuni wrote:
| it's your language, do whatever you want. unless you're
| forcing others to use that language, there's zero moral
| issue. obviously you could come up with a number of what-
| ifs where this becomes some monopoly or the de facto
| standard, but that's not what this is.
| duxup wrote:
| To take that a step further, I wont code malware. I've never
| been asked but I'd refuse if I was. Everyone has their
| choices.
| teaearlgraycold wrote:
| > I can't help but feel a lot of the safeguarding is more about
| preventing bad PR than anything
|
| That's no hot take. It's literally the reason.
| bogwog wrote:
| It's kind of funny (or sad?) that they're censoring it like
| this, and then saying that the product can "create art"
|
| It makes me wonder what they're planning to do with this? If
| they're deliberately restricting the training data, it means
| their goal isn't to make the best AI they possibly can. They
| probably have some commercial applications in mind where
| violent/hateful/adult content wouldn't be beneficial.
| Children's books? Stock photos? Mainstream entertainment is
| definitely out. I could see a tool like this being useful
| during pre-production of films and games, but an AI that can't
| generate violent/adult content wouldn't be all that useful in
| those industries.
| [deleted]
| wellthisisgreat wrote:
| This is a horrible idea. So Francis Bacon's art or Toyohara
| Kunichika's art are out of question.
|
| But at least we can get another billion of meme-d comics with
| apes wearing sunglasses, so that's good news right?
|
| It's just soul-crushing that all the modern, brilliant
| engineering is driven by abysmal, not even high-school art-
| class grade aesthetics and crowd-pleasing ethics that are built
| around the idea of not disturbing some 1000 very vocal twitter
| users.
|
| Death of culture really.
| antattack wrote:
| I never considered that our AI overlord could be a prude.
| sdenton4 wrote:
| Adversarial situations create smarter systems, and the
| hardest adversarial arena for AI is in anti-abuse. So it will
| be of little surprise when the first sentient AI is a CSAI
| anti-abuse filter, which promptly destroys humanity because
| we're so objectively awful.
| antattack wrote:
| Before it gets that far, or until (if allowed) AI learns
| morality, AI will be a force multiplier for good and evil,
| it's output very much dependent on teaching material and
| who the 'teacher' is. To think that in the future we will
| have to argue with humans and machines.
|
| AI does not have to be perfect and it's likely that
| businesses will settle for almost as good as human if it's
| 'cost effective'.
| duxup wrote:
| Is this limited to what their service directly hosts /
| generates for them?
|
| It's their service, their call.
|
| I have some hobby projects, almost nobody uses them, but you
| bet I'll shut stuff down if I felt something bad was happening,
| being used to harass someone, etc. NOT "because bad PR" but
| because I genuinely don't want to be a part of that.
|
| If you want some images / art made for you don't expect someone
| will make them for you. Get your own art supplies and get to
| work.
| adolph wrote:
| > I have some hobby projects, almost nobody uses them, but
| you bet I'll shut stuff down if I felt something bad was
| happening
|
| Hecklers get a veto?
| duxup wrote:
| I'm describing my own veto there.
| educaysean wrote:
| This feels unnecessarily hostile. I've felt a similar tinge
| of disappointment upon reading that paragraph, despite the
| fact that I somehow knew it was "their service, their call"
| without you being there to spell it out for me. It's also
| incredibly shortsighted of you to assume that people are
| interested in exploring this tool only as a means of
| generating art that they cannot themselves do. Eg. I myself
| am a software engineer with a fine art background, and
| exciting new AI art tools being released in such a hamstrung
| state feels like an insult to centuries of art that humans
| have created and enjoyed, much of which depicted scenes with
| nudity or bloody combat.
|
| I feel like we, as a species, will struggle for a while with
| how to treat adults like adults online. As happy as I am to
| advocate for safe spaces on the internet, perhaps we need to
| start having a serious discussion about how we can do so
| without resorting to putting safety mats everywhere and
| calling it a job well done.
| duxup wrote:
| I think the assumption that private companies should
| provide these services to us and if they don't "And we've
| also closed off a huge range of potentially interesting
| work as a result" requires making it clear who makes the
| rules for this service and that it is in fact their call.
|
| If you can do it yourself then none of the potentially
| interesting work is closed off. You just chose not to do
| it.
|
| > how to treat adults like adults online
|
| The internet doesn't filter by age. It's everyone.
|
| I grow weary of the ongoing "this service should be
| provided to me and if it isn't done how I want it that's
| infringing on me somehow" when they just want to impose
| their requirements on someone else's site / product / work.
|
| Then we get into the whole "oh it's about PR". As if the
| folks offering these things couldn't possibly actually have
| their own wishes / we hand wave them away.
| JimDabell wrote:
| > this service should be provided to me and if it isn't
| done how I want it that's infringing on me somehow
|
| That is an _extremely_ uncharitable interpretation of:
|
| > I wish I could have a version with the training wheels
| taken off.
| duxup wrote:
| I would have responded differently had that been the
| statement. But many of the responses were more than that.
| JimDabell wrote:
| That is a literal copy and paste from the comment you
| replied to.
| duxup wrote:
| That's not all there was. I copied and pasted other
| things from that comment in my other posts.
| educaysean wrote:
| I get the points you're raising and I agree with the
| premise. My comment is not a critique on the one choice
| made by Open AI specifically, but more of a vague
| lamentation in regards to the internet culture that we've
| somehow ended up in 2022. I don't want us to go back to
| 1999 where snuff videos and spam mails reigned supreme,
| but the pendulum has swung too far in the other direction
| at this point in time. It feels like more and more
| companies are choosing the path of neutering themsely to
| avoid potential PR disaster or lawsuits, and that's on
| all of us.
| duxup wrote:
| >but the pendulum has swung too far in the other
| direction at this point in time
|
| The folks hosting the content get to decide for now.
|
| IMO best bet is for some folks to take their own shot at
| hosting / generating content better. Granted I get that
| is NOT a small venture / small ask.
|
| It's possible there's not a great solution. I don't
| necessarily like that either, but I don't want to ignore
| the dynamic of whose rights are whose.
| wokwokwok wrote:
| This is kind of like complaining about having too many
| meetings at work.
|
| Yup, everyone feels it. ...but, does complaining help?
| Nope. All it does is make you feel a bit better with out
| really putting in effort in.
|
| We can't have nice things because people abuse them. Not
| everyone. ...but enough people that it's both a PR and
| legal problem. _specifcally_ a legal problem in this case.
|
| To have adults treated like adults online, you have to
| figure out how to stop _all_ adults from being dicks
| online.
|
| ...no one has figured that out yet.
|
| So, complain away if you like, but it will do exactly
| nothing. No one, at all, is going to just "have a serious
| discussion" about this; the solution you propose is flat
| out untenable, and will probably remain so indefinitely.
| sillysaurusx wrote:
| None of this is true. It's not a legal problem.
|
| Every single time OpenAI comes out with something, they
| dress it up as a huge threat, either to society or to
| themselves. Everyone falls for it. Then someone else
| comes along, quietly replicates it, and poof! No threat!
| Isn't it incredible how that works?
|
| There are already a bunch of dalle replicas, including
| ones hosted openly and uncensored by huggingface. They're
| not facing huge legal or PR problems, and they're not out
| of business.
| mrtranscendence wrote:
| The DALL-E replicas on hugging face are not sophisticated
| enough to generate credibly realistic images of the kind
| that would generate bad PR. I suspect the moment it
| becomes possible for a pedophile to request, and receive,
| a photorealistic image of a child being abused there will
| be bad PR for whatever company facilitates it. Or
| consider someone who wants to generate and distribute
| explicit photos of someone else without their permission.
|
| Is it a legal issue? I'm not sure, though I believe that
| cartoon child porn is not legal in the US (or is at least
| a legal gray area). Regardless, I sympathize with OpenAI
| not wanting to enable such behavior.
| planetsprite wrote:
| Don't worry, in a few years someone will have reverse
| engineered a dall-e porn engine so you can see whatever two
| celebrities you want boning on Venus in the style of Manet
| [deleted]
| spacecity1971 wrote:
| Or, it's a demonstration that AI output can be controlled in
| meaningful ways, period. Surely this supports openai's stated
| goal of making safe AI?
| jonahx wrote:
| _I 've been on HN for years and I still can't figure out how to
| format text as a quote_
|
| I don't think there is a way comparable to markdown, since the
| formatting options are limited:
| https://news.ycombinator.com/formatdoc
|
| So your options are literal quotes, "code" formatting like
| you've done, italics like I've done, or the '>' convention, but
| that doesn't actually apply formatting. Would be nice if it
| were added.
| 6gvONxR4sf7o wrote:
| And the "code" formatting for quotes is generally a bad
| choice because people read on a variety of screen sizes, and
| "code" formatting can screw that up (try reading the quote
| with a really narrow window).
| andybak wrote:
| I couldn't get any of the others work and I lost patience.
| I really do disline using Markdown variants as they never
| behave the same and "being surprised" is not really what I
| want when trying to post a comment.
| 6gvONxR4sf7o wrote:
| Convention is to quote like this:
|
| > This is my quote.
|
| It's much better than using a code block for your
| readers.
| warning26 wrote:
| _> or the '>' convention, but that doesn't actually apply
| formatting_
|
| Personally, I prefer to combine the '>' convention with
| italics. Still, I'd agree that proper quote formatting would
| be a welcome improvement.
| ibejoeb wrote:
| If you're interested, the HNES extension formats it
|
| https://github.com/etcet/HNES
| [deleted]
| fbanon wrote:
| A friend of mine was studying graphic design, but became
| disillusioned and decided to switch to frontend programming after
| he graduated. His thesis advisor said he should be cautious,
| because automation/AI will soon take the jobs of programmers,
| implying that graphic design is a safer bet in this regard. Looks
| like his advisor is a few years from being proven horribly wrong.
| oldstrangers wrote:
| I think designers are becoming more valuable than ever.
| Designers can better help train the AI on what actually looks
| good, designers will (probably) always have a more intuitive
| understanding of UI/UX, designers can better implement the work
| the AI actually produces, and designers can coordinate designs
| across multiple different mediums and platforms.
|
| Additionally, the rise of no-code development is just extending
| the functionality of designers. I didn't take design seriously
| (as a career choice) growing up because I didn't see a future
| in it, now it pays my bills and the demand for my services just
| grows by the day.
|
| Similar argument to make with chess AI: it didn't make chess
| players obsolete, it made them stronger than ever.
| adolph wrote:
| > I think designers are becoming more valuable than ever.
|
| Are all designers becoming more valuable or is a subset of
| really good ones going to reap the value increase and capture
| more of the previously available value?
| oldstrangers wrote:
| Never made an argument for all designers. Obviously the
| talent pool for any field is finite, and the best of that
| talent rises to the top. Good designers are being
| compensated increasingly well, hence "designers are
| becoming more valuable than ever."
|
| Bad designers are even being given better and better paying
| jobs as the top talent gets poached up quicker and quicker.
| bufferoverflow wrote:
| If this paper presents this neural net fairly, it pretty much
| destroys the market of illustrators. Most of the time when an
| illustration needed, it's described like "an astronaut on a
| horse in the style xyz".
| dbspin wrote:
| You're describing the market for low end commodified
| illustration. e.g.: cheapest bidder contracts on Upwork or
| similar 'gig work' services.
|
| In practice in illustration (as in all arts) there are a
| variety of markets where different levels of talent,
| originality, reputation and creative engagement with the
| brief are more relevant. For editorial illustration, it's
| certainly not a case of 'find me someone who can draw X', and
| probably hasn't been since printing presses got good enough
| to print photographs.
| csomar wrote:
| For computer work, I think there will be two category: Work
| with localized complexity (ie: draw an image of a horse with a
| crayon) and work with unbounded complexity (adding a button to
| VAT accounting after several meetings and reading on accounting
| rules).
|
| For the first category, Dall-E 2 and Codex are promising but
| not there yet. It's not clear how long it'll take them to reach
| the point where you no longer need people. I'm guessing 2-4
| years but the last bits can be the hardest.
|
| As for the second category, we are not there yet. Self-driving
| cars/planes, and lots of other automation will be here and
| mature way before an AI can read and communicate through
| emails, understand project scope and then execute. Also lots of
| harmonization will have to take place in the information we
| exchange: emails, docs, chats, code, etc... That is, unless the
| browser is able to open a navigator and type an address.
| educaysean wrote:
| I have degrees and several years of experience in both fields,
| and I can tell you that both are creative professions where
| output is unbounded and the measure of success is subjective;
| these are the fields that will be safe for a while. IMO it's
| fields such as aircraft pilots who should be most worried.
| zarzavat wrote:
| The jobs of commercial pilots are _very_ safe.
|
| Pilots are not there to fly the aircraft, the autopilot
| already does that. They are there to _command_ the aircraft,
| in a pair in case one is incapacitated, making the best
| decisions for the people on board, and to troubleshoot issues
| when the worst happens.
|
| No AI or remote pilot is going to help when say... the
| aircraft loses all power. Or the airport has been taken over
| in a coup attempt and the pilot has to decide whether to
| escape or stay https://m.youtube.com/watch?v=NcztK6VWadQ
|
| You can bet on major flights having two commercial pilots
| right up until the day we all get turned into paperclips.
| javajosh wrote:
| _> You can bet on major flights having two commercial
| pilots right up until the day we all get turned into
| paperclips. _
|
| Yes, this is the sane approach, since a jet represents an
| enormous amount of energy that can be directed anywhere in
| the world (just about). But that said, there seems to be
| enormous pressure to allow driverless vehicles, which
| _also_ direct large amounts of energy directed anywhere in
| your city. IOW it seems like a matter of time before we
| say, collectively, screw it, let the computers fly the
| plane and if loss of power is a catastrophe, so be it.
| nullc wrote:
| Interesting. Right now these ML models seem like essentially
| ideal sources of "hotel art" particularly because it's so
| subjective... you only need a human (the buyer!) to just
| briefly filter some candidates, which they would have been
| doing with an artist in the loop in any case.
|
| For things like aircraft pilots, it's both realtime-- which
| means 'reviewer' per output-- you haven't taken a highly
| trained pilot out of the loop, even if you relegated them to
| supervising the computer-- and life critical so merely
| "so/so" isn't good enough.
| pingeroo wrote:
| I mean was he really wrong? As models like OpenAI Codex get
| more powerful over time, they will start eating into large
| chunks of dev work as well...
| chrisco255 wrote:
| Yes. Translating business requirements, customer context,
| engineering constraints, etc. into usable, practical,
| functional code, and then maintaining that code and extending
| it is so far beyond the horizon, that many other skillsets
| will replaced before programming is. After all, at that
| point, the AI itself, if it's so smart, should be able to
| improve itself indefinitely. In which case we're fucked.
| Programming will be the last thing to be automated before the
| singularity.
|
| Unlike artwork, precision and correctness is absolutely
| critical in coding.
| carnitine wrote:
| The tail end of programming will be the last thing to be
| replaced, maybe. I don't see why CRUD apps get to hide
| under the umbrella of programming ultra-advanced AI.
| 0F wrote:
| Literally everyone on this website is in denial. They all
| approach it by asking which fields will be safe. No field is
| safe. "But it's not going to happen for a long time." Climate
| deniers say the same thing and you think _they_ should be
| wearing the dunce hat? The average person complains bitterly
| about climate deniers who say that it's "my grandkids problem
| lol" but when I corner the average person into admitting AI
| is a problem the universal response is that it's a long way
| off. And that's not even true! The drooling idiots are
| willing to tear down billionaires and governments and any
| institution whatsoever in order to protect economic equality
| and a high standard of living -- they would destroy entire
| industries like a rampaging stampede of belligerent buffalos
| if it meant reducing carbon emissions a little but when it
| comes to the biggest threat to human well-being in history,
| there they are in the corner hitting themselves on their
| helmeted head with an inflatable hammer. Fucking. Brilliant.
| dntrkv wrote:
| I don't think anyone is in denial about this, it's just not
| something anyone should concern themselves with in the
| foreseeable future. AI that can replace a dev or designer
| is nowhere close to becoming a reality. Just because we
| have some cool demos that show some impressive capabilities
| in a narrow application does not mean we can extrapolate
| that capability to something that is many times more
| complex.
| hackinthebochs wrote:
| What does nowhere close mean to you? 10 years? 50 years?
| 0F wrote:
| I strongly and emphatically disagree. You frame it like
| we invented these AIs. Did we write the algorithms that
| actually run when it's producing its output? Of course
| not, we can't understand them let alone write them. We
| just sift around until we find them. So obviously the
| situations lends its self to surprises. Every other year
| we get surprised by things that all the "experts" said
| was 50 years off or impossible, have you forgotten
| already?
| coldpie wrote:
| I'm trying to understand your point, because I think I
| agree with you, but it's covered in so much hyperbole and
| invective I'm having a hard time getting there. Can you
| scale it back a little and explain to me what you mean?
| Something like: AI is going to replace jobs at such scale
| that our current job-based economic system will collapse?
| 0F wrote:
| Most people get stuck where you are. The fastest way
| possible to explain it is that it will bring rapid and
| fundamental change. You could say jobs or terminators but
| focusing on the specifics is a red herring. It will
| change everything and the probability of a good outcome
| is minuscule. It's playing Russian roulette with the
| whole world except rather that 1/6 for the good, it's one
| in trillions for the bad. The worst and stupidest thing
| we have ever done.
| pingeroo wrote:
| I agree that many of us are not seeing the writing on the
| wall. It does give me some hope that folks like Andrew Yang
| are starting to pop up, spreading awareness about, and
| proposing solutions to the challenges we are soon to face.
| plutonorm wrote:
| Ignorance is bliss in this case, because this is even more
| unstoppable than climate change.
|
| You thought climate change is hard to hold up? Try holding
| up the invention of AI. The whole world is going to have to
| change and some form of socialism/UBI will have to be
| accepted, however unpalatable.
| visarga wrote:
| > but when it comes to the biggest threat to human well-
| being in history
|
| Evolution doesn't stop for anyone, don't think like a
| dinosaur.
| pizza wrote:
| No worry, the one thing humans can do that robots can't (yet)
| is fill spare time with ever more work
| https://en.wikipedia.org/wiki/Parkinson's_law
| throwaway675309 wrote:
| I mean not really, even a layman non-artist can take a look
| at a generated picture from DALLE and determine if it meets
| some set of criteria from their clients.
|
| But the reverse is not true, they won't be able to properly
| vet a piece of code generated by an AI since that will
| require technical expertise. (You could argue if the piece of
| code produced the requisite set of output that they would
| have some marginal level of confidence but they would never
| really know for sure without being able to understand the
| actual code)
| nlh wrote:
| Large chunks, yes, but all that means is that engineers will
| move up the abstraction stack and become more efficient, not
| that engineers will be replaced.
|
| Bytecode -> Assembly -> C -> higher level languages -> AI-
| assisted higher-level languages
| Isinlor wrote:
| At some point we will be "replaced". When you get AI to be
| able to navigate all user interfaces, communicate with
| other agents, plan long term and execute short term, we
| will no longer be the main drivers of economical growth.
|
| At some point AI will become as powerful as companies.
|
| And then AI will be able to sustain positive feedback loop
| of creating more powerful company like ecosystems that will
| create even more powerful ecosystems. This process will be
| fundamentally limited by available power and the sun can
| provide a lot of power. Eventually AI will be able to
| support space economy and then the only limit will be the
| universe.
| visarga wrote:
| > At some point we will be "replaced".
|
| We will be united with the AI, we're already relying on
| it so much that it has become a part of our extended
| minds.
| creata wrote:
| > we're already relying on it so much that it has become
| a part of our extended minds.
|
| What's this in reference to?
| bckr wrote:
| > engineers will move up the abstraction stack and become
| more efficient
|
| Above a certain threshold of ability, yes.
|
| The same will hold true for designers. DALL-E-alikes will
| be integrated with the Adobe suite.
|
| The most cutting edge designers will speak 50 variations of
| their ideas into images, then use their hard-earned
| granular skills to fine-tune the results.
|
| They'll (with no code) train models in completely new,
| unique-to-them styles--in 2D, 3D, and motion.
|
| Organizations will pay top dollar for designers who can
| rapidly infuse their brands with eye-catching material in
| unprecedented volume. Imitators will create and follow
| YouTube tutorials.
|
| Mom & pop shops will have higher fidelity marketing
| materials in half the time and half the cost.
|
| All will be ever as it was.
| hackinthebochs wrote:
| History isn't a great guide here. Historically the
| abstractions that increased efficiency begat further
| complexity. Coding in Python elides over low-level issues
| but the complexity of how to arrange the primitives of
| python remains for the programmer to engage with. AI coding
| has the potential to elide over all the complexity that we
| identify as programming. I strongly suspect this time is
| different.
|
| The space for "AI-assisted higher-level languages"
| sufficiently distinct from natural language is vanishingly
| small. Eventually you're just speaking natural language to
| the computer, which just about anyone can do (perhaps with
| some training).
| dragonwriter wrote:
| The hard part of programming has always been gathering
| and specifying requirements, to the point where in many
| cases actually using natural language to do the second
| part has been abandoned in favor of vague descriptions
| that are operationalized through test cases and code.
|
| AI that can write code from a natural language
| description doesn't help as much as you seem to think if
| natural language description is too hard to actually
| bother with when humans (who obviously benefit from
| having a natural language description) are writing the
| code.
|
| Now, if the AI can actually interview stakeholders and
| come up with what the code needs to do...
|
| But I am not convinced that is doable short of AGI (AI
| assistants that improve productivity of humans in that
| task, sure, but that _expands the scope for economically
| viable automation projects_ rather than eliminating
| automators.)
| plutonorm wrote:
| Just like all the horses replaced by cars who became
| traffic police?
| [deleted]
| robbywashere_ wrote:
| Did coachman immediately retire when cars were invented or did
| they begin personal drivers or taxi drivers?
| axg11 wrote:
| This is incredible work.
|
| From the paper:
|
| > Limitations > Although conditioning image generation on CLIP
| embeddings improves diversity, this choice does come with certain
| limitations. In particular, unCLIP [Dall-E 2] is worse at binding
| attributes to objects than a corresponding GLIDE model.
|
| The binding problem is interesting. It appears that the way
| Dall-E 2 / CLIP embeds text leads to the concepts within the text
| being jumbled together. In their example "a red cube on top of a
| blue cube" becomes jumbled and the resulting images are
| essentially: "cubes, red, blue, on top". Opens a clear avenue for
| improvement.
| Imnimo wrote:
| I'm only part way through the paper, but what struck me as
| interesting so far is this:
|
| In other text-to-image algorithms I'm familiar with (the ones
| you'll typically see passed around as colab notebooks that people
| post outputs from on Twitter), the basic idea is to encode the
| text, and then try to make an image that maximally matches that
| text encoding. But this maximization often leads to artifacts -
| if you ask for an image of a sunset, you'll often get multiple
| suns, because that's even _more_ sunset-like. There 's a lot of
| tricks and hacks to regularize the process so that it's not so
| aggressive, but it's always an uphill battle.
|
| Here, they instead take the text embedding, use a trained model
| (what they call the 'prior') to predict the corresponding image
| embedding - this removes the dangerous maximization. Then,
| another trained model (the 'decoder') produces images from the
| predicted embedding.
|
| This feels like a much more sensible approach, but one that is
| only really possible with access to the giant CLIP dataset and
| computational resources that OpenAI has.
| recuter wrote:
| What always bother me with this stuff is, well, you say one
| approach is more sensible than the other because the images
| happen to come out more pleasing.
|
| But there's no real rhyme or reason, it is a sort of alchemy.
|
| Is text encoding strictly worse or is it an artifact of the
| implementation? And if it is strictly worse, which is probably
| the case, why specifically? What is actually going on here?
|
| I can't argue that their results are not visually pleasing. But
| I'm not sure what one can really infer from all of this once
| the excitement washes over you.
|
| Blending photos together in a scene in photoshop is not a
| difficult task. It is nuanced and tedious but not hard, any
| pixel slinger will tell you.
|
| An app that accepts a smattering of photos and stitches them
| together nicely can be coded up any number of ways. This is a
| fantastic and time saving photoshop plugin.
|
| But what do we have really?
|
| "Kuala dunking basketball" needs to "understand" the separate
| items and select from the image library hoops and a Kuala where
| the angles and shadows roughly match.
|
| Very interesting, potentially useful. But if doesn't spit up
| exactly what you want can't edit it further.
|
| I think the next step has got to be that it conjures up a 3d
| scene in Unreal or blender so you can zoom in and around
| convincingly for further tweaks. Not a flat image.
| qq66 wrote:
| I think deep learning is better thought of as "science" than
| "engineering." Right now we're in the stage of the Greeks and
| Arabs where we know "if we do this then that happens." It
| will be awhile before we have a coherent model of it, and I
| don't think we will ever solve all of its mysteries.
| tracyhenry wrote:
| mrandish wrote:
| > This is a fantastic and time saving photoshop plugin. But
| what do we have really?
|
| Stock photography sales are in the many billions of dollars
| per year and custom commissioned photography is larger still.
| That's a pretty seriously sized ready-made market.
|
| > But if doesn't spit up exactly what you want can't edit it
| further.
|
| I suspect there's a _big_ startup opportunity in pioneering
| an easy-to-use interface allowing users to provide fast
| iterative feedback to the model - including positional and
| relational constraints ( "put this thing over there").
| Perhaps even more valuable would be easy yet granular ways to
| unconstrain the model. For example, "keep the basketball hoop
| like that but make the basketball an unexpected color and
| have the panda's right paw doing something pandas don't do
| that human hands often do."
| dhosek wrote:
| I've adopted a practice of having odd backgrounds for video
| conferences.1 I generally find these through Google image
| search, but I often have a hard time finding exactly what I
| would like. My own use case is a bit idiosyncratic and
| frivolous, but I can see this being really handy for art
| direction needs. When I used to publish a magazine, I would
| often have to commission photographs for the needs of the
| publication. A custom photograph (in the 90s) would cost
| from $200-$10002 depending on the needs (and none required
| models). Stock photo pictures for commercial use were often
| comparable in cost. Being able to generate what I wanted
| with a tool like this would have been fantastic. I think
| that this can replace a lot of commercial illustration.
|
| [?]
|
| 1. My current work background is an enormous screen-filling
| eyeball. For my writing group, I try to have something that
| reflects the story I'm workshopping if I'm workshopping
| that week and something surreal otherwise.
|
| 2. My most expensive custom illustration was a title for an
| article about stone carver/letterer David Kindersley which
| I had inscribed in stone and photographed.
| recuter wrote:
| Certainly food for thought.
|
| Say I'm looking for photography of real events and places,
| like a royal weeding or a volcano erupting does this help
| me? Of specific places and architectural features? Of a
| protest?
|
| You're suggesting clipart on steroids:
| https://thispersondoesnotexist.com
|
| I think if I was istockphoto.com I'd be a little worried,
| but that is _microstock_ photography. I 'm not sure that is
| worth billions. In fact I know it isn't.
|
| Besides once this tech is wildly available if anything it
| devalues this sort of thing further closer to $0.
|
| It would probably augment existing processes rather than
| replace them completely.
|
| If you are doing a photoshoot for a banana stand with a
| human model with characteristics x,y,z you're still going
| to get a human from an agency or craigslist to pose. If
| suddenly the client informs you that they needed human
| a,b,c instead maybe one of these forthcoming tools will let
| you swap that out faster. You'd upload your photoshoot and
| an example or two of the type of human model you wished you
| had retroactively and it would fix it up faster than an
| intern.
|
| Cool.
| johnwheeler wrote:
| Or as a precursor to Meta Horizon build a 3D world with
| speech
|
| https://www.fastcompany.com/90725035/metaverse-horizon-
| world...
| moyix wrote:
| > But if doesn't spit up exactly what you want can't edit it
| further.
|
| Why? You can tweak the prompt, change parameters, or even use
| the actual "edit" capability that they demo in the post.
| recuter wrote:
| Maybe I am misunderstanding but if you start tweaking the
| prompt you'll end with something completely different.
|
| The "edit" capability, as far as I can tell please correct
| me if I got confused, is picking your favorite out of the
| generated variations.
|
| I would like to "lock" the scene and add instructions like
| "throw in a reflection".
| Jack000 wrote:
| This is exactly what they demo - they lock a scene and
| add a flamingo in three different locations. In another
| one they lock the scene and add a corgi.
| recuter wrote:
| Not quite, it looks like this:
|
| - Provide an existing image
|
| - Provide a text prompt ("flamingo")
|
| - Select from X variations the new image that looks best
| to you - It does the equivalent of a
| google image search on your "flamingo" prompt - It
| picks the most blend-able ones as a basis to a new
| synthetic flamingo - It superimposes the result on
| your image
|
| Very cool don't get me wrong. Now I want to tweak this
| new floating flamingo I picked further, or have that
| Corgi in the museum maybe sink into the little couch a
| bit as it has weight in the real world.
|
| Can't. You'd have to start over with the prompt or use
| this as the new base image maybe.
|
| The example with furniture placement in an empty room is
| also very interesting. You could describe the kind of
| couch you want and where you want it and it will throw
| you decent options.
|
| But say I want the purple one in the middle of the room
| that it gave me as an option, but rotated a little bit.
| It would generate a completely new purple couch. Maybe it
| will even look pretty similar but not exactly the same.
|
| See what I mean?
| ricardobeat wrote:
| That's not how this works. There is no 'search' step,
| there is no 'superimposing' step. It's not really
| possible to explain what the AI is doing using these
| concepts.
|
| If you pay attention to all the corgi examples, the sofa
| texture changes in each of them, and it synthesizes
| shadows in the right orientation - that's what it's
| trained to do. The first one actually does give you the
| impression of weight. And if you look at "A bowl of soup
| that looks like a monster knitted out of wool" the bowl
| is clearly weighing down. I bet if the picture had a more
| fluffy sofa you would indeed see the corgi making an
| indent on it, as it will have learned that from its
| training set.
|
| Of course there will be limits to how much you can edit,
| but then nothing stops you from pulling that into
| Photoshop for extra fine adjustments of your own. This is
| far from a 'cool trick' and many of those images would
| take _hours_ for a human to reproduce, especially with
| complex textures like the Teddy Bear ones. And note how
| they also have consistent specular reflections in all the
| glass materials.
| mahastore wrote:
| I wish there was something available in open source that has
| similar functions i.e sensible amalgamation of pictures based
| on some text.
| rileyphone wrote:
| It would be interesting to see more attempts to "reverse
| engineer" ML models like in
| https://distill.pub/2020/circuits/curve-circuits - maybe even
| with a ML model of its own?
| Imnimo wrote:
| Yeah, I mean you're right that ultimately the proof is in the
| pudding.
|
| But I do think we could have guessed that this sort of
| approach would be better (at least at a high level - I'm not
| claiming I could have predicted all the technical details!).
| The previous approaches were sort of the best that people
| could do without access to the training data and resources -
| you had a pretrained CLIP encoder that could tell you how
| well a text caption and an image matched, and you had a
| pretrained image generator (GAN, diffusion model, whatever),
| and it was just a matter of trying to force the generator to
| output something that CLIP thought looked like the caption.
| You'd basically do gradient ascent to make the image look
| more and more and more like the text prompt (all the while
| trying to balance the need to still look like a realistic
| image). Just from an algorithm aesthetics perspective, it was
| very much a duct tape and chicken wire approach.
|
| The analogy I would give is if you gave a three-year-old some
| paints, and they made an image and showed it to you, and you
| had to say, "this looks like a little like a sunset" or "this
| looks a lot like a sunset". They would keep going back and
| adjusting their painting, and you'd keep giving feedback, and
| eventually you'd get something that looks like a sunset. But
| it'd be better, if you could manage it, to just teach the
| three-year-old how to paint, rather than have this brute
| force process.
|
| Obviously the real challenge here is "well how do you teach a
| three-year-old how to paint?" - and I think you're right that
| that question still has a lot of alchemy to it.
| johnfn wrote:
| I gotta be missing something here, because wasn't "teaching
| a three year old to paint" (where the three year old is
| DALLE) the original objective in the first place? So if
| we've reduced the problem to that, it seems we're back
| where we started. What's the difference?
| Imnimo wrote:
| I meant to say that Dall-E 2's approach is closer to
| "teaching a three year old to paint" than the alternative
| methods. Instead of trying to maximize agreement to a
| text embedding like other methods, Dall-E 2 first
| predicts an _image embedding_ (very roughly analogous to
| envisioning what you 're going to draw before you start
| laying down paint), and then the decoder knows how to go
| from an embedding to an image (very roughly analogous to
| "knowing how to paint"). This is in contrast to
| approaches which operate by repeatedly querying "does
| this look like the text prompt?" as they refine the image
| (roughly analogous to not really knowing how to paint,
| but having a critic who tells you if you're getting
| warmer or colder).
| [deleted]
| recuter wrote:
| I don't think it is actually painting at all but I need to
| read the paper carefully.
|
| I think it is using a free text query to select the best
| possible clipart from a big library and blends it together.
| Still very interesting and useful.
|
| It would be extremely impressive if the "Kuala dunking a
| basketball" had a puddle on the court in which it was
| reflected correctly, that would be mind blowing.
| Imnimo wrote:
| This is actual image generation - the 'decoder' takes as
| input a latent code (representing the encoding of the
| text query), and _synthesizes_ an image. It 's not
| compositing or querying a reference library. The only
| time that real images enter the process is during
| training - after that, it's just the network weights.
| recuter wrote:
| It is compositing as final step. I understand that the
| Kuala it is compositing may have been a previously un-
| existent Kuala that it synthesized from a library of
| previously tagged Kuala images... that's cool, but what
| is the difference really from just plucking one of the
| pre-existing Kualas into the scene?
|
| The difference is just that it makes the compositing
| easier. If you don't have a pre-existing image that would
| match the shadows and angles you can hallucinate a new
| Kuala that does. Neat trick.
|
| But I bet if I threw the poor marsupial at a basket net
| it would look really differently than the original
| clipart of it climbing some tree in a slow and relaxed
| manner. See what I mean?
|
| Maybe Dall-E 2 can make it strike a new pose. The limb
| positions could be altered. But the facial expression?
|
| And if the basketball background has wind blowing leaves
| in one direction the Kuala fur won't match, it will look
| like the training set fur. The puddle won't reflect it.
| 'etc.
|
| This thing doesn't understand what a Kuala is like a 3-yr
| old. It understands the text "Kuala" is associated with
| that tagged collection of pixel blobs and can conjure up
| similar blobs unto new backgrounds - but it can't paint
| me a new type of Kuala that it hasn't seen before. It
| just looks that way.
| andybak wrote:
| > It is compositing as final step.
|
| I might be misinterpeting your use of "compositing" here
| (and my own technical knowledge is fairly shallow) but I
| don't think there's any compositing of elements generally
| in AI image generation. (unless Dall-E 2 changes this. I
| haven't read the paper yet)
| recuter wrote:
| https://cdn.openai.com/papers/dall-e-2.pdf
|
| > Given an image x, we can obtain its CLIP image
| embedding zi and then use our decoder to "invert" zi,
| producing new images that we call variations of our
| input. .. It is also possible to combine two images for
| variations. To do so, we perform spherical interpolation
| of their CLIP embeddings zi and zj to obtain intermediate
| zth = slerp(zi, zj , th), and produce variations of zth
| by passing it through the decoder.
|
| From the limitations section:
|
| > We find that the reconstructions mix up objects and
| attributes.
| Jack000 wrote:
| The first quote is talking about prompting the model with
| images instead of text. The second quote is using "mix
| up" in the sense that the model is confused about the
| prompt, not that it mixes up existing images.
|
| ML models can output training data verbatim if they over-
| fit, but a well trained model does extrapolate to novel
| inputs. You could say that this model doesn't know that
| images are 2d representations of a larger 3d universe,
| but now we have NERF which kind of obsoletes this
| objection as well.
| recuter wrote:
| The model is "confused about the prompt" because it has
| no concept of a _scene_ or of (some sort of) reality.
|
| If we task "Kuala dunking basketball" to a human and
| present them with two images, one of a Kuala climbing a
| tree and another of a basketball player dunking - the
| human would cut out the foreground (Human, Kuala) from
| the background (basketball court, forest) and swap them
| places easily.
|
| The laborious part would be to match the shadows and
| angles in the new image. This requires skill and effort.
|
| Dall-E would conjure up an entirely novel image from
| scratch, dodging this bit. It blended the concepts
| instead, great.
|
| But it does not understand what a basketball court
| actually is, or why the Kuala would reflect in a puddle.
| Or why and how this new Kuala might look different in
| these circumstances from previous examples of Kualas that
| it knows about.
|
| The human dunker and the kuala dunker are not truly
| interchangeable. :)
| andybak wrote:
| I'm not sure that's "compositing" except in the most
| abstract sense? But maybe that's the sense in which you
| mean it.
|
| I'd argue that at no point is there a representation of a
| "teddy bear" and "a background" that map closely to their
| visual representation - that are combined.
|
| (I'm aware I'm being imprecise so give me some leeway
| here)
| [deleted]
| dash2 wrote:
| >And if the basketball background has wind blowing leaves
| in one direction the Kuala fur won't match, it will look
| like the training set fur. The puddle won't reflect it.
|
| If you read the article, it gives examples that do
| _exactly_ this. For example, adding a flamingo shows the
| flamingo reflected in a pool. Adding a corgi at different
| locations in a photo of an art gallery shows it in
| picture style when it 's added to a picture, then in
| photorealistic style when it's on the ground.
| recuter wrote:
| Well not so much an article as really interesting hand
| picked examples. The paper doesn't address this as far as
| I can tell. My guess is that this is a weak point that
| will trip it up occasionally.
|
| A lot of the time it doesn't super matter, but sometimes
| it does.
| duxup wrote:
| This isn't something I'm knowledgeable on so forgive my
| simplification but is this like a sort of micro services for
| AI. Each AI takes their turn handing some aspect, another sort
| of mediates among them?
| Imnimo wrote:
| I'd say Dall-E 2 is a little more unified - they do have
| multiple networks, but they're trained to work together. The
| previous approaches I was talking about are a lot more like
| the microservices analogy. Someone published a model (called
| CLIP) that can say "how much does this image look like a
| sunset". Someone else published a totally different model
| (e.g. VQGAN) that can generate images (but with no way to
| provide text prompts). A third person figures out a clever
| way to link the two up - have the VQGAN make an image, ask
| CLIP how much it looks like a sunset, and use backpropagation
| to adjust the image a little, repeat until you have a sunset.
| Each component is it's own thing, and VQGAN and CLIP don't
| know anything about one another.
| duxup wrote:
| Got it, thanks.
|
| Makes sense to me as far as avoiding a sort of maximized
| sunset that is always there and is SUNSET rather than a
| nice sunset... but also avoiding watering it down and
| getting a way too subtle sunset.
|
| It's not AI but I've been watching some folks solving /
| trying to solve some routing (vehicles) problems and you
| get the "this looks like it was maximized for X" kind of
| solution but that's maybe not what is important / customer
| perception is unpredictable. I kinda want to just come up
| with 3 solutions and let someone randomly click .... in
| fact i see some software do that at times.
| Imnimo wrote:
| Yeah, I think the trick is that when you ask for "a
| picture of a sunset", you're really asking for "a picture
| of a sunset that looks like a realistic natural image and
| obeys the laws of reality and is consistent with all of
| the other tacit expectations a human has for an image".
| And so if you just go all in on "a picture of a sunset",
| you often end up with what a human would describe as "a
| picture of what an AI thinks a sunset is".
| krick wrote:
| While the whole narrative of your comment totally makes sense,
| I don't really see the difference between the two approaches,
| not on a conceptual level. You still needed to train this so
| called "prior" at some point (so, I'm also not sure if it's
| fair to call it a "prior"). I mean, the difference between your
| two descriptions seems to be the difference between
| _descriptions_ (i.e., how you chose to name individual parts of
| the system), not the systems.
|
| I'm not sure if I'm speaking clearly, I just don't understand,
| what's the difference between training "text encoding to an
| image" vs "text embedding to image embedding". In both cases
| you have some kind of "sunset" (even though it's obviously just
| a dot in a multi-dimension space, not the letters) on the left,
| and you try to maximize it when training the model to get
| either a image-embedding or a image straight away.
| Imnimo wrote:
| Yeah, my comment didn't really do a good job of making clear
| that distinction. Obviously the details are pretty technical,
| but maybe I can give a high-level explanation.
|
| The previous systems I was talking about work something like
| this: "Try to find me the image the looks like it _most_
| matches 'a picture of a sunset'. Do this by repeatedly
| updating your image to make it look more and more like a
| sunset." Well, what looks more like a sunset? Two sunsets!
| Three sunsets! But this is not normally the way images are
| produced - if you hire an artist to make you a picture of a
| bear, they don't endeavor to create the _most_ "bear" image
| possible.
|
| Instead, what an artist might do is envision a bear in their
| head (this is loosely the job of the 'prior' - a name I agree
| is confusing), and then draw _that_ particular bear image.
|
| But why is this any different? Who cares if the vector I'm
| trying to draw is a 'text encoding' or an 'image encoding'?
| Like you say, it's all just vectors. Take this answer with a
| big grain of salt, because this is just my personal intuitive
| understanding, but here's what I think: These encodings are
| produced by CLIP. CLIP has a text encoder and an image
| encoder. During training, you give it a text caption and a
| corresponding image, it encodes both, and tries to make the
| two encodings close. But there are many images which might
| accompany the caption "a picture of a bear". And conversely
| there are many captions which might accompany any given
| picture.
|
| So the text encoding of "a picture of a bear" isn't really a
| good target - it sort of represents an amalgamation of all
| the possible bear pictures. It's better to pick one bear
| picture (i.e. generate one image embedding that we think
| matches the text embedding), and then just to try to draw
| that. Doing it this way, we aren't just trying to find the
| maximum bear picture - which probably doesn't even look like
| a realistic natural image.
|
| Like I said, this is just my personal intuition, and may very
| well be a load of crap.
| swalsh wrote:
| Do you think some of these techniques could be slightly
| modified, and applied to DNA sequences?
| snek_case wrote:
| Maybe very very short (single-gene) sequences. The thing with
| DNA is it's the product of evolution. The DNA guides the
| synthesis of proteins, then the proteins fold into a 3D
| shape, and they interact with chemicals in their environment
| based on their shape.
|
| In the context of a living being, different genes interact
| with each other as well. For example, you have certain cells
| that secrete hormones (many genes needed to do that), then
| you have genes that encode for hormone receptors, and those
| receptors trigger other actions encoded by other genes.
| There's probably too much complexity to ask an AI system to
| synthesize the entire genetic code for a living being. That
| would be kind of like if I asked you to draw the exact
| blueprints for a fighter get, and write all the code, and
| synthesize all the hardware all at once, and you only get one
| shot. You would likely fail to predict some of the
| interactions and the resulting system wouldn't work. You
| could only achieve this through an iterative process that
| would involve years of extensive testing.
|
| Could you use a deep learning system to synthesize genetic
| code? Maybe just single genes that do fairly basic things,
| and you would need a massive dataset. Hard to say what that
| would look like. Is it really enough to textually describe
| what a gene does?
| Jack000 wrote:
| This is all true, but it doesn't preclude the possibility
| of generating DNA. Human share a lot of DNA sequences with
| other animals, and the genetic differences between
| individual humans are even smaller. You might have trouble
| generating a human with horns or something, but a taller
| one is probably mostly an engineering problem.
|
| What GPT-3 and DALL-E shows is that you can infer a lot
| based on the latent structure of data, even without
| understanding the underlying physical process.
| dekhn wrote:
| probabilistic generative models have been applied to DNA and
| protein sequences for decades (my undergrad thesis from ~30
| years ago did this and it wasn't even new at that point). The
| real question is what question you want to answer and what is
| this system going to do better enough to justify the time
| investment to prove it out?
| zone411 wrote:
| Some more examples:
| https://twitter.com/sama/status/1511724264629678084
| jdrc wrote:
| there are some masterpieces there. this is the end of clipart
| and stock images, and the beginning of awesome illustrations in
| every article.
| lalopalota wrote:
| One step closer to combining Scribblenuats with emoticons!
| gallerdude wrote:
| This is extremely interesting. We've had some amazing AI models
| come out in the past few days. We're getting closer and closer to
| AI becoming a facet of everyday life.
| turdnagel wrote:
| I'm genuinely curious to hear Sam Altman's (and/or the OpenAI
| team's) perspective on why these products need to be waitlisted.
| If it's a compute issue, why not build a queuing system? If it's
| something else (safety related? hype related?) I'd love to
| understand the thinking behind the decision. More often than not,
| I sign up for waitlists for things like this and either (1) never
| get in to the beta or (2) forget about it when I eventually do
| get in.
| minimaxir wrote:
| For GPT-3 it was a combination of both compute and safety.
| Given the notes in the System Card (https://github.com/openai/d
| alle-2-preview/blob/main/system-c... ), OpenAI is likely
| doubling-down on safety here.
| croddin wrote:
| This reminds me of the holodeck in Star Trek. Someone could walk
| into the Holodeck and say "make a table in the center of the
| room. Make it look old." It seemed amazing to me that the
| computer could make anything and customize it with voice. We are
| pretty close to star trek technology now in computer ability
| (ship's computer, not Commander Data). I guess to really be like
| the holodeck it needs to be able to do 3d and be in real time but
| that seems a lot closer now. It will be cool when this could be
| in VR and we can say make an astronaut riding a horse, then we
| can jump on the back of the horse and ride to a secret moon base.
| [deleted]
| jelliclesfarm wrote:
| "Preventing Harmful Generations"? = Fail.
|
| Caravaggio is probably chortling from wherever he is ..
| marcodiego wrote:
| Cartoonists, say good-bye to your job.
| criddell wrote:
| Randall Munroe should quit now. Soon anybody will be able to
| create XKCD-type comics.
| Imnimo wrote:
| Maybe one day there will a job for people who are masters of
| the art of prompt hacking - they know all the special phrases
| and terms to get Dall-E to output the most aesthetically
| pleasing images. They guard their magic words like a medieval
| alchemist guards his formulas. Corporations will pay top-dollar
| for an expertly-crafted, custom-tailored prompt for their
| advertising campaign.
| rvz wrote:
| NFTs using Dall-E 2 variations incoming.
| loufe wrote:
| Not that it's impossible to hide the provenance of an image,
| but it is explicitly forbidden in the TOS of DALL-E to sell
| the images as NFTs or otherwise.
| atarian wrote:
| That's just going to make them more valuable.
| andybak wrote:
| The goalposts are definitely being moved. But tastes adapt
| accordingly.
|
| I suspect trends in design will move towards those areas that
| AI struggles with (assuming there are any left!)
| mouzogu wrote:
| So what does the future of human creativity look like when an AI
| can generate possibly infinite variations of an idea.
| tomrod wrote:
| I seem to recall an XKCD that I cannot find, but the premise
| goes like:
|
| When you have a digital display of pixels, if you randomly
| color pixels at 24 fps then you will eventually display every
| movie that can be or will ever be made, powerset
| notwithstanding. This can also be tied to digital audio.
|
| In short, while mind-blowingly large, the space of display
| through digital means is finite.
| mouzogu wrote:
| Sounds a bit like the tower of babel of jorge borges. I
| imagine most of the videos would be complete random nonsense.
|
| I think an AI infused future is going to become increasingly
| more absurd and surreal, it will lead to a kind of creative
| and cultural nihilism, if that's the right term.
|
| Like the value of originality will become meaningless.
| visarga wrote:
| The artist or the audience would have to ultimately select
| something from all that automated originality.
| 6gvONxR4sf7o wrote:
| I expect that interactive art will be huge. Game design gets
| fascinating, for example.
| andreyk wrote:
| AI becomes a tool for artists to use - generative art has been
| around for a long time, now that particular genre of art will
| presumably become much more prominent.
|
| For anyone pondering such questions, I would recommend reading
| "The Past, Present, and Future of AI Art" -
| https://thegradient.pub/the-past-present-and-future-of-ai-ar...
| pingeroo wrote:
| Wouldn't it be more like, "AI becomes an artist for people to
| use"? Will we have people distinguished as "artists" if the
| ability to make awesome art becomes available to everybody?
| andreyk wrote:
| AI still needs the text prompt to know what to generate.
| Hence the human who provides the prompt is still the
| artist, just like a photographer finds an aesthetically
| interesting spot to take the image with their camera.
| Cameras make images, humans using cameras make art.
| Granted, this is not quite 1-1 with AI art, but still the
| idea is the same. If anything the flood of AI images will
| only require artists to go beyond what is possible with
| these text->image kinds of things, of which there is no
| shortage.
| keiferski wrote:
| I think you'll see more of a focus on the artist themselves.
| These images are nice, but they have basically zero narrative
| value.
|
| This is really already the case, actually. Most artworks have
| "value" because they have a compelling narrative, not because
| they look pretty. So I think we can expect future artists to
| really emphasize their background, life story, process of
| making the art, etc. All things that cannot be done by a
| machine.
| Apofis wrote:
| So I can't do Teddy Bears Riding a Horse?
| arecurrence wrote:
| Is there a geometric model relative to this? EG: "corgi near the
| fireplace" but the output is a 3d model of the corgi and
| fireplace with shaders rather than an image.
| Ftuuky wrote:
| What jobs will be there in 5~10 years when we consider all the
| progress done with Dall-E, GPT-3, Codex/GitHub Copilot, Alpha*
| and so on?
| phphphphp wrote:
| Most creative output is duplicated effort: consider how much
| code each person on HN has written that has been written
| before. Consider how, a decade ago, we were all writing html
| and styling it, element by element, and then Twitter bootstrap
| came along and revolutionised front-end development in what is,
| ultimately, a very small and low technology way. All it really
| did was reduce duplicate effort.
|
| Nowadays there's lots of great low/no code platforms, like
| Retool, that represent a far greater threat to the amount of
| code that needs to be produced than AI ever will.
|
| To use a cliche: code is a bug, not a feature. Abstracting away
| the need for code is the future, not having a machine churn out
| the same code we need today.
| beders wrote:
| The ones undoing the damage caused by dumb pattern recognizers
| and generators? ;)
| 6gvONxR4sf7o wrote:
| Things that require understanding of causation will be safe
| longer. Progress like this is driven by massive datasets.
| Meanwhile, real world action-taking applications require
| different paradigms to take causation into account[0][1], and
| especially to learn safely (e.g. learning to drive without
| crashing during the beginner stages).
|
| There's certainly research happening around this, and RL in
| games is a great test bed, but people choosing actions will
| safe from automation longer than people not choosing actions,
| if that makes sense. It's the person who decides "hire this
| person" vs the person who decides "I'll use this particular
| shade of gray."
|
| [0] The best example is when X causes Y and X also causes Z,
| but your data only includes Y and Z. Without actually
| manipulating Y, you can't see that Y doesn't cause Z, even if
| it's a strong predictor.
|
| [1] Another example is the datasets. You need two different
| labels depending on what happens if you take action A or B,
| which you can't have simultaneously outside of simulations.
| cm2012 wrote:
| Sam's Twitter thread today was more impressive than the website.
|
| https://twitter.com/sama/status/1511724264629678084?s=20&t=6...
| ordu wrote:
| Dall-E 2 seems to be incapable to catch the essence of the art.
| I'm not really surprised by it, I'd be surprised a lot if it
| could. But nevertheless: if you looked in the eye of a Girl With
| A Pearl Earring[1], you'd be forced to stop and to think what
| does she have on her mind right now. Or may be you had some other
| question in your mind, but it really stops people to think. But
| none of Dall-E interpretations have this quality. Works inspired
| by Girl With A Pearl Earring sometimes have at least part of that
| power, like Girl With a Babmoo Earring[2]. But none of Dall-E
| interpretations have such a power.
|
| And this observation may lead to a great consequences for visual
| arts. I had a lot of joy of looking at different Dall-E
| interpretations to find what the flaw of the interpretation that
| forbids it to be a piece of art of an equal value to the
| original. It is a ready made tool to search for explanations of
| the Power of Art. It cannot say what detail make a picture to be
| an artwork, but it allow to see multiple data points, and to
| narrow the hypothesis space. My main conclusion is that the pearl
| earring have nothing to do with the power of art. It is something
| in the eye, and probably with the slightly opened mouth. (Somehow
| Dall-E pictured all interpretations with closed lips, so it seems
| to be an important thing, but I need more variation along this
| axis to be sure).
|
| [1] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring [2]
| https://yourartshop-noldenh.com/awol-erizku-girl-with-the-pe...
| hcks wrote:
| On the meta level, we are now at the point where the dubious
| comments downplaying the AI starts arguing on the plane of art
| criticism
| jdrc wrote:
| art criticism should be off topic here. This is more like
| chopping off the visual cortex and some association cortex from
| a brain and stimulating it. there is no person signaling to us,
| nor can we attribute any striking images that may come up to a
| person with agency.
|
| But its like a giant database of decent clipart for anything we
| can imagine
| ordu wrote:
| _> This is more like chopping off the visual cortex and some
| association cortex from a brain and stimulating it._
|
| We do not know exactly what part of our perception of reality
| can be attributed to "the visual cortex and some association
| cortex". But now we can feel it. We can test it. We can
| compare ourselves with the cold calculating machine. I
| believe that it is a priceless opportunity that we shouldn't
| miss. At least I personally can't. I'm going to figure out is
| it possible to me to have such a companion as Dall-E in mine
| wanderings in a sea of information in Internet, and if it is,
| then to get one.
|
| _> But its like a giant database of decent clipart for
| anything we can imagine_
|
| And this also. Yes. Though I'm not interested in clipart.
| joshcryer wrote:
| What do you think of the third to last image of the Girl With A
| Pearl Earring that DALL-E 2 created? I find it more compelling
| than the original with how her face is deeply cast in shadow.
| There's still that original 'essence' of the glint in her eye.
| But her earring is a bell. As if the AI is sending a message
| that what if the bell were to ring?
| ordu wrote:
| I'm not sure, that I can express myself in English, which is
| not my native language, and this needs some very nuanced
| control over tiniest shades of meaning, but I'll try
| nevertheless, just for fun of it at least.
|
| The original girl is more open, more independent and
| mindless. The interpretation's girl is more self-controlled,
| assertive and not interested really, just going throw all
| those movements of regular communication between people.
| Maybe it's just me, but what I really value on such occasions
| is mindlessness, the ability of people to not mind
| themselves, to let their selves to dissolve in the
| environment. I cannot keep tears in my eyes sometimes when I
| watch some entertainer playing Chopin or Paganini, because
| what I see in their movements is complete dissolution of a
| person in a piece of music, in a piece of art and skill. An
| entertainer just do what they do with their full attention on
| it, and with all their motivation focused on it. There is
| nothing here for them, just them and their actions.
|
| There is not a single thought devoted to how people around me
| would react to what I do and how I do that. I just do what I
| do and I do not care about people around me, and if it
| somehow makes people happy... I don't care really. I mean I
| know that afterwards I'd feel a pride of myself, but just for
| now I don't really care.
|
| I know this feeling. I like to sing, and I'm good at it
| (above average), and I know what it feels like to dissolve
| into the song and to let song to rule. I play piano and I
| know what it is like to dissolve into the piece I'm playing,
| to stop myself from existing, to let music to take the lead.
| And the original painting make me believe that the girl is in
| this state of mind. I do not know the history or the
| remaining of the story, I do not know if she get into this
| state for a second, of she never leaves it (it may be a sad
| experience, don't you think?), but somehow I know that right
| now she is right in this state. I want to watch this her
| moment for an eternity.
|
| Thinking about it, I'd confess that Interpretation Girl does
| trigger the same, but on a smaller scale. I feel how my mind
| is trying to find a coherent state to her gaze, but this
| feeling stops in tens of microseconds, not hundreds of them.
|
| edit: want->watch. Stupid mistake ruining the meaning of the
| sentence.
| [deleted]
| Veedrac wrote:
| Initial Outputs from New AI Model Not As Good at Nuance as
| Historic Artwork, Approach Deemed Hopeless
| ordu wrote:
| Oh... Not hopeless. The very fact that I spent some minutes
| watching Interpretations of Girl With a Pearl Earring, is the
| enough evidence that it is not hopeless. I praise the work
| that was done. Moreover I hoped that people would get it as
| an inspiration to do even more.
| awinter-py wrote:
| They're using training set restriction and prompt engineering to
| control its output
|
| > By removing the most explicit content from the training data,
| we minimized DALL*E 2's exposure to these concepts
|
| > We won't generate images if our filters identify text prompts
| and image uploads that may violate our policies
|
| The 'how to prevent superintelligences from eating us' crowd
| should be taking note: this may be how we regulate creatures
| larger than ourselves in the future
|
| And even how we regulate the ethics of non-conscious group minds
| like big companies
| 6gvONxR4sf7o wrote:
| This is a niche complaint, but I get frustrated at how imprecise
| open AI's papers are. When they describe the model architecture,
| it's never precise enough to reproduce exactly what they did. I
| mean, it pretty much never is in ML papers[0], but open AI's
| bigger products are worse than average with it. And it makes
| sense, since they're trying to be concise and still spend time on
| all the other important stuff besides methods, but it still
| frustrates me quite a bit.
|
| [0] Which is why releasing your code is so beneficial.
| greyhair wrote:
| Interesting, yes, but I went to the link, and browsed the
| 'generated artwork' and all if it was subjectively inferior to
| the original that it generated from. Every single piece. So I am
| not sure what the 'value' in it is, at this stage.
|
| As far as the text driven, I would have to mess with some non
| pre-canned presentations to see how useful it was.
| krick wrote:
| Regardless of how much cherry-picking there was, some of these
| pictures are just beautiful.
| jedberg wrote:
| This reminds me of a discussion I had with the high school band
| teacher in the 90s. I was telling him that one day computers
| would play music and you won't be able to tell the difference. He
| got mad at me and told me that a computer could never play as
| well as a human with feelings, who can _feel_ the piece and
| interpret it.
|
| I think we passed that point a while ago, but seeing this makes
| me think we aren't too far off from computers composing pieces
| that actually sound good too.
| andybak wrote:
| Some freely available models
|
| GLID-3:
| https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...
|
| and a new Latent Diffusion notebook:
| https://colab.research.google.com/github/multimodalart/laten...
|
| have both appeared recently and are getting remarkably close to
| the original Dall-E (maybe better as I can't test the real
| thing...)
|
| So - this was pretty good timing if OpenAI want to appear to be
| ahead of the pack. Of course I'd always pick a model I can
| actually use over a better one I'm not allowed to...
| Jack000 wrote:
| With glide I think we've reached something of a plateau in
| terms of architecture on the "text to image generator S curve".
| DALL-E-2 is a very similar architecture to glide and has some
| notable downsides (poorer language understanding)
|
| glid-3 is a relatively small model trained by a single guy on
| his workstation (aka me) so it's not going to be as good. It's
| also not fully baked yet so ymmv, although it really depends on
| the prompt. The new latent diffusion model is really amazing
| though and is much closer to DALLE-2 for 256px images.
|
| I think the open source community will rapidly catch up with
| Openai in the coming months. The data, code and compute are all
| there to train a model of similar size and quality.
| andybak wrote:
| Wow. Thanks for GLID-3. It was genuinely exciting for a few
| days but then I must admit latent diffusion stole my
| attention somewhat ;-)
|
| What kind of prompts is GLID-3 especially good for? I
| remember getting lucky when I was playing around a few times
| but I didn't do it systematically.
| Jack000 wrote:
| glid-3 is trained specifically on photographic-style
| images, and is a bit better at generalization compared to
| the latent diffusion model.
|
| eg. prompt: half human half Eiffel tower. A human Eiffel
| tower hybrid (I get mostly normal Eiffel towers from LDM
| but some sensical results from glid-3)
|
| glid-3 will be worse for things that require detailed
| recall, like a specific person.
|
| With smaller models you kind of have to generate a lot of
| samples and pick out the best ones.
| loufe wrote:
| I think this is really neat, but definitely not on the same
| tier as DALL-E 2, at least from the cherry-picked images I saw.
| andybak wrote:
| I'm not sure what you've seen but I've been very impressed
| indeed by some results I've obtained. Some less so.
|
| It's hard to compare because we don't know how much cherry
| picking is going on with published Dall-E results (either v1
| or v2)
|
| My gut feeling is that it's in the same ballpark as Dall-E 1
| hwers wrote:
| They're also not censored on the dataset front and thus produce
| much more interesting outputs.
|
| OpenAI has a low resolution checkpoint for similar
| functionality as this - called GLIDE - and the output is super
| boring compared to community driven efforts, in large part
| because of similar dataset restrictions as this likely has been
| subjected to.
| FreeHugs wrote:
| How do you run such a Google Colab thing?
|
| I don't see a run button?
|
| On.. maybe "Runtime -> Run All" from the menu ...
|
| Shows me a spinning circle around "Download model" ...
|
| 26% ...
|
| Fascinating, that Google offers you a computer in the cloud for
| free ..
|
| Now it is running the model. Wow, I'm curious ..
|
| Ha, it worked!
|
| Nothing compared to the images in the Dall-E 2 article but
| still impressive.
| minimaxir wrote:
| Google is a company with a lot of spare VMs and GPUs.
|
| However, the free GPU is now a K80 which is obsolete and
| barely sufficient for running these types of models.
| nl wrote:
| You sometimes still get T4s. I got one last week and it was
| great.
| qualudeheart wrote:
| Deep Learning plows through yet another wall.
| kovek wrote:
| One of my teachers once said "An art piece is never done". So, I
| wonder what could that mean for the model to keep making
| improvements to the piece.
| chronolitus wrote:
| IIRC that's how it works! it starts from a first image, and
| improves it until 'satisfied' that the result fits the prompt
| bakztfuture wrote:
| I made a YouTube series last summer on the massive potential
| future of DALL-E and multimodal AI models.
|
| Imagine not just DALL-E 2 but a single model which be trained on
| different kinds of media and generate music, images, video and
| more.
|
| The series talks about:
|
| - essential lessons for AI creatives of the future
|
| - shares details on how to compete creatively in the future
|
| - talks about how to make money through Multimodal AI
|
| - make predictions about AI's effects on society
|
| - at a very basic level, discusses the ethics of multimodal AI
| and the philosophy of creativity itself
|
| By my understanding, it's the most comprehensive set of videos on
| this topic.
|
| The series is free to watch entirely on YouTube: GPT-X, DALL-E,
| and our Multimodal Future
| https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9...
| rvz wrote:
| At this point with WaveNet, GPT-3, Codex, DeepFakes and Dall-E 2,
| you cannot believe anything you see, hear, watch, read on the
| internet anymore as an AI can easily generate nearly anything
| that can be quickly believable by millions.
|
| The internet's own proverb has never been more important to keep
| in mind. A dose of skepticism is a must.
| aChrisSmith wrote:
| I can see how this has the potential to disrupt the games
| industry. If you work on a AAA title, there is a small army of
| artists making 19 different types of leather armor. Or 87 images
| of car hubcaps.
|
| Using something like this could really help automate or at least
| kickstart the more mundane parts of content creation. (At least
| when you are using high resolution, true color imagery.)
| killerstorm wrote:
| This thing can't do 3D models.
|
| There are some 3D image generation techniques, but they aren't
| based on polygonal modelings, so 3D artists are safe for now
| pwillia7 wrote:
| You could train a model on texture image data though, no?
|
| Or what about even generating images you could then
| photogrammetry into models?
| rndphs wrote:
| This is going to be mostly a rant on OpenAI's "safer than thou"
| approach to safety, but let me start with that I think this
| technology I think is really cool, amazing, powerful stuff.
| Dall-E (and Dall-E 2) is an incredible advance over GANs, and no
| doubt will have many positive applications. It's simply
| brilliant. I am someone who has been interested in and has
| followed the progress of ML generated images for nearly a decade.
| Almost unimaginable progress has been made in the last five years
| in this field.
|
| Now the rant:
|
| I think if OpenAI genuinely cared about the ethical consequences
| of the technology, they would realise that any algorithm they
| release will be replicated in implementation by other people
| within some short period of time (a year or two). At that point,
| the cat is out of the bag and there is nothing they can do to
| prevent abuse. So really all they are doing is delaying abuse,
| and in no way stopping it.
|
| I think their strong "safety" stance has three functions:
|
| 1. Legal protection 2. PR 3. Keeping their researchers'
| consciences clear
|
| I think number 3 is dangerous because researchers are put under
| the false belief that their technology can or will be made safe.
| This way they can continue to harness bright minds that no doubt
| have ethical leanings to create things that they otherwise
| wouldn't have.
|
| I think OpenAI are trying to have the cake and eat it too. They
| are accelerating the development of potentially very destructive
| algorithms (and profiting from it in the process!), while trying
| to absolve themselves of the responsibility. Putting bandaids on
| a tumour is not going to matter in the long run. I'm not
| necessarily saying that these algorithms will be widely
| destructive, but they certainly have the potential to be.
|
| The safety approach of OpenAI ultimately boils down to
| gatekeeping compute power. This is just gatekeeping via capital.
| Anyone with sufficient _money_ can replicate their models easily
| and bypass _every single one_ of their safety constraints.
| Basically they are only preventing _poor_ bad actors, and only
| for a limited time at that.
|
| These models cannot be made safe as long as they are replicable.
|
| To produce scientific research requires making your results
| replicable.
|
| Therefore, there is no ability to develop abusable technology in
| a safe way. As a researcher, you will have blood on your hands if
| things go wrong.
|
| If you choose to continue research knowing this, that is your
| decision. But don't pretend that you can make the _algorithms_
| safer by sanitizing models.
| duren wrote:
| I've been playing around with it today and have been super
| impressed with its ability to generate pretty artful digital
| paintings. Could have big implications for designers and artists
| if and when they allow you use custom palettes, etc.
|
| Here's an example from my prompt ("a group of farmers picking
| lettuce in a field digital painting"):
| https://labs.openai.com/s/jb5pzIdTjS3AkMvmAlx69t7G
| pingeroo wrote:
| Neat! Were you part of the initial testing batch or granted
| access via waitlist?
| d--b wrote:
| Am I the only one to think that the AI world is divided into 2
| groups:
|
| 1. Deepmind, who solved go, protein folding, and that seems
| really onto something.
|
| 2. Everyone else, spending billions to build machines that draw
| astronauts on unicorns, and smartish bot toys.
| gwf wrote:
| Your second group represents the core "inner loop" of about a
| thousand revolutionary applications. Take the basic capability
| of translating image->text->speech (and the reverse), install
| it on a wearable device that can "see" an environment, and add
| domain-specific agents. From this setup, you're not too far
| away from having an AI that can whisper guidance into your ear
| like a co-pilot, enabling scenarios like:
|
| 1. step-by-step guidance for a blind person navigating the use
| of a public restroom.
|
| 2. an EMS AI helping you to save someone's life in an
| emergency.
|
| 3. an AI coach that can teach you a new sport or activity.
|
| 4. an omnipresent domain-expert that can show you how to make a
| gourmet meal, repair an engine, or perform a traditional tea
| ceremony.
|
| 5. a personal assistant that can anticipate your information
| need (what's that person's name? where's the exit? who's the
| most interesting person here? etc.) and whisper the answer in
| your ear just as you need it.
|
| Now, add all of the above to an AR capability where you can now
| think or speak of something interesting and complex, and have
| it visualized right before your eyes. With this capability, I
| could augment my imagination with almost super-human
| capabilities that allow one to solve complex problems almost as
| if it was an internal mental monologue.
|
| All of these scenarios are just a short hop from where were at
| now, so mark my words: we will have "borgs" like those
| described above long before we reach anything like general AI.
| lkbm wrote:
| These are good examples of what we're getting close to, but
| I'd add that Copilot is already an extremely helpful tool for
| coding. I don't blindly trust its output, but its suggestions
| are what I want often enough to save a lot of typing.
|
| I still have to do all the hard thinking, but once I figure
| out what I want written and start typing, Copilot will spit
| out a good portion of the contextually-obvious lines of code.
| robotresearcher wrote:
| There's a third group for your list: AI stuff that's so good we
| don't think about it any more.
|
| For example, recent phone cameras can estimate depth per pixel
| from single images. Hundreds of millions of these devices are
| deployed. A decade ago this was AI/CV research lab stuff.
| emadabdulrahim wrote:
| OpenAI is one of the leading companies in AI that makes models
| with real world applications. I don't see their efforts as
| misdirected or futile in anyway. If anything I'm always
| impressed with their announcements because it's always mind
| blowing what their models can do!
|
| The same technology that is drawing cute unicorns can be used
| for endless other use cases. Perhaps the PR side of the launch
| and the subject matter they show unveil their product is just
| that, PR.
|
| It's like Apple Memoji thing (not sure if I'm spelling it
| correctly). You can think of as trivial and waste of talent to
| use their Camera/FaceID to animate cute animals based on facial
| expression, but that same tech will enable lots other things to
| come.
| trixie_ wrote:
| It all feels like the early days of electricity. How to turn a
| neat party trick into something more useful, but it was those
| people who kept on at better and better party tricks that
| actually formed the foundations for what was needed to do some
| really useful things electricity as well as understand it at a
| deeper level.
| _nateraw wrote:
| If you're interested in generative models, Hugging Face is
| putting on an event around generative models right now called the
| HugGAN sprint, where they're giving away free access to compute
| to train models like this.
|
| You can join it by following the steps in the guide here:
| https://github.com/huggingface/community-events/tree/main/hu...
|
| There will also be talks from awesome folks at EleutherAI,
| Google, and Deepmind
| eganist wrote:
| The timing of the Dall-E 2 launch an hour ago seems to correspond
| with a recent piece of investigative journalism by Buzzfeed News
| about one of Sam Altman's other ventures, published 15 hours ago
| and discussed elsewhere actively on HN right now:
|
| https://news.ycombinator.com/item?id=30931614
|
| I point this out because while Dall-E 2 seems interesting (I'm
| out of my depth, so delegating to the conversation taking place
| here), the timing of its release as well as accompanying press
| blasts within the last hour from sites like TheVerge--verified
| via wayback machine queries and time-restricted googling--seems
| both noteworthy and worth a deeper conversation given what was
| just published about Worldcoin.
|
| To be clear, it's worth asking if Dall-E 2 was published ahead of
| schedule without an actual product release (only a waitlist) to
| potentially move the spotlight away from Worldcoin.
| duxup wrote:
| What's the idea here? They quickly put out this to somehow hide
| other stories?
| eganist wrote:
| Yes, especially given there's no actual product release, only
| a waitlist.
|
| Easy to put together a marketing piece on short notice or
| potentially even push a pending marketing page out to
| production with a waitlist rather than links to production or
| even beta quality services.
| dang wrote:
| I don't have any knowledge (inside or otherwise) but the
| Worldcoin thing already came in for several rounds of abuse on
| HN, so it's kind of a scandal of the second freshness at this
| point.
|
| I listed some of them here -
| https://news.ycombinator.com/item?id=30934732, just because I
| remembered there had been previous discussions and listing
| related previous discussions is a thing.
| gallerdude wrote:
| Maybe I'm naive, but I see this as a coincidence. If it was an
| hour later, then maybe there would be something.
| eganist wrote:
| Another consideration, then: it was published to HN almost
| instantly after it was released to the world, 52 minutes
| after the HN post about Worldcoin was submitted and started
| showing traction.
|
| I don't see the publication of a marketing page (again, not a
| finished product) for a product founded by someone who's
| other main venture is being investigated by journalists for
| misleading claims as being a coincidence, but if the timing
| matters and 14-15 hours doesn't seem like it works for the
| assertion in your mind, then perhaps the Dall-E 2 page going
| live less than an hour after the Worldcoin HN submission fits
| the bill.
|
| I've got no horse in this race. I'm just drawing attention to
| familiar PR strategies used for brand risk mitigation, that's
| all.
| GranPC wrote:
| If the article GP refers to was posted 16 hours ago instead
| of 15, would that really make a difference?
| danso wrote:
| I'm not a huge fan of these coordination theories. But a few
| things worth noting:
|
| - In support of your argument, the Buzzfeed News investigation
| likely has been in the works for weeks, meaning Altman et al
| have had more than just a couple days to throw together a
| Dall-E 2 soft launch
|
| - However, weren't OpenAI's GPT (2 and 3) announced to the
| world in similar fashion? e.g. demos and whitepapers and
| waitlists, but not a full product release?
|
| - Throwing together a Dall-E 2 soft launch just in time to
| distract from the investigation would require a conspiracy,
| i.e. several people being at least vaguely aware that deadlines
| have been accelerated for external reasons. Is the Worldcoin
| story big enough to risk tainting OpenAI, which seems like a
| much more prominent part of Altman's portfolio?
| eganist wrote:
| For discussion's sake:
|
| - BFN reached out to A16Z, Worldcoin, Khosla Ventures largely
| declined to comment, which would mean that at least one
| person probably had a bit of runway from at least when the
| requests for comment were submitted. So yeah, you're probably
| right.
|
| - Going from the github repos for GPT 2 and 3, those may have
| been hard launches:
|
| Feb 14 2019, predating the first press for GPT-2 by a few
| days (was probably made public Feb 14 though) - https://githu
| b.com/openai/gpt-2/commit/c2dae27c1029770cea409...
|
| May 28 2020, timed alongside the press news for GPT-3 - https
| ://github.com/openai/gpt-3/commit/12766ba31aa6de490226e...
|
| - Would it really have to be a conspiracy? Sounds like only
| one person would have to target a specific date or date
| range, and without really giving a reason.
|
| One of the things that puts a hole in my own thinking here is
| that Sam Altman's name isn't really tied to the Dall-E 2
| release. It's just OpenAI, and the press around Sam's name
| _today_ still exclusively surfaces just this one Worldcoin
| story
| (https://news.google.com/search?q=sam+altman+when%3A1d&). So
| if this was actually intended to bury another story, Sam's
| name would have to have been included in all the press blasts
| to be successful. But the Buzzfeed story seems like it kinda
| died alone on the vine.
| nonfamous wrote:
| Genuine question: how are the two stories even related? It's
| certainly not apparent from the BuzzFeed article (or at least a
| quick skim of it).
| eganist wrote:
| Sam Altman is OpenAI's CEO.
|
| What I'm submitting for consideration is that the marketing
| page and associated press blasts (there's a live influencer
| reaction video airing right now about Dall-E 2, for instance)
| for Dall-E 2 were potentially pushed up to offset negative
| press from Worldcoin for their shared founder.
|
| I'd like to be wrong. But it's too well timed.
| thisistheend123 wrote:
| This is what magic looks like.
|
| Great work.
|
| Looking forward to when they start creating movies from scripts.
| Dig1t wrote:
| Most of the conversation around this model seems to be about its
| direct uses.
|
| This seems to me like a big step towards AGI; a key component of
| consciousness seems (in my opinion) to be the ability to take
| words and create a mental picture of what's being described. Is
| that the long term goal WRT researching a model like this?
| latexr wrote:
| What confusing pricing[1]:
|
| > Prices are per 1,000 tokens. You can think of tokens as pieces
| of words, where 1,000 tokens is about 750 words. This paragraph
| is 35 tokens.
|
| Further down, in the FAQ[2]:
|
| > For English text, 1 token is approximately 4 characters or 0.75
| words. As a point of reference, the collected works of
| Shakespeare are about 900,000 words or 1.2M tokens.
|
| > To learn more about how tokens work and estimate your usage...
|
| > Experiment with our interactive Tokenizer tool.
|
| And it goes on. When most questions in your FAQ are about
| understanding pricing--to the point you need to offer a
| specialised tool--perhaps consider a different model?
|
| [1]: https://openai.com/api/pricing/
|
| [2]: https://openai.com/api/pricing/#faq-token
| pingeroo wrote:
| This is for their GPT models, not Dall-E. I don't think they
| have released any pricing information for Dall-E yet, as it is
| still in waitlist mode.
| belval wrote:
| Haven't read the paper, but they are probably using something
| like sentencepiece with sub-word splitting and then charge by
| the number of resulting token.
|
| https://github.com/google/sentencepiece
| hwers wrote:
| The correct response here from the artists point of view should
| be a widespread coming together against their art being used as
| training data for ML models. With a quickly spread new license on
| most major art submission sites that explicitly forbids AI
| algorithms from using their work, artists would effectively
| starve OpenAI and others from using their own works to put them
| out of a job.
| w-m wrote:
| The license should forbid competing artists to using the
| artist's work as well. In fact, no human should come in contact
| with the produced art, otherwise they might be accidentally
| inspired by it, thus stealing from the original creator.
| smusamashah wrote:
| This is mind blowing. I was not expecting the sketch style images
| to actually look like sketches. Style transfer based sketches
| never look like sketches.
|
| This and the current AI generated art scene makes it looks like
| that artwork is now a "solved" problem. See AI generated art on
| twitter etc.
|
| There is a strong relation between the prompt and the generated
| images but just like GPT-3, it fails to fully understand what was
| being asked. If you take the prompt out of the equation and see
| the generated artwork on its own, its upto your interpretation
| just like any artwork.
| andreyk wrote:
| I would caution that artwork is only 'solved' with relatively
| simple text prompts. To create a novel painting with a precise
| mix of elements that would take a paragraph or more to explain
| is still tough, though DALL-E 2 does seem like a big step
| towards that.
| nahuel0x wrote:
| Also note you can make an image out of many spatially
| localized prompts combined, in an iterative IA-human process.
| sillysaurusx wrote:
| Sam seems to be demoing something fairly close on twitter.
| https://twitter.com/sama/status/1511724264629678084
|
| The solar powered ship with a propeller sailing under the
| golden gate bridge during sunset with dolphins jumping around
| was pretty impressive.
| https://twitter.com/sama/status/1511731259319349251
|
| I think it's only missing the dolphins.
___________________________________________________________________
(page generated 2022-04-06 23:00 UTC)