[HN Gopher] DALL-E Paper and Code
___________________________________________________________________
DALL-E Paper and Code
Author : david2016
Score : 15 points
Date : 2021-02-24 20:26 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| MrUssek wrote:
| So, uhh, where's the paper? The link in the readme isn't active.
| campac wrote:
| Has anyone tried this out?
| CasperDern wrote:
| The repository linked is just a part of the entire model so it
| can't be used as is.
|
| That said there is a completely implementation made by
| lucidrains[1] with some results, the only missing component now
| is the dataset.
|
| [1]: https://github.com/lucidrains/DALLE-pytorch
| minimaxir wrote:
| A thread of examples from the provided notebook:
| https://twitter.com/ak92501/status/1364666124919447558
|
| Note that these just demonstrate that arbitrary encoded input
| images match the decoded images, which is what would be
| expected from a VAE.
| minimaxir wrote:
| Note that this is just the VAE component as used to help training
| and generating images, it will not let you create crazy images
| with natural language as used in the blog post
| (https://openai.com/blog/dall-e/).
|
| More specifically from that link:
|
| > [...] the image is represented using 1024 tokens with a
| vocabulary size of 8192.
|
| > The images are preprocessed to 256x256 resolution during
| training. Similar to VQVAE, each image is compressed to a 32x32
| grid of discrete latent codes using a discrete VAE1 that we
| pretrained using a continuous relaxation.
|
| OpenAI also provides the encoder and decoder models and their
| weights.
|
| However, with the decoder model, it's now possible to say train a
| text-encoding model to link up to that decoder (training on say
| an annotated image dataset) to get something close to the DALL-E
| demo OpenAI posted. Or something even better!
| indiv0 wrote:
| Yeah unfortunately OpenAI has only released the weaker resnets
| and vision transformers they trained.
|
| Some brilliant folks (Ryan Murdock [@advadnoun], Phil Wang
| [@lucidrains]) have tried to replicate their results with
| projects like big-sleep [0] with decent results, but even with
| this improved VAE we're still a ways from DALL-E quality
| results.
|
| If anyone would like to play with the model check out either
| the Google Colab [1] (if you wanna run it on Google's cloud) or
| my site [2] (if you want a simplified UI).
|
| [0]: https://github.com/lucidrains/big-sleep/
|
| [1]: https://colab.research.google.com/drive/1MEWKbm-
| driRNF8PrU7o...
|
| [2]: https://dank.xyz
| make3 wrote:
| the title should be updated, this doesn't have the paper, and
| it's not the code for DALL-E but for its VAE component only
___________________________________________________________________
(page generated 2021-02-24 23:01 UTC)