https://multimodal.art/ multimodal.art * Home * MindsEye beta - ai art generator * News welcome to multimodal.art A comprehensive guide to understand the multimodal AI art scene and create your own text-to-image (and other) pieces twitter, instagram, email me: [email protected] This page is an ever evolving meta-curation on multimodal ai art content A few examples of AI generated images from a text prompt by me (more here and check out our portfolio for my participation in real world AI art exhibitions) A ritual to bring a Tamagotchi back to life (CLIP Guided Diffusion) Neuromancer in Ukiyo-e style (VQGAN+CLIP) The inauguration of a wormhole between Shanghai and New York (VQGAN+CLIP) A mecha-robot in a favela by James Gurney (CLIP Guided Diffusion) Index What is it (non technical, not in depth) What is it (in depth, curated content) I want to see examples of what it can do I want to play with it myself What is it (non technical, not in depth) Essentially, this are models that can generate an image from a text prompt, the most famous ones that are open source are: VQGAN + CLIP, CLIP Guided Diffusion, Dall-E Mini. The main ingredient for all the image-to-text generation AI models are datasets of hundreds of millions of "image-and-text-pairs", or images with labels describing what they are Example of a text-image pairExample of a text-image pair Those hundreds of millions of image-and-text pairs are then used to train a neural networks (such as CLIP or DALL-E) that "learn" features and the connections between the text and the images without a human telling them what is what. For example, it can learn that there's a dog, or a grass field or a red ball or even a dog's mouth just by having enough examples of different things in the dataset and "learning" what those are. Once this models are trained, they can then be used - either directly (such as in Dall-E like models), or indirectly (such as in CLIP guided models) - as guidance to generate new images based on a text. The idea here is that the model will be intertwined with other models that are trained to be good at image generation (such as VQ-VAE, VQGAN or Guided Diffusion), and will use this learned features of how well a text matches an image (called loss function or error function) to then guide this models to generate an image that satisfies that text. Short history of how we got here Check out the excellent article The Weird and Wonderful World of AI Art that goes into way more detail, but in summary: OpenAI's Dall-E started the current trend in January 2021, but they haven't released their pre-trained models, but they did release CLIP (a model that can say how good an image matches a text pair, as described above). With that, Ryan Murdock (@advadnoun) started this trend of hooking up CLIP with image generation models with Big Dream + CLIP. After that, Katherine Crowson (@rivershavewings) hooked CLIP with the VQGAN image generation neural network, starting a text-to-image Cambrian explosion. A few months after that she came back and did the same with CLIP + a model called Guided Diffusion, which dramatically increased the quality of the generations. In 2022 the process only accelerated. The amount of models released started growing a lot. I created a newsletter to keep up with it. But the highlights of 2022 so far are definitely the release of Latent Diffusion models, as well as Dall-E 2. Vice/Motherboard featured this scene in a piece in July/2021 called "AI-Generated Art Scene Explodes as Hackers Create Groundbreaking New Tools" Other prominent people in this field: @jbusted1, @nshepperd1, @BoneAmputee, dribnet, @danielrussruss, @bakztfuture. I also recommend the following discords: our own Multimodal Art, EleutherAI and LAION What is it (in depth, curated content) Understanding Multimodal Art: The AI art open source course by Jonathan Whitaker Understanding Dall-E: Two minute papers video, Yannic Kilcher video, original blog post Understanding CLIP: Yannic Kilcher video, original blog post Understanding VQGAN+CLIP: Adafruit blogpost, Bestiario del Hypogripho Understanding Diffusion models: Ari Seff video Understanding DALL-E 2: Two Minute Papers The next 10 years of Multimodal art by Bakz T. Future I want to see examples of what it can do A few examples created by me. For more follow me on Twitter or Instagram A giant insect protecting the city of Lagos (CLIP Guided Diffusion) A cute monster bathing in an acai bowl (CLIP Guided Diffusion) A pao de queijo food cart with a Japanese castle in the backgroun by James Gurney (CLIP Guided Diffusion) A mecha robot celebrating Diwali by James Gurney (CLIP Guided Diffusion) A cute monster taking a shower in a bathtub trending on artstation (CLIP Guided Diffusion) Prision Shrimp Night Fight trending on Artstation (CLIP Guided Diffusion) A renaissance painting of eyeballs (CLIP Guided Diffusion) Two people silluetes looking at artificial intelligence art in a gallery (CLIP Guided Diffusion) A cute seahorse amigurumi (CLIP Guided Diffusion) A landscape resembling the Black Lotus Magic The Gathering Card (CLIP Guided Diffusion) A shakira chicken dancing (CLIP Guided Diffusion) A giant chicken in an Austrian supermarket by James Gurney (CLIP Guided Diffusion) A surrealist sculpture of a GameBoy (CLIP Guided Diffusion) The biggest baile funk party in Times Square (VQGAN+CLIP) Do not rinse raw chicken before cooking says the FDA (VQGAN+CLIP) Mark_Zuckerberg regretting having created Facebook oil in canvas (VQGAN+CLIP) Elon Musk saying his final words before his exile in a Jupiter moon oil in canvas (VQGAN+CLIP) Jeff Bezos apologizes to former employees before going to jail oil in canvas (VQGAN+CLIP) Drinking the milky way galaxy from a milk bottle a couple spending their first Universal Basic Income payment on a fully automated lab grown meat restaurant the online advertisement bubble burst crisis ; oil on canvas Some prominent AI art/model creators: @rivershavewings, @advadnoun, @images_ai, @jbusted1, @nshepperd1, @BoneAmputee, @dribnet, @danielrussruss I want to play with it myself We just released MindsEye beta, a GUI for running multiple multimodal art models. Check it out here multimodal Besides that, check out those resources based on your use-case: Check out this list of tools and other resources you can run on your own I have a powerful GPU and I know a bit of coding and I want to run this models on my local machine Run VQGAN+CLIP locally Run CLIP Guided Diffusion locally Run Dall-E mini locally Run many models at once with Vision of Chaos (Windows only) I know how to use a Google Colab (or I am willing to learn) VQGAN+CLIP: original notebook, with pooling trick, MSE regularized Guided Diffusion: 512x512px original, Disco Diffusion Latent Diffusion? notebook by us I'm not willing to learn how to use a Colab, I just want a website that I can just type the text and get the image out Check out MindsEye! multimodal.art multimodal.art * Home * MindsEye beta - ai art generator * News