[HN Gopher] Exploring 12M of the 2.3B images used to train Stabl...
       ___________________________________________________________________
        
       Exploring 12M of the 2.3B images used to train Stable Diffusion
        
       Author : detaro
       Score  : 427 points
       Date   : 2022-08-30 21:39 UTC (1 days ago)
        
 (HTM) web link (waxy.org)
 (TXT) w3m dump (waxy.org)
        
       | wongarsu wrote:
       | This is quite interesting. This really makes me wonder how much
       | of the differences between Stable Diffusion, Dall-e 2 and
       | MidJourney are due to different architectures and training
       | intensity and how much is due to different datasets.
       | 
       | For example Stable Diffusion knows much better than MidJourney
       | what a cat looks like, MidJourney knows what a Hacker Cat looks
       | like, while Stable Diffusion doesn't (you can tell it to make a
       | cat in a hoodie in front of a laptop, but it won't come up with
       | that on its own). Meanwhile for landscapes Stable Diffusion seems
       | to have no problem with imagination. How much of that is simply
       | due to blindspots in the training data?
        
       | lovelearning wrote:
       | > Unsurprisingly, a large number came from stock image
       | sites...Adobe Stock's...iStockPhoto...Shutterstock
       | 
       | Are they ok with their stock photos being used to train a service
       | that's likely to bite into their stock photo business?
        
         | PeterisP wrote:
         | The general understanding is that noone cares if they're ok
         | with that as training a model (even if a competing service) is
         | not among the exclusive rights of copyright owners where their
         | permission is required - if you have a legally obtained copy,
         | you can use it as you want (except for the specific explicitly
         | enumerated use cases that copyright law awards exclusively to
         | the copyright holder) no matter if the copyright holder likes
         | it or not.
        
       | NIL8 wrote:
       | It feels like a tsunami is coming and we have no idea how big it
       | will be.
        
         | [deleted]
        
         | [deleted]
        
         | wil421 wrote:
         | Like the self driving revolution? Or the bitcoin/blockchain
         | revolution?
         | 
         | Personally, I'm not even getting out my popcorn yet.
        
           | CuriouslyC wrote:
           | Real people are creating real things with this tech right
           | now. Beyond that, people are enthusiastically building on
           | this technology to create higher level tools. This will only
           | be able to go so far with the stable diffusion model, but the
           | ceiling is still very high with what we already have, and
           | given the pace of model progress we can realistically expect
           | the next 10 years or so to be absolutely transformative for
           | art, and probably after that writing and music.
        
           | bottled_poe wrote:
           | Fair position given the failure of crypto to live up to the
           | revolutionary hype.
           | 
           | This is clearly different. The value has been demonstrated -
           | and it has clear implications for a lot of jobs.
        
           | simonw wrote:
           | Very different. This stuff is genuinely useful already, and
           | is getting more effective every day.
        
           | marktolson wrote:
           | Bitcoin / Blockchain doesn't have any intrinsic value other
           | than to those who believe in it. Self driving cars (Level
           | 4-5) are not available to the public and is still in
           | development. This stuff is real, produces some incredible
           | results, available to the public and advancing at a rapid
           | rate.
        
             | cdata wrote:
             | The output of these models seems really impressive, but for
             | my money the notion that it has _value_ is undermined by
             | the way its trainers keep  "proprietary" data that is
             | likely to be in violation of image usage rights at a large
             | scale. What is the true value of something that can only be
             | had at the other end of misbegotten
             | extraction/exploitation? It seems like a similar trade-off
             | to the one that web3 proponents are asking us to make. The
             | apparent end-game is that we'll kill off all the true value
             | creators - the working artists responsible for the source
             | data - and all we'll be left with is an artifact of their
             | works.
        
               | nprateem wrote:
               | No, real artists will just become artisans, like any
               | producers of hand made goods
        
           | axg11 wrote:
           | Self driving cars are on the road. Any two people on Earth
           | are able to trustlessly transmit value between them using
           | Bitcoin. You seem cynical.
        
             | astrange wrote:
             | Any two wallet addresses are able to. That doesn't mean the
             | people are. By abstracting the actual process of getting
             | and using the Bitcoin on both ends you've lost all actual
             | real world detail.
             | 
             | ...and they can still lose it all to typos or fees
             | permanently.
        
             | hotpotamus wrote:
             | Google still can't reliably figure out that an email from
             | jpm0r4ncha$e telling me how much money I've won is spam.
             | Once they nail that down, then maybe I'll step inside one
             | of their self driving cars. Until then, I'll laugh at the
             | video where the Tesla flattens a child-sized mannequin.
        
       | simonw wrote:
       | If anyone is interested in the technical details, the database
       | itself is a 4GB SQLite file which we are hosting with Datasette
       | running on Fly.
       | 
       | More details in our repo: https://github.com/simonw/laion-
       | aesthetic-datasette
       | 
       | Search is provided by SQLite FTS5.
        
         | sdwr wrote:
         | The search speed is amazing!! Do you have to do a lot of pre-
         | indexing to get it so fast?
        
           | simonw wrote:
           | It's SQLite's built in FTS index, nothing special on top of
           | it. I built the index by running:                   sqlite-
           | utils enable-fts data.db images text
           | 
           | https://sqlite-
           | utils.datasette.io/en/stable/cli.html#configu...
        
             | sdwr wrote:
             | Are you running on anything special compute-wise? I have a
             | budget node running Mongo, it takes almost a second to
             | fetch a single 1MB document.
             | 
             | Writing it out I realize it's not indexed by the attribute
             | I'm retrieving by...
        
               | simonw wrote:
               | I started this on a Fly instance with 256MB of RAM and a
               | shared CPU. This worked great when it was just a couple
               | of people testing it.
               | 
               | Once it started getting traffic it started running a bit
               | slow, so I bumped it up to a 2 CPU instance with 4GB of
               | RAM and it's been fine since then.
               | 
               | The database file is nearly 4GB and almost all memory is
               | being used, so I guess it all got loaded into RAM by
               | SQLite.
               | 
               | I'll scale it back down again in a few days time, once
               | interest in it wanes a bit.
        
               | isoprophlex wrote:
               | Use the index, Luke!
        
               | snovv_crash wrote:
               | But MongoDB is webscale!
        
         | yojo wrote:
         | I notice a surprising number of duplicates. E.g. if I sort by
         | aesthetic, there's the same 500x500 Tuscan village painting
         | multiple times on the first page of results.
         | 
         | Presumably it wouldn't be so hard to hash the images and filter
         | out repeats. Is the idea to keep the duplicates to preserve the
         | description mappings?
        
           | bambax wrote:
           | I noticed this too, with the same description every time. How
           | does this work in the model, does this give repeated images a
           | bigger weight?
           | 
           | It's surprising that these weren't filtered out, and it would
           | be interesting to know the number of unique images. (When it
           | is mentioned that a model was trained on 10 billion images
           | for example, obviously if each image is repeated 5 times then
           | the actual number of images is 2 billions, not 10.)
        
         | jadbox wrote:
         | Is there plans to expand the model to be even larger?
        
       | rektide wrote:
       | So excellent. Flipping the story we see all the time on it's
       | head. AI's quasi-mystical powers are endless spectacle. Taking a
       | look through the other side of the looking glass is vastly
       | overdue. Amazing work.
       | 
       | Just just just starting to scratch the surface. 2% of the data
       | gathered, sources identified. These are a couple of sites we can
       | now source as the primary powerers of AI. Barely reviewed, dived
       | into in terms of the content itself. We have so little sense &
       | appreciation for what lurks beneath, but this is a go.
        
         | benreesman wrote:
         | It is an unambiguous social necessity to demystify these
         | things.
         | 
         | In a world where the lay public didn't really know about
         | photoshop, photoshop would be a terrifying weapon.
         | 
         | Likewise modern ML is for the most part mysterious and/or
         | menacing because it's opaque and arcane and mostly controlled
         | by big corporate R&D labs.
         | 
         | Get some charismatic science popularizers out there teaching
         | people how it works, all the sudden not such a big scary thing.
        
       | revskill wrote:
       | It's hard for user to produce expected effects of result without
       | clear guidance on correct grammar, contextual information,...
        
       | withinboredom wrote:
       | NSFW is entertaining. It tends to think "knobs" in the prompt
       | mean "female breasts" which is annoying.
        
       | cakeface wrote:
       | In which I run `order by punsafe desc` and immediately regret it.
        
         | nprateem wrote:
         | Although it did show me that my masseuse isn't as skilled as I
         | thought
        
       | rom1504 wrote:
       | Hi, laion5b author here,
       | 
       | Nice tool!
       | 
       | You can also explore the dataset there
       | https://rom1504.github.io/clip-retrieval/
       | 
       | Thanks to approximate knn, it's possible to query and explore
       | that 5B datasets with only 2TB of local storage, anyone can
       | download the knn index and metadata to run that locally too.
       | 
       | Regarding duplicates, indeed it's an interesting topic!
       | 
       | Laion5b deduplicated samples by url+text, but not by image.
       | 
       | To deduplicate by image you need to have an efficient way to
       | compute whether image a and b are the same.
       | 
       | An idea to do that is to compute an hash based on clip
       | embeddings. A further idea would be to train a network actually
       | good at dedup and not only similarity by training on positive and
       | negative pairs, eg with triple loss.
       | 
       | Here's my plan on the topic
       | https://docs.google.com/document/d/1AryWpV0dD_r9x82I_quUzBuR...
       | 
       | If anyone is interested to participate, I'd be happy to guide
       | them to do that. This is an open effort, just join laion discord
       | server and let's talk.
        
         | yreg wrote:
         | I have been using the rom1504 clip retrieval tool[0] up until
         | now, but the Datasette browser[1] seems much better for Stable
         | Diffusion users.
         | 
         | When my prompt isn't working, I often want to check whether the
         | concepts I use are even present in the dataset.
         | 
         | For example, inputting `Jony Ive` returns pictures of Jony Ive
         | in Datasette and pictures of apples and dolls in clip
         | retrieval.
         | 
         | (I know laion 5B is not the same as laion aesthetic 6+, but
         | that's a lesser issue.)
         | 
         | [0] - https://rom1504.github.io/clip-retrieval/
         | 
         | [1] - https://laion-aesthetic.datasette.io/laion-
         | aesthetic-6pls/im...
        
           | rom1504 wrote:
           | This is due to the aesthetic scoring in the UI. Simply
           | disable it if you want precise results rather than aesthetic
           | ones.
           | 
           | It works for your example
           | 
           | I guess I'll disable it by default since it seems to confuse
           | people
        
             | rom1504 wrote:
             | Done https://github.com/rom1504/clip-
             | retrieval/commit/53e3383f58b...
             | 
             | Using clip for searching is better than direct text
             | indexing for a variety of reasons but here for example
             | because it matches better what stable diffusion sees
             | 
             | Still interesting to have a different view over the
             | dataset!
             | 
             | If you want to scale this out, you could use elastic search
        
             | yreg wrote:
             | I see, thanks! I didn't realize that as I thought I want to
             | keep aesthetic scoring enabled since Stable Diffusion was
             | trained on LAION-Aesthetics.
             | 
             | ---
             | 
             | Also: There is a joke to be made at Jony's expense
             | regarding the need to turn off aesthetic scoring to see his
             | face.
        
         | ijk wrote:
         | You are probably very aware of it, but just to highlight the
         | importance of this for people who aren't aware: data
         | duplication degrades the training and makes memorization (and
         | therefore plagiarism, in the technical sense) more likely. For
         | language models, this includes near-similarities, which I'd
         | guess would extend to images.
         | 
         | Quantifying Memorization Across Neural Language Models
         | https://arxiv.org/abs/2202.07646
         | 
         | Deduplicating Training Data Makes Language Models Better
         | https://arxiv.org/abs/2107.06499
         | https://twitter.com/arankomatsuzaki/status/14154721921003397...
         | https://twitter.com/katherine1ee/status/1415496898241339400
        
       | fareesh wrote:
       | How would one go about adding more data into the dataset?
       | 
       | Would one need to retrain the entire dataset? Or is there
       | typically a way to just add an incremental batch?
        
       | nbzso wrote:
       | Thinking over this "magical" tech. A distinctive painting style
       | is one of the biggest goals for an artist. Producing unique
       | stroke and clear expression takes years of hard work and
       | experimentation.
       | 
       | An artist decides to sell prints of an expensive artwork, he or
       | she publishes a photo on their website. AI scrapers get the
       | images in data set update. Game over for the artist.
       | 
       | I hope for a class action over this training data sets. I get
       | that kids have fun with the new photoshop filters. I get that
       | software is eating "the world" but someone must wake up an push
       | the kill switch. It is possible.
        
         | sennight wrote:
         | Sounds like a hopeless protectionist endeavor, reminiscent of
         | cartoonish Keynesian economics busywork: "We can't permit
         | development of adding machines, what about all the hard work
         | people have put into memorizing multiplication tables?!".
        
           | nbzso wrote:
           | No. Sounds like common sense. Not popular in the tech
           | community nowadays. Data is the new "petrol". Since when the
           | petrol is free of charge? People in mass are clueless. If you
           | use my "human" accomplishments as an energy source, you must
           | pay me. Period.
        
             | sennight wrote:
             | > Since when the petrol is free of charge?
             | 
             | At around the same time that it actually resembles "the new
             | data", or maybe even shares a single quality with it? Being
             | a physical object, it is bound by physical properties...
             | like scarcity. Data, being an abstract concept, suffers no
             | such constraint. Same story for whatever artistic technique
             | you've imagined to be not only valuable, but novel to all
             | of human experience and wholly owned by you and you alone.
             | Your valuation of your worth and that of your labor is
             | laughably overinflated and the market is telling you so.
             | Period.
        
       | ks2048 wrote:
       | They need a feature that for a given generation, shows the
       | nearest image in the training set. It is clearly doing more than
       | "memorizing", but for some "normal" queries, how do you know the
       | output isn't very similar to some training image? That could have
       | legal implications.
        
         | mkl wrote:
         | I don't think anyone has a definition of "nearest" that could
         | accomplish that in general. Comparing pixel data is easy, but
         | comparing the subjects portrayed and how is much harder to pin
         | down.
        
         | speedgoose wrote:
         | You could try the reverse search in a images search engine.
         | Google or Bing support that for example.
        
       | epups wrote:
       | As expected, very few NSFW images were included in the training
       | set, according to this. They are more afraid of showing a penis
       | than showing Mickey Mouse.
        
       | karpathy wrote:
       | Data is an excellent place to look at to get a sense of where the
       | model is likely to work or not (what kinds of images), and for
       | prompt design ideas because, roughly speaking, the probability of
       | something working well is proportional to its frequency (or of
       | things very similar to it) in the data.
       | 
       | The story is more complex though because the data can often be
       | quite far away from actual neural net training due to
       | preprocessing steps, data augmentations, oversampling settings
       | (it's not uncommon to not sample data uniformly during training),
       | etc. So my favorite place to scrutinize is to build a "batch
       | explorer": During training of the network one dumps batches into
       | pickles immediately before the forward pass of the neural net,
       | then writes a separate explorer that loads the pickles and
       | visualizes them to "see exactly what the neural net sees" during
       | training. Ideally one then spends some quality time (~hours)
       | looking through batches to get a qualitiative sense of what is
       | likely to work or not work and how well. Of course this is also
       | very useful for debugging, as many bugs can be present in the
       | data preprocessing pipeline. But a batch explorer is harder to
       | obtain here because you'd need the full training
       | data/code/settings.
        
         | jimsimmons wrote:
         | With big generative models, seeing data even once is more than
         | sufficient to memorize it. So your claim that performance
         | relates to frequency is not exactly correct.
         | 
         | The whole point of this model class is that one can learn one
         | word from one sample, another pixel from another one and so on
         | to master the domain. The emergent, non-trivial generalization
         | is what makes them so fascinating. There is no simple,
         | linear/first order relationship with data and behaviour. Case
         | in point: GPT3 can do few-shot learning despite not having used
         | any explicit few-shot formatted data during training.
         | 
         | Not saying you are wrong, but the story is not as simple as
         | simple supervised learning with small datasets
        
           | galangalalgol wrote:
           | What does that say about how these models will behave as an
           | increasingly large portion of their training data is outputs
           | from similar models? Our curation of the outputs will
           | hopefully help. And if one image really is enough, perhaps
           | the smaller number of human created images will be sufficient
           | to inject new stuff rather than stagnating?
        
       | TaylorAlexander wrote:
       | "The most frequent artist in the dataset? The Painter of Light
       | himself, Thomas Kinkade, with 9,268 images."
       | 
       | Oh that's why it is so good at generating Thomas Kinkade style
       | paintings! I ran a bunch of those and they looked pretty good.
       | Some kind of garden cottage prompt with Thomas Kinkade style
       | works very well. Good image consistency with a high success rate,
       | few weird artifacts.
        
         | Quiark wrote:
         | Can you come up with a prompt that will reproduce some of his
         | painting almost exactly?
        
           | TaylorAlexander wrote:
           | Not sure.
        
             | dr_dshiv wrote:
             | I'm curious about this. Can it plagerize? Can GPT3? Why or
             | why not?
        
         | z3c0 wrote:
         | I've noticed an affinity towards "The Greats" of modern
         | painting. I've gotten incredible results from using Dali,
         | Picasso, Bacon, Lichtenstein, etc. My luck with slightly-less-
         | known artists of similar styles doesn't work as well (eg
         | Braque, Guayasamin, or Gris, as opposed to Picasso)
        
       | ryandv wrote:
        
         | [deleted]
        
       | [deleted]
        
       | r3trohack3r wrote:
       | This has been an incredibly helpful tool today exploring Stable
       | Diffusion.
       | 
       | I'm starting to realize Stable Diffusion doesn't understand many
       | words, but it's hard to tell which words are causing it problems
       | when engineering a prompt. Searching this dataset for a term is a
       | great way to tell whether Stable Diffusion is likely to
       | "understand" what I mean when I say that term; if there are few
       | results, or if the results aren't really representative of what I
       | mean, Stable Diffusion is likely to produce garbage outputs for
       | those terms.
        
       | Imnimo wrote:
       | It is interesting that there are at least a few images in the
       | dataset that were generated by previous diffusion methods:
       | 
       | https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/im...
       | 
       | (Surely there are many more that don't have this specific label).
        
         | htrp wrote:
         | Turtles all the way down
        
       | thrdbndndn wrote:
       | Why not just add a shortcut to search for the related images in
       | these artist count table etc.?
       | 
       | Also, a UI issue: the sorting arrow feels wrong:
       | 
       | https://i.imgur.com/uyUXAXy.png
       | 
       | The norm is that when the arrow is pointing down, the data are
       | currently sorting descendingly (I'm aware you can interpret it as
       | "what will happen if you click it", but the norm is to show what
       | currently is using.)
        
         | simonw wrote:
         | That's the exact problem: should the button show what will
         | happen, or what is currently happening?
         | 
         | When I designed that feature I looked at a bunch of systems and
         | found examples of arrows in both directions.
        
           | thrdbndndn wrote:
           | At least Excel and Google Sheets both display it the way I
           | described. Adding Apple Numbers, which I never use
           | unfortunately, I think that should cover like 90% of actual
           | use cases to be considered as "convention".
        
       | lmarcos wrote:
       | I always had the crazy idea of "infinite entertainment": somehow
       | we manage to "tap" into the multiverse and are able to watch TV
       | from countless of planets/universes (I think Rick and Morty did
       | something similar). So, in some channel at some time you may be
       | able to see Brad Pitt fighting against Godzilla while the monster
       | is hacking into the pentagon using ssh. Highly improbable, but in
       | the multiverse TV everything is possible.
       | 
       | Now I think we don't need the multiverse for that. Give this AI
       | technology a few years and you'll have streaming services a la
       | Netflix where you provide the prompt to create your own movie.
       | What the hell, people will vote "best movie" among the millions
       | submitted by other people. We'll be movie producers like we are
       | nowadays YouTubers. Overabundance of high quality material and so
       | little time to watch them all. Same goes for books, music and
       | everything else that is digital (even software?).
        
         | thatfrenchguy wrote:
         | I'm curious how much this prevents new styles from emerging
         | though, rather than rehashing of things that already exist.
        
         | StrictDabbler wrote:
         | "Highly improbable, but in the multiverse TV everything is
         | possible."
         | 
         | Quick reminder, there are infinitely many even numbers and none
         | of them are odd.
         | 
         | A given infinite (or transfinite) set does not necessarily
         | contain all imaginable elements.
        
           | Invictus0 wrote:
           | Ok... I don't think anyone would expect an infinite supply of
           | water to also include vodka and wine. He specifically said
           | "infinite TV".
        
             | StrictDabbler wrote:
             | Does any universe have a "just a completely dark room"
             | channel? How many dark room channels are there?
             | 
             | Is there a universe with channels focusing on these
             | subjects:
             | 
             | "Video of the last tears of terminal cancer patients as
             | Jerry Lewis tells jokes about his dick"
             | 
             | "This guy doesn't like ice cream but he eats it to reassure
             | his girlfriend that he isn't vegan"
             | 
             | "A single hair grows on a scalp"
             | 
             | "Infant children read two-century-old stories about
             | shopping for goose-grease and comment on the prosody"
             | 
             | "Gameshows where an entire country's left-handed population
             | guesses how you'll die"
             | 
             | This is the whole point of the Rick and Morty cable bit.
             | There are things that would not be on TV in any universe
             | that invents TV. It's hilarious to pretend they would be.
        
               | Invictus0 wrote:
               | It's clear you have a misunderstanding of infinity.
               | 
               | In a show, there are a certain number of frames. In one
               | frame, there are a certain number of pixels. Each pixel
               | can be one of some number of colors. An infinite TV would
               | be able to show every combination of every color of
               | pixels, follow by every combination of frames,
               | simultaneously and forever. All those shows are in there.
               | Not only that, but all of this is also countably
               | infinite.
        
               | crabmusket wrote:
               | Don't forget to posit an infinite curator who watches
               | every channel on the infinite TV and responds to queries
               | for channels with something sensible, not a random garble
               | of pixels.
        
               | StrictDabbler wrote:
               | That's not a multiverse cable situation and there are
               | _way_ more kinds of infinity than just countable and
               | uncountable.
               | 
               | In a multiverse situation you're watching actual content
               | from an infinity of universes where beings exist who have
               | produced and selected that content.
               | 
               | You're not watching an infinite amount of static and
               | magically selecting the parts of static that are
               | coincidentally equivalent to specific content.
               | 
               | All multiverse content must be watchable and creatable by
               | the kind of creature that creates a television. So
               | content that is unwatchable or uncreatable by any
               | conceivable creature will not be in that infinite set.
               | 
               | It is very easy to describe impossible content and any
               | impossible content will not be on the multiverse TV.
               | 
               | Trivial counterexamples that are describable but
               | uncreatable, in any universe similar enough to ours that
               | it has television:
               | 
               | -a channel that reruns actual filmed footage of a given
               | universe's big-bang will not exist.
               | 
               | -a channel that shows *only* accurate, continuous footage
               | of the future of that universe.
               | 
               | -a channel that shows the result of dividing by zero.
               | 
               | Channels that may or may not be uncreatable:
               | 
               | -a channel that induces in the viewer the sensation of
               | smelling their mother's hair.
               | 
               | -a channel that causes any viewer to eat their own feet.
               | 
               | -a channel that cures human retinal cancer.
               | 
               | -a channel that shows you what you are doing, right now,
               | in your own home in this universe, like a mirror. Note
               | that this requires some connection between our universe
               | and the other universe and there's no guarantee, in a
               | multiverse situation, that connections between the
               | universes are also a complete graph.
               | 
               | These examples are more important. We know they are
               | either possible or impossible but we do not know which.
               | Just saying "multiverses are infinite" doesn't answer the
               | question.
               | 
               | For further reading, review
               | https://en.wikipedia.org/wiki/Absolute_Infinite and
               | remember that a channel is a defined set.
        
         | temp_account_32 wrote:
         | Isn't everything representable in a digital form? I think we're
         | in the very early era of entertainment becoming commoditized to
         | an even higher degree than it is now.
         | 
         | I envision exactly the future as you describe: Feed a song to
         | the AI, it spits out a completely new, whole discography from
         | the artist complete with lyrics and album art that you can
         | listen to infinitely.
         | 
         | "Hey Siri, play me a series about chickens from outer space
         | invading Earth": No problem, here's a 12 hour marathon,
         | complete with a coherent storyline, plot twists, good acting
         | and voice lines.
         | 
         | The only thing that is currently limiting us is computing
         | power, and given enough time, the barrier will be overcome.
         | 
         | A human brain is just a series of inputs, a function that
         | transforms them, and a series of outputs.
        
         | gfody wrote:
         | > So, in some channel at some time you may be able to see Brad
         | Pitt fighting against Godzilla while the monster is hacking
         | into the pentagon using ssh.
         | 
         | that's the movie GAN, more interesting would be the zero-shot
         | translations of epic foreign/historic films into your native
         | culture
        
         | [deleted]
        
         | yojo wrote:
         | Anyone have an idea of how many orders of magnitude advancement
         | we are away from this? Like, it takes a high end GPU a non-
         | trivial amount of time to make one low resolution image. A
         | modern film is at least 30 of those a second, at far higher
         | pixel density. It seems like you get a 10x improvement in GPU
         | perf every 6-8 years[1], and it might take more than a couple
         | of those.
         | 
         | Plus you need to encode the ideas of plots/characters/scenes,
         | and have through-lines that go multiple hours. It seems like
         | with the current kit it's hard to even make a consistent
         | looking children's book with hand picked illustrations.
         | 
         | My gut is we are more than a few years off, but maybe I'm
         | underestimating the low hanging fruit?
         | 
         | 1: https://epochai.org/blog/trends-in-gpu-price-performance
        
         | sircastor wrote:
         | There are a handful of TV series canceled too soon that of love
         | to see new episodes of. There are tremendous issues of consent
         | involved, but it's a nice dream to have. It's a slippery slope
         | to be sure, but imagine taking your well-honed fanfic script,
         | loading it in a system and getting a ready-to-watch episode
         | never before seen.
        
         | cerol wrote:
         | _> Overabundance of high quality material and so little time to
         | watch them all._
         | 
         | If a tree falls down in a forest, and there is no one there to
         | hear it. Does it make a sound?
         | 
         | If you can generate infinite material, how do you judge
         | quality?
         | 
         | You're extrapolating an idea based on what movies _are_ ,
         | fundamentally. But you don't take in consideration what movies
         | _are not_. Watching a movie is also a social experience. Going
         | to the movie theater, waiting years for a big blockbuster
         | title, watching something with friends. Word of mouth
         | recommendation is a very big thing. If a close friend
         | recommends me something (be it a movie or a book), I 'm much
         | more inclined to like it just for the human connection it
         | provides (reading or watching something other people enjoyed is
         | a means of accessing someone else's psyche).
         | 
         | If every time you watch a movie you have the knowledge that
         | there is a movie that is slightly better a prompt away, why
         | bother finishing this one? If you know you probably won't
         | finish the movie you generated, why bother starting one? So
         | what do you do? You end up rewatching The Office.
         | 
         | Sure, if you tell me this will be _possible_ in a couple of
         | years, I won 't object. The point is: will you pay for it on a
         | _recurring basis_? Because if you don 't, this will be no more
         | than a very cool tech project.
         | 
         | ----
         | 
         | I've recently had this idea for a sci-fi book: in a future not
         | so distant, society is divided between tech and non-tech
         | people. Tech people created pretty much everything they said
         | they would create. AGI, smarter-than-human robots, you name it.
         | But, it didn't change society at all. Companies still employ
         | regular humans, people still watch regular made-by-human movies
         | and eat handmade pizzas and drink their human-made lattes in
         | hipster coffeshops. So tech people are naturally very
         | frustrated at non-tech people, because they're not using
         | optmizing their lives and business enough. And then you have
         | this awkward situation where you have all these robots with
         | brains of size of a galaxy lying around, doing nothing. And
         | then some of them start developing depression, for spending too
         | much time idle. And then the tech people have to rush to
         | develop psychiatric robots. And then some robots decide to
         | unionize, and others start writing books about how humans are
         | taking jobs that were supposed to be automated.
        
         | c3534l wrote:
         | It does feel like we could at least get procedurally generated
         | streaming music. Music is limited enough that it feels
         | possible, and people are more than willing to rate and save
         | music they like. The social element of raiding another person's
         | curated playlist could take over the romantic notion of an
         | artist's personal expression. Curating such lists of
         | procedurally generated music could make everyone a musician.
        
           | noitpmeder wrote:
           | See this recent thread for more on this ==
           | https://news.ycombinator.com/item?id=32559119
        
         | Alupis wrote:
         | I think a more interesting thought is how entertainment boils
         | down to a sequence of 1's and 0's... and if we could somehow
         | get enough of that sequence right, we could unveil
         | video/audio/images of real people doing things they've never
         | done - such as Brad Pitt fighting Godzilla.
         | 
         | Imagine uncovering a movie that was never made but featured
         | actors you know. If Steven Spielberg can make the movie, then
         | there is an undiscovered sequence of 1's and 0's that already
         | is that movie, a sequence that could be discovered without
         | actually making the movie. Imagine "mining for movies"...
         | 
         | Of course that sequence is likely impossible to ever predict
         | enough of to actually discover something real... but it's a fun
         | thought experiment.
        
           | kilovoltaire wrote:
           | If you like thinking about that sort of thing, and haven't
           | read it yet, check out the short story "The Library of Babel"
           | by Jorge Luis Borges
        
         | CamperBob2 wrote:
         | The Library of Babel, now only $19.95 per month!
        
         | wpietri wrote:
         | I think much more likely, at least this side of the
         | singularity, what we'll have is infinite dreck. And for some
         | people, that will be enough.
         | 
         | It's extremely hard to make good content. Teams of extremely
         | skilled, well-paid people, even ones who have succeeded before,
         | fail regularly. And that's with complicated filtering
         | mechanisms and review cycles to limit who has access and keep
         | the worst of it from getting out.
         | 
         | But not everybody needs everything to be actually good. My
         | partner will sometimes unwind on a Friday by watching bad
         | action movies; the bad ones are in some ways better, as they
         | require less work to understand and are amusing in their own
         | way. Or there's a mobile game I play when I want to not think,
         | where you have to conquer a graph of nodes. The levels are
         | clearly auto-generated, and it's fine.
         | 
         | I think that kind of serviceable junk is where we might see AI
         | get to in a couple of decades, made for an audience that will
         | get a weed gummy and a six pack and ask for a "sci-fi action
         | adventure with lots of explosions" and get something with a
         | half-assed plot, forgettable stereotypical characters and
         | visuals that don't totally make sense, but that's fine. You
         | won't learn anything, you won't be particularly moved, and you
         | won't ever watch it again, but it will be a perfectly cromulent
         | distraction between clocking out and going to bed.
        
           | jrvarela56 wrote:
           | Whenever I think of AI and its implications, I find it useful
           | to think of our current version of the AI: The Market. Its
           | profit-maximizing function is the canonical paperclip
           | maximizer.
           | 
           | We are going to absolutely drown in crap. Just as we do now,
           | it's just going to flood the internet at an unimaginable
           | pace. We'll probably train AIs to help us find stuff and tell
           | what's real/true/etc; it's going to be a heck of an arms
           | race.
           | 
           | It's going to be one hell of an arms race.
        
         | peoplefromibiza wrote:
         | > So, in some channel at some time you may be able to see Brad
         | Pitt fighting against Godzilla while the monster is hacking
         | into the pentagon using ssh.
         | 
         | Don't know if this is sarcasm, if it is, ignore the rest of the
         | comment.
         | 
         | Honestly, it sounds terrible.
         | 
         | Good shows are well written, coherent and, most of all, narrow
         | in scope.
         | 
         | If an AI can write the next Better Call Saul, great.
         | 
         | Randomly patching together tropes sounds more like kids
         | drawings, that are interesting on an artistic point of view,
         | maybe, given their limited knowledge of reality and narrative,
         | but terribly boring and confusing as a form of entertainment.
         | 
         | Unless the audience is kids, they love that stuff, for reasons
         | we don't understand anymore as we grow up.
        
           | mrighele wrote:
           | > now well written, not coherent, and broad in scope.
           | 
           | > randomly patched together tropes
           | 
           | Sounds like a dream, not as in what I wish for, but what I
           | experience at night.
           | 
           | So, if you think about it like a tool to enable a form of
           | lucid dreaming, it may be something interesting.
           | 
           | Of course you have to find a way to get for your brain what
           | you want to see in "real time", but I think we will get
           | there.
        
             | peoplefromibiza wrote:
             | > So, if you think about it like a tool to enable a form of
             | lucid dreaming, it may be something interesting.
             | 
             | we usually call that tool psychedelic drugs.
             | 
             | there are devices being developed for that purpose, I don't
             | think they will ever be reliable, AI is not necessary for
             | that.
             | 
             | On the philosophical implications of lucid dreams
             | 
             | https://en.m.wikipedia.org/wiki/Waking_Life
        
           | bigyikes wrote:
           | The point of the quoted remark is to illustrate the exotic
           | possibilities of "Multiverse TV". It's not an example of
           | quality content you /would/ watch, it's merely some content
           | you /could/ watch. Multiverse TV has everything, from Better
           | Call Saul to zany strings of tropes.
        
             | peoplefromibiza wrote:
             | The point is that nonsense "multiverse TV" has been a thing
             | since I can remember watching TV.
             | 
             | Endless entertainment it's already there, you can't watch
             | it simply because most of the content you're talking about
             | is not on air and streaming platforms don't buy it, because
             | it's shit.
             | 
             | Not that I don't like shit, I've watched more
             | troma/SyFi/random low budget Asian movies (I am a huge fan
             | of ninja movies) than necessary, but if we stop to think
             | that there are already 6 or 7 sharknado movies (that are
             | exactly the kind of endless entertainment you talk about,
             | they are probably generated in some way), maybe it's not
             | the volume of content that's missing, but probably content
             | that's worth watching.
        
           | mtkhaos wrote:
           | I think you are forgetting that a good portion of social
           | media users are used to short term content.
        
             | peoplefromibiza wrote:
             | I believe there's already a lot more content on social
             | media than time to watch it in 100 lives.
        
               | visarga wrote:
               | But if you want to see something specific, it's often not
               | there unless you generate it.
        
               | peoplefromibiza wrote:
               | > But if you want to see something specific, it's often
               | not there unless you generate it.
               | 
               | Example?
               | 
               | I don't think I've ever wanted to watch something that
               | only a computer could generate.
        
           | namrog84 wrote:
           | I think a key piece here is missing.
           | 
           | txt2img is quite limited and img2img is really where the
           | power is. With a little intermittent guidance hand of a
           | human.
           | 
           | What took 100s or 1000s of people to write, act, record, post
           | process, Better Call Saul might be doable by a a team 1/100th
           | the size possibly even to a single individual. Which means
           | while it might not just instantly spit it out. I'd just like
           | youtube, there will be an incredible amount of great content
           | to watch, far more than anyone could ever realistically
           | watch.
           | 
           | And of course there will be lots of utter trash as well.
           | 
           | But if it took 100 people to make better call saul and now
           | 100 individuals can make 100x different "better call sauces"
        
           | xgkickt wrote:
           | Sturgeon's law will probably end up at 99%
        
           | idiotsecant wrote:
           | Did you read the whole post? Parent post was talking about
           | assisted movie generation where a human is making a movie and
           | using the AI as a tool to make the content. This will
           | _absolutely_ be an enormous thing in the next 10, 20 years
           | and it will lead to a creative revolution in the same way
           | that youtube did - entire genres that do not currently exist
           | will come to fruition by lowering the barriers to entry to a
           | huge number of creators.
           | 
           | I don't have any trouble finding youtube channels that I like
           | to watch and ignoring the rest, and I suspect I won't have
           | any trouble finding movies generated using AI as a production
           | tool that I want to watch either.
        
             | peoplefromibiza wrote:
             | > Did you read the whole post
             | 
             | as per HN rule, don't ask this question.
             | 
             | > Parent post was talking about assisted movie generation
             | where a human is making a movie and using the AI as a tool
             | to make the content
             | 
             | which is exactly the problem we don't have.
             | 
             | there are thousands of scripts written everyday that never
             | see the green light.
             | 
             | > and it will lead to a creative revolution
             | 
             | it won't
             | 
             | main reason why content is not produced is money.
             | 
             | unless you find a way to create an infinite supply of money
             | and an infinite amount of paying audience for that content,
             | more content is a problem, not a solution.
             | 
             | > I don't have any trouble finding youtube channels that I
             | like to watch and ignoring the rest,
             | 
             | so what's the problem?
             | 
             | There's already infinite content out there, what does the
             | "AI" brings to the table that will make any difference,
             | other than marketing, like 3D movies?
             | 
             | Have you watched any 3D movie recently?
        
               | motoxpro wrote:
               | It sounds like you are arguing FOR the GP's idea.
               | 
               | "there are thousands of scripts written everyday that
               | never see the green light." "main reason why content is
               | not produced is money."
               | 
               | So if there are plenty of ideas and not enough money, and
               | you could put those ideas into a box and spit out a movie
               | that would normally cost millions, that's good right?
        
               | bawolff wrote:
               | There is a lot more that goes in to it than an idea. Like
               | that dude who has a great app "idea" and just needs
               | someone to implement it, but is surprised nobody takes
               | them up on it.
        
               | peoplefromibiza wrote:
               | > So if there are plenty of ideas and not enough money,
               | 
               | big chunk of the budget is spent on marketing.
               | 
               | if you produce something that nobody watch, it's like the
               | sound of the tree falling where nobody can hear it.
               | 
               | if you know how to use AI to cut that cost, I'm all ears.
               | 
               | Also: Al Pacino will want his money if you use his name,
               | even if he is not actually acting in the movie.
               | 
               | Reality is that there's plenty of ideas, true, that would
               | not make any money though.
               | 
               | Studios don't like to work at loss.
               | 
               | Rick and Morty costs 1.5 million dollars per episode and
               | _from what we 've heard from director Erica Hayes, a
               | single episode takes anywhere between 9 and 12 months to
               | create from ideation to completion_
        
               | MaxikCZ wrote:
               | >if you know how to use AI to cut that cost, I'm all
               | ears.
               | 
               | Cutting costs seems to be the main reason AI is being
               | explored. If you go to studio asking for budget to create
               | a movie and predict "10 000 people will watch it", they
               | will laugh in your face. If one person with the help of
               | AI can make the movie and 10 000 people will watch it,
               | its a win for everyone involved.
               | 
               | I dont see youtube channels having enormous budgets for
               | marketing, yet they find sizeable audience and make
               | profit still. Once you lower the cost of production, you
               | dont need huge marketing budgets to secure profits.
        
               | peoplefromibiza wrote:
               | > I dont see youtube channels having enormous budgets for
               | marketing,
               | 
               | Because they mainly support one person.
               | 
               | You don't need a big budget to sell lemonade on the
               | street, you can make a salary out of it, doesn't mean you
               | have become a tycoon or have revolutionized the lemonade
               | stand industry.
               | 
               | > I dont see youtube channels having enormous budgets for
               | marketing
               | 
               | Have you seen those ADS every 15 seconds?
               | 
               | That's the marketing budget, the whole YouTube ADS
               | revenue is the marketing budget.
        
             | bawolff wrote:
             | > I don't have any trouble finding youtube channels that I
             | like to watch and ignoring the rest, and I suspect I won't
             | have any trouble finding movies generated using AI as a
             | production tool that I want to watch either.
             | 
             | Not really the same - there is a range from good to bad on
             | youtube, because real people are adding the creative spark.
             | There is no reason to suspect AI will generate such a
             | range, and its unclear we will ever get to the point where
             | AI can do "creativity" by itself.
        
               | wpietri wrote:
               | Exactly. I'm sure we'll see a lot of AI-assisted
               | production. But AI-originated and high quality? I don't
               | think I'll see it in my lifetime. (I do expect though,
               | that we'll see people claiming works as AI-created, as
               | the controversy will be stellar marketing.)
        
           | w-ll wrote:
           | Way back in 2019 someone use deepfake tech at the time to
           | change what new lion king could look like. This is what i
           | think some of us are thinking.
           | 
           | http://geekdommovies.com/heres-what-the-live-action-lion-
           | kin...
           | 
           | I have a decently old LG 3d tv that can actually turn 2d into
           | 3d, and its actually a lot of fun to watch certain stuff in
           | 3d mode.
        
         | CobrastanJorji wrote:
         | Reminds me just a bit of the Culture series, where in the
         | distant future computing power is essential infinite. In the
         | series, many of the great AIs of unfathomable intellect spend
         | their free time on the "Infinite Fun Space," which is
         | simulating universes with slightly different starting
         | conditions and physical laws.
        
         | jacobn wrote:
         | The model was trained on 2.3B images, but how many TV shows are
         | there to train it on?
         | 
         | There are quite a few books written, so maybe transfer learning
         | from that?
        
           | judge2020 wrote:
           | > how many TV shows are there to train it on?
           | 
           | None, according to the MPAA.
        
             | ronsor wrote:
             | ML training already involves scraping copyrighted content.
             | I'm sure big tech megacorps would fight any lawsuits they
             | receive.
        
         | ortusdux wrote:
         | Sounds like the "interdimensional cable" bits on Rick and
         | Morty. Reportedly the co-creators would get back-out drunk and
         | improvise the shows. My favorite is house hunters
         | international, where ambulatory houses are being hunted by guys
         | with shotguns.
         | 
         | https://screenrant.com/rick-morty-interdimensional-cable-epi...
        
       | egypturnash wrote:
       | Aaaand the estimated percentage of images released under a CC
       | license, or public domain, iiiiis...?
        
         | [deleted]
        
       | brohee wrote:
       | > Strangely, enormously popular internet personalities like David
       | Dobrik, Addison Rae, Charli D'Amelio, Dixie D'Amelio, and MrBeast
       | don't appear in the captions from the dataset at all
       | 
       | Self awareness here would have lead to the removal of "enormously
       | popular".
        
       | ISL wrote:
       | That inclusion of Mickey in the model is waving a red flag in
       | front of an impressive bull.
        
         | tough wrote:
         | I was just thinking yesterday if Mockey The Rat would fly as
         | homage/deriv mickey has some good 70 years already no?
         | Copyright will die at the hands of ML am afraid
        
       | avocado2 wrote:
       | web demo for stable diffusion:
       | https://huggingface.co/spaces/stabilityai/stable-diffusion
       | 
       | github (includes GFPGAN, realergan, and alot of other features):
       | https://github.com/hlky/stable-diffusion
       | 
       | colab repo (new): https://github.com/altryne/sd-webui-colab
       | 
       | demo made with gradio: https://github.com/gradio-app/gradio
        
       | supernova87a wrote:
       | My fundamental and maybe dumb question is: when is artificial
       | intelligence / ML going to get smarter than needing a billion
       | images to train?
       | 
       | Sure, the achievements of ML models lately are impressive, but
       | it's _so slow_ in learning. We are brute forcing the DNNs it
       | feels to me, which is not something that smacks of great
       | achievement.
       | 
       | You and I have never seen even 100,000 photos in our lives. Well,
       | maybe the video stream from our eyes is a little different. But
       | it's not a billion fundamentally different images.
       | 
       | Is there anything I can read about why it is so slow to learn?
       | How will it ever get faster? What next jump will fix this, or
       | what am I missing as a lay person?
        
         | xipho wrote:
         | > You and I have never seen even 100,000 photos in our lives.
         | Well, maybe the video stream from our eyes is a little
         | different. But it's not a billion fundamentally different
         | images.
         | 
         | I would argue precisely the opposite (as you alude to), it's
         | more than 100s of billions of fundamentally (what does this
         | even mean?) different images. Calculate the frequency at which
         | your eyes sample, think of the times the angle changes (new
         | images), multiple by your age, multiple by 2 for two eyes
         | looking slightly different directions, factor in your the noise
         | your brain has in forming the image "in your head" because you
         | drank too much... you can continue adding factors (hours of
         | "TV" watched on average) add-naseum.
         | 
         | It seems that "slow to learn" has a "real" target/bound, what
         | humans are capable of. If it takes Bob Ross decades to paint
         | all the "images" in his head, then maybe we should go easy on
         | the algorithims?
        
         | PeterisP wrote:
         | Some experiments have demonstrated that being involved in the
         | 'generation' of the visual data (i.e. choosing where to move,
         | where to look, how to alter reality) gets significantly better
         | learning than passively receiving the exact same visual data.
         | 
         | Active learning is a good way to improve sample efficiency -
         | however, as others note, don't underestimate the quantity of
         | learning data that a human baby needs for certain skills even
         | with good, evolution-optimized priors.
        
         | namose wrote:
         | I think the thing that's missing is that the AI can't train
         | itself. If you were asked to draw a realistic x ray of a
         | horse's ribcage, you'd probably google image search, do some
         | research about horse anatomy, etc, before putting pen to paper.
         | This thing is being trained exactly once, and can't learn
         | dynamically. That'll be the next step I think.
        
           | andreyk wrote:
           | What are you are describing is pretty much reinforcement
           | learning (or learning with access to a query-able knowledge
           | engine, or active learning, or all of these combined). There
           | is work on a bunch of variations of this, but it's true that
           | it's early days for combining it with generative systems.
        
         | 12ian34 wrote:
         | Most of us have our eyes open, looking at things ~16 hours a
         | day, with the first 18 years of our lives heavily focused on
         | learning about what those things are, plus we have the extra
         | brain capacity to remember those things, and to think about
         | them in an abstract manner. My entire photo library alone is
         | over 100,000 photos - and since I took all of them I will have
         | "seen" them.
        
         | andreyk wrote:
         | A couple of things:
         | 
         | * We have seen more than 100,000 "photos" in the sense that
         | photos are just images - if photos are just images, we have a
         | constant feed of "photos" every single moment our eyes are
         | open. Of course, that's not the same as these training
         | datasets, but it is still worth keeping in mind.
         | 
         | * All of these things trained on massive datasets with self-
         | supervised learning are in a sense addressing the "slowness" of
         | learning you mention, since self-supervised (aka no annotations
         | are needed beyond the data itself) "pre-training" on the
         | massive datasets can then enable training for downstream tasks
         | with way less data.
         | 
         | * Arguably requiring massive datasets for pre-training is still
         | a bit lame, but then again the 4-5 years of life it takes to
         | reach pretty advanced intelligence in humans represents a
         | whoooole lot of data. And as with self-supervised learning on
         | these massive models, a lot of intelligence seems to come down
         | to learning to predict the future from sensory input.
         | 
         | * Humans also come with a lot of pre-wiring done by evolution,
         | whereas these models are trained from scratch. Evolutionary
         | wiring represents its own sort of "pre-training", of course.
         | 
         | So basically, it is not so slow to learn as it seems. Arguably
         | it could get faster once we train multimodal models and
         | concepts from text can reinforce learning to understand images
         | and so on, and people are working on it (eg GATO). There may
         | also need to be a separation between low level 'instinct'
         | intelligence and high-level 'reasoning' intelligence; AI still
         | sucks at the second one.
        
           | supernova87a wrote:
           | I guess what I find really interesting is, how come we can
           | start to self-label data we encounter in the wild, yet the
           | DNN needs data to constantly be labeled at the same intensity
           | per image, into the billions?
        
             | simonw wrote:
             | If you look at the text labels for the data used by Stable
             | Diffusion you'll find that they are very low quality. Take
             | a look at some here:
             | 
             | https://laion-aesthetic.datasette.io/laion-
             | aesthetic-6pls/im...
             | 
             | Clearly quality of labeling isn't nearly as important once
             | you are training on billions of images.
        
             | fragmede wrote:
             | Textual Inversion is about to flip that around (if I'm
             | understanding the paper correctly).
             | 
             | https://textual-inversion.github.io/
        
             | andreyk wrote:
             | This is only true for multi modal learning, but yeah, in
             | that case we need text and image pairs. More than likely
             | it's possible to pretrain image and language separately,
             | and then use a vastly smaller number of pairs. But that's
             | hypothetical.
        
           | chse_cake wrote:
           | If we consider a frame rate of 60 FPS then a 5 year old would
           | have seen about ~ 6.3 billion images [60 (frames) * 60
           | (seconds) * 60 (minutes) * 16 (waking hours) * 365 (days) * 5
           | (years)]. Even with 30 FPS you can halve the number and it's
           | still a huge number.
           | 
           | A cool fact is that this model fit ~5B images in 900M
           | parameter model which is tiny compared to the size of the
           | data.
        
           | fragmede wrote:
           | Yeah. Humans take a _long_ time to train. We spend years and
           | years, starting at birth, just absorbing everything around us
           | before we get to a point where we 're considered adults.
        
             | erichocean wrote:
             | Human brains are also pre-trained at birth, on faces and a
             | whole bunch of other things.
        
             | supernova87a wrote:
             | Yet how do we do it with CPUs that are even less power
             | consuming than ARM chips, in our heads?
        
               | noobermin wrote:
               | Because believe it or not matrices are not brains.
               | 
               | People need to get over the metaphors. If you want to
               | spend your time learn about the mathematics under the
               | hood, there will be less "mysteries" then.
        
         | Ukv wrote:
         | > You and I have never seen even 100,000 photos in our lives
         | 
         | If someone's 30, that'd only require seeing 10 images a day.
         | For most people that quota is probably fulfilled within a
         | couple of minutes of watching TV or browsing social media, even
         | if the video stream from our eyes otherwise counts as nothing.
         | 
         | We've also had about 4 billion years of evolution, slowly
         | adjusting our genome with an unfathomable amount of data.
         | Gradient descent is blazing fast by comparison.
        
           | omegalulw wrote:
           | Add to that a video now a days is at least 24fps. So a two
           | hour movie suffices haha :)
        
         | simonw wrote:
         | I recommend looking into "transfer learning".
         | 
         | That's where you start with an existing large model, and train
         | a new model on top of it by feeding in new images.
         | 
         | What's fascinating about transfer learning is that you don't
         | need to give it a lot of new images, at all. Just a few hundred
         | extras can create a model that's frighteningly accurate for
         | tasks like image labeling.
         | 
         | This is pretty much how all AI models work today. Take a look
         | at the Stable Diffusion model card:
         | https://github.com/CompVis/stable-diffusion/blob/main/Stable...
         | 
         | They ran multiple training sessions with progressively smaller
         | (and higher quality) images to get the final result.
        
       | guelo wrote:
       | The next internet crawl is going to have thousands of (properly
       | labeled) AI generated images. I wonder if that could throw these
       | algorithms into a bad feedback spiral. Though I guess there are
       | already classifiers that can be used to exclude AI generated
       | images. The goal is to profit off of free human labor after all.
        
         | practice9 wrote:
         | > The goal is to profit off of free human labor after all.
         | 
         | this is too inflammatory in my opinion.
         | 
         | - some works are in public domain
         | 
         | - tech companies have profited from creators for a long time.
         | I'm sure some arrangement could be made for profit sharing for
         | artists who care about money, but it's too early for that (no
         | profits, I'm sure most companies are losing money on AI art)
         | 
         | - some artists care about art or fame more than money. Their
         | art will not be devalued by AI, if anything, constant usage of
         | their names in prompts is going to make them massively popular
         | and direct people to source material or merch, which they may
         | buy.
         | 
         | - some artists are dead and don't care anymore. Their "estate"
         | vulnerable to takeovers by "long lost but recently found"
         | relatives, who don't care about art itself, only about money.
         | Many such stories.
         | 
         | One example, albeit not in paintings but in music is Jimi
         | Hendrix Estate. They used to do copyright strikes on YouTube in
         | order to remove fan-made compilations of rare material (cleaned
         | up sound of live concerts, multiple sources mixed into one
         | etc.), without intentions to ever release an alternative.
        
       | thrdbndndn wrote:
       | > but often impossible with DALL-E 2, as you can see in this
       | Mickey Mouse example from my previous post
       | 
       | > "realistic 3d rendering of mickey mouse working on a vintage
       | computer doing his taxes" on DALL*E 2 (left) vs. Stable Diffusion
       | (right)
       | 
       | Well, but the mickey mouse in the right isn't "realistic", or
       | even 3D. It's straight up just a 2D Mickey image pasted there.
        
       | dmitriid wrote:
       | > Nearly half of the images, about 47%, were sourced from only
       | 100 domains, with the largest number of images coming from
       | Pinterest
       | 
       | This makes me vaguely uneasy. All these models and tools are
       | almost exclusively "western".
        
         | tough wrote:
         | > https://github.com/gradio-app/gradio
         | 
         | Plenty of asian / artists / styles on the datasets no?
        
       | gpm wrote:
       | Huh, there's a ton of duplicates in the data set... I would have
       | expected that it would be worthwhile to remove those. Maybe
       | multiple descriptions of the same thing helps, but some of the
       | duplicates have duplicated descriptions as well. Maybe
       | deduplication happens after this step?
       | 
       | http://laion-aesthetic.datasette.io/laion-aesthetic-6pls/ima...
        
         | minimaxir wrote:
         | Per the project page: https://laion.ai/blog/laion-400-open-
         | dataset/
         | 
         | > There is a certain degree of duplication because we used
         | URL+text as deduplication criteria. The same image with the
         | same caption may sit at different URLs, causing duplicates. The
         | same image with other captions is not, however, considered
         | duplicated.
         | 
         | I am surprised that image-to-image dupes aren't removed,
         | though, as the cosine similarity trick the page mentions would
         | work for that too.
        
           | kaibee wrote:
           | I assume having multiple captions for the same image is very
           | helpful actually.
        
             | minimaxir wrote:
             | Scrolling through the sorted link from the GP, there are a
             | few dupes with identical images and captions, so that
             | doesn't always work either.
        
           | djoldman wrote:
           | At a minimum a hash should be computed for each image and
           | dupes removed. I haven't read the paper so they might have
           | already done so.
        
           | gchamonlive wrote:
           | Isn't it really expensive to dedupe images based on content?
           | As you have to compare every image to every other image in
           | the dataset?
           | 
           | How could one go about deduping images? Maybe using something
           | similar to rsync protocol? Cheap hash method, then a more
           | expensive one, then a full comparison, maybe. Even so 2B+
           | images... and you are talking about saving on storage costs,
           | mostly which is quite cheap these days.
        
             | gpm wrote:
             | I don't have experience with image duplication, but if you
             | can make a decent hash a 2.3 billion item hashtable is
             | really cheap.
             | 
             | If you need to do something closer to pairwise (for
             | instance, because you can't make a cheap hash of images
             | which papers over differences in compression), make the
             | hash table for the text descriptions, then compare the
             | images within buckets. Of the few 5 or 6 text fields I just
             | spot checked (not even close to random selection) the worst
             | false positive I found (in the 12M data set) was 3 pairs of
             | two duplicates with the same description. On the other hand
             | I found one set of 76 identical images with the same
             | description.
        
             | wongarsu wrote:
             | There are hash algorithms for image similarity. For a toy
             | example, imagine scaling the image to 8x8px, making it
             | grayscale, and using those 64 bytes as hash. That way you
             | only have to hash each picture once, and can find
             | duplicates by searching for hashes with a low hamming
             | distance (number of bit flips) to each other, which is very
             | fast.
             | 
             | Of course actual hash algorithms are a bit cleverer, there
             | are a number to choose from depending on what you want to
             | consider a duplicate (cropping, flips, rotations, etc)
        
               | ClassyJacket wrote:
               | But even a single pixel being one shade brighter would
               | make one hash completely different, that's the point of
               | hashes.
        
               | fjkdlsjflkds wrote:
               | You're probably confusing "cryptographic hash functions"
               | with "perceptual hashing" (or other forms of "locality-
               | sensitive hashing"). In the case of the latter, what you
               | say is almost always not true (that's the point of using
               | "perceptual hashing" after all: similar objects get
               | mapped to similar/same hash).
               | 
               | See: https://en.wikipedia.org/wiki/Perceptual_hashing
        
             | visarga wrote:
             | No, you embed all images with CLIP and use an approximate
             | nearest neighbour library (like faiss) to get the most
             | similar ones to the query in logarithmic time. Embedding
             | will also be invariant to small variations.
             | 
             | You can try this on images.yandex.com - they do similarity
             | search with embeddings. Upload any photo and you'll get
             | millions of similar photos, unlike Google that has only
             | exact duplicate search. It's diverse like Pinterest but
             | without the logins.
             | 
             | Query image: https://cdn.discordapp.com/attachments/1005626
             | 182869467157/1...
             | 
             | Yandex similarity search results: https://yandex.com/images
             | /search?rpt=imageview&url=https%3A%...
        
             | ALittleLight wrote:
             | You don't have to compare all images to one another and
             | doing so wouldn't reliably dedupe - what if one image is
             | slightly different resolution, different image type,
             | different metadata, etc? They would different hashes but
             | still basically the same data.
             | 
             | I think the way you do it is to train a model to represent
             | images as vectors. Then you put those vectors into a BTree
             | which will allow you to efficiently query for the "nearest
             | neighbor" to an image on log(n) time. You calibrate to find
             | a distance that picks up duplicates without getting too
             | many non-duplicates and then it's n log(n) time rather than
             | n^2.
             | 
             | If that's still too slow there is also a thing called ANNOY
             | which lets you do approximate nearest neighbor faster.
        
             | minimaxir wrote:
             | Convert the images to embeddings, and perform an
             | approximate nearest neighbor search on them and identify
             | images that are very close together (e.g. with faiss, which
             | the page alludes to using).
             | 
             | It's performant enough even at scale.
        
             | [deleted]
        
             | acdha wrote:
             | It depends on exactly what problem you're trying to solve.
             | If the goal is to find the same image with slight
             | differences caused by re-encoding, downsampling, scaling,
             | etc. you can use something like phash.org pretty
             | efficiently to build a database of image hashes, review the
             | most similar ones, and use it to decide whether you've
             | already "seen" new images.
             | 
             | That approach works well when the images are basically the
             | same. It doesn't work so well when you're trying to find
             | images which are either different photos of the same
             | subject or where one of them is a crop of a larger image or
             | has been modified more heavily. A number of years back I
             | used OpenCV for that task[1] to identify the source of a
             | given thumbnail image in a larger master file and used
             | phash to validate that a new higher resolution thumbnail
             | was highly similar to the original low-res thumbnail after
             | trying to match the original crop & rotation. I imagine
             | there are far more sophisticated tools for that now but at
             | the time phash felt basically free in comparison the amount
             | of computation which OpenCV required.
             | 
             | 1. https://blogs.loc.gov/thesignal/2014/08/upgrading-image-
             | thum...
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-08-31 23:02 UTC)