[HN Gopher] Exploring 12M of the 2.3B images used to train Stabl...
___________________________________________________________________
Exploring 12M of the 2.3B images used to train Stable Diffusion
Author : detaro
Score : 427 points
Date : 2022-08-30 21:39 UTC (1 days ago)
(HTM) web link (waxy.org)
(TXT) w3m dump (waxy.org)
| wongarsu wrote:
| This is quite interesting. This really makes me wonder how much
| of the differences between Stable Diffusion, Dall-e 2 and
| MidJourney are due to different architectures and training
| intensity and how much is due to different datasets.
|
| For example Stable Diffusion knows much better than MidJourney
| what a cat looks like, MidJourney knows what a Hacker Cat looks
| like, while Stable Diffusion doesn't (you can tell it to make a
| cat in a hoodie in front of a laptop, but it won't come up with
| that on its own). Meanwhile for landscapes Stable Diffusion seems
| to have no problem with imagination. How much of that is simply
| due to blindspots in the training data?
| lovelearning wrote:
| > Unsurprisingly, a large number came from stock image
| sites...Adobe Stock's...iStockPhoto...Shutterstock
|
| Are they ok with their stock photos being used to train a service
| that's likely to bite into their stock photo business?
| PeterisP wrote:
| The general understanding is that noone cares if they're ok
| with that as training a model (even if a competing service) is
| not among the exclusive rights of copyright owners where their
| permission is required - if you have a legally obtained copy,
| you can use it as you want (except for the specific explicitly
| enumerated use cases that copyright law awards exclusively to
| the copyright holder) no matter if the copyright holder likes
| it or not.
| NIL8 wrote:
| It feels like a tsunami is coming and we have no idea how big it
| will be.
| [deleted]
| [deleted]
| wil421 wrote:
| Like the self driving revolution? Or the bitcoin/blockchain
| revolution?
|
| Personally, I'm not even getting out my popcorn yet.
| CuriouslyC wrote:
| Real people are creating real things with this tech right
| now. Beyond that, people are enthusiastically building on
| this technology to create higher level tools. This will only
| be able to go so far with the stable diffusion model, but the
| ceiling is still very high with what we already have, and
| given the pace of model progress we can realistically expect
| the next 10 years or so to be absolutely transformative for
| art, and probably after that writing and music.
| bottled_poe wrote:
| Fair position given the failure of crypto to live up to the
| revolutionary hype.
|
| This is clearly different. The value has been demonstrated -
| and it has clear implications for a lot of jobs.
| simonw wrote:
| Very different. This stuff is genuinely useful already, and
| is getting more effective every day.
| marktolson wrote:
| Bitcoin / Blockchain doesn't have any intrinsic value other
| than to those who believe in it. Self driving cars (Level
| 4-5) are not available to the public and is still in
| development. This stuff is real, produces some incredible
| results, available to the public and advancing at a rapid
| rate.
| cdata wrote:
| The output of these models seems really impressive, but for
| my money the notion that it has _value_ is undermined by
| the way its trainers keep "proprietary" data that is
| likely to be in violation of image usage rights at a large
| scale. What is the true value of something that can only be
| had at the other end of misbegotten
| extraction/exploitation? It seems like a similar trade-off
| to the one that web3 proponents are asking us to make. The
| apparent end-game is that we'll kill off all the true value
| creators - the working artists responsible for the source
| data - and all we'll be left with is an artifact of their
| works.
| nprateem wrote:
| No, real artists will just become artisans, like any
| producers of hand made goods
| axg11 wrote:
| Self driving cars are on the road. Any two people on Earth
| are able to trustlessly transmit value between them using
| Bitcoin. You seem cynical.
| astrange wrote:
| Any two wallet addresses are able to. That doesn't mean the
| people are. By abstracting the actual process of getting
| and using the Bitcoin on both ends you've lost all actual
| real world detail.
|
| ...and they can still lose it all to typos or fees
| permanently.
| hotpotamus wrote:
| Google still can't reliably figure out that an email from
| jpm0r4ncha$e telling me how much money I've won is spam.
| Once they nail that down, then maybe I'll step inside one
| of their self driving cars. Until then, I'll laugh at the
| video where the Tesla flattens a child-sized mannequin.
| simonw wrote:
| If anyone is interested in the technical details, the database
| itself is a 4GB SQLite file which we are hosting with Datasette
| running on Fly.
|
| More details in our repo: https://github.com/simonw/laion-
| aesthetic-datasette
|
| Search is provided by SQLite FTS5.
| sdwr wrote:
| The search speed is amazing!! Do you have to do a lot of pre-
| indexing to get it so fast?
| simonw wrote:
| It's SQLite's built in FTS index, nothing special on top of
| it. I built the index by running: sqlite-
| utils enable-fts data.db images text
|
| https://sqlite-
| utils.datasette.io/en/stable/cli.html#configu...
| sdwr wrote:
| Are you running on anything special compute-wise? I have a
| budget node running Mongo, it takes almost a second to
| fetch a single 1MB document.
|
| Writing it out I realize it's not indexed by the attribute
| I'm retrieving by...
| simonw wrote:
| I started this on a Fly instance with 256MB of RAM and a
| shared CPU. This worked great when it was just a couple
| of people testing it.
|
| Once it started getting traffic it started running a bit
| slow, so I bumped it up to a 2 CPU instance with 4GB of
| RAM and it's been fine since then.
|
| The database file is nearly 4GB and almost all memory is
| being used, so I guess it all got loaded into RAM by
| SQLite.
|
| I'll scale it back down again in a few days time, once
| interest in it wanes a bit.
| isoprophlex wrote:
| Use the index, Luke!
| snovv_crash wrote:
| But MongoDB is webscale!
| yojo wrote:
| I notice a surprising number of duplicates. E.g. if I sort by
| aesthetic, there's the same 500x500 Tuscan village painting
| multiple times on the first page of results.
|
| Presumably it wouldn't be so hard to hash the images and filter
| out repeats. Is the idea to keep the duplicates to preserve the
| description mappings?
| bambax wrote:
| I noticed this too, with the same description every time. How
| does this work in the model, does this give repeated images a
| bigger weight?
|
| It's surprising that these weren't filtered out, and it would
| be interesting to know the number of unique images. (When it
| is mentioned that a model was trained on 10 billion images
| for example, obviously if each image is repeated 5 times then
| the actual number of images is 2 billions, not 10.)
| jadbox wrote:
| Is there plans to expand the model to be even larger?
| rektide wrote:
| So excellent. Flipping the story we see all the time on it's
| head. AI's quasi-mystical powers are endless spectacle. Taking a
| look through the other side of the looking glass is vastly
| overdue. Amazing work.
|
| Just just just starting to scratch the surface. 2% of the data
| gathered, sources identified. These are a couple of sites we can
| now source as the primary powerers of AI. Barely reviewed, dived
| into in terms of the content itself. We have so little sense &
| appreciation for what lurks beneath, but this is a go.
| benreesman wrote:
| It is an unambiguous social necessity to demystify these
| things.
|
| In a world where the lay public didn't really know about
| photoshop, photoshop would be a terrifying weapon.
|
| Likewise modern ML is for the most part mysterious and/or
| menacing because it's opaque and arcane and mostly controlled
| by big corporate R&D labs.
|
| Get some charismatic science popularizers out there teaching
| people how it works, all the sudden not such a big scary thing.
| revskill wrote:
| It's hard for user to produce expected effects of result without
| clear guidance on correct grammar, contextual information,...
| withinboredom wrote:
| NSFW is entertaining. It tends to think "knobs" in the prompt
| mean "female breasts" which is annoying.
| cakeface wrote:
| In which I run `order by punsafe desc` and immediately regret it.
| nprateem wrote:
| Although it did show me that my masseuse isn't as skilled as I
| thought
| rom1504 wrote:
| Hi, laion5b author here,
|
| Nice tool!
|
| You can also explore the dataset there
| https://rom1504.github.io/clip-retrieval/
|
| Thanks to approximate knn, it's possible to query and explore
| that 5B datasets with only 2TB of local storage, anyone can
| download the knn index and metadata to run that locally too.
|
| Regarding duplicates, indeed it's an interesting topic!
|
| Laion5b deduplicated samples by url+text, but not by image.
|
| To deduplicate by image you need to have an efficient way to
| compute whether image a and b are the same.
|
| An idea to do that is to compute an hash based on clip
| embeddings. A further idea would be to train a network actually
| good at dedup and not only similarity by training on positive and
| negative pairs, eg with triple loss.
|
| Here's my plan on the topic
| https://docs.google.com/document/d/1AryWpV0dD_r9x82I_quUzBuR...
|
| If anyone is interested to participate, I'd be happy to guide
| them to do that. This is an open effort, just join laion discord
| server and let's talk.
| yreg wrote:
| I have been using the rom1504 clip retrieval tool[0] up until
| now, but the Datasette browser[1] seems much better for Stable
| Diffusion users.
|
| When my prompt isn't working, I often want to check whether the
| concepts I use are even present in the dataset.
|
| For example, inputting `Jony Ive` returns pictures of Jony Ive
| in Datasette and pictures of apples and dolls in clip
| retrieval.
|
| (I know laion 5B is not the same as laion aesthetic 6+, but
| that's a lesser issue.)
|
| [0] - https://rom1504.github.io/clip-retrieval/
|
| [1] - https://laion-aesthetic.datasette.io/laion-
| aesthetic-6pls/im...
| rom1504 wrote:
| This is due to the aesthetic scoring in the UI. Simply
| disable it if you want precise results rather than aesthetic
| ones.
|
| It works for your example
|
| I guess I'll disable it by default since it seems to confuse
| people
| rom1504 wrote:
| Done https://github.com/rom1504/clip-
| retrieval/commit/53e3383f58b...
|
| Using clip for searching is better than direct text
| indexing for a variety of reasons but here for example
| because it matches better what stable diffusion sees
|
| Still interesting to have a different view over the
| dataset!
|
| If you want to scale this out, you could use elastic search
| yreg wrote:
| I see, thanks! I didn't realize that as I thought I want to
| keep aesthetic scoring enabled since Stable Diffusion was
| trained on LAION-Aesthetics.
|
| ---
|
| Also: There is a joke to be made at Jony's expense
| regarding the need to turn off aesthetic scoring to see his
| face.
| ijk wrote:
| You are probably very aware of it, but just to highlight the
| importance of this for people who aren't aware: data
| duplication degrades the training and makes memorization (and
| therefore plagiarism, in the technical sense) more likely. For
| language models, this includes near-similarities, which I'd
| guess would extend to images.
|
| Quantifying Memorization Across Neural Language Models
| https://arxiv.org/abs/2202.07646
|
| Deduplicating Training Data Makes Language Models Better
| https://arxiv.org/abs/2107.06499
| https://twitter.com/arankomatsuzaki/status/14154721921003397...
| https://twitter.com/katherine1ee/status/1415496898241339400
| fareesh wrote:
| How would one go about adding more data into the dataset?
|
| Would one need to retrain the entire dataset? Or is there
| typically a way to just add an incremental batch?
| nbzso wrote:
| Thinking over this "magical" tech. A distinctive painting style
| is one of the biggest goals for an artist. Producing unique
| stroke and clear expression takes years of hard work and
| experimentation.
|
| An artist decides to sell prints of an expensive artwork, he or
| she publishes a photo on their website. AI scrapers get the
| images in data set update. Game over for the artist.
|
| I hope for a class action over this training data sets. I get
| that kids have fun with the new photoshop filters. I get that
| software is eating "the world" but someone must wake up an push
| the kill switch. It is possible.
| sennight wrote:
| Sounds like a hopeless protectionist endeavor, reminiscent of
| cartoonish Keynesian economics busywork: "We can't permit
| development of adding machines, what about all the hard work
| people have put into memorizing multiplication tables?!".
| nbzso wrote:
| No. Sounds like common sense. Not popular in the tech
| community nowadays. Data is the new "petrol". Since when the
| petrol is free of charge? People in mass are clueless. If you
| use my "human" accomplishments as an energy source, you must
| pay me. Period.
| sennight wrote:
| > Since when the petrol is free of charge?
|
| At around the same time that it actually resembles "the new
| data", or maybe even shares a single quality with it? Being
| a physical object, it is bound by physical properties...
| like scarcity. Data, being an abstract concept, suffers no
| such constraint. Same story for whatever artistic technique
| you've imagined to be not only valuable, but novel to all
| of human experience and wholly owned by you and you alone.
| Your valuation of your worth and that of your labor is
| laughably overinflated and the market is telling you so.
| Period.
| ks2048 wrote:
| They need a feature that for a given generation, shows the
| nearest image in the training set. It is clearly doing more than
| "memorizing", but for some "normal" queries, how do you know the
| output isn't very similar to some training image? That could have
| legal implications.
| mkl wrote:
| I don't think anyone has a definition of "nearest" that could
| accomplish that in general. Comparing pixel data is easy, but
| comparing the subjects portrayed and how is much harder to pin
| down.
| speedgoose wrote:
| You could try the reverse search in a images search engine.
| Google or Bing support that for example.
| epups wrote:
| As expected, very few NSFW images were included in the training
| set, according to this. They are more afraid of showing a penis
| than showing Mickey Mouse.
| karpathy wrote:
| Data is an excellent place to look at to get a sense of where the
| model is likely to work or not (what kinds of images), and for
| prompt design ideas because, roughly speaking, the probability of
| something working well is proportional to its frequency (or of
| things very similar to it) in the data.
|
| The story is more complex though because the data can often be
| quite far away from actual neural net training due to
| preprocessing steps, data augmentations, oversampling settings
| (it's not uncommon to not sample data uniformly during training),
| etc. So my favorite place to scrutinize is to build a "batch
| explorer": During training of the network one dumps batches into
| pickles immediately before the forward pass of the neural net,
| then writes a separate explorer that loads the pickles and
| visualizes them to "see exactly what the neural net sees" during
| training. Ideally one then spends some quality time (~hours)
| looking through batches to get a qualitiative sense of what is
| likely to work or not work and how well. Of course this is also
| very useful for debugging, as many bugs can be present in the
| data preprocessing pipeline. But a batch explorer is harder to
| obtain here because you'd need the full training
| data/code/settings.
| jimsimmons wrote:
| With big generative models, seeing data even once is more than
| sufficient to memorize it. So your claim that performance
| relates to frequency is not exactly correct.
|
| The whole point of this model class is that one can learn one
| word from one sample, another pixel from another one and so on
| to master the domain. The emergent, non-trivial generalization
| is what makes them so fascinating. There is no simple,
| linear/first order relationship with data and behaviour. Case
| in point: GPT3 can do few-shot learning despite not having used
| any explicit few-shot formatted data during training.
|
| Not saying you are wrong, but the story is not as simple as
| simple supervised learning with small datasets
| galangalalgol wrote:
| What does that say about how these models will behave as an
| increasingly large portion of their training data is outputs
| from similar models? Our curation of the outputs will
| hopefully help. And if one image really is enough, perhaps
| the smaller number of human created images will be sufficient
| to inject new stuff rather than stagnating?
| TaylorAlexander wrote:
| "The most frequent artist in the dataset? The Painter of Light
| himself, Thomas Kinkade, with 9,268 images."
|
| Oh that's why it is so good at generating Thomas Kinkade style
| paintings! I ran a bunch of those and they looked pretty good.
| Some kind of garden cottage prompt with Thomas Kinkade style
| works very well. Good image consistency with a high success rate,
| few weird artifacts.
| Quiark wrote:
| Can you come up with a prompt that will reproduce some of his
| painting almost exactly?
| TaylorAlexander wrote:
| Not sure.
| dr_dshiv wrote:
| I'm curious about this. Can it plagerize? Can GPT3? Why or
| why not?
| z3c0 wrote:
| I've noticed an affinity towards "The Greats" of modern
| painting. I've gotten incredible results from using Dali,
| Picasso, Bacon, Lichtenstein, etc. My luck with slightly-less-
| known artists of similar styles doesn't work as well (eg
| Braque, Guayasamin, or Gris, as opposed to Picasso)
| ryandv wrote:
| [deleted]
| [deleted]
| r3trohack3r wrote:
| This has been an incredibly helpful tool today exploring Stable
| Diffusion.
|
| I'm starting to realize Stable Diffusion doesn't understand many
| words, but it's hard to tell which words are causing it problems
| when engineering a prompt. Searching this dataset for a term is a
| great way to tell whether Stable Diffusion is likely to
| "understand" what I mean when I say that term; if there are few
| results, or if the results aren't really representative of what I
| mean, Stable Diffusion is likely to produce garbage outputs for
| those terms.
| Imnimo wrote:
| It is interesting that there are at least a few images in the
| dataset that were generated by previous diffusion methods:
|
| https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/im...
|
| (Surely there are many more that don't have this specific label).
| htrp wrote:
| Turtles all the way down
| thrdbndndn wrote:
| Why not just add a shortcut to search for the related images in
| these artist count table etc.?
|
| Also, a UI issue: the sorting arrow feels wrong:
|
| https://i.imgur.com/uyUXAXy.png
|
| The norm is that when the arrow is pointing down, the data are
| currently sorting descendingly (I'm aware you can interpret it as
| "what will happen if you click it", but the norm is to show what
| currently is using.)
| simonw wrote:
| That's the exact problem: should the button show what will
| happen, or what is currently happening?
|
| When I designed that feature I looked at a bunch of systems and
| found examples of arrows in both directions.
| thrdbndndn wrote:
| At least Excel and Google Sheets both display it the way I
| described. Adding Apple Numbers, which I never use
| unfortunately, I think that should cover like 90% of actual
| use cases to be considered as "convention".
| lmarcos wrote:
| I always had the crazy idea of "infinite entertainment": somehow
| we manage to "tap" into the multiverse and are able to watch TV
| from countless of planets/universes (I think Rick and Morty did
| something similar). So, in some channel at some time you may be
| able to see Brad Pitt fighting against Godzilla while the monster
| is hacking into the pentagon using ssh. Highly improbable, but in
| the multiverse TV everything is possible.
|
| Now I think we don't need the multiverse for that. Give this AI
| technology a few years and you'll have streaming services a la
| Netflix where you provide the prompt to create your own movie.
| What the hell, people will vote "best movie" among the millions
| submitted by other people. We'll be movie producers like we are
| nowadays YouTubers. Overabundance of high quality material and so
| little time to watch them all. Same goes for books, music and
| everything else that is digital (even software?).
| thatfrenchguy wrote:
| I'm curious how much this prevents new styles from emerging
| though, rather than rehashing of things that already exist.
| StrictDabbler wrote:
| "Highly improbable, but in the multiverse TV everything is
| possible."
|
| Quick reminder, there are infinitely many even numbers and none
| of them are odd.
|
| A given infinite (or transfinite) set does not necessarily
| contain all imaginable elements.
| Invictus0 wrote:
| Ok... I don't think anyone would expect an infinite supply of
| water to also include vodka and wine. He specifically said
| "infinite TV".
| StrictDabbler wrote:
| Does any universe have a "just a completely dark room"
| channel? How many dark room channels are there?
|
| Is there a universe with channels focusing on these
| subjects:
|
| "Video of the last tears of terminal cancer patients as
| Jerry Lewis tells jokes about his dick"
|
| "This guy doesn't like ice cream but he eats it to reassure
| his girlfriend that he isn't vegan"
|
| "A single hair grows on a scalp"
|
| "Infant children read two-century-old stories about
| shopping for goose-grease and comment on the prosody"
|
| "Gameshows where an entire country's left-handed population
| guesses how you'll die"
|
| This is the whole point of the Rick and Morty cable bit.
| There are things that would not be on TV in any universe
| that invents TV. It's hilarious to pretend they would be.
| Invictus0 wrote:
| It's clear you have a misunderstanding of infinity.
|
| In a show, there are a certain number of frames. In one
| frame, there are a certain number of pixels. Each pixel
| can be one of some number of colors. An infinite TV would
| be able to show every combination of every color of
| pixels, follow by every combination of frames,
| simultaneously and forever. All those shows are in there.
| Not only that, but all of this is also countably
| infinite.
| crabmusket wrote:
| Don't forget to posit an infinite curator who watches
| every channel on the infinite TV and responds to queries
| for channels with something sensible, not a random garble
| of pixels.
| StrictDabbler wrote:
| That's not a multiverse cable situation and there are
| _way_ more kinds of infinity than just countable and
| uncountable.
|
| In a multiverse situation you're watching actual content
| from an infinity of universes where beings exist who have
| produced and selected that content.
|
| You're not watching an infinite amount of static and
| magically selecting the parts of static that are
| coincidentally equivalent to specific content.
|
| All multiverse content must be watchable and creatable by
| the kind of creature that creates a television. So
| content that is unwatchable or uncreatable by any
| conceivable creature will not be in that infinite set.
|
| It is very easy to describe impossible content and any
| impossible content will not be on the multiverse TV.
|
| Trivial counterexamples that are describable but
| uncreatable, in any universe similar enough to ours that
| it has television:
|
| -a channel that reruns actual filmed footage of a given
| universe's big-bang will not exist.
|
| -a channel that shows *only* accurate, continuous footage
| of the future of that universe.
|
| -a channel that shows the result of dividing by zero.
|
| Channels that may or may not be uncreatable:
|
| -a channel that induces in the viewer the sensation of
| smelling their mother's hair.
|
| -a channel that causes any viewer to eat their own feet.
|
| -a channel that cures human retinal cancer.
|
| -a channel that shows you what you are doing, right now,
| in your own home in this universe, like a mirror. Note
| that this requires some connection between our universe
| and the other universe and there's no guarantee, in a
| multiverse situation, that connections between the
| universes are also a complete graph.
|
| These examples are more important. We know they are
| either possible or impossible but we do not know which.
| Just saying "multiverses are infinite" doesn't answer the
| question.
|
| For further reading, review
| https://en.wikipedia.org/wiki/Absolute_Infinite and
| remember that a channel is a defined set.
| temp_account_32 wrote:
| Isn't everything representable in a digital form? I think we're
| in the very early era of entertainment becoming commoditized to
| an even higher degree than it is now.
|
| I envision exactly the future as you describe: Feed a song to
| the AI, it spits out a completely new, whole discography from
| the artist complete with lyrics and album art that you can
| listen to infinitely.
|
| "Hey Siri, play me a series about chickens from outer space
| invading Earth": No problem, here's a 12 hour marathon,
| complete with a coherent storyline, plot twists, good acting
| and voice lines.
|
| The only thing that is currently limiting us is computing
| power, and given enough time, the barrier will be overcome.
|
| A human brain is just a series of inputs, a function that
| transforms them, and a series of outputs.
| gfody wrote:
| > So, in some channel at some time you may be able to see Brad
| Pitt fighting against Godzilla while the monster is hacking
| into the pentagon using ssh.
|
| that's the movie GAN, more interesting would be the zero-shot
| translations of epic foreign/historic films into your native
| culture
| [deleted]
| yojo wrote:
| Anyone have an idea of how many orders of magnitude advancement
| we are away from this? Like, it takes a high end GPU a non-
| trivial amount of time to make one low resolution image. A
| modern film is at least 30 of those a second, at far higher
| pixel density. It seems like you get a 10x improvement in GPU
| perf every 6-8 years[1], and it might take more than a couple
| of those.
|
| Plus you need to encode the ideas of plots/characters/scenes,
| and have through-lines that go multiple hours. It seems like
| with the current kit it's hard to even make a consistent
| looking children's book with hand picked illustrations.
|
| My gut is we are more than a few years off, but maybe I'm
| underestimating the low hanging fruit?
|
| 1: https://epochai.org/blog/trends-in-gpu-price-performance
| sircastor wrote:
| There are a handful of TV series canceled too soon that of love
| to see new episodes of. There are tremendous issues of consent
| involved, but it's a nice dream to have. It's a slippery slope
| to be sure, but imagine taking your well-honed fanfic script,
| loading it in a system and getting a ready-to-watch episode
| never before seen.
| cerol wrote:
| _> Overabundance of high quality material and so little time to
| watch them all._
|
| If a tree falls down in a forest, and there is no one there to
| hear it. Does it make a sound?
|
| If you can generate infinite material, how do you judge
| quality?
|
| You're extrapolating an idea based on what movies _are_ ,
| fundamentally. But you don't take in consideration what movies
| _are not_. Watching a movie is also a social experience. Going
| to the movie theater, waiting years for a big blockbuster
| title, watching something with friends. Word of mouth
| recommendation is a very big thing. If a close friend
| recommends me something (be it a movie or a book), I 'm much
| more inclined to like it just for the human connection it
| provides (reading or watching something other people enjoyed is
| a means of accessing someone else's psyche).
|
| If every time you watch a movie you have the knowledge that
| there is a movie that is slightly better a prompt away, why
| bother finishing this one? If you know you probably won't
| finish the movie you generated, why bother starting one? So
| what do you do? You end up rewatching The Office.
|
| Sure, if you tell me this will be _possible_ in a couple of
| years, I won 't object. The point is: will you pay for it on a
| _recurring basis_? Because if you don 't, this will be no more
| than a very cool tech project.
|
| ----
|
| I've recently had this idea for a sci-fi book: in a future not
| so distant, society is divided between tech and non-tech
| people. Tech people created pretty much everything they said
| they would create. AGI, smarter-than-human robots, you name it.
| But, it didn't change society at all. Companies still employ
| regular humans, people still watch regular made-by-human movies
| and eat handmade pizzas and drink their human-made lattes in
| hipster coffeshops. So tech people are naturally very
| frustrated at non-tech people, because they're not using
| optmizing their lives and business enough. And then you have
| this awkward situation where you have all these robots with
| brains of size of a galaxy lying around, doing nothing. And
| then some of them start developing depression, for spending too
| much time idle. And then the tech people have to rush to
| develop psychiatric robots. And then some robots decide to
| unionize, and others start writing books about how humans are
| taking jobs that were supposed to be automated.
| c3534l wrote:
| It does feel like we could at least get procedurally generated
| streaming music. Music is limited enough that it feels
| possible, and people are more than willing to rate and save
| music they like. The social element of raiding another person's
| curated playlist could take over the romantic notion of an
| artist's personal expression. Curating such lists of
| procedurally generated music could make everyone a musician.
| noitpmeder wrote:
| See this recent thread for more on this ==
| https://news.ycombinator.com/item?id=32559119
| Alupis wrote:
| I think a more interesting thought is how entertainment boils
| down to a sequence of 1's and 0's... and if we could somehow
| get enough of that sequence right, we could unveil
| video/audio/images of real people doing things they've never
| done - such as Brad Pitt fighting Godzilla.
|
| Imagine uncovering a movie that was never made but featured
| actors you know. If Steven Spielberg can make the movie, then
| there is an undiscovered sequence of 1's and 0's that already
| is that movie, a sequence that could be discovered without
| actually making the movie. Imagine "mining for movies"...
|
| Of course that sequence is likely impossible to ever predict
| enough of to actually discover something real... but it's a fun
| thought experiment.
| kilovoltaire wrote:
| If you like thinking about that sort of thing, and haven't
| read it yet, check out the short story "The Library of Babel"
| by Jorge Luis Borges
| CamperBob2 wrote:
| The Library of Babel, now only $19.95 per month!
| wpietri wrote:
| I think much more likely, at least this side of the
| singularity, what we'll have is infinite dreck. And for some
| people, that will be enough.
|
| It's extremely hard to make good content. Teams of extremely
| skilled, well-paid people, even ones who have succeeded before,
| fail regularly. And that's with complicated filtering
| mechanisms and review cycles to limit who has access and keep
| the worst of it from getting out.
|
| But not everybody needs everything to be actually good. My
| partner will sometimes unwind on a Friday by watching bad
| action movies; the bad ones are in some ways better, as they
| require less work to understand and are amusing in their own
| way. Or there's a mobile game I play when I want to not think,
| where you have to conquer a graph of nodes. The levels are
| clearly auto-generated, and it's fine.
|
| I think that kind of serviceable junk is where we might see AI
| get to in a couple of decades, made for an audience that will
| get a weed gummy and a six pack and ask for a "sci-fi action
| adventure with lots of explosions" and get something with a
| half-assed plot, forgettable stereotypical characters and
| visuals that don't totally make sense, but that's fine. You
| won't learn anything, you won't be particularly moved, and you
| won't ever watch it again, but it will be a perfectly cromulent
| distraction between clocking out and going to bed.
| jrvarela56 wrote:
| Whenever I think of AI and its implications, I find it useful
| to think of our current version of the AI: The Market. Its
| profit-maximizing function is the canonical paperclip
| maximizer.
|
| We are going to absolutely drown in crap. Just as we do now,
| it's just going to flood the internet at an unimaginable
| pace. We'll probably train AIs to help us find stuff and tell
| what's real/true/etc; it's going to be a heck of an arms
| race.
|
| It's going to be one hell of an arms race.
| peoplefromibiza wrote:
| > So, in some channel at some time you may be able to see Brad
| Pitt fighting against Godzilla while the monster is hacking
| into the pentagon using ssh.
|
| Don't know if this is sarcasm, if it is, ignore the rest of the
| comment.
|
| Honestly, it sounds terrible.
|
| Good shows are well written, coherent and, most of all, narrow
| in scope.
|
| If an AI can write the next Better Call Saul, great.
|
| Randomly patching together tropes sounds more like kids
| drawings, that are interesting on an artistic point of view,
| maybe, given their limited knowledge of reality and narrative,
| but terribly boring and confusing as a form of entertainment.
|
| Unless the audience is kids, they love that stuff, for reasons
| we don't understand anymore as we grow up.
| mrighele wrote:
| > now well written, not coherent, and broad in scope.
|
| > randomly patched together tropes
|
| Sounds like a dream, not as in what I wish for, but what I
| experience at night.
|
| So, if you think about it like a tool to enable a form of
| lucid dreaming, it may be something interesting.
|
| Of course you have to find a way to get for your brain what
| you want to see in "real time", but I think we will get
| there.
| peoplefromibiza wrote:
| > So, if you think about it like a tool to enable a form of
| lucid dreaming, it may be something interesting.
|
| we usually call that tool psychedelic drugs.
|
| there are devices being developed for that purpose, I don't
| think they will ever be reliable, AI is not necessary for
| that.
|
| On the philosophical implications of lucid dreams
|
| https://en.m.wikipedia.org/wiki/Waking_Life
| bigyikes wrote:
| The point of the quoted remark is to illustrate the exotic
| possibilities of "Multiverse TV". It's not an example of
| quality content you /would/ watch, it's merely some content
| you /could/ watch. Multiverse TV has everything, from Better
| Call Saul to zany strings of tropes.
| peoplefromibiza wrote:
| The point is that nonsense "multiverse TV" has been a thing
| since I can remember watching TV.
|
| Endless entertainment it's already there, you can't watch
| it simply because most of the content you're talking about
| is not on air and streaming platforms don't buy it, because
| it's shit.
|
| Not that I don't like shit, I've watched more
| troma/SyFi/random low budget Asian movies (I am a huge fan
| of ninja movies) than necessary, but if we stop to think
| that there are already 6 or 7 sharknado movies (that are
| exactly the kind of endless entertainment you talk about,
| they are probably generated in some way), maybe it's not
| the volume of content that's missing, but probably content
| that's worth watching.
| mtkhaos wrote:
| I think you are forgetting that a good portion of social
| media users are used to short term content.
| peoplefromibiza wrote:
| I believe there's already a lot more content on social
| media than time to watch it in 100 lives.
| visarga wrote:
| But if you want to see something specific, it's often not
| there unless you generate it.
| peoplefromibiza wrote:
| > But if you want to see something specific, it's often
| not there unless you generate it.
|
| Example?
|
| I don't think I've ever wanted to watch something that
| only a computer could generate.
| namrog84 wrote:
| I think a key piece here is missing.
|
| txt2img is quite limited and img2img is really where the
| power is. With a little intermittent guidance hand of a
| human.
|
| What took 100s or 1000s of people to write, act, record, post
| process, Better Call Saul might be doable by a a team 1/100th
| the size possibly even to a single individual. Which means
| while it might not just instantly spit it out. I'd just like
| youtube, there will be an incredible amount of great content
| to watch, far more than anyone could ever realistically
| watch.
|
| And of course there will be lots of utter trash as well.
|
| But if it took 100 people to make better call saul and now
| 100 individuals can make 100x different "better call sauces"
| xgkickt wrote:
| Sturgeon's law will probably end up at 99%
| idiotsecant wrote:
| Did you read the whole post? Parent post was talking about
| assisted movie generation where a human is making a movie and
| using the AI as a tool to make the content. This will
| _absolutely_ be an enormous thing in the next 10, 20 years
| and it will lead to a creative revolution in the same way
| that youtube did - entire genres that do not currently exist
| will come to fruition by lowering the barriers to entry to a
| huge number of creators.
|
| I don't have any trouble finding youtube channels that I like
| to watch and ignoring the rest, and I suspect I won't have
| any trouble finding movies generated using AI as a production
| tool that I want to watch either.
| peoplefromibiza wrote:
| > Did you read the whole post
|
| as per HN rule, don't ask this question.
|
| > Parent post was talking about assisted movie generation
| where a human is making a movie and using the AI as a tool
| to make the content
|
| which is exactly the problem we don't have.
|
| there are thousands of scripts written everyday that never
| see the green light.
|
| > and it will lead to a creative revolution
|
| it won't
|
| main reason why content is not produced is money.
|
| unless you find a way to create an infinite supply of money
| and an infinite amount of paying audience for that content,
| more content is a problem, not a solution.
|
| > I don't have any trouble finding youtube channels that I
| like to watch and ignoring the rest,
|
| so what's the problem?
|
| There's already infinite content out there, what does the
| "AI" brings to the table that will make any difference,
| other than marketing, like 3D movies?
|
| Have you watched any 3D movie recently?
| motoxpro wrote:
| It sounds like you are arguing FOR the GP's idea.
|
| "there are thousands of scripts written everyday that
| never see the green light." "main reason why content is
| not produced is money."
|
| So if there are plenty of ideas and not enough money, and
| you could put those ideas into a box and spit out a movie
| that would normally cost millions, that's good right?
| bawolff wrote:
| There is a lot more that goes in to it than an idea. Like
| that dude who has a great app "idea" and just needs
| someone to implement it, but is surprised nobody takes
| them up on it.
| peoplefromibiza wrote:
| > So if there are plenty of ideas and not enough money,
|
| big chunk of the budget is spent on marketing.
|
| if you produce something that nobody watch, it's like the
| sound of the tree falling where nobody can hear it.
|
| if you know how to use AI to cut that cost, I'm all ears.
|
| Also: Al Pacino will want his money if you use his name,
| even if he is not actually acting in the movie.
|
| Reality is that there's plenty of ideas, true, that would
| not make any money though.
|
| Studios don't like to work at loss.
|
| Rick and Morty costs 1.5 million dollars per episode and
| _from what we 've heard from director Erica Hayes, a
| single episode takes anywhere between 9 and 12 months to
| create from ideation to completion_
| MaxikCZ wrote:
| >if you know how to use AI to cut that cost, I'm all
| ears.
|
| Cutting costs seems to be the main reason AI is being
| explored. If you go to studio asking for budget to create
| a movie and predict "10 000 people will watch it", they
| will laugh in your face. If one person with the help of
| AI can make the movie and 10 000 people will watch it,
| its a win for everyone involved.
|
| I dont see youtube channels having enormous budgets for
| marketing, yet they find sizeable audience and make
| profit still. Once you lower the cost of production, you
| dont need huge marketing budgets to secure profits.
| peoplefromibiza wrote:
| > I dont see youtube channels having enormous budgets for
| marketing,
|
| Because they mainly support one person.
|
| You don't need a big budget to sell lemonade on the
| street, you can make a salary out of it, doesn't mean you
| have become a tycoon or have revolutionized the lemonade
| stand industry.
|
| > I dont see youtube channels having enormous budgets for
| marketing
|
| Have you seen those ADS every 15 seconds?
|
| That's the marketing budget, the whole YouTube ADS
| revenue is the marketing budget.
| bawolff wrote:
| > I don't have any trouble finding youtube channels that I
| like to watch and ignoring the rest, and I suspect I won't
| have any trouble finding movies generated using AI as a
| production tool that I want to watch either.
|
| Not really the same - there is a range from good to bad on
| youtube, because real people are adding the creative spark.
| There is no reason to suspect AI will generate such a
| range, and its unclear we will ever get to the point where
| AI can do "creativity" by itself.
| wpietri wrote:
| Exactly. I'm sure we'll see a lot of AI-assisted
| production. But AI-originated and high quality? I don't
| think I'll see it in my lifetime. (I do expect though,
| that we'll see people claiming works as AI-created, as
| the controversy will be stellar marketing.)
| w-ll wrote:
| Way back in 2019 someone use deepfake tech at the time to
| change what new lion king could look like. This is what i
| think some of us are thinking.
|
| http://geekdommovies.com/heres-what-the-live-action-lion-
| kin...
|
| I have a decently old LG 3d tv that can actually turn 2d into
| 3d, and its actually a lot of fun to watch certain stuff in
| 3d mode.
| CobrastanJorji wrote:
| Reminds me just a bit of the Culture series, where in the
| distant future computing power is essential infinite. In the
| series, many of the great AIs of unfathomable intellect spend
| their free time on the "Infinite Fun Space," which is
| simulating universes with slightly different starting
| conditions and physical laws.
| jacobn wrote:
| The model was trained on 2.3B images, but how many TV shows are
| there to train it on?
|
| There are quite a few books written, so maybe transfer learning
| from that?
| judge2020 wrote:
| > how many TV shows are there to train it on?
|
| None, according to the MPAA.
| ronsor wrote:
| ML training already involves scraping copyrighted content.
| I'm sure big tech megacorps would fight any lawsuits they
| receive.
| ortusdux wrote:
| Sounds like the "interdimensional cable" bits on Rick and
| Morty. Reportedly the co-creators would get back-out drunk and
| improvise the shows. My favorite is house hunters
| international, where ambulatory houses are being hunted by guys
| with shotguns.
|
| https://screenrant.com/rick-morty-interdimensional-cable-epi...
| egypturnash wrote:
| Aaaand the estimated percentage of images released under a CC
| license, or public domain, iiiiis...?
| [deleted]
| brohee wrote:
| > Strangely, enormously popular internet personalities like David
| Dobrik, Addison Rae, Charli D'Amelio, Dixie D'Amelio, and MrBeast
| don't appear in the captions from the dataset at all
|
| Self awareness here would have lead to the removal of "enormously
| popular".
| ISL wrote:
| That inclusion of Mickey in the model is waving a red flag in
| front of an impressive bull.
| tough wrote:
| I was just thinking yesterday if Mockey The Rat would fly as
| homage/deriv mickey has some good 70 years already no?
| Copyright will die at the hands of ML am afraid
| avocado2 wrote:
| web demo for stable diffusion:
| https://huggingface.co/spaces/stabilityai/stable-diffusion
|
| github (includes GFPGAN, realergan, and alot of other features):
| https://github.com/hlky/stable-diffusion
|
| colab repo (new): https://github.com/altryne/sd-webui-colab
|
| demo made with gradio: https://github.com/gradio-app/gradio
| supernova87a wrote:
| My fundamental and maybe dumb question is: when is artificial
| intelligence / ML going to get smarter than needing a billion
| images to train?
|
| Sure, the achievements of ML models lately are impressive, but
| it's _so slow_ in learning. We are brute forcing the DNNs it
| feels to me, which is not something that smacks of great
| achievement.
|
| You and I have never seen even 100,000 photos in our lives. Well,
| maybe the video stream from our eyes is a little different. But
| it's not a billion fundamentally different images.
|
| Is there anything I can read about why it is so slow to learn?
| How will it ever get faster? What next jump will fix this, or
| what am I missing as a lay person?
| xipho wrote:
| > You and I have never seen even 100,000 photos in our lives.
| Well, maybe the video stream from our eyes is a little
| different. But it's not a billion fundamentally different
| images.
|
| I would argue precisely the opposite (as you alude to), it's
| more than 100s of billions of fundamentally (what does this
| even mean?) different images. Calculate the frequency at which
| your eyes sample, think of the times the angle changes (new
| images), multiple by your age, multiple by 2 for two eyes
| looking slightly different directions, factor in your the noise
| your brain has in forming the image "in your head" because you
| drank too much... you can continue adding factors (hours of
| "TV" watched on average) add-naseum.
|
| It seems that "slow to learn" has a "real" target/bound, what
| humans are capable of. If it takes Bob Ross decades to paint
| all the "images" in his head, then maybe we should go easy on
| the algorithims?
| PeterisP wrote:
| Some experiments have demonstrated that being involved in the
| 'generation' of the visual data (i.e. choosing where to move,
| where to look, how to alter reality) gets significantly better
| learning than passively receiving the exact same visual data.
|
| Active learning is a good way to improve sample efficiency -
| however, as others note, don't underestimate the quantity of
| learning data that a human baby needs for certain skills even
| with good, evolution-optimized priors.
| namose wrote:
| I think the thing that's missing is that the AI can't train
| itself. If you were asked to draw a realistic x ray of a
| horse's ribcage, you'd probably google image search, do some
| research about horse anatomy, etc, before putting pen to paper.
| This thing is being trained exactly once, and can't learn
| dynamically. That'll be the next step I think.
| andreyk wrote:
| What are you are describing is pretty much reinforcement
| learning (or learning with access to a query-able knowledge
| engine, or active learning, or all of these combined). There
| is work on a bunch of variations of this, but it's true that
| it's early days for combining it with generative systems.
| 12ian34 wrote:
| Most of us have our eyes open, looking at things ~16 hours a
| day, with the first 18 years of our lives heavily focused on
| learning about what those things are, plus we have the extra
| brain capacity to remember those things, and to think about
| them in an abstract manner. My entire photo library alone is
| over 100,000 photos - and since I took all of them I will have
| "seen" them.
| andreyk wrote:
| A couple of things:
|
| * We have seen more than 100,000 "photos" in the sense that
| photos are just images - if photos are just images, we have a
| constant feed of "photos" every single moment our eyes are
| open. Of course, that's not the same as these training
| datasets, but it is still worth keeping in mind.
|
| * All of these things trained on massive datasets with self-
| supervised learning are in a sense addressing the "slowness" of
| learning you mention, since self-supervised (aka no annotations
| are needed beyond the data itself) "pre-training" on the
| massive datasets can then enable training for downstream tasks
| with way less data.
|
| * Arguably requiring massive datasets for pre-training is still
| a bit lame, but then again the 4-5 years of life it takes to
| reach pretty advanced intelligence in humans represents a
| whoooole lot of data. And as with self-supervised learning on
| these massive models, a lot of intelligence seems to come down
| to learning to predict the future from sensory input.
|
| * Humans also come with a lot of pre-wiring done by evolution,
| whereas these models are trained from scratch. Evolutionary
| wiring represents its own sort of "pre-training", of course.
|
| So basically, it is not so slow to learn as it seems. Arguably
| it could get faster once we train multimodal models and
| concepts from text can reinforce learning to understand images
| and so on, and people are working on it (eg GATO). There may
| also need to be a separation between low level 'instinct'
| intelligence and high-level 'reasoning' intelligence; AI still
| sucks at the second one.
| supernova87a wrote:
| I guess what I find really interesting is, how come we can
| start to self-label data we encounter in the wild, yet the
| DNN needs data to constantly be labeled at the same intensity
| per image, into the billions?
| simonw wrote:
| If you look at the text labels for the data used by Stable
| Diffusion you'll find that they are very low quality. Take
| a look at some here:
|
| https://laion-aesthetic.datasette.io/laion-
| aesthetic-6pls/im...
|
| Clearly quality of labeling isn't nearly as important once
| you are training on billions of images.
| fragmede wrote:
| Textual Inversion is about to flip that around (if I'm
| understanding the paper correctly).
|
| https://textual-inversion.github.io/
| andreyk wrote:
| This is only true for multi modal learning, but yeah, in
| that case we need text and image pairs. More than likely
| it's possible to pretrain image and language separately,
| and then use a vastly smaller number of pairs. But that's
| hypothetical.
| chse_cake wrote:
| If we consider a frame rate of 60 FPS then a 5 year old would
| have seen about ~ 6.3 billion images [60 (frames) * 60
| (seconds) * 60 (minutes) * 16 (waking hours) * 365 (days) * 5
| (years)]. Even with 30 FPS you can halve the number and it's
| still a huge number.
|
| A cool fact is that this model fit ~5B images in 900M
| parameter model which is tiny compared to the size of the
| data.
| fragmede wrote:
| Yeah. Humans take a _long_ time to train. We spend years and
| years, starting at birth, just absorbing everything around us
| before we get to a point where we 're considered adults.
| erichocean wrote:
| Human brains are also pre-trained at birth, on faces and a
| whole bunch of other things.
| supernova87a wrote:
| Yet how do we do it with CPUs that are even less power
| consuming than ARM chips, in our heads?
| noobermin wrote:
| Because believe it or not matrices are not brains.
|
| People need to get over the metaphors. If you want to
| spend your time learn about the mathematics under the
| hood, there will be less "mysteries" then.
| Ukv wrote:
| > You and I have never seen even 100,000 photos in our lives
|
| If someone's 30, that'd only require seeing 10 images a day.
| For most people that quota is probably fulfilled within a
| couple of minutes of watching TV or browsing social media, even
| if the video stream from our eyes otherwise counts as nothing.
|
| We've also had about 4 billion years of evolution, slowly
| adjusting our genome with an unfathomable amount of data.
| Gradient descent is blazing fast by comparison.
| omegalulw wrote:
| Add to that a video now a days is at least 24fps. So a two
| hour movie suffices haha :)
| simonw wrote:
| I recommend looking into "transfer learning".
|
| That's where you start with an existing large model, and train
| a new model on top of it by feeding in new images.
|
| What's fascinating about transfer learning is that you don't
| need to give it a lot of new images, at all. Just a few hundred
| extras can create a model that's frighteningly accurate for
| tasks like image labeling.
|
| This is pretty much how all AI models work today. Take a look
| at the Stable Diffusion model card:
| https://github.com/CompVis/stable-diffusion/blob/main/Stable...
|
| They ran multiple training sessions with progressively smaller
| (and higher quality) images to get the final result.
| guelo wrote:
| The next internet crawl is going to have thousands of (properly
| labeled) AI generated images. I wonder if that could throw these
| algorithms into a bad feedback spiral. Though I guess there are
| already classifiers that can be used to exclude AI generated
| images. The goal is to profit off of free human labor after all.
| practice9 wrote:
| > The goal is to profit off of free human labor after all.
|
| this is too inflammatory in my opinion.
|
| - some works are in public domain
|
| - tech companies have profited from creators for a long time.
| I'm sure some arrangement could be made for profit sharing for
| artists who care about money, but it's too early for that (no
| profits, I'm sure most companies are losing money on AI art)
|
| - some artists care about art or fame more than money. Their
| art will not be devalued by AI, if anything, constant usage of
| their names in prompts is going to make them massively popular
| and direct people to source material or merch, which they may
| buy.
|
| - some artists are dead and don't care anymore. Their "estate"
| vulnerable to takeovers by "long lost but recently found"
| relatives, who don't care about art itself, only about money.
| Many such stories.
|
| One example, albeit not in paintings but in music is Jimi
| Hendrix Estate. They used to do copyright strikes on YouTube in
| order to remove fan-made compilations of rare material (cleaned
| up sound of live concerts, multiple sources mixed into one
| etc.), without intentions to ever release an alternative.
| thrdbndndn wrote:
| > but often impossible with DALL-E 2, as you can see in this
| Mickey Mouse example from my previous post
|
| > "realistic 3d rendering of mickey mouse working on a vintage
| computer doing his taxes" on DALL*E 2 (left) vs. Stable Diffusion
| (right)
|
| Well, but the mickey mouse in the right isn't "realistic", or
| even 3D. It's straight up just a 2D Mickey image pasted there.
| dmitriid wrote:
| > Nearly half of the images, about 47%, were sourced from only
| 100 domains, with the largest number of images coming from
| Pinterest
|
| This makes me vaguely uneasy. All these models and tools are
| almost exclusively "western".
| tough wrote:
| > https://github.com/gradio-app/gradio
|
| Plenty of asian / artists / styles on the datasets no?
| gpm wrote:
| Huh, there's a ton of duplicates in the data set... I would have
| expected that it would be worthwhile to remove those. Maybe
| multiple descriptions of the same thing helps, but some of the
| duplicates have duplicated descriptions as well. Maybe
| deduplication happens after this step?
|
| http://laion-aesthetic.datasette.io/laion-aesthetic-6pls/ima...
| minimaxir wrote:
| Per the project page: https://laion.ai/blog/laion-400-open-
| dataset/
|
| > There is a certain degree of duplication because we used
| URL+text as deduplication criteria. The same image with the
| same caption may sit at different URLs, causing duplicates. The
| same image with other captions is not, however, considered
| duplicated.
|
| I am surprised that image-to-image dupes aren't removed,
| though, as the cosine similarity trick the page mentions would
| work for that too.
| kaibee wrote:
| I assume having multiple captions for the same image is very
| helpful actually.
| minimaxir wrote:
| Scrolling through the sorted link from the GP, there are a
| few dupes with identical images and captions, so that
| doesn't always work either.
| djoldman wrote:
| At a minimum a hash should be computed for each image and
| dupes removed. I haven't read the paper so they might have
| already done so.
| gchamonlive wrote:
| Isn't it really expensive to dedupe images based on content?
| As you have to compare every image to every other image in
| the dataset?
|
| How could one go about deduping images? Maybe using something
| similar to rsync protocol? Cheap hash method, then a more
| expensive one, then a full comparison, maybe. Even so 2B+
| images... and you are talking about saving on storage costs,
| mostly which is quite cheap these days.
| gpm wrote:
| I don't have experience with image duplication, but if you
| can make a decent hash a 2.3 billion item hashtable is
| really cheap.
|
| If you need to do something closer to pairwise (for
| instance, because you can't make a cheap hash of images
| which papers over differences in compression), make the
| hash table for the text descriptions, then compare the
| images within buckets. Of the few 5 or 6 text fields I just
| spot checked (not even close to random selection) the worst
| false positive I found (in the 12M data set) was 3 pairs of
| two duplicates with the same description. On the other hand
| I found one set of 76 identical images with the same
| description.
| wongarsu wrote:
| There are hash algorithms for image similarity. For a toy
| example, imagine scaling the image to 8x8px, making it
| grayscale, and using those 64 bytes as hash. That way you
| only have to hash each picture once, and can find
| duplicates by searching for hashes with a low hamming
| distance (number of bit flips) to each other, which is very
| fast.
|
| Of course actual hash algorithms are a bit cleverer, there
| are a number to choose from depending on what you want to
| consider a duplicate (cropping, flips, rotations, etc)
| ClassyJacket wrote:
| But even a single pixel being one shade brighter would
| make one hash completely different, that's the point of
| hashes.
| fjkdlsjflkds wrote:
| You're probably confusing "cryptographic hash functions"
| with "perceptual hashing" (or other forms of "locality-
| sensitive hashing"). In the case of the latter, what you
| say is almost always not true (that's the point of using
| "perceptual hashing" after all: similar objects get
| mapped to similar/same hash).
|
| See: https://en.wikipedia.org/wiki/Perceptual_hashing
| visarga wrote:
| No, you embed all images with CLIP and use an approximate
| nearest neighbour library (like faiss) to get the most
| similar ones to the query in logarithmic time. Embedding
| will also be invariant to small variations.
|
| You can try this on images.yandex.com - they do similarity
| search with embeddings. Upload any photo and you'll get
| millions of similar photos, unlike Google that has only
| exact duplicate search. It's diverse like Pinterest but
| without the logins.
|
| Query image: https://cdn.discordapp.com/attachments/1005626
| 182869467157/1...
|
| Yandex similarity search results: https://yandex.com/images
| /search?rpt=imageview&url=https%3A%...
| ALittleLight wrote:
| You don't have to compare all images to one another and
| doing so wouldn't reliably dedupe - what if one image is
| slightly different resolution, different image type,
| different metadata, etc? They would different hashes but
| still basically the same data.
|
| I think the way you do it is to train a model to represent
| images as vectors. Then you put those vectors into a BTree
| which will allow you to efficiently query for the "nearest
| neighbor" to an image on log(n) time. You calibrate to find
| a distance that picks up duplicates without getting too
| many non-duplicates and then it's n log(n) time rather than
| n^2.
|
| If that's still too slow there is also a thing called ANNOY
| which lets you do approximate nearest neighbor faster.
| minimaxir wrote:
| Convert the images to embeddings, and perform an
| approximate nearest neighbor search on them and identify
| images that are very close together (e.g. with faiss, which
| the page alludes to using).
|
| It's performant enough even at scale.
| [deleted]
| acdha wrote:
| It depends on exactly what problem you're trying to solve.
| If the goal is to find the same image with slight
| differences caused by re-encoding, downsampling, scaling,
| etc. you can use something like phash.org pretty
| efficiently to build a database of image hashes, review the
| most similar ones, and use it to decide whether you've
| already "seen" new images.
|
| That approach works well when the images are basically the
| same. It doesn't work so well when you're trying to find
| images which are either different photos of the same
| subject or where one of them is a crop of a larger image or
| has been modified more heavily. A number of years back I
| used OpenCV for that task[1] to identify the source of a
| given thumbnail image in a larger master file and used
| phash to validate that a new higher resolution thumbnail
| was highly similar to the original low-res thumbnail after
| trying to match the original crop & rotation. I imagine
| there are far more sophisticated tools for that now but at
| the time phash felt basically free in comparison the amount
| of computation which OpenCV required.
|
| 1. https://blogs.loc.gov/thesignal/2014/08/upgrading-image-
| thum...
| [deleted]
___________________________________________________________________
(page generated 2022-08-31 23:02 UTC)