[HN Gopher] Visual Anagrams: Generating optical illusions with d...
       ___________________________________________________________________
        
       Visual Anagrams: Generating optical illusions with diffusion models
        
       Author : beefman
       Score  : 292 points
       Date   : 2023-11-30 18:39 UTC (4 hours ago)
        
 (HTM) web link (dangeng.github.io)
 (TXT) w3m dump (dangeng.github.io)
        
       | minimaxir wrote:
       | Note that this technique and its results are unrelated to the
       | infamous "spiral" ControlNet images a couple months back:
       | https://arstechnica.com/information-technology/2023/09/dream...
       | 
       | Per the code, the technique is based off of DeepFloyd-IF, which
       | is not as easy to run as a Stable Diffusion variant.
        
         | SamBam wrote:
         | I missed it, what was infamous about it?
        
           | minimaxir wrote:
           | It created a backlash because a) it was too popular with AI
           | people hyping "THIS CHANGES EVERYTHING!" and people were
           | posting low-effort transformations to the point that it got
           | saturated and b) non-AI people were "tricked" into thinking
           | it was a clever trick with real art since ControlNet is not
           | ubiquitous outside the AI-sphere, and they got mad.
        
             | andybak wrote:
             | I rather liked it and actually didn't get to see as many
             | examples as I wanted to.
             | 
             | Is there a good repository anywhere or is it just "wade
             | through twitter"?
        
         | Der_Einzige wrote:
         | I always thought it was weird that this idea took off with that
         | particular controlnet model. Many other controlnet models when
         | combined with those same images produce excellent and striking
         | results.
         | 
         | The ecosystem around Stable Diffusion in general is so massive.
        
           | minimaxir wrote:
           | Other ControlNet adapters either preserve the high-level
           | shape not enough or preserve it _too_ well, IMO. Canny /Depth
           | ControlNet generations are less of an illusion.
        
         | ShamelessC wrote:
         | > Per the code, the technique is based off of DeepFloyd-IF,
         | which is not as easy to run as a Stable Diffusion variant.
         | 
         | I haven't dug in yet, but it _should_ be possible to use their
         | ideas in other diffusion networks? It may be a non-trivial
         | change to the code provided though. Happy to be corrected of
         | course.
        
           | minimaxir wrote:
           | I suspect the trick only works because DeepFloyd-IF operates
           | in pixel space while other diffusion models operate in the
           | latent space.
           | 
           | > Our method uses DeepFloyd IF, a pixel-based diffusion
           | model. We do not use Stable Diffusion because latent
           | diffusion models cause artifacts in illusions (see our paper
           | for more details).
        
       | mg wrote:
       | I had a similar idea early last year and also dabbled with a
       | checkerboard approach.
       | 
       | Here a cat is made from 9 paintings of cats in the style of
       | popular painters:
       | 
       | https://twitter.com/marekgibney/status/1521500594577584141
       | 
       | You might have to squint your eyes to see it.
       | 
       | I made a few of them and then somehow lost interest.
        
         | hammock wrote:
         | That's really cool. Can you do 3x3x3? As in, 9x9 with 81 1-cell
         | cats, 9 9-cell cats and 1 81-cell cat?
        
       | jamilton wrote:
       | I really like the man/woman inversion.
       | 
       | I wonder how many permutations could legibly be generated in a
       | single image with an extended version of the same technique. I
       | don't understand the math, but would two orthogonal
       | transformations in sequence still be an orthogonal transformation
       | and thus work?
        
         | xanderlewis wrote:
         | I'm not sure whether 'orthogonal transformations' in this
         | context refers to the usual orthogonal _linear_ transformations
         | ( /matrices), but if so then yes.
        
       | moritzwarhier wrote:
       | This is wonderful.
        
       | willsmith72 wrote:
       | > This colab notebook requires a high RAM and V100 GPU runtime,
       | available through Colab Pro.
       | 
       | That's sad, I would've loved to try it.
        
         | andybak wrote:
         | well - chuck $10 at it and spend the rest of your month trying
         | other things.
         | 
         | (Back in Disco Diffusion days I was happy to spend money on
         | Colab Pro. It was fun)
        
         | nomel wrote:
         | I completely disagree. It's _fantastic_ that we can get access
         | to this hardware for so cheap. A used V100 is $1300. You could
         | pay for Colab Pro for _10 years_ with that, which will get you
         | faster and faster hardware through the years. Where I am, a
         | month is the cost of two bags of chips.
        
       | hammock wrote:
       | Do real-life jigsaw puzzles like the ones shown here, exist for
       | purchase?
        
       | downboots wrote:
       | Could the idea of puzzle piece rearrangement also extend for
       | something similar to self-tiling tile sets?
        
       | cloudyporpoise wrote:
       | This may be one of the cooler things i've ever seen
        
       | Nition wrote:
       | The duck/rabbit that rearranges would be really cool to use on
       | one of those sliding puzzles. Two valid solutions!
        
       ___________________________________________________________________
       (page generated 2023-11-30 23:00 UTC)