[HN Gopher] Identifying Stable Diffusion XL 1.0 images from VAE ...
       ___________________________________________________________________
        
       Identifying Stable Diffusion XL 1.0 images from VAE artifacts
       (2023)
        
       Author : rcarmo
       Score  : 47 points
       Date   : 2024-04-05 16:38 UTC (6 hours ago)
        
 (HTM) web link (hforsten.com)
 (TXT) w3m dump (hforsten.com)
        
       | blt wrote:
       | (2023)
        
         | dang wrote:
         | Added. Thanks!
        
       | GaggiX wrote:
       | 8 months ago StabilityAI changed the default VAE from 1.0 to 0.9,
       | so no one realistically uses VAE 1.0.
        
         | wut42 wrote:
         | And this article is nine months old.
        
         | Dwedit wrote:
         | I can see why, that VAE looks pretty bad.
        
       | kken wrote:
       | Generally, the VAE is mapping from a small latent space to a
       | large image space. This means that there must be a large number
       | of images for which no reverse mapping exists.
       | 
       | It should be possible to identify images that have not been
       | generate by the VAE since they are not part of the set images
       | that the VAE can generate. The other way round is a bit more
       | difficult as there may be images that can be mapped to the latent
       | space and back without loss but have been generated in another
       | way
       | 
       | -> there may be false positives.
        
         | chacham15 wrote:
         | This logic has a key flaw: just the fact that the size of the
         | space is different doesnt mean that every representable thing
         | in the larger space is a thing we care about. E.g. a person
         | with three hands may not have a representation in the smaller
         | space, but we would never care about that. The actual question
         | is: what is the difference in the amount of information encoded
         | in a large image vs the small latent space and compare that to
         | the difference in information between a large image and a small
         | image. If those two differences are close enough together,
         | being able to determine a legitimate difference between SD
         | generated vs not becomes near impossible.
        
           | Onavo wrote:
           | Yes, otherwise cryptographic hashes won't work (they are not
           | bijective)
        
       | TrueDuality wrote:
       | Very interesting! Well broken down and explained!
        
       | tsycho wrote:
       | Has anyone tried training a neural net to distinguish between
       | photographed vs AI generated images?
       | 
       | Of course, you will need to remove exifs or other metadata, but
       | this sounds like the kind of domain that NNs are good at.
        
         | moofight wrote:
         | yes, and it's quite challenging for instance =>
         | https://sightengine.com/detect-ai-generated-images
        
         | HPsquared wrote:
         | Isn't that basically how the image generation models themselves
         | work? By refining the image until it can't distinguish it from
         | a "real image".
        
           | jncfhnb wrote:
           | That was trendy for a while but is no longer the primary
           | method
        
         | ok123456 wrote:
         | Yes. Trying to sell an "AI detector" is a fool's errand since
         | this is how adversarial networks (the hot model from two hype
         | cycles ago) are trained. The only use of the "AI detector " is
         | to tune the model so that the detector is uniformly random.
        
       ___________________________________________________________________
       (page generated 2024-04-05 23:01 UTC)