[HN Gopher] Want to spot a deepfake? Look for the stars in their...
       ___________________________________________________________________
        
       Want to spot a deepfake? Look for the stars in their eyes
        
       Author : jonbaer
       Score  : 191 points
       Date   : 2024-07-18 14:34 UTC (1 days ago)
        
 (HTM) web link (ras.ac.uk)
 (TXT) w3m dump (ras.ac.uk)
        
       | brabel wrote:
       | Well, nice find, but now all the fakes have to do is add a new
       | layer of AI that knows how to fix the eyes.
        
         | chbint wrote:
         | Indeed, but useful nonetheless. Solving it may be a challenge
         | for a while, and deep fakes generated before a solution becomes
         | broadly available will remain detectable with this technique.
        
         | spywaregorilla wrote:
         | The commercial use cases for generative art is not to make
         | images that experts cannot discern as fake. It would be very
         | expensive to annotate training images to have physically
         | correct reflection images and the value would be negligible.
         | Realistically, if you want to produce something that is
         | impossible to prove fake, you would have a vastly easier time
         | doing such edits manually. We are very, very, very far from
         | being able to push button churn out undiscernable fakes. Even
         | making generically good outputs for art is a careful process
         | with lots of iteration.
        
         | kvemkon wrote:
         | Or just use conventional algorithm, since the fix is about
         | formal physics. Although it will not be a true 100% fix, it
         | could be good enough to make this test rather useless, because
         | even now:
         | 
         | > "There are false positives and false negatives..."
        
       | bqmjjx0kac wrote:
       | I wouldn't be shocked if phone cameras accidentally produced
       | weird effects like this. Case in point:
       | https://www.theverge.com/2023/12/2/23985299/iphone-bridal-ph...
        
         | Mashimo wrote:
         | Or https://www.theverge.com/2023/3/13/23637401/samsung-fake-
         | moo...
         | 
         | And there is also software that fixes you eyes for selfies and
         | video calls.
        
       | leidenfrost wrote:
       | AFAIK deepfakes can't mimic strong gesticulations very well, nor
       | mimic correctly a head facing sideways.
       | 
       | Or was that corrected?
        
         | BugsJustFindMe wrote:
         | > _deepfakes can 't_
         | 
         | There is a big difference between can't and don't. Every next
         | generation will do more than what the previous generation did.
        
         | SXX wrote:
         | While I'm not expert on state of the art you should keep in
         | mind there is huge difference with deepfakes created 100% from
         | scratch and those created via face swap and style transfer.
         | 
         | Basically it's easier to create believable gesticulations when
         | there is footage of actual person as raw material.
        
         | spywaregorilla wrote:
         | You think we can't generate a picture of a head facing
         | sideways? Obviously incorrect.
        
           | fourthark wrote:
           | The argument being that there is not very much video footage
           | of people turning their heads, therefore not enough data to
           | train deep fake videos / filters.
        
             | spywaregorilla wrote:
             | Videos are still very much in the baby phase. There are
             | way, way easier tells when a video has been faked. We're
             | talking about images. Turned head images are very much in
             | scope.
        
       | nottorp wrote:
       | I can't read TFA because it's probably HNed. However an artist
       | friend of mine said generated images are easy to spot because
       | every pixel is "perfect". Not only eyes.
       | 
       | Explained pretty well why I thought even non realistic ones felt
       | ... uncanny.
        
         | KineticLensman wrote:
         | > However an artist friend of mine said generated images are
         | easy to spot because every pixel is "perfect".
         | 
         | It depends on the art. There was a discussion here a while ago
         | that mentioned the use of Gen AI to create images of knitted /
         | crocheted dolls. The images looked okay at a quick glance but
         | were clearly fake because some of the colour changes weren't
         | aligned with the position of the stitches. E.g. strands of
         | supposed hair overlaid on the underlying stitched texture.
         | 
         | I'm sure there are lots of similar examples in other forms of
         | art where the appearance is physically impossible.
        
         | krisoft wrote:
         | > However an artist friend of mine said generated images are
         | easy to spot because every pixel is "perfect".
         | 
         | What does perfect mean? Does the pixels drawing 15 fingers
         | count as "perfect"?
         | 
         | I think this heuristic is liable to fail in both directions.
         | You will find images made by humans where "every pixel is
         | perfect" (whatever that means) and you will also find AI which
         | does mimic whatever imperfection you are looking for.
        
           | nottorp wrote:
           | > What does perfect mean?
           | 
           | Nothing out of focus or less detailed than other parts of the
           | image...
        
             | cthalupa wrote:
             | Even the most cursory browsing of civitai or genai image
             | generation subreddits shows this to not be true. Focus,
             | bokeh, etc. are all things that can be generated by these
             | models.
        
         | Suppafly wrote:
         | Your artist friend has deluded themselves with wishful
         | thinking.
        
       | jmmcd wrote:
       | They love saying things like "generative AI doesn't know
       | physics". But the constraint that both eyes should have
       | consistent reflection patterns is just another statistical
       | regularity that appears in real photographs. Better training,
       | larger models, and larger datasets, will lead to models that
       | capture this statistical regularity. So this "one weird trick"
       | will disappear without any special measures.
        
         | actionfromafar wrote:
         | Wouldn't also the adverserial model training have to the take
         | "physics correctness" into account? As long as the image
         | detects as "<insert celebrity> in blue dress", why would it
         | care about correct details in eyes if nothing in the "checker"
         | cares about that?
        
           | Filligree wrote:
           | Current image generators don't use an adversarial model.
           | Though the ones that do would have eventually encoded that as
           | well; the details to look for aren't hard-coded.
        
             | actionfromafar wrote:
             | Interesting. Apparently, I have _much_ to learn.
        
               | themoonisachees wrote:
               | GP told you how they don't work, but not how they do:
               | 
               | Current image generators work by training models to
               | remove artificial noise added to the training set. Take
               | an image, add some amount of noise, and feed it with it's
               | description as inputs to your model. The closest the
               | output is to the original image, the highest the reward
               | function.
               | 
               | Using some tricks (a big one is training simultaneously
               | on large and small amounts of noise), you ultimately get
               | a model that can remove 99% noise based only on the
               | description you feed it, and that means you can just swap
               | out the description for what you want the model to
               | generate and feed it pure noise, and it'll do a good job.
        
               | 101008 wrote:
               | I read this description of the algorithm a few times and
               | I find it fascinating because it's so simple to follow. I
               | have a lot of questions, though, like "why does it
               | work?", "why nobody thought of this before", and "where
               | is the extra magical step that moves this from 'silly
               | idea' to 'wonder work'"?
        
               | ikari_pl wrote:
               | answer to the 2nd and 3rd question is mostly "vastly more
               | computing power available", especially the kind that CUDA
               | introduced a few years back
        
         | aitchnyu wrote:
         | Did anybody prompt a GenAI to get this output?
        
           | spywaregorilla wrote:
           | It wouldn't work. The models could put stuff in the eyes but
           | it wouldn't be able to do so realistically, consistently or
           | even a fraction of the time. The text describing the images
           | does not typically annotate tiny details like correct
           | reflections in the eyes so prompting for it is useless.
        
         | krembo wrote:
         | Just like the 20 fingers disappeared
        
           | zamadatix wrote:
           | Those early diffusion generators sure managed to make the
           | flesh monster in The Witcher look sane sometimes.
        
         | sholladay wrote:
         | Agreed, but the tricks are still useful.
         | 
         | When there are no more tricks remaining, I think we must be
         | pretty close to AGI.
        
         | layer8 wrote:
         | That still won't make them understand physics.
         | 
         | This all reminds me of "fixing" mis-architected software by
         | adding extra conditional code for every special case that is
         | discovered to work incorrectly, instead of fixing the
         | architecture (because no one understands it).
        
           | tossandthrow wrote:
           | This is more a comment to the word "understand" than
           | "physics".
           | 
           | Yes, the models output will converge to being congruent with
           | laws of physics by virtue of deriving that as a latent
           | variable.
        
           | falcor84 wrote:
           | >That still won't make them understand physics
           | 
           | I would assume that larger models working with additional
           | training data will eventually allow them to understand
           | physics to the same extent as humans inspecting the world -
           | i.e. to capture what we call Naive Physics [0]. But the limit
           | isn't there; the next generation of GenAI could model the
           | whole scene and then render it with ray tracing (no special
           | casing needed).
           | 
           | [0] https://en.wikipedia.org/wiki/Na%C3%AFve_physics
        
             | layer8 wrote:
             | There seems to be little basis for this assumption, as
             | current models don't exhibit understanding. Understanding
             | would allow to apply it to situations that don't match
             | existing patterns in the training data.
        
             | aj7 wrote:
             | That's not large models "understanding physics." Better,
             | giving output "statistically consistent" with real physical
             | measurements. And no one, to my knowledge, has yet
             | succeeded in a general AI app that reverts to a
             | deterministic calculation in response to a prompt.
        
               | paulmd wrote:
               | chatgpt has had the ability to generate and call out to
               | deterministic python scripts for a year now
        
               | astrange wrote:
               | > And no one, to my knowledge, has yet succeeded in a
               | general AI app that reverts to a deterministic
               | calculation in response to a prompt.
               | 
               | They will all do this with a fixed seed. They just don't
               | do that because nobody wants it.
        
           | energy123 wrote:
           | Maybe it will. It really depends whether it's "easier" for
           | the network to learn an intuitive physics, versus a laundry
           | list of superficial hacks that let it minimise loss all the
           | same. If the list of hacks grows so long that gradient
           | descent finds it easier to learn the actual physics, then
           | it'll learn the physics.
           | 
           | Hinton argues that the easiest way to minimise loss in next
           | token prediction is to actually understand meaning. An
           | analogous thing may hold true in vision modelling wrt
           | physics.
        
             | quonn wrote:
             | > that gradient descent finds it easier to learn the actual
             | physics, then it'll learn the physics.
             | 
             | I guess it really depends on what the meaning of gradient
             | decent learning the physics is.
             | 
             | Maybe you define it to mean that the actually correct
             | equations appear encoded in the computation of the net. But
             | this would still be tacit knowledge. It would be kind of
             | like a math software being aware of physics at best.
        
             | Frieren wrote:
             | > It really depends whether it's "easier" for the network
             | to learn an intuitive physics, versus a laundry list of
             | superficial hacks that let it minimise loss all the same.
             | 
             | Human innate understanding of physics is a laundry list of
             | superficial hacks. People needs education and mental effort
             | to go beyond that innate but limited understanding.
        
               | godelski wrote:
               | When it is said that humans innately understand physics,
               | no one means that people innately understand the
               | equations and can solve physics problems. I think we all
               | know how laughable such a claim would be, because how
               | much people struggle when learning physics and how few
               | people even get to a moderate level (not even Goldstein,
               | but at least calculus based physics with partial
               | derivatives).
               | 
               | What people mean when saying people innately understand
               | physics is that they have a working knowledge of many of
               | the implications. Things like that gravity is uniformly
               | applied from a single direction and that is the direction
               | towards ground. That objects move in arcs or "ballistic
               | trajectories", that straight lines are uncommon, that
               | wires hang with hyperbolic function shapes even if they
               | don't know that word, that snow is created from cold,
               | that the sun creates heat, many lighting effects (which
               | is how we also form many illusions), and so on.
               | 
               | Essentially, humans know that things do not fall up. One
               | could argue that this is based on a "laundry list of
               | superficial hacks" and they wouldn't be wrong, but they
               | also wouldn't be right. Even when wrong, the human
               | formulations are (more often than not) causally
               | formulated. That is, explainable _and_ rational (rational
               | does not mean correct, but that it follows some logic.
               | The logic doesn't need to be right. In fact, no logic is,
               | just some are less wrong than others).
        
             | abeppu wrote:
             | If your entire existence was constrained to seeing 2d
             | images, not of your choosing, _could_ a perplexity-
             | optimizing process  "learn the physics"?
             | 
             | Basic things that are not accessible to such a learning
             | process:
             | 
             | - moving around to get a better view of a 3d object
             | 
             | - see actual motion
             | 
             | - measure the mass of an object participating in an
             | interaction
             | 
             | - set up an experiment and measure its outcomes
             | 
             | - choose to look at a particular sample at a closer
             | resolution (e.g. microscopy)
             | 
             | - see what's out of frame from a given image
             | 
             | I think we have at this point a lot of evidence that
             | optimizing models to understand distributions of images is
             | not the same thing as understanding the things in those
             | images. In 2013 that was 'DeepDream' dog worms, in 2018
             | that was "this person does not exist" portraits where
             | people's garments or hair or jewelry fused together or
             | merged with their background. In 2022 it was diffusion
             | images of people with too many fingers, or whose hands
             | melted together if you asked for people shaking hands. In
             | the Sora announcement earlier this year it was a woman's
             | jacket morphing while the shot zoomed into her face.
             | 
             | I think in the same way that LLMs do better at some
             | reasoning tasks by generating a program to produce the
             | answer, I suspect models which are trained to generate 3D
             | geometry and scenes, and run a simulation -> renderer ->
             | style transfer process may end up being the better way to
             | get to image models that "know" about physics.
        
             | godelski wrote:
             | > It really depends whether it's "easier" for the network
             | to learn an intuitive physics, versus a laundry list of
             | superficial hacks that let it minimise loss all the same
             | 
             | The latter is always easier. Not to mention that the
             | architectures are fundamentally curve fitters. There are
             | many curves that can fit data, but not all curves are
             | casually related to data. The history of physics itself is
             | a history of becoming less wrong and many of the early
             | attempts at problems (which you probably never learned
             | about fwiw) were pretty hacky approximations.
             | 
             | > Hinton argues
             | 
             | Hinton is only partially correct. It entirely depends on
             | the conditions of your optimization. If you're trying to
             | generalize and understand causality, then yes, this is
             | without a doubt true. But models don't train like this and
             | most research is not pursuing these (still unknown)
             | directions. So if we aren't conditioning our model on those
             | aspects, then consider how many parameters they have (and
             | aspects like superposition). Without a doubt the
             | "superficial hacks" are a lot easier and will very likely
             | lead to better predictions on the training data (and likely
             | test data).
        
           | noduerme wrote:
           | Isn't that just what neural networks do? The way light falls
           | on an object is physically deterministic, but the neural
           | network in the brain of a human painter doesn't actually
           | calculate rays to determine where highlights should be. A
           | center fielder knows where to run to catch a fly ball without
           | having to understand the physics acting on it. Similarly, we
           | can spot things that look wrong, not because we're refering
           | to physical math but because we have endless kludged-together
           | rules that supercede other rules. Like: Heavy objects don't
           | float. Except for boats which do float. Except for boats that
           | are leaking, which don't. To then explain why something is
           | happening we refer to specialized models, and these image
           | generation models are too general for that, but there's no
           | reason they couldn't refer to separate physical models to
           | assist their output in the future.
        
             | Sardtok wrote:
             | Boats are mostly air by volume, which isn't heavy at all
             | compared to water.
        
           | bawolff wrote:
           | > This all reminds me of "fixing" mis-architected software by
           | adding extra conditional code for every special case that is
           | discovered to work incorrectly...
           | 
           | Isn't that what AI training is in general? It has worked
           | pretty well so far.
           | 
           | I dont think img-gen AI is ever going to "understand
           | physics", but that isn't the task at hand. I don't think it
           | is neccesary to understand physics to make good fake
           | pictures. For that matter, i dont think understanding physics
           | would even be a good approach to the fake picture problem.
        
           | midtake wrote:
           | Most "humans" don't understand physics to a Platonic level
           | and act in much the same way as a model, finding best fits
           | among a set of parameters that produce a result that fits
           | some correctness check.
        
           | wvenable wrote:
           | > That still won't make them understand physics.
           | 
           | They don't have to. They just have to understand what makes a
           | realistic picture. The author of the article isn't really
           | employing physics either; he's comparing the eyes to each
           | other.
        
         | Someone wrote:
         | But we don't know how much larger the models will have to be,
         | how large the data sets or how much trianing is needed, do we?
         | They could have to be inconceivably large.
         | 
         | If you want to correct for this particular problem you might be
         | better off training a face detector, an eye detector and a
         | model that takes two eyes as input and corrects for this
         | problem. Process then would be:
         | 
         | - generate image
         | 
         | - detect faces
         | 
         | - detect eyes in each face
         | 
         | - correct reflections in eyes
         | 
         | That is convoluted, though, and would get very convoluted when
         | you want to correct for multiple such issues. It also might be
         | problematic in handling faces with glass eyes, but you could
         | try to 'detect' those with a model that is trained on the
         | prompt.
        
           | rocqua wrote:
           | I feel like a GAN method might work better, building a
           | detector, and training the model to defeat the detector.
        
           | bastawhiz wrote:
           | > They could have to be inconceivably large.
           | 
           | The opposite might also be true. Just having better, well
           | curated data goes a long way. LAION worked for a long time
           | because it's huge, but what if all the garbage images were
           | filtered out and the annotations were better?
           | 
           | The early generations of image and video models used middling
           | data because it was the only data. Since then, literally
           | everyone with data has been working their butts off to get it
           | cleaned up to make the next generation better.
           | 
           | Better data, more intricate models, and improvements to the
           | underlying infrastructure could mean these sorts of
           | "improvements" come mostly "for free".
        
         | amelius wrote:
         | Shouldn't a GAN be able to use this fact immediately in its
         | adversarial network?
        
           | godelski wrote:
           | Unfortunately no. The GAN always need to be in balance and
           | contention with the generator. You can swap out the
           | discriminator later, but you also got to make sure your
           | discriminator is able to identify these errors. And ML models
           | aren't the best at noticing small details. And since they too
           | don't understand physics, there is no reason to believe that
           | they will encode such information, despite every image in
           | real life requiring consistency. Also remember that there is
           | a learning trajectory, and most certainly these small details
           | are not learned early on in networks. The problem is that
           | this information is post hoc trivial to identify errors, but
           | it isn't a priori. It is also easy for you because you know
           | physics innately and can formulate causal explanations.
        
         | johnsutor wrote:
         | I know there are murmurs that synthetic data (i.e. using
         | rendering software with 3D models) was used to train some
         | generative models, including OpenAI Sora; seems like it's the
         | only plausible way right now to get the insane amounts of data
         | needed to capture such statistical regularities.
        
         | sangnoir wrote:
         | > Better training, larger models, and larger datasets, will
         | lead to models that
         | 
         | Hypothetically, with enough information, one could predict the
         | future (barring truly random events like radioactive decay).
         | Generative AI is also constrained by economic forces - how much
         | are GenAI companies willing to invest to get eyeball
         | reflections right? Would they earn adequate revenue to cover
         | the increase in costs to justify that feature? There are plenty
         | of things that humanity can technically achieve, that don't get
         | done because the incentives are not aligned- for instance,
         | there is enough food grown to feed every human on earth and the
         | technology to transport it, and yet we have hunger,
         | malnutrition and famines.
        
           | stevenwalton wrote:
           | > how much are GenAI companies willing to invest to get
           | eyeball reflections right?
           | 
           | Willing to? Probably not much. Should? A WHOLE LOT. It is the
           | whole enchilada.
           | 
           | While this might not seem like a big issue and truthfully
           | most people don't notice, getting this right (consistently)
           | requires getting a lot more right. It doesn't require the
           | model knowing physics (because every training sample face
           | will have realistic lighting). But what underlines this issue
           | is the model understanding subtleties. No model to date
           | accomplishes this. From image generators to language
           | generators (LLMs). There is a pareto efficiency issue here
           | too. Remember that it is magnitudes easier to get a model to
           | be "80% correct" than to be "90% correct".
           | 
           | But recall that the devil is in the details. We live in a
           | complex world, and what that means is that the subtleties
           | matter. The world is (mathematically) chaotic, so small
           | things have big effects. You should start solving problems
           | not worrying about these, but eventually you need to move
           | into tackling these problems. If you don't, you'll just
           | generate enshitification. In fact, I'd argue that the
           | difference between an amateur and an expert is knowledge of
           | subtleties and nuance. This is both why amateurs can trick
           | themselves into thinking they're more expert than they are
           | and why experts can recognize when talking to other experts
           | (I remember a thread a while ago where many people were
           | shocked about how most industries don't give tests or
           | whiteboard problems when interviewing candidates and how
           | hiring managers can identify good hires from bad ones).
        
           | dwaltrip wrote:
           | Getting the eyeballs correct will correlate with other very
           | useful improvements.
           | 
           | They won't train a better model just for that reason. It will
           | just happen along the way as they seek to broadly improve
           | performance and usefulness.
        
           | rowanG077 wrote:
           | Yeah every person is constantly predicting the future, often
           | even scarely accurately. I don't see how this is a hot take
           | at all.
        
           | bastawhiz wrote:
           | > how much are GenAI companies willing to invest to get
           | eyeball reflections right
           | 
           | This isn't how it works. As the models are improved, they
           | learn more about reality largely on their own. Except for
           | glaringly obvious problems (like hands, deformed limbs, etc)
           | the improvements are really just giving the models techniques
           | for more accurately replicating features from reasoning data.
           | There's nobody that's like "today we're working on
           | fingernails" or "today we're making hair physics work
           | better": it's about making the model understand and replicate
           | the features already present in the training dataset.
        
             | sangnoir wrote:
             | > This isn't how it works. As the models are improved, they
             | learn more about reality largely on their own.
             | 
             | AI models aren't complete blackboxes to the people who
             | develop them: there is careful thought behind the
             | architecture, dataset selection and model evaluation.
             | Assuming that you can take an existing model and simply
             | throw more compute at it will automatically result in
             | higher fidelity illumination modeling takes almost
             | religious levels of faith. If moar hardware is all you
             | need, Nvidia would have the best models in every category
             | right now. Perhaps someone ought to write the sequel to
             | Fred Brooks' book amd name it "The Mythical GPU-Cluster-
             | Month".
             | 
             | FWIW, Google has AI-based illumination adjustable in Google
             | Photos where one can add virtual lights - specialized
             | models already exist, but I'm very cynical about a generic
             | mixed model incidentally gaining those capabilities without
             | specific training for it. When dealing with exponential
             | requirements (training data, training time, GPUs, model
             | weight size), you'll run out of resources in short order.
        
           | kaba0 wrote:
           | I'm far from an expert on this, but these are often trained
           | in conjunction with a model that recognizes deep fakes.
           | Improving one will improve the other, and it's an infinite
           | recursion.
        
         | stevenwalton wrote:
         | > But the constraint that both eyes should have consistent
         | reflection patterns is just another statistical regularity that
         | appears in real photographs
         | 
         | Hi, author here of a model that does really good on this[0]. My
         | model is SOTA and has undergone a third party user study that
         | shows it generates convincing images of faces[1]. AND my
         | undergrad is in physics. I'm not saying this to brag, I'm
         | giving my credentials. That I have deep knowledge in both
         | generating realistic human faces and in physics. I've seen
         | hundreds of thousands of generated faces from many different
         | models and architectures.
         | 
         | I can assure you, these models don't know physics. What you're
         | seeing is the result of attention. Go ahead and skip the front
         | matter in my paper and go look at the appendix where I show
         | attention maps and go through artifacts.
         | 
         | Yes, the work is GANs, but the same principles apply to
         | diffusion models. Just diffusion models are typically MUCH
         | bigger and have way more training data (sure, I had access to
         | an A100 node at the time, but even one node makes you GPU poor
         | these days. So best to explore on GANs ):
         | 
         | I'll point out flaws in images in my paper, but remember that
         | these fool people and you're now primed to see errors, and if
         | you continue reading you'll be even further informed. In
         | Figures 8-10 you can see the "stars" that the article talks
         | about. You'll see mine does a lot better. But the artifact
         | exists in all images. You can also see these errors in all of
         | the images in the header, but they are much harder to see. But
         | I did embed the images as large as I could into the paper, so
         | you can zoom in quite a bit.
         | 
         | Now there are ways to detect deep fakes pretty readily, but it
         | does take an expert eye. These aren't the days of StyleGAN-2
         | where monsters are common (well... at least on GANs and
         | diffusion is getting there). Each model and architecture has a
         | different unique signature but there are key things that you
         | can look for if you want to get better at this. Here's things
         | that I look for, and I've used these to identify real world
         | fake profiles and you will see them across Twitter and
         | elsewhere:
         | 
         | - Eyes: Eyes are complex in humans with lots of texture. Look
         | for "stars" (inconsistent lighting), pupil dilation, pupil
         | shape, heterochromia (can be subtle see Figure 2, last row,
         | column 2 for example), and the texture of the iris. And also
         | make sure to look at the edge of eyes (Figs 8-10) and
         | 
         | - Glasses: look for aberrations, inconsistent
         | lighting/reflections, and pay very close attention to the edges
         | where new textures can be created
         | 
         | - Necks: These are just never right. The skin wrinkles, shape,
         | angles, etc
         | 
         | - Ears: These always lose detail (as seen in TFA and my paper),
         | lose symmetry in shape, are often not lit correctly, if there
         | are earrings then watch for the same things too (see TFA).
         | 
         | - Hair: Dear fucking god, it is always the hair. But I think
         | most people might not notice this at first. If you're having
         | trouble, start by looking at the strands. Start with Figure 8.
         | Patches are weird, color changes, texture, direction, and more.
         | Then try Fig 9 and TFA.
         | 
         | - Backgrounds: I make a joke that the best indicator to
         | determine if you have a good quality image is how much it looks
         | like a LinkedIn headshot. I have yet to see a generated photo
         | that has things happening in the background that do not have
         | errors. Both long-range and local. Look at my header image with
         | care and look at the bottom image in row 2 (which is pretty
         | good but has errors), row 2 column 4, and even row 1 in column
         | 4's shadow doesn't make sense.
         | 
         | - Phase Artifacts: This one is discussed back in StyleGAN2
         | paper (Fig 6). These are still common today.
         | 
         | - Skin texture: Without fail, unrealistic textures are created
         | on faces. These are hard to use in the wild though because
         | you're typically seeing a compressed image and that creates
         | artifacts too and you frequently need to zoom to see. They can
         | be more apparent with post processing though.
         | 
         | There's more, but all of these are a result of models not
         | knowing physics. If you are just scrolling through Twitter you
         | won't notice many of these issues. But if you slow down and
         | study an image, they become apparent. If you practice looking,
         | you'll quickly learn to find the errors with little effort. I
         | can be more specific about model differences but this comment
         | is already too long. I can also go into detail about how we
         | can't determine these errors from our metrics, but that's a
         | whole other lengthy comment.
         | 
         | [0] https://arxiv.org/abs/2211.05770
         | 
         | [1] https://arxiv.org/abs/2306.04675
        
         | ken47 wrote:
         | > So this "one weird trick" will disappear without any special
         | measures.
         | 
         | > Better training, larger models, and larger datasets
         | 
         | But "better training" here is a special measure. It would take
         | a lot of training effort to defeat this check. For example,
         | you'd need a program or group of people who would be able to
         | label training data as realistic/not based on the laws of
         | physics as reflected in subjects' eyeballs.
        
         | dheera wrote:
         | Exactly. Notably, in my experiments, diffusion models based on
         | U-Nets (e.g. SD1.4, SD2) are worse at capturing "correlations
         | at a distance" like this in comparison to newer, DiT-based
         | methods (e.g. SD3, PixArt).
        
       | gyosko wrote:
       | It seems that even discussion about AI is getting really
       | polarized like everything else these days.
       | 
       | Comments are always one of these two types:
       | 
       | 1 -> AI is awesome and perfect, if it isn't, another AI will make
       | it perfect 2 -> AI is just garbage and will always be garbage
        
         | EGreg wrote:
         | 1 also says "anything bad that AI does was already bad before
         | AI and you just didnt care, scale is irrelevant".
        
         | ben_w wrote:
         | I have seen those comments; but I do wonder, to what extent
         | that is because the comments' authors intended such positions
         | vs. the subtlety and nuance are hard to write and easy to
         | overlook when reading? (Ironically, humans are more boolean
         | than LLMs, the word "nuance" itself seems a bit like ChatGPT's
         | voice).
         | 
         | I'm sure people place me closer to #1 than I actually feel,
         | simply because I'm _more often_ responding to people who seem
         | to be too far in the #2 direction than vice versa.
        
           | digging wrote:
           | Your comment seems pretty accurate because, from my
           | perspective, I've _never_ seen comments of type #1. And so,
           | despite me explicitly saying otherwise, people like the GP
           | commenter may be reading my comments as #1.
        
             | LegionMammal978 wrote:
             | Even within this thread,
             | https://news.ycombinator.com/item?id=41005386,
             | https://news.ycombinator.com/item?id=41005633,
             | https://news.ycombinator.com/item?id=41010124, and to a
             | lesser extent https://news.ycombinator.com/item?id=41005240
             | seem like #1 to my eyes, with the sentiment of "It is
             | detectable, therefore it will be easily corrected by near-
             | future AI." Do you read these differently?
        
               | ben_w wrote:
               | Of these four:
               | 
               | The first ('''So this "one weird trick" will disappear
               | without any special measures''' etc.) does not seem so to
               | me, I do not read that as a claim of perfection, merely a
               | projection of the trends already seen.
               | 
               | The second ('''If the computer can see it we have a
               | discriminator than we can use in a GAN-like fashion to
               | train the network not to make that mistake again.''') I
               | agree with you, that's overstating what GANs can do.
               | They're good, they're not _that_ good.
               | 
               | The third ('''Once you highlight any inconsistency in AI-
               | generated content, IMHO, it will take a nothingth of a
               | second to "fix" that.''') I'd lean towards agreeing with
               | you, that seems to understate the challenges involved.
               | 
               | The fourth ('''Well, nice find, but now all the fakes
               | have to do is add a new layer of AI that knows how to fix
               | the eyes.''') is technically correct, but contrary to the
               | meme this is not the best kind of correct, and again it's
               | downplaying the challenge same as the previous (but it is
               | unclear to me if this is because nuance is hard to write
               | and to read or the genuine position). Also, once you're
               | primed to look for people who underestimate the
               | difficulties I can easily see why you would see it as
               | such an example as it's close enough to be ambiguous.
        
         | tivert wrote:
         | > 1 -> AI is awesome and perfect, if it isn't, another AI will
         | make it perfect 2 -> AI is just garbage and will always be
         | garbage
         | 
         | 3 -> An awesome AI will actually predictably be a deep negative
         | for nearly all people (for much more mundane reasons than the
         | Terminator-genocide-cliche), so the progress is to be dreaded
         | and the garbage-ness hoped for.
         | 
         | Your 1 is warmed over techno-optimism, which is far past its
         | sell-by date but foundational to the tech entrepreneurship
         | space. Your 2 greatly underestimates what tech people can
         | deliver.
        
         | keybored wrote:
         | Your comment is polarized.
         | 
         | Plenty of people think AI is useful (and equally as dangerous).
         | Only useful, not redefines-everything. "I use AI as an
         | assistant" is a common sentiment.
        
         | ectospheno wrote:
         | I'm in the AI is very useful but horribly named camp. It is all
         | A and no I.
        
         | rolph wrote:
         | 3 -> AI is still a technical concept, and does not yet exist.
        
         | bawolff wrote:
         | I think its because at this point there is nothing else
         | interesting to say. We've all seen AI generated images that
         | look impressively real. We've also all seen artifacts proving
         | they aren't perfect. None of this is really new at this point.
        
         | surfingdino wrote:
         | Nobody has given me a good reason to use it or proof that what
         | it does is more than recombining what it hoovers up, so... I'm
         | in the second camp.
        
           | wvenable wrote:
           | You could just... try it. It's very impressive what it can
           | do. It's not some catch-all solution to everything but it
           | saves me hours of time every week. Some of the things it can
           | do are really quite amazing; my real-life example:
           | 
           | I took a picture of my son's grade 9 math homework worksheet
           | and asked ChatGPT to tell me which questions he got wrong. It
           | did that perfectly.
           | 
           | But I use for the more mundane stuff like "From this long
           | class definition, can you create a list of assignments for
           | each property that look this: object1.propertyName =
           | object2.propertyName" and poof.
        
       | Y_Y wrote:
       | If you can see the difference than so can the computer. If the
       | computer can see it we have a discriminator than we can use in a
       | GAN-like fashion to train the network not to make that mistake
       | again.
        
       | raisedbyninjas wrote:
       | The sample images don't show a large difference between the real
       | and generated photo. The light sources in the real photo must
       | have been pretty close to the subject.
        
       | HumblyTossed wrote:
       | Isn't it easier to simply look for all the 6 fingered hands?
        
         | bitwize wrote:
         | Won't work on a deepfake of Count Rugen, for instance.
        
         | Suppafly wrote:
         | They've already mostly fixed the extra fingers and weird hands
         | in general issue.
        
       | olivierduval wrote:
       | Warning: photoshopped portraits (and most of pro portraits ARE
       | photoshopped, even slightly) may add "catch lights" in the eyes,
       | to make the portrait more "alive"
       | 
       | So that kind of "clues" only shows that the picture has been
       | processed, not that the people on the picture doesn't exists or
       | is a deepfake
        
         | acomjean wrote:
         | When I shot events years ago, I always used a flash for fill,
         | even outdoors. People like the glint in the eyes that it added.
         | 
         | Before the photoshop times you could sus out lighting setups
         | based on the reflections.
        
         | radicality wrote:
         | And the non-professional pictures, like your everyday
         | smartphone pictures everyone takes, pass through so many layers
         | of computational photography that sometimes make it pretty far
         | off from reality.
        
       | SXX wrote:
       | I wonder how true this is for face swap. Since actual scammers
       | likely wouldn't generate deepfakes completely from scratch or
       | static image.
        
       | symisc_devel wrote:
       | Well, they are relatively easy to spot with the current AI
       | software used to generate them especially if you are dealing on a
       | daily basis with presentation attacks aka deepfakes for facial
       | recognition. FACEIO has already deployed a very powerful model to
       | deter such attacks for the purpose of facial authentication:
       | https://faceio.net/security-best-practice#faceSpoof
        
       | neom wrote:
       | Random thought: GCHQ and IDF specifically seek out dyslexic
       | employees to put on spotting "things out of place" be it a issue
       | in a large amount of data, or something that seems wrong on a
       | map, to a picture that contains something impossible in physics.
       | Something about dyslexic processing provides an advantage here
       | (not sure if I'd take this or reading at 1 word per hour), given
       | GPTs are just NNs, I wonder if there is any "dyslexic specific"
       | neurology you could build a NN around and apply it to problems
       | neurodivergent minds are good at? Not sure what I'm really saying
       | here as I only have armchair knowledge.
        
       | singingwolfboy wrote:
       | https://archive.ph/pDf1x
        
       | keybored wrote:
       | > In an era when the creation of artificial intelligence (AI)
       | images is at the fingertips of the masses, the ability to detect
       | fake pictures - particularly deepfakes of people - is becoming
       | increasingly important.
       | 
       | The masses having access to things wasn't a cutoff point for me.
        
         | Nullinker wrote:
         | I would actually argue that once the masses are aware about
         | certain technology existing and being in widespread use it
         | becomes much easier to convince someone that a particular
         | instance that the data is not trustworthy, so the ability to
         | detect it through technological means becomes less important.
         | 
         | In the stage before widespread use people are much more easily
         | tricked because they are unaware that others have certain
         | capabilities which they never experienced first hand.
        
           | nick238 wrote:
           | You're missing the flip side: falsely believing something is
           | forged.
           | 
           | Now that the technology is so accessible and widespread,
           | someone could deny truth by saying whatever audio/visual
           | evidence was deepfaked, and people will believe it.
        
       | throw4847285 wrote:
       | Also be on the look out for high flyin' clouds and people dancin'
       | on a string.
        
       | RobotToaster wrote:
       | Interesting, some portrait photographers use cross polarised
       | light to eliminate reflection from glasses, but it has the side
       | effect of eliminating reflection from eyes.
        
       | adwi wrote:
       | > The Gini coefficient is normally used to measure how the light
       | in an image of a galaxy is distributed among its pixels. This
       | measurement is made by ordering the pixels that make up the image
       | of a galaxy in ascending order by flux and then comparing the
       | result to what would be expected from a perfectly even flux
       | distribution.
       | 
       | Interesting, I'd only heard of the Gini coefficient as an
       | econometric measure of income inequality.
       | 
       | https://en.m.wikipedia.org/wiki/Gini_coefficient
        
         | ploika wrote:
         | Some decision tree algorithms use it to decide what variable to
         | split on when creating new branches.
        
         | buildsjets wrote:
         | Also found it interesting, but for it's technical merits, as I
         | recently had to glue some code together to analyze/compare
         | droplet size from still frames of a high speed video of a
         | pressurized nozzle spraying a flammable fluid. (into a fire!
         | neat! fire! FIRE!)
         | 
         | This approach might have been useful to try. I ended up finding
         | a way to use ImageJ, an open source tool published by the NIH
         | that biologists use to automatically count bacterial colony-
         | forming units growing on petri dishes, but it was very slow and
         | hacky. It was not perfect, but it gave an objective way to
         | quantify information from a large body of existing test data
         | with zero budget. https://en.wikipedia.org/wiki/ImageJ
        
       | GaggiX wrote:
       | Did they try using this method on something that is not StyleGAN?
        
       | butlike wrote:
       | I don't understand the "galaxy" terminology in the sentence: "To
       | measure the shapes of galaxies, we analyse whether they're
       | centrally compact, whether they're symmetric, and how smooth they
       | are"
       | 
       | Can someone explain?
        
         | meatmanek wrote:
         | Given that this is from the Royal Astronomical Society, I think
         | they're literally talking about galaxies. They're then using
         | these same scoring functions to characterize the reflections on
         | the subjects' eyes, and comparing the result for the two eyes
         | -- real photos should have similar values, generated images
         | have more variation between the two eyes.
        
       | AlbertCory wrote:
       | I took a film lighting class a long, long time ago at a community
       | college. Even then, you could look at a closeup and tell where
       | the lights were by the reflections in the eyes.
        
       | crazygringo wrote:
       | I don't know, the example photos of deepfakes here seem... pretty
       | good. If that's the worst they could find, then this doesn't seem
       | useful at all.
       | 
       | Even in the real photos, you can see that the reflections are
       | different in both position and shape, because the two eyeballs
       | aren't perfectly aligned and reflections are going to be
       | genuinely different.
       | 
       | And then when you look at the actual "reflections" their software
       | is supposedly detecting (highlighted in green and blue) and you
       | compare with the actual photo, their software is doing a
       | _terrible_ job detecting reflections in the first place --
       | missing some, and spuriously adding others that don 't exist.
       | 
       | Maybe this is a valuable tool for spotting deepfakes, but this
       | webpage is doing a _terrible_ job at convincing me of that.
       | 
       | (Not to mention that reflections like these are often added in
       | Photoshop for professional photography, which might have similar
       | subtle positioning errors, and training on those photos
       | reproduces them. So then this wouldn't tell you at all that it's
       | an AI photo -- it might just be a real photo that someone
       | photoshopped reflections into.)
        
       | ch33zer wrote:
       | I suspect that detecting ai generated content will becomes an
       | arms race just like spam filtering and seo. Business will be
       | built on using secret ml models detecting smaller and smaller
       | irregularities in images and text. It'll be interesting to see
       | who wins
        
       | notorandit wrote:
       | Once you highlight any inconsistency in AI-generated content,
       | IMHO, it will take a nothingth of a second to "fix" that.
        
       | grvbck wrote:
       | Am I missing something here, or are the authors incorrectly using
       | the term "deepfake" where "AI-generated" would have been more
       | appropriate?
       | 
       | There's a lot of comments here discussing how generative AI will
       | deal with this, which is really interesting.
       | 
       | But if somebody's actual goal was to pass off a doctored/AI-
       | generated image as authentic, it would be very easy to just
       | correct the eye reflection (and other flaws) manually, no?
        
       | threatripper wrote:
       | I really wonder where the limit is for AI. Reality has an
       | incredible amount of detail that you can't just simulate or
       | emulate entirely. However, our perception is limited, and we
       | can't process all those details. AI only has to be good enough to
       | fool our perception, and I'm confident that every human-
       | understandable method for identifying fakes can be fooled by
       | generative AI. It will probably be up to AI to identify AI-
       | generated content. Even then, noise and limited resolution will
       | mask the flaws. For many forms of content, there will simply be
       | no way to determine what's real.
        
       ___________________________________________________________________
       (page generated 2024-07-19 23:05 UTC)