[HN Gopher] Want to spot a deepfake? Look for the stars in their...
___________________________________________________________________
Want to spot a deepfake? Look for the stars in their eyes
Author : jonbaer
Score : 191 points
Date : 2024-07-18 14:34 UTC (1 days ago)
(HTM) web link (ras.ac.uk)
(TXT) w3m dump (ras.ac.uk)
| brabel wrote:
| Well, nice find, but now all the fakes have to do is add a new
| layer of AI that knows how to fix the eyes.
| chbint wrote:
| Indeed, but useful nonetheless. Solving it may be a challenge
| for a while, and deep fakes generated before a solution becomes
| broadly available will remain detectable with this technique.
| spywaregorilla wrote:
| The commercial use cases for generative art is not to make
| images that experts cannot discern as fake. It would be very
| expensive to annotate training images to have physically
| correct reflection images and the value would be negligible.
| Realistically, if you want to produce something that is
| impossible to prove fake, you would have a vastly easier time
| doing such edits manually. We are very, very, very far from
| being able to push button churn out undiscernable fakes. Even
| making generically good outputs for art is a careful process
| with lots of iteration.
| kvemkon wrote:
| Or just use conventional algorithm, since the fix is about
| formal physics. Although it will not be a true 100% fix, it
| could be good enough to make this test rather useless, because
| even now:
|
| > "There are false positives and false negatives..."
| bqmjjx0kac wrote:
| I wouldn't be shocked if phone cameras accidentally produced
| weird effects like this. Case in point:
| https://www.theverge.com/2023/12/2/23985299/iphone-bridal-ph...
| Mashimo wrote:
| Or https://www.theverge.com/2023/3/13/23637401/samsung-fake-
| moo...
|
| And there is also software that fixes you eyes for selfies and
| video calls.
| leidenfrost wrote:
| AFAIK deepfakes can't mimic strong gesticulations very well, nor
| mimic correctly a head facing sideways.
|
| Or was that corrected?
| BugsJustFindMe wrote:
| > _deepfakes can 't_
|
| There is a big difference between can't and don't. Every next
| generation will do more than what the previous generation did.
| SXX wrote:
| While I'm not expert on state of the art you should keep in
| mind there is huge difference with deepfakes created 100% from
| scratch and those created via face swap and style transfer.
|
| Basically it's easier to create believable gesticulations when
| there is footage of actual person as raw material.
| spywaregorilla wrote:
| You think we can't generate a picture of a head facing
| sideways? Obviously incorrect.
| fourthark wrote:
| The argument being that there is not very much video footage
| of people turning their heads, therefore not enough data to
| train deep fake videos / filters.
| spywaregorilla wrote:
| Videos are still very much in the baby phase. There are
| way, way easier tells when a video has been faked. We're
| talking about images. Turned head images are very much in
| scope.
| nottorp wrote:
| I can't read TFA because it's probably HNed. However an artist
| friend of mine said generated images are easy to spot because
| every pixel is "perfect". Not only eyes.
|
| Explained pretty well why I thought even non realistic ones felt
| ... uncanny.
| KineticLensman wrote:
| > However an artist friend of mine said generated images are
| easy to spot because every pixel is "perfect".
|
| It depends on the art. There was a discussion here a while ago
| that mentioned the use of Gen AI to create images of knitted /
| crocheted dolls. The images looked okay at a quick glance but
| were clearly fake because some of the colour changes weren't
| aligned with the position of the stitches. E.g. strands of
| supposed hair overlaid on the underlying stitched texture.
|
| I'm sure there are lots of similar examples in other forms of
| art where the appearance is physically impossible.
| krisoft wrote:
| > However an artist friend of mine said generated images are
| easy to spot because every pixel is "perfect".
|
| What does perfect mean? Does the pixels drawing 15 fingers
| count as "perfect"?
|
| I think this heuristic is liable to fail in both directions.
| You will find images made by humans where "every pixel is
| perfect" (whatever that means) and you will also find AI which
| does mimic whatever imperfection you are looking for.
| nottorp wrote:
| > What does perfect mean?
|
| Nothing out of focus or less detailed than other parts of the
| image...
| cthalupa wrote:
| Even the most cursory browsing of civitai or genai image
| generation subreddits shows this to not be true. Focus,
| bokeh, etc. are all things that can be generated by these
| models.
| Suppafly wrote:
| Your artist friend has deluded themselves with wishful
| thinking.
| jmmcd wrote:
| They love saying things like "generative AI doesn't know
| physics". But the constraint that both eyes should have
| consistent reflection patterns is just another statistical
| regularity that appears in real photographs. Better training,
| larger models, and larger datasets, will lead to models that
| capture this statistical regularity. So this "one weird trick"
| will disappear without any special measures.
| actionfromafar wrote:
| Wouldn't also the adverserial model training have to the take
| "physics correctness" into account? As long as the image
| detects as "<insert celebrity> in blue dress", why would it
| care about correct details in eyes if nothing in the "checker"
| cares about that?
| Filligree wrote:
| Current image generators don't use an adversarial model.
| Though the ones that do would have eventually encoded that as
| well; the details to look for aren't hard-coded.
| actionfromafar wrote:
| Interesting. Apparently, I have _much_ to learn.
| themoonisachees wrote:
| GP told you how they don't work, but not how they do:
|
| Current image generators work by training models to
| remove artificial noise added to the training set. Take
| an image, add some amount of noise, and feed it with it's
| description as inputs to your model. The closest the
| output is to the original image, the highest the reward
| function.
|
| Using some tricks (a big one is training simultaneously
| on large and small amounts of noise), you ultimately get
| a model that can remove 99% noise based only on the
| description you feed it, and that means you can just swap
| out the description for what you want the model to
| generate and feed it pure noise, and it'll do a good job.
| 101008 wrote:
| I read this description of the algorithm a few times and
| I find it fascinating because it's so simple to follow. I
| have a lot of questions, though, like "why does it
| work?", "why nobody thought of this before", and "where
| is the extra magical step that moves this from 'silly
| idea' to 'wonder work'"?
| ikari_pl wrote:
| answer to the 2nd and 3rd question is mostly "vastly more
| computing power available", especially the kind that CUDA
| introduced a few years back
| aitchnyu wrote:
| Did anybody prompt a GenAI to get this output?
| spywaregorilla wrote:
| It wouldn't work. The models could put stuff in the eyes but
| it wouldn't be able to do so realistically, consistently or
| even a fraction of the time. The text describing the images
| does not typically annotate tiny details like correct
| reflections in the eyes so prompting for it is useless.
| krembo wrote:
| Just like the 20 fingers disappeared
| zamadatix wrote:
| Those early diffusion generators sure managed to make the
| flesh monster in The Witcher look sane sometimes.
| sholladay wrote:
| Agreed, but the tricks are still useful.
|
| When there are no more tricks remaining, I think we must be
| pretty close to AGI.
| layer8 wrote:
| That still won't make them understand physics.
|
| This all reminds me of "fixing" mis-architected software by
| adding extra conditional code for every special case that is
| discovered to work incorrectly, instead of fixing the
| architecture (because no one understands it).
| tossandthrow wrote:
| This is more a comment to the word "understand" than
| "physics".
|
| Yes, the models output will converge to being congruent with
| laws of physics by virtue of deriving that as a latent
| variable.
| falcor84 wrote:
| >That still won't make them understand physics
|
| I would assume that larger models working with additional
| training data will eventually allow them to understand
| physics to the same extent as humans inspecting the world -
| i.e. to capture what we call Naive Physics [0]. But the limit
| isn't there; the next generation of GenAI could model the
| whole scene and then render it with ray tracing (no special
| casing needed).
|
| [0] https://en.wikipedia.org/wiki/Na%C3%AFve_physics
| layer8 wrote:
| There seems to be little basis for this assumption, as
| current models don't exhibit understanding. Understanding
| would allow to apply it to situations that don't match
| existing patterns in the training data.
| aj7 wrote:
| That's not large models "understanding physics." Better,
| giving output "statistically consistent" with real physical
| measurements. And no one, to my knowledge, has yet
| succeeded in a general AI app that reverts to a
| deterministic calculation in response to a prompt.
| paulmd wrote:
| chatgpt has had the ability to generate and call out to
| deterministic python scripts for a year now
| astrange wrote:
| > And no one, to my knowledge, has yet succeeded in a
| general AI app that reverts to a deterministic
| calculation in response to a prompt.
|
| They will all do this with a fixed seed. They just don't
| do that because nobody wants it.
| energy123 wrote:
| Maybe it will. It really depends whether it's "easier" for
| the network to learn an intuitive physics, versus a laundry
| list of superficial hacks that let it minimise loss all the
| same. If the list of hacks grows so long that gradient
| descent finds it easier to learn the actual physics, then
| it'll learn the physics.
|
| Hinton argues that the easiest way to minimise loss in next
| token prediction is to actually understand meaning. An
| analogous thing may hold true in vision modelling wrt
| physics.
| quonn wrote:
| > that gradient descent finds it easier to learn the actual
| physics, then it'll learn the physics.
|
| I guess it really depends on what the meaning of gradient
| decent learning the physics is.
|
| Maybe you define it to mean that the actually correct
| equations appear encoded in the computation of the net. But
| this would still be tacit knowledge. It would be kind of
| like a math software being aware of physics at best.
| Frieren wrote:
| > It really depends whether it's "easier" for the network
| to learn an intuitive physics, versus a laundry list of
| superficial hacks that let it minimise loss all the same.
|
| Human innate understanding of physics is a laundry list of
| superficial hacks. People needs education and mental effort
| to go beyond that innate but limited understanding.
| godelski wrote:
| When it is said that humans innately understand physics,
| no one means that people innately understand the
| equations and can solve physics problems. I think we all
| know how laughable such a claim would be, because how
| much people struggle when learning physics and how few
| people even get to a moderate level (not even Goldstein,
| but at least calculus based physics with partial
| derivatives).
|
| What people mean when saying people innately understand
| physics is that they have a working knowledge of many of
| the implications. Things like that gravity is uniformly
| applied from a single direction and that is the direction
| towards ground. That objects move in arcs or "ballistic
| trajectories", that straight lines are uncommon, that
| wires hang with hyperbolic function shapes even if they
| don't know that word, that snow is created from cold,
| that the sun creates heat, many lighting effects (which
| is how we also form many illusions), and so on.
|
| Essentially, humans know that things do not fall up. One
| could argue that this is based on a "laundry list of
| superficial hacks" and they wouldn't be wrong, but they
| also wouldn't be right. Even when wrong, the human
| formulations are (more often than not) causally
| formulated. That is, explainable _and_ rational (rational
| does not mean correct, but that it follows some logic.
| The logic doesn't need to be right. In fact, no logic is,
| just some are less wrong than others).
| abeppu wrote:
| If your entire existence was constrained to seeing 2d
| images, not of your choosing, _could_ a perplexity-
| optimizing process "learn the physics"?
|
| Basic things that are not accessible to such a learning
| process:
|
| - moving around to get a better view of a 3d object
|
| - see actual motion
|
| - measure the mass of an object participating in an
| interaction
|
| - set up an experiment and measure its outcomes
|
| - choose to look at a particular sample at a closer
| resolution (e.g. microscopy)
|
| - see what's out of frame from a given image
|
| I think we have at this point a lot of evidence that
| optimizing models to understand distributions of images is
| not the same thing as understanding the things in those
| images. In 2013 that was 'DeepDream' dog worms, in 2018
| that was "this person does not exist" portraits where
| people's garments or hair or jewelry fused together or
| merged with their background. In 2022 it was diffusion
| images of people with too many fingers, or whose hands
| melted together if you asked for people shaking hands. In
| the Sora announcement earlier this year it was a woman's
| jacket morphing while the shot zoomed into her face.
|
| I think in the same way that LLMs do better at some
| reasoning tasks by generating a program to produce the
| answer, I suspect models which are trained to generate 3D
| geometry and scenes, and run a simulation -> renderer ->
| style transfer process may end up being the better way to
| get to image models that "know" about physics.
| godelski wrote:
| > It really depends whether it's "easier" for the network
| to learn an intuitive physics, versus a laundry list of
| superficial hacks that let it minimise loss all the same
|
| The latter is always easier. Not to mention that the
| architectures are fundamentally curve fitters. There are
| many curves that can fit data, but not all curves are
| casually related to data. The history of physics itself is
| a history of becoming less wrong and many of the early
| attempts at problems (which you probably never learned
| about fwiw) were pretty hacky approximations.
|
| > Hinton argues
|
| Hinton is only partially correct. It entirely depends on
| the conditions of your optimization. If you're trying to
| generalize and understand causality, then yes, this is
| without a doubt true. But models don't train like this and
| most research is not pursuing these (still unknown)
| directions. So if we aren't conditioning our model on those
| aspects, then consider how many parameters they have (and
| aspects like superposition). Without a doubt the
| "superficial hacks" are a lot easier and will very likely
| lead to better predictions on the training data (and likely
| test data).
| noduerme wrote:
| Isn't that just what neural networks do? The way light falls
| on an object is physically deterministic, but the neural
| network in the brain of a human painter doesn't actually
| calculate rays to determine where highlights should be. A
| center fielder knows where to run to catch a fly ball without
| having to understand the physics acting on it. Similarly, we
| can spot things that look wrong, not because we're refering
| to physical math but because we have endless kludged-together
| rules that supercede other rules. Like: Heavy objects don't
| float. Except for boats which do float. Except for boats that
| are leaking, which don't. To then explain why something is
| happening we refer to specialized models, and these image
| generation models are too general for that, but there's no
| reason they couldn't refer to separate physical models to
| assist their output in the future.
| Sardtok wrote:
| Boats are mostly air by volume, which isn't heavy at all
| compared to water.
| bawolff wrote:
| > This all reminds me of "fixing" mis-architected software by
| adding extra conditional code for every special case that is
| discovered to work incorrectly...
|
| Isn't that what AI training is in general? It has worked
| pretty well so far.
|
| I dont think img-gen AI is ever going to "understand
| physics", but that isn't the task at hand. I don't think it
| is neccesary to understand physics to make good fake
| pictures. For that matter, i dont think understanding physics
| would even be a good approach to the fake picture problem.
| midtake wrote:
| Most "humans" don't understand physics to a Platonic level
| and act in much the same way as a model, finding best fits
| among a set of parameters that produce a result that fits
| some correctness check.
| wvenable wrote:
| > That still won't make them understand physics.
|
| They don't have to. They just have to understand what makes a
| realistic picture. The author of the article isn't really
| employing physics either; he's comparing the eyes to each
| other.
| Someone wrote:
| But we don't know how much larger the models will have to be,
| how large the data sets or how much trianing is needed, do we?
| They could have to be inconceivably large.
|
| If you want to correct for this particular problem you might be
| better off training a face detector, an eye detector and a
| model that takes two eyes as input and corrects for this
| problem. Process then would be:
|
| - generate image
|
| - detect faces
|
| - detect eyes in each face
|
| - correct reflections in eyes
|
| That is convoluted, though, and would get very convoluted when
| you want to correct for multiple such issues. It also might be
| problematic in handling faces with glass eyes, but you could
| try to 'detect' those with a model that is trained on the
| prompt.
| rocqua wrote:
| I feel like a GAN method might work better, building a
| detector, and training the model to defeat the detector.
| bastawhiz wrote:
| > They could have to be inconceivably large.
|
| The opposite might also be true. Just having better, well
| curated data goes a long way. LAION worked for a long time
| because it's huge, but what if all the garbage images were
| filtered out and the annotations were better?
|
| The early generations of image and video models used middling
| data because it was the only data. Since then, literally
| everyone with data has been working their butts off to get it
| cleaned up to make the next generation better.
|
| Better data, more intricate models, and improvements to the
| underlying infrastructure could mean these sorts of
| "improvements" come mostly "for free".
| amelius wrote:
| Shouldn't a GAN be able to use this fact immediately in its
| adversarial network?
| godelski wrote:
| Unfortunately no. The GAN always need to be in balance and
| contention with the generator. You can swap out the
| discriminator later, but you also got to make sure your
| discriminator is able to identify these errors. And ML models
| aren't the best at noticing small details. And since they too
| don't understand physics, there is no reason to believe that
| they will encode such information, despite every image in
| real life requiring consistency. Also remember that there is
| a learning trajectory, and most certainly these small details
| are not learned early on in networks. The problem is that
| this information is post hoc trivial to identify errors, but
| it isn't a priori. It is also easy for you because you know
| physics innately and can formulate causal explanations.
| johnsutor wrote:
| I know there are murmurs that synthetic data (i.e. using
| rendering software with 3D models) was used to train some
| generative models, including OpenAI Sora; seems like it's the
| only plausible way right now to get the insane amounts of data
| needed to capture such statistical regularities.
| sangnoir wrote:
| > Better training, larger models, and larger datasets, will
| lead to models that
|
| Hypothetically, with enough information, one could predict the
| future (barring truly random events like radioactive decay).
| Generative AI is also constrained by economic forces - how much
| are GenAI companies willing to invest to get eyeball
| reflections right? Would they earn adequate revenue to cover
| the increase in costs to justify that feature? There are plenty
| of things that humanity can technically achieve, that don't get
| done because the incentives are not aligned- for instance,
| there is enough food grown to feed every human on earth and the
| technology to transport it, and yet we have hunger,
| malnutrition and famines.
| stevenwalton wrote:
| > how much are GenAI companies willing to invest to get
| eyeball reflections right?
|
| Willing to? Probably not much. Should? A WHOLE LOT. It is the
| whole enchilada.
|
| While this might not seem like a big issue and truthfully
| most people don't notice, getting this right (consistently)
| requires getting a lot more right. It doesn't require the
| model knowing physics (because every training sample face
| will have realistic lighting). But what underlines this issue
| is the model understanding subtleties. No model to date
| accomplishes this. From image generators to language
| generators (LLMs). There is a pareto efficiency issue here
| too. Remember that it is magnitudes easier to get a model to
| be "80% correct" than to be "90% correct".
|
| But recall that the devil is in the details. We live in a
| complex world, and what that means is that the subtleties
| matter. The world is (mathematically) chaotic, so small
| things have big effects. You should start solving problems
| not worrying about these, but eventually you need to move
| into tackling these problems. If you don't, you'll just
| generate enshitification. In fact, I'd argue that the
| difference between an amateur and an expert is knowledge of
| subtleties and nuance. This is both why amateurs can trick
| themselves into thinking they're more expert than they are
| and why experts can recognize when talking to other experts
| (I remember a thread a while ago where many people were
| shocked about how most industries don't give tests or
| whiteboard problems when interviewing candidates and how
| hiring managers can identify good hires from bad ones).
| dwaltrip wrote:
| Getting the eyeballs correct will correlate with other very
| useful improvements.
|
| They won't train a better model just for that reason. It will
| just happen along the way as they seek to broadly improve
| performance and usefulness.
| rowanG077 wrote:
| Yeah every person is constantly predicting the future, often
| even scarely accurately. I don't see how this is a hot take
| at all.
| bastawhiz wrote:
| > how much are GenAI companies willing to invest to get
| eyeball reflections right
|
| This isn't how it works. As the models are improved, they
| learn more about reality largely on their own. Except for
| glaringly obvious problems (like hands, deformed limbs, etc)
| the improvements are really just giving the models techniques
| for more accurately replicating features from reasoning data.
| There's nobody that's like "today we're working on
| fingernails" or "today we're making hair physics work
| better": it's about making the model understand and replicate
| the features already present in the training dataset.
| sangnoir wrote:
| > This isn't how it works. As the models are improved, they
| learn more about reality largely on their own.
|
| AI models aren't complete blackboxes to the people who
| develop them: there is careful thought behind the
| architecture, dataset selection and model evaluation.
| Assuming that you can take an existing model and simply
| throw more compute at it will automatically result in
| higher fidelity illumination modeling takes almost
| religious levels of faith. If moar hardware is all you
| need, Nvidia would have the best models in every category
| right now. Perhaps someone ought to write the sequel to
| Fred Brooks' book amd name it "The Mythical GPU-Cluster-
| Month".
|
| FWIW, Google has AI-based illumination adjustable in Google
| Photos where one can add virtual lights - specialized
| models already exist, but I'm very cynical about a generic
| mixed model incidentally gaining those capabilities without
| specific training for it. When dealing with exponential
| requirements (training data, training time, GPUs, model
| weight size), you'll run out of resources in short order.
| kaba0 wrote:
| I'm far from an expert on this, but these are often trained
| in conjunction with a model that recognizes deep fakes.
| Improving one will improve the other, and it's an infinite
| recursion.
| stevenwalton wrote:
| > But the constraint that both eyes should have consistent
| reflection patterns is just another statistical regularity that
| appears in real photographs
|
| Hi, author here of a model that does really good on this[0]. My
| model is SOTA and has undergone a third party user study that
| shows it generates convincing images of faces[1]. AND my
| undergrad is in physics. I'm not saying this to brag, I'm
| giving my credentials. That I have deep knowledge in both
| generating realistic human faces and in physics. I've seen
| hundreds of thousands of generated faces from many different
| models and architectures.
|
| I can assure you, these models don't know physics. What you're
| seeing is the result of attention. Go ahead and skip the front
| matter in my paper and go look at the appendix where I show
| attention maps and go through artifacts.
|
| Yes, the work is GANs, but the same principles apply to
| diffusion models. Just diffusion models are typically MUCH
| bigger and have way more training data (sure, I had access to
| an A100 node at the time, but even one node makes you GPU poor
| these days. So best to explore on GANs ):
|
| I'll point out flaws in images in my paper, but remember that
| these fool people and you're now primed to see errors, and if
| you continue reading you'll be even further informed. In
| Figures 8-10 you can see the "stars" that the article talks
| about. You'll see mine does a lot better. But the artifact
| exists in all images. You can also see these errors in all of
| the images in the header, but they are much harder to see. But
| I did embed the images as large as I could into the paper, so
| you can zoom in quite a bit.
|
| Now there are ways to detect deep fakes pretty readily, but it
| does take an expert eye. These aren't the days of StyleGAN-2
| where monsters are common (well... at least on GANs and
| diffusion is getting there). Each model and architecture has a
| different unique signature but there are key things that you
| can look for if you want to get better at this. Here's things
| that I look for, and I've used these to identify real world
| fake profiles and you will see them across Twitter and
| elsewhere:
|
| - Eyes: Eyes are complex in humans with lots of texture. Look
| for "stars" (inconsistent lighting), pupil dilation, pupil
| shape, heterochromia (can be subtle see Figure 2, last row,
| column 2 for example), and the texture of the iris. And also
| make sure to look at the edge of eyes (Figs 8-10) and
|
| - Glasses: look for aberrations, inconsistent
| lighting/reflections, and pay very close attention to the edges
| where new textures can be created
|
| - Necks: These are just never right. The skin wrinkles, shape,
| angles, etc
|
| - Ears: These always lose detail (as seen in TFA and my paper),
| lose symmetry in shape, are often not lit correctly, if there
| are earrings then watch for the same things too (see TFA).
|
| - Hair: Dear fucking god, it is always the hair. But I think
| most people might not notice this at first. If you're having
| trouble, start by looking at the strands. Start with Figure 8.
| Patches are weird, color changes, texture, direction, and more.
| Then try Fig 9 and TFA.
|
| - Backgrounds: I make a joke that the best indicator to
| determine if you have a good quality image is how much it looks
| like a LinkedIn headshot. I have yet to see a generated photo
| that has things happening in the background that do not have
| errors. Both long-range and local. Look at my header image with
| care and look at the bottom image in row 2 (which is pretty
| good but has errors), row 2 column 4, and even row 1 in column
| 4's shadow doesn't make sense.
|
| - Phase Artifacts: This one is discussed back in StyleGAN2
| paper (Fig 6). These are still common today.
|
| - Skin texture: Without fail, unrealistic textures are created
| on faces. These are hard to use in the wild though because
| you're typically seeing a compressed image and that creates
| artifacts too and you frequently need to zoom to see. They can
| be more apparent with post processing though.
|
| There's more, but all of these are a result of models not
| knowing physics. If you are just scrolling through Twitter you
| won't notice many of these issues. But if you slow down and
| study an image, they become apparent. If you practice looking,
| you'll quickly learn to find the errors with little effort. I
| can be more specific about model differences but this comment
| is already too long. I can also go into detail about how we
| can't determine these errors from our metrics, but that's a
| whole other lengthy comment.
|
| [0] https://arxiv.org/abs/2211.05770
|
| [1] https://arxiv.org/abs/2306.04675
| ken47 wrote:
| > So this "one weird trick" will disappear without any special
| measures.
|
| > Better training, larger models, and larger datasets
|
| But "better training" here is a special measure. It would take
| a lot of training effort to defeat this check. For example,
| you'd need a program or group of people who would be able to
| label training data as realistic/not based on the laws of
| physics as reflected in subjects' eyeballs.
| dheera wrote:
| Exactly. Notably, in my experiments, diffusion models based on
| U-Nets (e.g. SD1.4, SD2) are worse at capturing "correlations
| at a distance" like this in comparison to newer, DiT-based
| methods (e.g. SD3, PixArt).
| gyosko wrote:
| It seems that even discussion about AI is getting really
| polarized like everything else these days.
|
| Comments are always one of these two types:
|
| 1 -> AI is awesome and perfect, if it isn't, another AI will make
| it perfect 2 -> AI is just garbage and will always be garbage
| EGreg wrote:
| 1 also says "anything bad that AI does was already bad before
| AI and you just didnt care, scale is irrelevant".
| ben_w wrote:
| I have seen those comments; but I do wonder, to what extent
| that is because the comments' authors intended such positions
| vs. the subtlety and nuance are hard to write and easy to
| overlook when reading? (Ironically, humans are more boolean
| than LLMs, the word "nuance" itself seems a bit like ChatGPT's
| voice).
|
| I'm sure people place me closer to #1 than I actually feel,
| simply because I'm _more often_ responding to people who seem
| to be too far in the #2 direction than vice versa.
| digging wrote:
| Your comment seems pretty accurate because, from my
| perspective, I've _never_ seen comments of type #1. And so,
| despite me explicitly saying otherwise, people like the GP
| commenter may be reading my comments as #1.
| LegionMammal978 wrote:
| Even within this thread,
| https://news.ycombinator.com/item?id=41005386,
| https://news.ycombinator.com/item?id=41005633,
| https://news.ycombinator.com/item?id=41010124, and to a
| lesser extent https://news.ycombinator.com/item?id=41005240
| seem like #1 to my eyes, with the sentiment of "It is
| detectable, therefore it will be easily corrected by near-
| future AI." Do you read these differently?
| ben_w wrote:
| Of these four:
|
| The first ('''So this "one weird trick" will disappear
| without any special measures''' etc.) does not seem so to
| me, I do not read that as a claim of perfection, merely a
| projection of the trends already seen.
|
| The second ('''If the computer can see it we have a
| discriminator than we can use in a GAN-like fashion to
| train the network not to make that mistake again.''') I
| agree with you, that's overstating what GANs can do.
| They're good, they're not _that_ good.
|
| The third ('''Once you highlight any inconsistency in AI-
| generated content, IMHO, it will take a nothingth of a
| second to "fix" that.''') I'd lean towards agreeing with
| you, that seems to understate the challenges involved.
|
| The fourth ('''Well, nice find, but now all the fakes
| have to do is add a new layer of AI that knows how to fix
| the eyes.''') is technically correct, but contrary to the
| meme this is not the best kind of correct, and again it's
| downplaying the challenge same as the previous (but it is
| unclear to me if this is because nuance is hard to write
| and to read or the genuine position). Also, once you're
| primed to look for people who underestimate the
| difficulties I can easily see why you would see it as
| such an example as it's close enough to be ambiguous.
| tivert wrote:
| > 1 -> AI is awesome and perfect, if it isn't, another AI will
| make it perfect 2 -> AI is just garbage and will always be
| garbage
|
| 3 -> An awesome AI will actually predictably be a deep negative
| for nearly all people (for much more mundane reasons than the
| Terminator-genocide-cliche), so the progress is to be dreaded
| and the garbage-ness hoped for.
|
| Your 1 is warmed over techno-optimism, which is far past its
| sell-by date but foundational to the tech entrepreneurship
| space. Your 2 greatly underestimates what tech people can
| deliver.
| keybored wrote:
| Your comment is polarized.
|
| Plenty of people think AI is useful (and equally as dangerous).
| Only useful, not redefines-everything. "I use AI as an
| assistant" is a common sentiment.
| ectospheno wrote:
| I'm in the AI is very useful but horribly named camp. It is all
| A and no I.
| rolph wrote:
| 3 -> AI is still a technical concept, and does not yet exist.
| bawolff wrote:
| I think its because at this point there is nothing else
| interesting to say. We've all seen AI generated images that
| look impressively real. We've also all seen artifacts proving
| they aren't perfect. None of this is really new at this point.
| surfingdino wrote:
| Nobody has given me a good reason to use it or proof that what
| it does is more than recombining what it hoovers up, so... I'm
| in the second camp.
| wvenable wrote:
| You could just... try it. It's very impressive what it can
| do. It's not some catch-all solution to everything but it
| saves me hours of time every week. Some of the things it can
| do are really quite amazing; my real-life example:
|
| I took a picture of my son's grade 9 math homework worksheet
| and asked ChatGPT to tell me which questions he got wrong. It
| did that perfectly.
|
| But I use for the more mundane stuff like "From this long
| class definition, can you create a list of assignments for
| each property that look this: object1.propertyName =
| object2.propertyName" and poof.
| Y_Y wrote:
| If you can see the difference than so can the computer. If the
| computer can see it we have a discriminator than we can use in a
| GAN-like fashion to train the network not to make that mistake
| again.
| raisedbyninjas wrote:
| The sample images don't show a large difference between the real
| and generated photo. The light sources in the real photo must
| have been pretty close to the subject.
| HumblyTossed wrote:
| Isn't it easier to simply look for all the 6 fingered hands?
| bitwize wrote:
| Won't work on a deepfake of Count Rugen, for instance.
| Suppafly wrote:
| They've already mostly fixed the extra fingers and weird hands
| in general issue.
| olivierduval wrote:
| Warning: photoshopped portraits (and most of pro portraits ARE
| photoshopped, even slightly) may add "catch lights" in the eyes,
| to make the portrait more "alive"
|
| So that kind of "clues" only shows that the picture has been
| processed, not that the people on the picture doesn't exists or
| is a deepfake
| acomjean wrote:
| When I shot events years ago, I always used a flash for fill,
| even outdoors. People like the glint in the eyes that it added.
|
| Before the photoshop times you could sus out lighting setups
| based on the reflections.
| radicality wrote:
| And the non-professional pictures, like your everyday
| smartphone pictures everyone takes, pass through so many layers
| of computational photography that sometimes make it pretty far
| off from reality.
| SXX wrote:
| I wonder how true this is for face swap. Since actual scammers
| likely wouldn't generate deepfakes completely from scratch or
| static image.
| symisc_devel wrote:
| Well, they are relatively easy to spot with the current AI
| software used to generate them especially if you are dealing on a
| daily basis with presentation attacks aka deepfakes for facial
| recognition. FACEIO has already deployed a very powerful model to
| deter such attacks for the purpose of facial authentication:
| https://faceio.net/security-best-practice#faceSpoof
| neom wrote:
| Random thought: GCHQ and IDF specifically seek out dyslexic
| employees to put on spotting "things out of place" be it a issue
| in a large amount of data, or something that seems wrong on a
| map, to a picture that contains something impossible in physics.
| Something about dyslexic processing provides an advantage here
| (not sure if I'd take this or reading at 1 word per hour), given
| GPTs are just NNs, I wonder if there is any "dyslexic specific"
| neurology you could build a NN around and apply it to problems
| neurodivergent minds are good at? Not sure what I'm really saying
| here as I only have armchair knowledge.
| singingwolfboy wrote:
| https://archive.ph/pDf1x
| keybored wrote:
| > In an era when the creation of artificial intelligence (AI)
| images is at the fingertips of the masses, the ability to detect
| fake pictures - particularly deepfakes of people - is becoming
| increasingly important.
|
| The masses having access to things wasn't a cutoff point for me.
| Nullinker wrote:
| I would actually argue that once the masses are aware about
| certain technology existing and being in widespread use it
| becomes much easier to convince someone that a particular
| instance that the data is not trustworthy, so the ability to
| detect it through technological means becomes less important.
|
| In the stage before widespread use people are much more easily
| tricked because they are unaware that others have certain
| capabilities which they never experienced first hand.
| nick238 wrote:
| You're missing the flip side: falsely believing something is
| forged.
|
| Now that the technology is so accessible and widespread,
| someone could deny truth by saying whatever audio/visual
| evidence was deepfaked, and people will believe it.
| throw4847285 wrote:
| Also be on the look out for high flyin' clouds and people dancin'
| on a string.
| RobotToaster wrote:
| Interesting, some portrait photographers use cross polarised
| light to eliminate reflection from glasses, but it has the side
| effect of eliminating reflection from eyes.
| adwi wrote:
| > The Gini coefficient is normally used to measure how the light
| in an image of a galaxy is distributed among its pixels. This
| measurement is made by ordering the pixels that make up the image
| of a galaxy in ascending order by flux and then comparing the
| result to what would be expected from a perfectly even flux
| distribution.
|
| Interesting, I'd only heard of the Gini coefficient as an
| econometric measure of income inequality.
|
| https://en.m.wikipedia.org/wiki/Gini_coefficient
| ploika wrote:
| Some decision tree algorithms use it to decide what variable to
| split on when creating new branches.
| buildsjets wrote:
| Also found it interesting, but for it's technical merits, as I
| recently had to glue some code together to analyze/compare
| droplet size from still frames of a high speed video of a
| pressurized nozzle spraying a flammable fluid. (into a fire!
| neat! fire! FIRE!)
|
| This approach might have been useful to try. I ended up finding
| a way to use ImageJ, an open source tool published by the NIH
| that biologists use to automatically count bacterial colony-
| forming units growing on petri dishes, but it was very slow and
| hacky. It was not perfect, but it gave an objective way to
| quantify information from a large body of existing test data
| with zero budget. https://en.wikipedia.org/wiki/ImageJ
| GaggiX wrote:
| Did they try using this method on something that is not StyleGAN?
| butlike wrote:
| I don't understand the "galaxy" terminology in the sentence: "To
| measure the shapes of galaxies, we analyse whether they're
| centrally compact, whether they're symmetric, and how smooth they
| are"
|
| Can someone explain?
| meatmanek wrote:
| Given that this is from the Royal Astronomical Society, I think
| they're literally talking about galaxies. They're then using
| these same scoring functions to characterize the reflections on
| the subjects' eyes, and comparing the result for the two eyes
| -- real photos should have similar values, generated images
| have more variation between the two eyes.
| AlbertCory wrote:
| I took a film lighting class a long, long time ago at a community
| college. Even then, you could look at a closeup and tell where
| the lights were by the reflections in the eyes.
| crazygringo wrote:
| I don't know, the example photos of deepfakes here seem... pretty
| good. If that's the worst they could find, then this doesn't seem
| useful at all.
|
| Even in the real photos, you can see that the reflections are
| different in both position and shape, because the two eyeballs
| aren't perfectly aligned and reflections are going to be
| genuinely different.
|
| And then when you look at the actual "reflections" their software
| is supposedly detecting (highlighted in green and blue) and you
| compare with the actual photo, their software is doing a
| _terrible_ job detecting reflections in the first place --
| missing some, and spuriously adding others that don 't exist.
|
| Maybe this is a valuable tool for spotting deepfakes, but this
| webpage is doing a _terrible_ job at convincing me of that.
|
| (Not to mention that reflections like these are often added in
| Photoshop for professional photography, which might have similar
| subtle positioning errors, and training on those photos
| reproduces them. So then this wouldn't tell you at all that it's
| an AI photo -- it might just be a real photo that someone
| photoshopped reflections into.)
| ch33zer wrote:
| I suspect that detecting ai generated content will becomes an
| arms race just like spam filtering and seo. Business will be
| built on using secret ml models detecting smaller and smaller
| irregularities in images and text. It'll be interesting to see
| who wins
| notorandit wrote:
| Once you highlight any inconsistency in AI-generated content,
| IMHO, it will take a nothingth of a second to "fix" that.
| grvbck wrote:
| Am I missing something here, or are the authors incorrectly using
| the term "deepfake" where "AI-generated" would have been more
| appropriate?
|
| There's a lot of comments here discussing how generative AI will
| deal with this, which is really interesting.
|
| But if somebody's actual goal was to pass off a doctored/AI-
| generated image as authentic, it would be very easy to just
| correct the eye reflection (and other flaws) manually, no?
| threatripper wrote:
| I really wonder where the limit is for AI. Reality has an
| incredible amount of detail that you can't just simulate or
| emulate entirely. However, our perception is limited, and we
| can't process all those details. AI only has to be good enough to
| fool our perception, and I'm confident that every human-
| understandable method for identifying fakes can be fooled by
| generative AI. It will probably be up to AI to identify AI-
| generated content. Even then, noise and limited resolution will
| mask the flaws. For many forms of content, there will simply be
| no way to determine what's real.
___________________________________________________________________
(page generated 2024-07-19 23:05 UTC)