[HN Gopher] Drag Your GAN: Interactive Point-Based Manipulation ...
       ___________________________________________________________________
        
       Drag Your GAN: Interactive Point-Based Manipulation of Images
        
       Author : waqasy
       Score  : 104 points
       Date   : 2023-05-19 06:58 UTC (16 hours ago)
        
 (HTM) web link (vcai.mpi-inf.mpg.de)
 (TXT) w3m dump (vcai.mpi-inf.mpg.de)
        
       | Scene_Cast2 wrote:
       | Neat concept. I wonder if something like that can be applied to
       | diffusion models (the new kid on the block that is outshining
       | GANs right now) - especially since the technique doesn't seem to
       | be too dependent on the generative image implementation.
       | 
       | Also, it's interesting that they're submitting to SIGGRAPH - kind
       | of expected this to be in a more ML-ish conference.
        
         | cubefox wrote:
         | Probably goes to show where SIGGRAPH is headed.
        
       | mft_ wrote:
       | The main link you provided is either not loading, or loading with
       | missing video links, for me - maybe hugged to death at the
       | moment?
       | 
       | Github may be more resilient:
       | https://github.com/XingangPan/DragGAN
        
       | t3estabc wrote:
       | [dead]
        
       | nine_k wrote:
       | Hello, post-truth world!
       | 
       | More seriously, I think that digital photos, and especially low-
       | res surveillance camera coverage, will soon be inadmissible in
       | any reasonable court, because tools like this would allow to
       | forge such evidence in very natural-looking ways.
        
         | grumbel wrote:
         | Being able to fake something really doesn't matter all that
         | much, as you'd still need to get that fake video into the
         | surveillance camera system and you need to do so in the time
         | between committing the crime and the police arriving, and
         | without leaving a trace and hoping that whoever you try to
         | incriminate doesn't have an alibi.
         | 
         | Fakes will be relevant for Twitter, TikTok and Co., where
         | random videos are posted and distributed without sources,
         | heavily edited and compressed, such that it is impossible to
         | tell if that video ever started out as a real video or a fake.
         | But in court the whole thing starts to fall apart the moment
         | they ask where that video came from.
        
         | imranq wrote:
         | There are probably ways to embed cryptographic hashes within
         | images. Any device that creates images from the real world
         | could have secret keys that can be used to validate any image
         | created by said device.
         | 
         | We will still need a centralized party that holds the secret
         | keys for validation through
        
           | vhcr wrote:
           | There's no way this would work, either the master encryption
           | key would be leaked, or someone would reverse-engineer the
           | chip.
           | 
           | Also, what about someone putting a screen just in front of
           | the sensor?
        
             | nine_k wrote:
             | No need for that.
             | 
             | A private key is generated on device and never leaves it.
             | It sits inside a TPM or equivalent.
             | 
             | The public key is pushed to a well-known site, visible to
             | all.
             | 
             | Every shot is _signed_ by hashing the bits into a
             | reasonably short string (say, using sha512) and then
             | encrypted with the private key.
             | 
             | Anyone can now decrypt the hash, and compare it with the
             | hash they computed from the bits.
             | 
             | The problem, of course, is that any transformation
             | whatsoever breaks the signature. You can't adjust levels
             | and contrast, you can't even crop. Maybe it's a good
             | property.
        
           | nine_k wrote:
           | What shall we do with millions upon millions of existing
           | mobile phones, and also surveillance cameras and dashcams?
           | 
           | Some of them possibly could be updated, but this will take
           | time. Securing the keys within them is also going to be a
           | problem; not all of them have a TPM.
        
           | ChainReaktion wrote:
           | This is the right approach, but there's lots of complexity
           | around transcoding. In the courtroom that's less of an issue
           | if you can get the original unmodified outputs, but broader
           | applications need to think through what it means to be
           | "verified"
        
           | bick_nyers wrote:
           | What about the recording device itself? Refeed a
           | video/frame/hash back to a security camera and it tells you
           | if it was originally sourced from that specific camera or
           | not.
        
             | foota wrote:
             | The sci-fi dystopian answer would be entangled photon
             | lights and off site image recording that preserves
             | entanglement :-)
        
           | smrtinsert wrote:
           | You don't need a centralized database. You have 3 companies 3
           | different hashes. They catch bob stealing, on 3 different
           | cameras. Each footage can be independently verified as
           | authentic and not doctored. This would prevent anonymous
           | found footage suggesting someone committed a crime.
           | 
           | You definitely don't want one single leakable entity.
        
         | politelemon wrote:
         | Seeing how slowly laws move, the cynic says it's: _should_ be
         | inadmissible in any reasonable court, but will continue to be
         | admissible, and will take a major set of incidents for changes
         | to be enacted across many countries.
        
           | nine_k wrote:
           | Yes. The story if admissable DNA evidence is instructive and
           | terrifying.
           | 
           | https://daily.jstor.org/forensic-dna-evidence-can-lead-
           | wrong...
        
       | krunck wrote:
       | I'm thinking that soon video and images are going to be just dead
       | weight in journalism, adding nothing other than decoration.
        
         | radarsat1 wrote:
         | It seems to be a pretty common thing now on news network
         | websites, that at the top of the story, or perhaps somewhere in
         | the middle, there is a video. But, you click on the video, and
         | it's something entirely unrelated to the article. I feel like
         | this as been going on for quite a while, nothing to do with
         | synthetic media, but just an observation of an annoying pattern
         | I've noticed.
        
         | chpatrick wrote:
         | I think there has to be some kind of cryptographic signature
         | solution, like "The BBC verifies that this image is authentic".
        
           | mrshadowgoose wrote:
           | We already have most of the required cryptographic primitives
           | for this. PKI, trusted timestamping, secure hardware with
           | remote attestation are some of the necessary building blocks.
           | All that's really missing are camera sensors with built-in
           | cryptography.
           | 
           | And societal care. Our society seems to really like to whine
           | about "the danger of deepfake images", but our actions reveal
           | that we don't really give a crap, as we could solve this
           | problem today if we really wanted to.
        
         | esafak wrote:
         | Stock photos are already like that. To me they're worse than
         | having no pictures at all. Generated images are better than
         | nothing because you can make them very specific.
        
       | loandbehold wrote:
       | What do we need human actors for at this point? Everything can be
       | generated by AI now.
        
         | the_af wrote:
         | > _What do we need human actors for at this point? Everything
         | can be generated by AI now._
         | 
         | For blockbusters? If the tech still isn't there, it may be
         | soon, and we won't need actors.
         | 
         | For cinema where we care it was made by humans, for humans?
         | Actors will always be needed. Also, theater still exists and
         | people enjoy it.
        
         | kleer001 wrote:
         | Not even close by several orders of magnitude across two dozen
         | disciplines. But yea, we're heading there.
        
         | u385639 wrote:
         | Please try to make an original movie that meets the standard
         | of, say, The Godfather, with AI.
        
           | yamazakiwi wrote:
           | I understand your point but I think it would be easier with
           | AI than without. Many movies are not made to the standard of
           | The Godfather because they don't sell like MCU Movie #53 and
           | if you include more humans in the creation you're more likely
           | to run into the current system's restrictions.
           | 
           | Making a movie as beloved as the Godfather would still be
           | challenging of course.
        
             | esafak wrote:
             | As long as they don't suck the air out of funding for real
             | movies I can live with it, but I'd still be sad that people
             | are being trained to like auto-generated junk. Like how
             | people are losing their ability to concentrate on long-form
             | content due to overexposure to addictive short-form
             | content.
        
             | u385639 wrote:
             | Of course it would be easier. I agree. I just take issue
             | with the "why humans" thing because if anything, the recent
             | advancements highlight just how big the human element
             | really is.
             | 
             | Can you imitate a Bach prelude? Sure. And only people who
             | aren't actually familiar with his music would be impressed.
             | 
             | Much of AI approaching "human performance", is it
             | approaching the lowest bar. There's a Wittgenstein thing
             | going on here. That an LLM can ace the LSAT or GMAT is
             | mostly an indictment of those tests.
             | 
             | A little off topic.
        
               | og_kalu wrote:
               | >That an LLM can ace the LSAT or GMAT is mostly an
               | indictment of those tests.
               | 
               | These kind of comments are always the funniest. You can
               | just tell the person who makes them has never looked at
               | those tests nevermind attempted them.
        
               | u385639 wrote:
               | I scored 159 on the LSAT in 2014, so I am not claiming
               | the tests are easy. I am pointing out that when an AI
               | aces them, it says more about the test than anything
               | else.
        
           | pmoriarty wrote:
           | Please try to make an original movie that meets the standard
           | of The Godfather, with or without AI.
        
       | johndough wrote:
       | Project website mirror
       | https://web.archive.org/web/20230519060439/https://vcai.mpi-...
       | 
       | GitHub (no code yet, only demo GIF)
       | https://github.com/XingangPan/DragGAN
       | 
       | arXiv https://arxiv.org/abs/2305.10973
        
       | ArekDymalski wrote:
       | As a technology this is tremendously impressive, straight out of
       | SF movie. However I wonder how it will impact our culture,
       | fashion, standards of beauty etc. when more and more artists will
       | be accepting the generated output then creating their own. Just
       | like in case of music the invention of MIDI, synths and
       | sequencers brought new styles but also boring and imagination-
       | numbing standardization.
        
       | ortusdux wrote:
       | Demo video:
       | https://twitter.com/_akhaliq/status/1659424744490377217
        
         | lt wrote:
         | Longer video with more examples from one of the paper authors:
         | 
         | https://twitter.com/XingangP/status/1659483374174584832
        
           | amelius wrote:
           | Looks like it can't keep the background stable, so I guess
           | this is not suitable for animations.
        
       | Zetobal wrote:
       | The only thing that's new is the interactive interface the rest
       | of it is old tech... You can use it on art breeder.com. Photoshop
       | even has it in their face neural filter. GANs are not feasible
       | for a variety of reasons you need to have models specific to your
       | subject ie. why they change to a elephant model to manipulate the
       | elephant.they are also not style agnostic but it's a great demo
       | and the right time to release it. Just before the summit of the
       | hype curve I bet one VC is dumb enough to throw millions at them.
        
         | chatmasta wrote:
         | With ChatGPT, the only thing new was the chat interface. In
         | fact even Sam Altman mentioned this on Lex Fridman's podcast,
         | IIRC - he said what he was most surprised about was the
         | outsized effect the interface had on bringing LLM to the
         | forefront of public consciousness, despite the existing
         | maturity of the underlying GPT models. At least in that case it
         | was OpenAI adding interactivity to its own existing models. But
         | similarly, from a more holistic viewpoint, OpenAI productized
         | existing research from Google. Transformer models were "old
         | tech" since Google published "Attention is all you need" in
         | 2017... and yet, when OpenAI managed to turn it into a usable
         | product, suddenly they became the first movers and the company
         | to beat. So I'm not convinced that only a "dumb" investor would
         | fund an effort with a proven ability to productize "old tech."
        
           | Zetobal wrote:
           | The I don't understand the technology but will ramble about
           | stuff until they just give up reading the comment approach
           | -\\_(tsu)_/-
        
             | chatmasta wrote:
             | Are you referring to my comment? I'm certainly no expert on
             | AI, and if I'm misunderstanding the technology I'd like to
             | know. What is wrong about what I wrote?
        
               | Zetobal wrote:
               | Yes, I am referring to your comment and I am not going to
               | explain why everyone and their mother jumped ship from
               | GANs and went all in on transformers. Well, there is
               | still the Alan Turing Institute in the UK but even they
               | gave up and are into NFTs now :D
        
         | npunt wrote:
         | This reads a lot like 'dropbox is trivial, rsync already
         | exists'
        
           | Zetobal wrote:
           | [flagged]
        
       | waqasy wrote:
       | DragGAN consists of two main components including: 1) a feature-
       | based motion supervision that drives the handle point to move
       | towards the target position, and 2) a new point tracking approach
       | that leverages the discriminative GAN features to keep localizing
       | the position of the handle points.
        
       | sroussey wrote:
       | Would love to see this for architecture!
        
       | bbminner wrote:
       | There's also an older work called PuppetGAN
       | http://ai.bu.edu/puppetgan/
        
         | Qweiuu wrote:
         | It's similar but different.
         | 
         | Your paper makes an existing body a puppet.
         | 
         | The other one adjusts features.
        
       | vagabund wrote:
       | The semantic understanding feels much richer than diffusion based
       | modeling, e.g. the trees on the shore growing to match the
       | manipulated reflection, the sun changing shape as it's moved up
       | on the horizon, the horse's leg following proper biomechanics as
       | its position is changed. I haven't gotten such a cohesive world
       | model when doing text-guided in-painting with stable diffusion
       | etc. This feels like it could very conceivably be guided by an
       | animation rig with temporally consistent results.
        
         | orbital-decay wrote:
         | Temporal consistency for a guided scene is a separate problem.
         | It's been kind of solved a couple years ago. [0] It can be used
         | with animation rigs and simplistic tagged geometries, and it
         | even works in near real-time. "Kind of" because training a
         | model from scratch from a large dataset is not something you
         | want to do for the actual job; what you want is a good style
         | transfer mechanism that can extract features from as few
         | references as possible.
         | 
         | [0] https://isl-org.github.io/PhotorealismEnhancement/
        
       | tikkun wrote:
       | Has anyone built the "online photoshop that incorporates all of
       | the latest AI image editing tools asap and sells access as a
       | premium subscription with lots of GPU access for smooth editing"
       | business yet? I'd be curious to know.
        
         | apodolny wrote:
         | Playground AI (https://playgroundai.com/) does a lot of this.
        
           | echelon wrote:
           | There are a million of these. It's a super crowded space.
           | 
           | https://civitai.com/
           | 
           | https://lexica.art/
           | 
           | https://openart.ai/
           | 
           | (Many more)
        
         | ftufek wrote:
         | I think less effort has gone into image editing compared to
         | image generation so far. That said, we're building some photo
         | realistic image editing tools at https://www.faceshape.com,
         | focused on face editing for now. Current models don't perform
         | as well, but next generation currently under training will.
         | 
         | I'm always curious to know what kind of AI image editing people
         | are interested in, can you share what kind of edits you'd like
         | to do? There's the usual edits like background removal or
         | object removal, but those are more general tools that are
         | getting incorporated into lots of apps natively (say Google
         | Photos).
        
         | jahewson wrote:
         | Adobe Firefly already did it
         | https://www.adobe.com/sensei/generative-ai/firefly.html
        
           | echelon wrote:
           | It's trained on their stock art and under-performs Stable
           | Diffusion and Midjourney.
           | 
           | It's really poor, comparatively.
        
             | cubefox wrote:
             | It makes way fewer visual mistakes (like wrong number of
             | limbs) than Stable Diffusion, or even Bing Dall-E ~3. The
             | latter is still the best at understanding your prompt
             | though.
        
           | Giorgi wrote:
           | Adobe AI is a crap compared to Midjourney
        
             | belter wrote:
             | Because one is on proper stock art and the other on
             | anything without asking the creators for authorization?
             | 
             | "AI art tools Stable Diffusion and Midjourney targeted with
             | copyright lawsuit" -
             | https://www.theverge.com/2023/1/16/23557098/generative-ai-
             | ar...
        
         | ultra_nick wrote:
         | Isn't that stability.ai's business model?
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-05-19 23:02 UTC)