[HN Gopher] Drag Your GAN: Interactive Point-Based Manipulation ...
___________________________________________________________________
Drag Your GAN: Interactive Point-Based Manipulation of Images
Author : waqasy
Score : 104 points
Date : 2023-05-19 06:58 UTC (16 hours ago)
(HTM) web link (vcai.mpi-inf.mpg.de)
(TXT) w3m dump (vcai.mpi-inf.mpg.de)
| Scene_Cast2 wrote:
| Neat concept. I wonder if something like that can be applied to
| diffusion models (the new kid on the block that is outshining
| GANs right now) - especially since the technique doesn't seem to
| be too dependent on the generative image implementation.
|
| Also, it's interesting that they're submitting to SIGGRAPH - kind
| of expected this to be in a more ML-ish conference.
| cubefox wrote:
| Probably goes to show where SIGGRAPH is headed.
| mft_ wrote:
| The main link you provided is either not loading, or loading with
| missing video links, for me - maybe hugged to death at the
| moment?
|
| Github may be more resilient:
| https://github.com/XingangPan/DragGAN
| t3estabc wrote:
| [dead]
| nine_k wrote:
| Hello, post-truth world!
|
| More seriously, I think that digital photos, and especially low-
| res surveillance camera coverage, will soon be inadmissible in
| any reasonable court, because tools like this would allow to
| forge such evidence in very natural-looking ways.
| grumbel wrote:
| Being able to fake something really doesn't matter all that
| much, as you'd still need to get that fake video into the
| surveillance camera system and you need to do so in the time
| between committing the crime and the police arriving, and
| without leaving a trace and hoping that whoever you try to
| incriminate doesn't have an alibi.
|
| Fakes will be relevant for Twitter, TikTok and Co., where
| random videos are posted and distributed without sources,
| heavily edited and compressed, such that it is impossible to
| tell if that video ever started out as a real video or a fake.
| But in court the whole thing starts to fall apart the moment
| they ask where that video came from.
| imranq wrote:
| There are probably ways to embed cryptographic hashes within
| images. Any device that creates images from the real world
| could have secret keys that can be used to validate any image
| created by said device.
|
| We will still need a centralized party that holds the secret
| keys for validation through
| vhcr wrote:
| There's no way this would work, either the master encryption
| key would be leaked, or someone would reverse-engineer the
| chip.
|
| Also, what about someone putting a screen just in front of
| the sensor?
| nine_k wrote:
| No need for that.
|
| A private key is generated on device and never leaves it.
| It sits inside a TPM or equivalent.
|
| The public key is pushed to a well-known site, visible to
| all.
|
| Every shot is _signed_ by hashing the bits into a
| reasonably short string (say, using sha512) and then
| encrypted with the private key.
|
| Anyone can now decrypt the hash, and compare it with the
| hash they computed from the bits.
|
| The problem, of course, is that any transformation
| whatsoever breaks the signature. You can't adjust levels
| and contrast, you can't even crop. Maybe it's a good
| property.
| nine_k wrote:
| What shall we do with millions upon millions of existing
| mobile phones, and also surveillance cameras and dashcams?
|
| Some of them possibly could be updated, but this will take
| time. Securing the keys within them is also going to be a
| problem; not all of them have a TPM.
| ChainReaktion wrote:
| This is the right approach, but there's lots of complexity
| around transcoding. In the courtroom that's less of an issue
| if you can get the original unmodified outputs, but broader
| applications need to think through what it means to be
| "verified"
| bick_nyers wrote:
| What about the recording device itself? Refeed a
| video/frame/hash back to a security camera and it tells you
| if it was originally sourced from that specific camera or
| not.
| foota wrote:
| The sci-fi dystopian answer would be entangled photon
| lights and off site image recording that preserves
| entanglement :-)
| smrtinsert wrote:
| You don't need a centralized database. You have 3 companies 3
| different hashes. They catch bob stealing, on 3 different
| cameras. Each footage can be independently verified as
| authentic and not doctored. This would prevent anonymous
| found footage suggesting someone committed a crime.
|
| You definitely don't want one single leakable entity.
| politelemon wrote:
| Seeing how slowly laws move, the cynic says it's: _should_ be
| inadmissible in any reasonable court, but will continue to be
| admissible, and will take a major set of incidents for changes
| to be enacted across many countries.
| nine_k wrote:
| Yes. The story if admissable DNA evidence is instructive and
| terrifying.
|
| https://daily.jstor.org/forensic-dna-evidence-can-lead-
| wrong...
| krunck wrote:
| I'm thinking that soon video and images are going to be just dead
| weight in journalism, adding nothing other than decoration.
| radarsat1 wrote:
| It seems to be a pretty common thing now on news network
| websites, that at the top of the story, or perhaps somewhere in
| the middle, there is a video. But, you click on the video, and
| it's something entirely unrelated to the article. I feel like
| this as been going on for quite a while, nothing to do with
| synthetic media, but just an observation of an annoying pattern
| I've noticed.
| chpatrick wrote:
| I think there has to be some kind of cryptographic signature
| solution, like "The BBC verifies that this image is authentic".
| mrshadowgoose wrote:
| We already have most of the required cryptographic primitives
| for this. PKI, trusted timestamping, secure hardware with
| remote attestation are some of the necessary building blocks.
| All that's really missing are camera sensors with built-in
| cryptography.
|
| And societal care. Our society seems to really like to whine
| about "the danger of deepfake images", but our actions reveal
| that we don't really give a crap, as we could solve this
| problem today if we really wanted to.
| esafak wrote:
| Stock photos are already like that. To me they're worse than
| having no pictures at all. Generated images are better than
| nothing because you can make them very specific.
| loandbehold wrote:
| What do we need human actors for at this point? Everything can be
| generated by AI now.
| the_af wrote:
| > _What do we need human actors for at this point? Everything
| can be generated by AI now._
|
| For blockbusters? If the tech still isn't there, it may be
| soon, and we won't need actors.
|
| For cinema where we care it was made by humans, for humans?
| Actors will always be needed. Also, theater still exists and
| people enjoy it.
| kleer001 wrote:
| Not even close by several orders of magnitude across two dozen
| disciplines. But yea, we're heading there.
| u385639 wrote:
| Please try to make an original movie that meets the standard
| of, say, The Godfather, with AI.
| yamazakiwi wrote:
| I understand your point but I think it would be easier with
| AI than without. Many movies are not made to the standard of
| The Godfather because they don't sell like MCU Movie #53 and
| if you include more humans in the creation you're more likely
| to run into the current system's restrictions.
|
| Making a movie as beloved as the Godfather would still be
| challenging of course.
| esafak wrote:
| As long as they don't suck the air out of funding for real
| movies I can live with it, but I'd still be sad that people
| are being trained to like auto-generated junk. Like how
| people are losing their ability to concentrate on long-form
| content due to overexposure to addictive short-form
| content.
| u385639 wrote:
| Of course it would be easier. I agree. I just take issue
| with the "why humans" thing because if anything, the recent
| advancements highlight just how big the human element
| really is.
|
| Can you imitate a Bach prelude? Sure. And only people who
| aren't actually familiar with his music would be impressed.
|
| Much of AI approaching "human performance", is it
| approaching the lowest bar. There's a Wittgenstein thing
| going on here. That an LLM can ace the LSAT or GMAT is
| mostly an indictment of those tests.
|
| A little off topic.
| og_kalu wrote:
| >That an LLM can ace the LSAT or GMAT is mostly an
| indictment of those tests.
|
| These kind of comments are always the funniest. You can
| just tell the person who makes them has never looked at
| those tests nevermind attempted them.
| u385639 wrote:
| I scored 159 on the LSAT in 2014, so I am not claiming
| the tests are easy. I am pointing out that when an AI
| aces them, it says more about the test than anything
| else.
| pmoriarty wrote:
| Please try to make an original movie that meets the standard
| of The Godfather, with or without AI.
| johndough wrote:
| Project website mirror
| https://web.archive.org/web/20230519060439/https://vcai.mpi-...
|
| GitHub (no code yet, only demo GIF)
| https://github.com/XingangPan/DragGAN
|
| arXiv https://arxiv.org/abs/2305.10973
| ArekDymalski wrote:
| As a technology this is tremendously impressive, straight out of
| SF movie. However I wonder how it will impact our culture,
| fashion, standards of beauty etc. when more and more artists will
| be accepting the generated output then creating their own. Just
| like in case of music the invention of MIDI, synths and
| sequencers brought new styles but also boring and imagination-
| numbing standardization.
| ortusdux wrote:
| Demo video:
| https://twitter.com/_akhaliq/status/1659424744490377217
| lt wrote:
| Longer video with more examples from one of the paper authors:
|
| https://twitter.com/XingangP/status/1659483374174584832
| amelius wrote:
| Looks like it can't keep the background stable, so I guess
| this is not suitable for animations.
| Zetobal wrote:
| The only thing that's new is the interactive interface the rest
| of it is old tech... You can use it on art breeder.com. Photoshop
| even has it in their face neural filter. GANs are not feasible
| for a variety of reasons you need to have models specific to your
| subject ie. why they change to a elephant model to manipulate the
| elephant.they are also not style agnostic but it's a great demo
| and the right time to release it. Just before the summit of the
| hype curve I bet one VC is dumb enough to throw millions at them.
| chatmasta wrote:
| With ChatGPT, the only thing new was the chat interface. In
| fact even Sam Altman mentioned this on Lex Fridman's podcast,
| IIRC - he said what he was most surprised about was the
| outsized effect the interface had on bringing LLM to the
| forefront of public consciousness, despite the existing
| maturity of the underlying GPT models. At least in that case it
| was OpenAI adding interactivity to its own existing models. But
| similarly, from a more holistic viewpoint, OpenAI productized
| existing research from Google. Transformer models were "old
| tech" since Google published "Attention is all you need" in
| 2017... and yet, when OpenAI managed to turn it into a usable
| product, suddenly they became the first movers and the company
| to beat. So I'm not convinced that only a "dumb" investor would
| fund an effort with a proven ability to productize "old tech."
| Zetobal wrote:
| The I don't understand the technology but will ramble about
| stuff until they just give up reading the comment approach
| -\\_(tsu)_/-
| chatmasta wrote:
| Are you referring to my comment? I'm certainly no expert on
| AI, and if I'm misunderstanding the technology I'd like to
| know. What is wrong about what I wrote?
| Zetobal wrote:
| Yes, I am referring to your comment and I am not going to
| explain why everyone and their mother jumped ship from
| GANs and went all in on transformers. Well, there is
| still the Alan Turing Institute in the UK but even they
| gave up and are into NFTs now :D
| npunt wrote:
| This reads a lot like 'dropbox is trivial, rsync already
| exists'
| Zetobal wrote:
| [flagged]
| waqasy wrote:
| DragGAN consists of two main components including: 1) a feature-
| based motion supervision that drives the handle point to move
| towards the target position, and 2) a new point tracking approach
| that leverages the discriminative GAN features to keep localizing
| the position of the handle points.
| sroussey wrote:
| Would love to see this for architecture!
| bbminner wrote:
| There's also an older work called PuppetGAN
| http://ai.bu.edu/puppetgan/
| Qweiuu wrote:
| It's similar but different.
|
| Your paper makes an existing body a puppet.
|
| The other one adjusts features.
| vagabund wrote:
| The semantic understanding feels much richer than diffusion based
| modeling, e.g. the trees on the shore growing to match the
| manipulated reflection, the sun changing shape as it's moved up
| on the horizon, the horse's leg following proper biomechanics as
| its position is changed. I haven't gotten such a cohesive world
| model when doing text-guided in-painting with stable diffusion
| etc. This feels like it could very conceivably be guided by an
| animation rig with temporally consistent results.
| orbital-decay wrote:
| Temporal consistency for a guided scene is a separate problem.
| It's been kind of solved a couple years ago. [0] It can be used
| with animation rigs and simplistic tagged geometries, and it
| even works in near real-time. "Kind of" because training a
| model from scratch from a large dataset is not something you
| want to do for the actual job; what you want is a good style
| transfer mechanism that can extract features from as few
| references as possible.
|
| [0] https://isl-org.github.io/PhotorealismEnhancement/
| tikkun wrote:
| Has anyone built the "online photoshop that incorporates all of
| the latest AI image editing tools asap and sells access as a
| premium subscription with lots of GPU access for smooth editing"
| business yet? I'd be curious to know.
| apodolny wrote:
| Playground AI (https://playgroundai.com/) does a lot of this.
| echelon wrote:
| There are a million of these. It's a super crowded space.
|
| https://civitai.com/
|
| https://lexica.art/
|
| https://openart.ai/
|
| (Many more)
| ftufek wrote:
| I think less effort has gone into image editing compared to
| image generation so far. That said, we're building some photo
| realistic image editing tools at https://www.faceshape.com,
| focused on face editing for now. Current models don't perform
| as well, but next generation currently under training will.
|
| I'm always curious to know what kind of AI image editing people
| are interested in, can you share what kind of edits you'd like
| to do? There's the usual edits like background removal or
| object removal, but those are more general tools that are
| getting incorporated into lots of apps natively (say Google
| Photos).
| jahewson wrote:
| Adobe Firefly already did it
| https://www.adobe.com/sensei/generative-ai/firefly.html
| echelon wrote:
| It's trained on their stock art and under-performs Stable
| Diffusion and Midjourney.
|
| It's really poor, comparatively.
| cubefox wrote:
| It makes way fewer visual mistakes (like wrong number of
| limbs) than Stable Diffusion, or even Bing Dall-E ~3. The
| latter is still the best at understanding your prompt
| though.
| Giorgi wrote:
| Adobe AI is a crap compared to Midjourney
| belter wrote:
| Because one is on proper stock art and the other on
| anything without asking the creators for authorization?
|
| "AI art tools Stable Diffusion and Midjourney targeted with
| copyright lawsuit" -
| https://www.theverge.com/2023/1/16/23557098/generative-ai-
| ar...
| ultra_nick wrote:
| Isn't that stability.ai's business model?
| [deleted]
___________________________________________________________________
(page generated 2023-05-19 23:02 UTC)