[HN Gopher] RealFill: Image completion using diffusion models
___________________________________________________________________
RealFill: Image completion using diffusion models
Author : flavoredquark
Score : 286 points
Date : 2023-09-29 18:27 UTC (4 hours ago)
(HTM) web link (realfill.github.io)
(TXT) w3m dump (realfill.github.io)
| CrzyLngPwd wrote:
| Creating a fake life is going to be so easy soon.
|
| Everyon will be able to make all of the other fakes on social
| media jealous with ease.
| lopatin wrote:
| These github.io pages always go down once they hit the front
| page.
| londons_explore wrote:
| Works fine for me...
|
| some github.io pages are iframes to the developers home machine
| or something for a tech demo that can't withstand many users.
|
| But regular github.io static pages ought to be able to
| withstand millions of users at once.
| lopatin wrote:
| I see, I think they're just broken for me in general.
| londons_explore wrote:
| Might be blocked on some corp networks because it's all
| anonymous user generated content.
| Squarex wrote:
| It appears to work ok and I've never witnessed problems with a
| static content on github pages.
| markjpT wrote:
| [flagged]
| a1o wrote:
| Images aren't loading for me which is kind of a bummer for this
| specifically... :/
| EricMausler wrote:
| I feel like this will be great for wedding photographers
| dang wrote:
| [stub for offtopicness]
| aga98mtl wrote:
| I do not agree with their usage of the word "Authentic".
| emodendroket wrote:
| Perhaps verisimilar then.
| crazygringo wrote:
| The point is that it's not based on hallucination -- it's
| generated out of the authentic details provided from other
| images.
|
| There's definitely a middle ground here that we perhaps don't
| have a good word for. E.g. what do we call a painting made by
| an artist who sat in front of the scene they depicted, vs. a
| painting made by an artist from their imagination? There's
| certainly some sense in which the first one was an
| "authentic" scene.
| thomastjeffery wrote:
| Here are some better words of the top of my head:
|
| Intentional
|
| Contextual
|
| Everything about this project goes against the meaning of
| authenticity.
| CobrastanJorji wrote:
| Yeah, except it's still absolutely vulnerable to
| hallucination. Look at the last set of images on
| "Limitations" page. The algorithm knows that there's a sign
| with text there, and it uses the original image to get the
| right letters in there, but it randomly reorders the
| letters rather than using the source image. "Real" and
| "authentic" is extremely misleading here.
|
| That said, props to them for calling out the limitations so
| clearly. I really appreciate it when people are up front
| with the problems like that.
| bhaney wrote:
| Cool tech, but plastering "authentic" all over this kind of
| generated photography is really disingenuous and just rubs me
| the wrong way. I get that it's informed by real details from
| other photos, but that's not what authentic means.
|
| If I buy an "authentic Rolex" and receive a Chinese Rolex clone
| that's built similarly based on observations of a real Rolex,
| I'm going to feel scammed and very upset. And I'm much more
| protective of my memories than I would be of a watch.
| dang wrote:
| Ok, we've taken authenticity out of the title above.
| 101008 wrote:
| Yeah, I think the first example is bad. This shouldn't be
| used for the photos you took. What's the purpose of having a
| photo if it wasn't the real moment you captured? I could
| understand the usage in marketing or event photographies, but
| for memories with your loved ones (as the first example tries
| to show it) it just doesn't make sense to me.
|
| Two anecdotes:
|
| 1. A friend of mine met his favourite author (traveled from
| one continent to another for a signing event). When he shaked
| hands with the author, a friend took a photo. A lady (still
| hated by us!) step in the middle, and blocked the photo.
| Maybe an IA or a talented person could remove her, use a
| footage photo of the author and rebuild the photo... but why?
| What's the purpose of that?
|
| 2. A few months ago during the pandemic I scanned all the
| printed pictures of my grand parents with my phone. Aftre
| scanning like 200s, I checked one and I zoomed in: the stupid
| app applied some IA to make it better and it just was worse.
| I don't care if it looks better for the untrained eye: my
| grandparents didn't look like that. I now have stupid
| horrible verson of the scanned photos, where my grand parents
| appear with smooth skin and weird eyes.
| nuancebydefault wrote:
| Is IA French for AI? (Like UE and many other
| abbreviations)? I could look it up but might as well ask
| the question.
| nargek wrote:
| Yes it is !
|
| IA -> Intelligence Artificielle
| smcnally wrote:
| In Spanish, too -- and other subject-object-verb
| languages
| IanCal wrote:
| I totally agree with 2. I'm less sure on 1. Imagine it's
| _perfect_ - it would be an accurate representation of what
| was really there. The real photo is a snapshot of a very
| specific time that doesn 't represent the broader context
| of what happened.
|
| A different angle, if a friend had painted the encounter
| instead, it wouldn't be exact but it would be a snapshot of
| a memory.
|
| I'm not hugely arguing in favour of it but I think there's
| different scales here, from cameras doing "merge pictures
| half a second apart so people have their eyes open" to
| "totally change their face".
| neilv wrote:
| They really need to not use the term "authentic" to name
| this.
|
| They also need to be very, very careful when introducing
| capability to falsify photographic images convincingly.
|
| Using the term "authentic" for this (and how do they even
| know what's an authentic memory?) doesn't sound like being
| very, very careful. It sounds like being gratuitously
| reckless.
| cmdli wrote:
| I would argue that authentic is a relative term, and actually
| helped me understand the product more easily. IMO, it is
| "authentic" because, compared to other image fills, it tries
| to fill in the data using real data from other photos.
| sdfghswe wrote:
| HN loves arguing about words.
| debugnik wrote:
| And recent HN posts love to twist them and reinterpret
| them just for promotion.
| marricks wrote:
| It's "authentic" in the same way that when you see
| something labeled authentic it makes you more likely to
| question if it's actually what it says it is because
| authentic thing don't need such labels plastered on them.
|
| Regardless, I'm pretty sure "reconstructed" it the honest
| word to use.
| HaZeust wrote:
| I'd call that "contextual" rather than authentic.
| waynesonfire wrote:
| let me give you an example. when i draw a mustache on a
| face in ms paint in brown, that's contextual but not
| authentic.
| endisneigh wrote:
| IDK, when I think authentic, I think "genuine", and no
| image generation is genuine by definition. this is not a
| bad thing necessarily, but it's important to frame these
| things correctly.
|
| ultimately we oughta think about what we are referring to.
| if we are talking about a photograph taken by someone, the
| authenticity is ultimately coming from the combination of
| the photograph and camera used. so when you think of a
| genuine photo in this scenario you expect it to be
| fundamentally taken by the user by a particular camera to
| create a particular photograph. you can use devices to take
| a photo without pressing the button, such as a timer, but
| the photograph and camera are both fundamental to the
| authenticity of the image. if the camera is no longer
| entirely involved in the generation of the photograph I
| would say that it is no longer genuine.
|
| Reference driven as described in the article is more
| appropriate, but alas it is verbose. normally such pedantry
| bores me, but in this case it's pretty tantamount to what
| it is being presented in this case.
| tremon wrote:
| I think "composite" would be more accurate to describe
| this process. As in, "complete a picture using image
| composition".
| jameshart wrote:
| We need a new word: authentish.
| smcnally wrote:
| Like "truthy."
| tremon wrote:
| How do they know the data from the other photos is real?
| thomastjeffery wrote:
| This literally goes against the meaning of the word
| authenticity.
|
| Call it "realistic". Words matter.
| QuercusMax wrote:
| Seems like it's only a matter of degree, given that modern
| cell phone cameras take image bursts and combine them into a
| single output image. Filling in details in a scene from other
| photos taken at the same time doesn't really seem that
| different to me. And seeing that photography has never really
| been capturing real life exactly, is it really that big a
| deal? Look at Ansel Adams - he heavily edited his "real-life"
| photographs, and changed them over the years as he made
| subsequent prints.
|
| (Disclaimer: work for Google but have nothing to do with this
| project.)
| drewcoo wrote:
| > plastering "authentic" all over this kind of generated
| photography is really disingenuous
|
| No more so than "virtual," which used to mean "true." Or
| "literal" which used to be the opposite of "figurative." It's
| just another word being used auto-autonymically.
|
| Definitio fugit.
| thomastjeffery wrote:
| Virtual never meant real.
|
| Literally is often used _in a sarcastic context_. That
| sarcasm _depends on_ the word meaning what it means.
| esafak wrote:
| Come on, Google, push this to Google Photos.
| 01100011 wrote:
| Slightly off-topic: what's the best way right now to remove my
| ex-wife from an old family portrait and replace her with my
| current wife?
| tremon wrote:
| Scissors and a Pritt stick.
| Adverblessly wrote:
| Assuming you are asking about a generative AI way, you could
| use photos of your new wife to train a LoRA with kohya-ss, then
| with A1111 you could do an img2img repaint using the ControlNet
| extension to make sure you get a similar pose. With enough
| experimentation you could probably get at least one decent
| result.
|
| At least that's what comes to mind with the things I know you
| can run offline.
| [deleted]
| pcblues wrote:
| Give it five years for the tech. Right now? Probably easier to
| get back with the ex to make the portrait correction.
|
| /jk sorry
| markjpT wrote:
| [flagged]
| true_religion wrote:
| I know someone who did something similar. He remarried then
| went back to and cropped or deleted ten years of Facebook
| photos to make it look like he never had a previous
| relationship and just ten years of boys nights.
|
| He even has a picture up of him from his wedding day...
| standing alone in a tux.
| solardev wrote:
| Ask everyone to pose for a new photo
| markjpT wrote:
| [flagged]
| bradleyjg wrote:
| The kind of stuff the op is doing---changing the composition to
| reflect a picture that could have been taken---is one thing.
| But what you are asking feels Stalin-esque to me. A picture is
| a record of a point in time and you can't change the past.
| joosters wrote:
| _A picture is a record of a point in time and you can't
| change the past._
|
| I don't think either of those things are true. Both can be
| changed, and are often changed. Much of what we 'know' of the
| past is wrong.
| solardev wrote:
| Sure you can, just as you can change people's memories and
| implant false ones. Hell, in this dystopia we're headed
| towards, it'll probably be a subscription service where you
| can rewrite 5 bad memories a month for $29/mo
| syntaxing wrote:
| Something...seems fishy? Like the example with the guy next to
| the robot figure. Their model happened to predict exactly the
| same type of figure?! Diffusion models are not omnipotent...
| foota wrote:
| The model gets the reference images as "context", so it can
| just copy the robot from one of the other images.
| syntaxing wrote:
| Ahh I see, this makes a lot more sense now!
| IshKebab wrote:
| That's the entire point. It didn't "happen" to predict exactly
| the same type of figure. It used the context photos to know
| what type of figure it should render.
|
| You might be getting a bit confused because here the training
| process has to happen every time you use it, whereas in most AI
| applications you only perform inference for actual use.
| bjornlouser wrote:
| I wonder if he is holding that umbrella to aid the model in
| recovering the 3d scene/scale from the reference images.
| endisneigh wrote:
| an interesting use case for this once the compute is there is to
| basically allow for ai powered digital zoom-out. it could work by
| instructing the user to take several pictures around the target,
| and then you take regular pictures of your subject.
|
| then, as you like, you can do an "ai zoom out" to get zoomed out
| pictures, no longer constrained by your lens or distance.
|
| I imagine this to be included relatively soon, just like how
| panaromas were once a niche thing that became much easier to do
| with some good ui/ux. pretty much any modern phone can do them
| without having to struggle with lining up photos and what not.
|
| one thing that does greatly concern me about the demo/site is
| that they have "authentic" and "recover" as terms. the result
| here is not authentic nor has anything been "recovered." it's an
| illusion at best. I personally don't like how they portray the
| new image as being equivalent as if the lens framed it in the
| original picture. it's not, as they show themselves in the later
| portion (near the end) with the text sign. seriously
| irresponsible framing (pun intended) to what's otherwise very
| cool tech.
| gs17 wrote:
| Agreed, the "Reference-Driven Generation" part is totally fine,
| but "authentic" is overselling it.
| richardw wrote:
| Nice idea. Might not need multiple pics given Google's image
| dataset and ability to recognise what you're looking at.
|
| Give that a couple generations. "You were at location X and
| didn't take a pic. We generated you some selfies, choose one
| that you like."
| markjpT wrote:
| [flagged]
| rasz wrote:
| Recover those precious memories of things that never happened,
| only with Google!
| nuancebydefault wrote:
| Somewhat covertly I deep down wish that human's desire for pretty
| looking pictures will fade away over time, due to the ubiquity of
| pretty looking pictures produced by auto post processing. And at
| ultima their liking of pretty people and shiny new stuff in
| general. I don't want to sound negative or pedantic, I just would
| like that people prefer inner beauty in the broader sense.
| hansoolo wrote:
| This is a beautiful post! Thank you!
| debarshri wrote:
| The current advancement in Generative AI is a bit scary, in my
| opinion. May I be pessimistic?
|
| This and the new demos I saw from WhatsApp's new demo around
| persona-based AI can really alter someone's perception and
| memories. I don't think we are considering how it can really
| impact our understanding of our feelings, perception, memories
| and mindfulness.
|
| If you take a picture of reality and alter it with Gen AI to do
| something else and change the moment, what is the new reality?
| After a while, we might question if it was real or not, and then
| that might just become the new reality.
|
| In my opinion, GenAI is truly transformational as well as scary,
| as it can alter our perception. I wonder if anyone else feels
| this way.
| theultdev wrote:
| Nobody shows their true reality in public pictures anyway, it's
| all staged in some way.
|
| For private pictures, it didn't change your reality, you can
| lie to yourself, but you've always been able to do that.
| debarshri wrote:
| but when you take a picture that capture personal moment and
| some software without your consent alters it with some
| generative stuff, what would that lead to?
|
| I disagree with lying to yourself. For people who are not
| mindful and aware, this is severely impact their perception.
| theultdev wrote:
| > but when you take a picture that capture personal moment
| and some software without your consent alters it with some
| generative stuff, what would that lead to?
|
| I mean, do you not look at the photo after you take it?
| Even if you don't, you were there and saw the original
| scene. If your memory fails you, it's on you. If you didn't
| take an accurate picture, it's on you. Check next time.
|
| If anything meaningful is added, it'll be very noticeable,
| if it's not meaningful, then what does it matter?
|
| Cameras already do a lot of corrections that don't
| represent reality.
|
| Hell, our perceptions of colors is different than everyone
| else's.
| deckar01 wrote:
| I have been working on a holographic camera, but the ultra-cheap
| pinhole cameras I chose for the array have two issues: the
| exposure can't be controlled and the lenses are poorly aligned. I
| can calibrate away most lens aberrations with OpenCV, but some of
| the outliers have so much cropping that I am discarding 75% of my
| good pixels to get a coherent result. I was considering using
| NeRFs to reproject the ideal camera angles, but COLMAP is not
| very tolerant of brightness fluctuations and NeRF training is
| relatively slow (considering my goal is video). This would be a
| nice solution to my problem, because I have a comprehensive set
| of angles to pull context from.
| waynenilsen wrote:
| "Comparison with Baselines" is shocking
| uptown wrote:
| Is this similar to what GoPro cameras do to remove the selfie-
| stick? They use video content from adjacent frames to remove the
| pole and fill in with pixels. I get that the approach here can
| use imagery that's frames completely differently.
| Jorge1o1 wrote:
| Wow. The use case that comes to mind for me is when you take a
| big family photo (or 20) and someone inevitably ends up cut-off
| by accident.
|
| So then you just feed RealFill the 20 pictures you took and your
| uncle is magically painted in.
| xwdv wrote:
| You don't even need to take the photo, with enough images of
| each family member and images of a tourist destination you can
| just automatically construct a photo of everyone together at
| the location, saving the costs and carbon footprint of getting
| everyone together.
| cubefox wrote:
| And then why demand "photos" of family excursions at all,
| when it is just an AI imagining how things probably were
| happening at the time, or would have happened? We should just
| stick to our own imperfect memory.
| jetrink wrote:
| Also getting everyone smiling with their eyes open at the same
| time. Phone cameras could record a group photo for five or ten
| seconds and use the best expression from different times for
| each person.
| emodendroket wrote:
| Pixel phones already have some features kind of like this so
| it makes sense.
| lazycouchpotato wrote:
| I feel like this is already a thing with certain photo
| editing applications, if not built into phones themselves.
| patapong wrote:
| Or you take a single picture of a group in front of a
| monument, but cut it off. As I understand it you could find
| pictures of the monument online, run the model, and have a
| picture with the group and the entire monument.
|
| Probably google can even do this automatically - I would not
| be surprised if I get suggestions to fix images with cut off
| buildings via Google Photos in the future! Would be so cool.
| twism wrote:
| From the leaks this may be coming to the Pixel 8
| twism wrote:
| https://www.theverge.com/2023/9/23/23886765/google-
| pixel-8-p...
| ChrisClark wrote:
| Leaks? Wasn't this a launch feature in Google Photos, while
| it was still Google+ Photos?
|
| It was supposed to adjust eyes to open them if you took
| multiple photos.
| crazygringo wrote:
| Wow.
|
| This actually feels like it could be an _incredibly_ valuable
| post-production tool in film and TV, once they get it working
| consistently across multiple frames.
|
| Not only for more flexibility in "uncropping" after shooting
| (there was a tree/wall in the way), but this could basically be
| the holy grail solution for converting 4:3 to widescreen without
| cutting off content on the top and bottom.
| waynenilsen wrote:
| removing the cameraman from the shot is probably pretty close
| to the top of the list also
| markjpT wrote:
| [flagged]
| emodendroket wrote:
| I can see it working great for some stuff but wouldn't you
| ultimately face the issue with more artistic work that the
| framing might not be very good if just artificially extending.
| crazygringo wrote:
| It definitely needs to be applied judiciously on a shot-by-
| shot basis.
|
| There have been quite a few 4:3-to-widescreen conversions
| that were done using the original film that was actually shot
| in widescreen and cropped for TV.
|
| Sometimes, the wider shot makes perfect sense. Sometimes,
| they keep the original cropped one but cut off top/bottom.
| Sometimes it's a combination of the two. It all depends on
| what's being framed -- two people in a car usually benefits
| from cropping (nobody needs the bottom third of the frame
| occupied by the car's hood), while a close-up on someone's
| face usually benefits from extending the sides (otherwise
| it's an uncomfortable mega-close up that cuts off their
| mouth).
|
| But having the flexibility to extend horizontally gives you
| the artistic possibilities.
| qingcharles wrote:
| I already use Photoshop Generative Fill for uncropping videos,
| but it only works for fixed camera shots. Photoshop just added
| feature where you can just drag the video file in and do the
| uncrop in one step.
|
| The problem I'm solving is converting videos from widescreen to
| vertical and sometimes you need some extra height.
| jiggawatts wrote:
| > widescreen to vertical
|
| You're a monster.
| anigbrowl wrote:
| Pro: a cool and useful looking technology
|
| Con: it's from Google so forget about trying it yourself any time
| soon
|
| I used to be a huge supporter of Google's products, now the name
| is an instant red flag.
| corndoge wrote:
| this page consistently crashes chrome on iOS
| cryptoz wrote:
| So is the weather just hallucinated then? We're just making up
| memories and calling them real? And advertising this blatently,
| called rainy days sunny and sunny days rainy? My god I hate this
| so much.
|
| Not even a discussion about if this might be harmful or what the
| risks are or anything, just plain old "THIS FAKE MOMENT WAS REAL
| AND YOU'LL BELIEVE IT"?!
|
| I really have a hard time with this. Wow I'm upset, more than I
| expected. The tech is fine yeah but the marketing is just deeply
| upsetting.
| [deleted]
| buildbot wrote:
| Cool tech as others have said, but of course, for thee but not
| for me with Google, unless I missed a link to a GitHub repo.
| (That's why OpenAI is called OpenAI - not open source, but at
| least open access!)
| sergiotapia wrote:
| This is what will make the Pixel compelling.
|
| My wife and I have been using the Pixel phones since Pixel 6 and
| we love the camera. Great pictures! But the best features are
| google photos, auto-tagging, recommending collages, walking down
| memory lane.
|
| Then you can magic erase tourists from pictures and pic a better
| shot from a picture you took on the fly....
|
| You add this "authentic image completion" to my kids pics, and
| it's game over...
|
| I want this on my Pixel 8 asap!
| ehsankia wrote:
| The demo of the new upcoming Magic Editor they gave at I/O was
| quite magical.
|
| https://www.youtube.com/watch?v=-a583U3Sw44
|
| There's also leaks showing another feature where you can
| individually swap every person's face to get the perfect photo:
|
| https://www.ign.com/articles/google-pixel-8-leaked-video-ai-...
|
| I definitely agree, Pixel has been at the forefront of
| computation photography and editing since its inception. Things
| like night photography that we take for granted now, I remember
| when Pixel 2 first introduced it and it was honestly mind
| blowing. this use of computation photography and editing that
| simoneau wrote:
| Me: Facebook AI, please post an entry about my vacation on Cape
| Cod and create a bunch of photos to go with it.
|
| Facebook: Great. I'd be happy to. Any more detail you'd like to
| add?
|
| Me: Make us look attractive. Show that we're a having a great
| time. Also, we went to see the Chatham Lighthouse.
|
| Facebook: OK, done!
|
| ...
|
| Facebook: You've received 48 likes. Your mother would like to
| know if you had any salt water taffy.
|
| Me: Yes, and please create a picture of my oldest daughter having
| trouble chewing it.
|
| Facebook: Done.
| ShakataGaNai wrote:
| Sounds like the plot line to an episode of Black Mirror, but
| also something that is far too likely to happen.
| simoneau wrote:
| me: Facebook AI, please post a tender moment between me and
| my father when I was a boy. Include some photos.
|
| Facebook: I'd be happy to. Are there any more details you'd
| like to include?
|
| me: Please show how he didn't understand me at first, but
| then he looks at me and starts crying with love and regret.
|
| Facebook: Done. Your relationship with your father must have
| been deeply fulfilling.
| ormax3 wrote:
| https://petapixel.com/2022/12/14/man-fakes-an-entire-
| month-o...
| brap wrote:
| For the last 2-3 years, on an almost weekly basis, I am blown
| away by the progress made in AI. Huge steps forward. It actually
| happened twice in the last 24 hours alone.
|
| Where will we be 10 years from now? 50?
| Heidaradar wrote:
| What was the second time in the last 24 hours?
| brap wrote:
| https://youtu.be/MVYrJJNdrEg
| jawns wrote:
| There's definitely value in providing this functionality for
| photographs taken in the present.
|
| But I think the real value -- and this is definitely in Google's
| favor -- is providing this functionality for photos you have
| taken in the past.
|
| I have probably 30K+ photos in Google Photos that capture moments
| from the past 15 years. There are quite a lot of them where I've
| taken multiple shots of the same scene in quick succession, and
| it would be fairly straightforward for Google to detect such
| groupings and apply the technique to produce synthesized pictures
| that are better than the originals. It already does something
| similar for photo collages and "best in a series of rapid shots."
| They surface without my having to do anything.
| thesuavefactor wrote:
| Every picture is a picture from the past though
| jawns wrote:
| Philosophically, yes. But some photo-editing techniques rely
| on data that is not backfillable and must be recorded at
| capture time. And even in cases where there is no functional
| impediment to applying it against historical photos,
| sometimes there is product gatekeeping to contend with.
| royaltheartist wrote:
| Oh yeah, what about this old Kodak I found in my grandpa's
| attic that prints pictures showing how people are going to
| die?
| Workaccount2 wrote:
| Google might as well just be making up tech considering none of
| this stuff ever gets released.
| datameta wrote:
| I think using allusions to realism with AI is a dangerous road to
| start out on.
| henriquez wrote:
| Hasn't something like this been around for a year or so to
| "decensor" hentai pics?
| drcode wrote:
| When will they re-release all the old Star Trek TV shows in 1080p
| resolution and 16:9 aspect ratio?
| ShakataGaNai wrote:
| There are already applications like
| https://www.topazlabs.com/topaz-video-ai and
| https://tensorpix.ai/ -- So it doesn't seem unreasonable that
| some of these deep learning models could upscale all these old
| TV episodes to at least 4k.
|
| I'd love to see a combo of this Google tech and AI upscaling do
| the same for Babylon 5. They had shot the actors in widescreen
| format, but the CGI spaceships were only rendered in 4:3 and
| the files have been lost.
| dragonwriter wrote:
| This requires other pictures of the environment to use to infer
| what should fill in the gaps, which will not exist for every
| shot in those series. (TOS and TNG were already rereleased in
| 1080p, though.) I suppose you could use outpainting to
| _construct_ the rest of the scene in one frame, and use that as
| the reference for other frames in the same shot.
| andrewprock wrote:
| I suspect this will do a pretty good job at defeating watermarks.
___________________________________________________________________
(page generated 2023-09-29 23:00 UTC)