hngopher.com

       [HN Gopher] RealFill: Image completion using diffusion models
       ___________________________________________________________________
        
       RealFill: Image completion using diffusion models
        
       Author : flavoredquark
       Score  : 570 points
       Date   : 2023-09-29 18:27 UTC (1 days ago)
        
 (HTM) web link (realfill.github.io)
 (TXT) w3m dump (realfill.github.io)
        
       | CrzyLngPwd wrote:
       | Creating a fake life is going to be so easy soon.
       | 
       | Everyon will be able to make all of the other fakes on social
       | media jealous with ease.
        
         | pbhjpbhj wrote:
         | Can we cryptographically sign a photo in a way that shows it
         | was generated in a particular place? I'm thinking of some sort
         | of beacon in a location that allows you to say this person was
         | here, at least. I'm not sure if it's possible to go beyond
         | presence and indicate anything else about the situation?
         | 
         | I hesitate to say it, but a blockchain is probably part of the
         | solution.
        
           | amelius wrote:
           | There is this trend going on where hardware vendors are
           | increasingly locking down their hardware, and this could be a
           | part of the solution you are looking for. Not everyone will
           | be happy about it, however.
        
       | lopatin wrote:
       | These github.io pages always go down once they hit the front
       | page.
        
         | londons_explore wrote:
         | Works fine for me...
         | 
         | some github.io pages are iframes to the developers home machine
         | or something for a tech demo that can't withstand many users.
         | 
         | But regular github.io static pages ought to be able to
         | withstand millions of users at once.
        
           | lopatin wrote:
           | I see, I think they're just broken for me in general.
        
             | londons_explore wrote:
             | Might be blocked on some corp networks because it's all
             | anonymous user generated content.
        
           | callalex wrote:
           | Images are broken for me.
        
         | Squarex wrote:
         | It appears to work ok and I've never witnessed problems with a
         | static content on github pages.
        
           | markjpT wrote:
           | [flagged]
        
         | a1o wrote:
         | Images aren't loading for me which is kind of a bummer for this
         | specifically... :/
        
       | EricMausler wrote:
       | I feel like this will be great for wedding photographers
        
       | dang wrote:
       | [stub for offtopicness]
        
         | aga98mtl wrote:
         | I do not agree with their usage of the word "Authentic".
        
           | emodendroket wrote:
           | Perhaps verisimilar then.
        
           | crazygringo wrote:
           | The point is that it's not based on hallucination -- it's
           | generated out of the authentic details provided from other
           | images.
           | 
           | There's definitely a middle ground here that we perhaps don't
           | have a good word for. E.g. what do we call a painting made by
           | an artist who sat in front of the scene they depicted, vs. a
           | painting made by an artist from their imagination? There's
           | certainly some sense in which the first one was an
           | "authentic" scene.
        
             | thomastjeffery wrote:
             | Here are some better words of the top of my head:
             | 
             | Intentional
             | 
             | Contextual
             | 
             | Everything about this project goes against the meaning of
             | authenticity.
        
             | CobrastanJorji wrote:
             | Yeah, except it's still absolutely vulnerable to
             | hallucination. Look at the last set of images on
             | "Limitations" page. The algorithm knows that there's a sign
             | with text there, and it uses the original image to get the
             | right letters in there, but it randomly reorders the
             | letters rather than using the source image. "Real" and
             | "authentic" is extremely misleading here.
             | 
             | That said, props to them for calling out the limitations so
             | clearly. I really appreciate it when people are up front
             | with the problems like that.
        
         | bhaney wrote:
         | Cool tech, but plastering "authentic" all over this kind of
         | generated photography is really disingenuous and just rubs me
         | the wrong way. I get that it's informed by real details from
         | other photos, but that's not what authentic means.
         | 
         | If I buy an "authentic Rolex" and receive a Chinese Rolex clone
         | that's built similarly based on observations of a real Rolex,
         | I'm going to feel scammed and very upset. And I'm much more
         | protective of my memories than I would be of a watch.
        
           | dang wrote:
           | Ok, we've taken authenticity out of the title above.
        
           | 101008 wrote:
           | Yeah, I think the first example is bad. This shouldn't be
           | used for the photos you took. What's the purpose of having a
           | photo if it wasn't the real moment you captured? I could
           | understand the usage in marketing or event photographies, but
           | for memories with your loved ones (as the first example tries
           | to show it) it just doesn't make sense to me.
           | 
           | Two anecdotes:
           | 
           | 1. A friend of mine met his favourite author (traveled from
           | one continent to another for a signing event). When he shaked
           | hands with the author, a friend took a photo. A lady (still
           | hated by us!) step in the middle, and blocked the photo.
           | Maybe an IA or a talented person could remove her, use a
           | footage photo of the author and rebuild the photo... but why?
           | What's the purpose of that?
           | 
           | 2. A few months ago during the pandemic I scanned all the
           | printed pictures of my grand parents with my phone. Aftre
           | scanning like 200s, I checked one and I zoomed in: the stupid
           | app applied some IA to make it better and it just was worse.
           | I don't care if it looks better for the untrained eye: my
           | grandparents didn't look like that. I now have stupid
           | horrible verson of the scanned photos, where my grand parents
           | appear with smooth skin and weird eyes.
        
             | nuancebydefault wrote:
             | Is IA French for AI? (Like UE and many other
             | abbreviations)? I could look it up but might as well ask
             | the question.
        
               | nargek wrote:
               | Yes it is !
               | 
               | IA -> Intelligence Artificielle
        
               | smcnally wrote:
               | In Spanish, too -- and other subject-object-verb
               | languages
        
             | IanCal wrote:
             | I totally agree with 2. I'm less sure on 1. Imagine it's
             | _perfect_ - it would be an accurate representation of what
             | was really there. The real photo is a snapshot of a very
             | specific time that doesn 't represent the broader context
             | of what happened.
             | 
             | A different angle, if a friend had painted the encounter
             | instead, it wouldn't be exact but it would be a snapshot of
             | a memory.
             | 
             | I'm not hugely arguing in favour of it but I think there's
             | different scales here, from cameras doing "merge pictures
             | half a second apart so people have their eyes open" to
             | "totally change their face".
        
           | neilv wrote:
           | They really need to not use the term "authentic" to name
           | this.
           | 
           | They also need to be very, very careful when introducing
           | capability to falsify photographic images convincingly.
           | 
           | Using the term "authentic" for this (and how do they even
           | know what's an authentic memory?) doesn't sound like being
           | very, very careful. It sounds like being gratuitously
           | reckless.
        
           | cmdli wrote:
           | I would argue that authentic is a relative term, and actually
           | helped me understand the product more easily. IMO, it is
           | "authentic" because, compared to other image fills, it tries
           | to fill in the data using real data from other photos.
        
             | sdfghswe wrote:
             | HN loves arguing about words.
        
               | debugnik wrote:
               | And recent HN posts love to twist them and reinterpret
               | them just for promotion.
        
             | marricks wrote:
             | It's "authentic" in the same way that when you see
             | something labeled authentic it makes you more likely to
             | question if it's actually what it says it is because
             | authentic thing don't need such labels plastered on them.
             | 
             | Regardless, I'm pretty sure "reconstructed" it the honest
             | word to use.
        
             | HaZeust wrote:
             | I'd call that "contextual" rather than authentic.
        
               | waynesonfire wrote:
               | let me give you an example. when i draw a mustache on a
               | face in ms paint in brown, that's contextual but not
               | authentic.
        
             | endisneigh wrote:
             | IDK, when I think authentic, I think "genuine", and no
             | image generation is genuine by definition. this is not a
             | bad thing necessarily, but it's important to frame these
             | things correctly.
             | 
             | ultimately we oughta think about what we are referring to.
             | if we are talking about a photograph taken by someone, the
             | authenticity is ultimately coming from the combination of
             | the photograph and camera used. so when you think of a
             | genuine photo in this scenario you expect it to be
             | fundamentally taken by the user by a particular camera to
             | create a particular photograph. you can use devices to take
             | a photo without pressing the button, such as a timer, but
             | the photograph and camera are both fundamental to the
             | authenticity of the image. if the camera is no longer
             | entirely involved in the generation of the photograph I
             | would say that it is no longer genuine.
             | 
             | Reference driven as described in the article is more
             | appropriate, but alas it is verbose. normally such pedantry
             | bores me, but in this case it's pretty tantamount to what
             | it is being presented in this case.
        
               | tremon wrote:
               | I think "composite" would be more accurate to describe
               | this process. As in, "complete a picture using image
               | composition".
        
             | jameshart wrote:
             | We need a new word: authentish.
        
               | smcnally wrote:
               | Like "truthy."
        
             | tremon wrote:
             | How do they know the data from the other photos is real?
        
           | thomastjeffery wrote:
           | This literally goes against the meaning of the word
           | authenticity.
           | 
           | Call it "realistic". Words matter.
        
             | Timon3 wrote:
             | "Realistic" is the wrong word, since that's what infill
             | models are already doing, and the word is already used for
             | that. You'd have to find something that differentiates
             | between plausibly realistic and contextual realistic
             | infill.
        
           | QuercusMax wrote:
           | Seems like it's only a matter of degree, given that modern
           | cell phone cameras take image bursts and combine them into a
           | single output image. Filling in details in a scene from other
           | photos taken at the same time doesn't really seem that
           | different to me. And seeing that photography has never really
           | been capturing real life exactly, is it really that big a
           | deal? Look at Ansel Adams - he heavily edited his "real-life"
           | photographs, and changed them over the years as he made
           | subsequent prints.
           | 
           | (Disclaimer: work for Google but have nothing to do with this
           | project.)
        
           | drewcoo wrote:
           | > plastering "authentic" all over this kind of generated
           | photography is really disingenuous
           | 
           | No more so than "virtual," which used to mean "true." Or
           | "literal" which used to be the opposite of "figurative." It's
           | just another word being used auto-autonymically.
           | 
           | Definitio fugit.
        
             | thomastjeffery wrote:
             | Virtual never meant real.
             | 
             | Literally is often used _in a sarcastic context_. That
             | sarcasm _depends on_ the word meaning what it means.
        
       | esafak wrote:
       | Come on, Google, push this to Google Photos.
        
       | syntaxfree wrote:
       | There was a subreddit called something like r/bubbling where
       | people would edit pictures of women in bikinis and actually cover
       | more of the image, but in such a way that your brain was fooled
       | into completing the image and seeing a nude woman. I thought it
       | was a technical marvel, however creepy.
        
       | kaetemi wrote:
       | I have a ton of potato definition videos along with matching high
       | res photos from my childhood that were made by one of those cheap
       | CF-card cameras at the time. Would be cool if this could restore
       | those shitty video frames based on the reference photos as well.
        
       | 01100011 wrote:
       | Slightly off-topic: what's the best way right now to remove my
       | ex-wife from an old family portrait and replace her with my
       | current wife?
        
         | tremon wrote:
         | Scissors and a Pritt stick.
        
         | Adverblessly wrote:
         | Assuming you are asking about a generative AI way, you could
         | use photos of your new wife to train a LoRA with kohya-ss, then
         | with A1111 you could do an img2img repaint using the ControlNet
         | extension to make sure you get a similar pose. With enough
         | experimentation you could probably get at least one decent
         | result.
         | 
         | At least that's what comes to mind with the things I know you
         | can run offline.
        
         | [deleted]
        
         | pcblues wrote:
         | Give it five years for the tech. Right now? Probably easier to
         | get back with the ex to make the portrait correction.
         | 
         | /jk sorry
        
           | markjpT wrote:
           | [flagged]
        
         | true_religion wrote:
         | I know someone who did something similar. He remarried then
         | went back to and cropped or deleted ten years of Facebook
         | photos to make it look like he never had a previous
         | relationship and just ten years of boys nights.
         | 
         | He even has a picture up of him from his wedding day...
         | standing alone in a tux.
        
         | solardev wrote:
         | Ask everyone to pose for a new photo
        
         | markjpT wrote:
         | [flagged]
        
         | bradleyjg wrote:
         | The kind of stuff the op is doing---changing the composition to
         | reflect a picture that could have been taken---is one thing.
         | But what you are asking feels Stalin-esque to me. A picture is
         | a record of a point in time and you can't change the past.
        
           | true_religion wrote:
           | This is why my family portrait is going to be a painting. In
           | the future, you can just paint in the new generations and we
           | can all be in the same frame together.
        
           | joosters wrote:
           | _A picture is a record of a point in time and you can't
           | change the past._
           | 
           | I don't think either of those things are true. Both can be
           | changed, and are often changed. Much of what we 'know' of the
           | past is wrong.
        
           | solardev wrote:
           | Sure you can, just as you can change people's memories and
           | implant false ones. Hell, in this dystopia we're headed
           | towards, it'll probably be a subscription service where you
           | can rewrite 5 bad memories a month for $29/mo
        
       | syntaxing wrote:
       | Something...seems fishy? Like the example with the guy next to
       | the robot figure. Their model happened to predict exactly the
       | same type of figure?! Diffusion models are not omnipotent...
        
         | foota wrote:
         | The model gets the reference images as "context", so it can
         | just copy the robot from one of the other images.
        
           | syntaxing wrote:
           | Ahh I see, this makes a lot more sense now!
        
         | IshKebab wrote:
         | That's the entire point. It didn't "happen" to predict exactly
         | the same type of figure. It used the context photos to know
         | what type of figure it should render.
         | 
         | You might be getting a bit confused because here the training
         | process has to happen every time you use it, whereas in most AI
         | applications you only perform inference for actual use.
        
         | bjornlouser wrote:
         | I wonder if he is holding that umbrella to aid the model in
         | recovering the 3d scene/scale from the reference images.
        
       | Dwedit wrote:
       | How much VRAM (or system RAM) would you need to run this, and how
       | much processing time does it take to process the reference images
       | and let it generate the fills?
        
       | derefr wrote:
       | Question: does this model only do outpainting, or does it also do
       | super-resolution? Could this model be fed all the frames of a
       | really awful security-camera video, in order to then synthesize a
       | high-resolution still image of a suspect?
        
         | pbhjpbhj wrote:
         | And you can tell it in advance which person you want it to
         | reveal as the suspect!
        
       | endisneigh wrote:
       | an interesting use case for this once the compute is there is to
       | basically allow for ai powered digital zoom-out. it could work by
       | instructing the user to take several pictures around the target,
       | and then you take regular pictures of your subject.
       | 
       | then, as you like, you can do an "ai zoom out" to get zoomed out
       | pictures, no longer constrained by your lens or distance.
       | 
       | I imagine this to be included relatively soon, just like how
       | panaromas were once a niche thing that became much easier to do
       | with some good ui/ux. pretty much any modern phone can do them
       | without having to struggle with lining up photos and what not.
       | 
       | one thing that does greatly concern me about the demo/site is
       | that they have "authentic" and "recover" as terms. the result
       | here is not authentic nor has anything been "recovered." it's an
       | illusion at best. I personally don't like how they portray the
       | new image as being equivalent as if the lens framed it in the
       | original picture. it's not, as they show themselves in the later
       | portion (near the end) with the text sign. seriously
       | irresponsible framing (pun intended) to what's otherwise very
       | cool tech.
        
         | gs17 wrote:
         | Agreed, the "Reference-Driven Generation" part is totally fine,
         | but "authentic" is overselling it.
        
         | richardw wrote:
         | Nice idea. Might not need multiple pics given Google's image
         | dataset and ability to recognise what you're looking at.
         | 
         | Give that a couple generations. "You were at location X and
         | didn't take a pic. We generated you some selfies, choose one
         | that you like."
        
           | cvwright wrote:
           | If they wanted to, they could find real pictures of you,
           | taken by other Google users who were there at the same time.
           | 
           | I don't know if that's more or less creepy than the AI
           | stuff...
        
         | satvikpendem wrote:
         | This already exists in Stable Diffusion and others, called
         | outpainting.
        
         | markjpT wrote:
         | [flagged]
        
       | Anya200 wrote:
       | [dead]
        
       | rasz wrote:
       | Recover those precious memories of things that never happened,
       | only with Google!
        
       | nuancebydefault wrote:
       | Somewhat covertly I deep down wish that human's desire for pretty
       | looking pictures will fade away over time, due to the ubiquity of
       | pretty looking pictures produced by auto post processing. And at
       | ultima their liking of pretty people and shiny new stuff in
       | general. I don't want to sound negative or pedantic, I just would
       | like that people prefer inner beauty in the broader sense.
        
         | hansoolo wrote:
         | This is a beautiful post! Thank you!
        
       | pedalpete wrote:
       | Mix this with some nerf/guasian spat, or other 3D rendering and
       | we have photos where you can re-frame after being shot. No more
       | selfie sticks, perhaps use both cameras at one time to capture
       | more of a scene for infill.
       | 
       | Some will say "but that isn't a real photo of what was there",
       | but our memories of what was in a photo or a scene aren't perfect
       | anyway.
        
         | drumttocs8 wrote:
         | I like the idea of a 3d generated scene we can explore with VR
        
       | debarshri wrote:
       | The current advancement in Generative AI is a bit scary, in my
       | opinion. May I be pessimistic?
       | 
       | This and the new demos I saw from WhatsApp's new demo around
       | persona-based AI can really alter someone's perception and
       | memories. I don't think we are considering how it can really
       | impact our understanding of our feelings, perception, memories
       | and mindfulness.
       | 
       | If you take a picture of reality and alter it with Gen AI to do
       | something else and change the moment, what is the new reality?
       | After a while, we might question if it was real or not, and then
       | that might just become the new reality.
       | 
       | In my opinion, GenAI is truly transformational as well as scary,
       | as it can alter our perception. I wonder if anyone else feels
       | this way.
        
         | theultdev wrote:
         | Nobody shows their true reality in public pictures anyway, it's
         | all staged in some way.
         | 
         | For private pictures, it didn't change your reality, you can
         | lie to yourself, but you've always been able to do that.
        
           | debarshri wrote:
           | but when you take a picture that capture personal moment and
           | some software without your consent alters it with some
           | generative stuff, what would that lead to?
           | 
           | I disagree with lying to yourself. For people who are not
           | mindful and aware, this is severely impact their perception.
        
             | theultdev wrote:
             | > but when you take a picture that capture personal moment
             | and some software without your consent alters it with some
             | generative stuff, what would that lead to?
             | 
             | I mean, do you not look at the photo after you take it?
             | Even if you don't, you were there and saw the original
             | scene. If your memory fails you, it's on you. If you didn't
             | take an accurate picture, it's on you. Check next time.
             | 
             | If anything meaningful is added, it'll be very noticeable,
             | if it's not meaningful, then what does it matter?
             | 
             | Cameras already do a lot of corrections that don't
             | represent reality.
             | 
             | Hell, our perceptions of colors is different than everyone
             | else's.
        
       | deckar01 wrote:
       | I have been working on a holographic camera, but the ultra-cheap
       | pinhole cameras I chose for the array have two issues: the
       | exposure can't be controlled and the lenses are poorly aligned. I
       | can calibrate away most lens aberrations with OpenCV, but some of
       | the outliers have so much cropping that I am discarding 75% of my
       | good pixels to get a coherent result. I was considering using
       | NeRFs to reproject the ideal camera angles, but COLMAP is not
       | very tolerant of brightness fluctuations and NeRF training is
       | relatively slow (considering my goal is video). This would be a
       | nice solution to my problem, because I have a comprehensive set
       | of angles to pull context from.
        
       | waynenilsen wrote:
       | "Comparison with Baselines" is shocking
        
       | uptown wrote:
       | Is this similar to what GoPro cameras do to remove the selfie-
       | stick? They use video content from adjacent frames to remove the
       | pole and fill in with pixels. I get that the approach here can
       | use imagery that's frames completely differently.
        
       | Jorge1o1 wrote:
       | Wow. The use case that comes to mind for me is when you take a
       | big family photo (or 20) and someone inevitably ends up cut-off
       | by accident.
       | 
       | So then you just feed RealFill the 20 pictures you took and your
       | uncle is magically painted in.
        
         | xwdv wrote:
         | You don't even need to take the photo, with enough images of
         | each family member and images of a tourist destination you can
         | just automatically construct a photo of everyone together at
         | the location, saving the costs and carbon footprint of getting
         | everyone together.
        
           | cubefox wrote:
           | And then why demand "photos" of family excursions at all,
           | when it is just an AI imagining how things probably were
           | happening at the time, or would have happened? We should just
           | stick to our own imperfect memory.
        
             | therein wrote:
             | I'd imagine in the future we could have services such as
             | this one:
             | 
             | > In exchange of a small fee and a 35 minutes suggestion
             | session, get you and your family implanted with memories of
             | a beautiful vacation that'll last you for a lifetime for
             | fraction of the cost of an actual one.
        
           | TheJoeMan wrote:
           | That reminds me of https://hackaday.com/2023/06/02/ai-camera-
           | imagines-a-photo-o...
           | 
           | A box that takes your gps location, weather, etc and
           | autogenerates a photo from your PoV.
        
         | jetrink wrote:
         | Also getting everyone smiling with their eyes open at the same
         | time. Phone cameras could record a group photo for five or ten
         | seconds and use the best expression from different times for
         | each person.
        
           | emodendroket wrote:
           | Pixel phones already have some features kind of like this so
           | it makes sense.
        
           | lazycouchpotato wrote:
           | I feel like this is already a thing with certain photo
           | editing applications, if not built into phones themselves.
        
             | AuryGlenz wrote:
             | I'm a photographer that does families/weddings.
             | 
             | I've done this manually in Photoshop more times than I can
             | count.
             | 
             | Usually more automated solutions only hold up to light
             | scrutiny, but that's rapidly changed in the past year. I'm
             | sitting after this year and I'm a little miffed about it.
             | Oh well.
        
               | kennyadam wrote:
               | You're sitting?
        
           | patapong wrote:
           | Or you take a single picture of a group in front of a
           | monument, but cut it off. As I understand it you could find
           | pictures of the monument online, run the model, and have a
           | picture with the group and the entire monument.
           | 
           | Probably google can even do this automatically - I would not
           | be surprised if I get suggestions to fix images with cut off
           | buildings via Google Photos in the future! Would be so cool.
        
           | twism wrote:
           | From the leaks this may be coming to the Pixel 8
        
             | twism wrote:
             | https://www.theverge.com/2023/9/23/23886765/google-
             | pixel-8-p...
        
             | ChrisClark wrote:
             | Leaks? Wasn't this a launch feature in Google Photos, while
             | it was still Google+ Photos?
             | 
             | It was supposed to adjust eyes to open them if you took
             | multiple photos.
        
               | twism wrote:
               | That's "Top Shot" which is the entire frame. The feature
               | I'm referring to would adjust multiple faces in a frame
               | by selecting the same faces/sections from different
               | frames to a single target frame
        
               | ChrisClark wrote:
               | I swear that's what was announced, but I assume you're
               | right because you actually know the term Top Shot, and I
               | had no memory of that.
               | 
               | So your memory is probably better than mine. :)
               | 
               | I just remember some demo of a family shot and it
               | automatically opening a little boys eyes by using another
               | photo. And another auto combining of images so that you
               | could take a lot of photos of a busy tourist place and
               | automatically remove all the people.
        
               | Grazester wrote:
               | Yeah there is a version of the smart fill available on
               | Pixel phones
        
       | crazygringo wrote:
       | Wow.
       | 
       | This actually feels like it could be an _incredibly_ valuable
       | post-production tool in film and TV, once they get it working
       | consistently across multiple frames.
       | 
       | Not only for more flexibility in "uncropping" after shooting
       | (there was a tree/wall in the way), but this could basically be
       | the holy grail solution for converting 4:3 to widescreen without
       | cutting off content on the top and bottom.
        
         | waynenilsen wrote:
         | removing the cameraman from the shot is probably pretty close
         | to the top of the list also
        
           | Gabrys1 wrote:
           | Initially I read the above as removing the cameramen from the
           | process of taking photos (which is also where this is going)
        
           | Wistar wrote:
           | ... especially on highly reflective subject surfaces such as
           | cars.
        
         | markjpT wrote:
         | [flagged]
        
         | emodendroket wrote:
         | I can see it working great for some stuff but wouldn't you
         | ultimately face the issue with more artistic work that the
         | framing might not be very good if just artificially extending.
        
           | crazygringo wrote:
           | It definitely needs to be applied judiciously on a shot-by-
           | shot basis.
           | 
           | There have been quite a few 4:3-to-widescreen conversions
           | that were done using the original film that was actually shot
           | in widescreen and cropped for TV.
           | 
           | Sometimes, the wider shot makes perfect sense. Sometimes,
           | they keep the original cropped one but cut off top/bottom.
           | Sometimes it's a combination of the two. It all depends on
           | what's being framed -- two people in a car usually benefits
           | from cropping (nobody needs the bottom third of the frame
           | occupied by the car's hood), while a close-up on someone's
           | face usually benefits from extending the sides (otherwise
           | it's an uncomfortable mega-close up that cuts off their
           | mouth).
           | 
           | But having the flexibility to extend horizontally gives you
           | the artistic possibilities.
        
         | miahwilde wrote:
         | wow x2. You're right video is where this is really cool. Take
         | enough video of a scene and you can then create most any photo
         | from any angle in it.
        
         | qingcharles wrote:
         | I already use Photoshop Generative Fill for uncropping videos,
         | but it only works for fixed camera shots. Photoshop just added
         | feature where you can just drag the video file in and do the
         | uncrop in one step.
         | 
         | The problem I'm solving is converting videos from widescreen to
         | vertical and sometimes you need some extra height.
        
           | kennyadam wrote:
           | Mind if I ask why you'd need to do that? It's a huge amount
           | of if the frame being generated artificially, especially if
           | you're talking cinema aspect ratio wide-screen.
        
             | qingcharles wrote:
             | If you're trying to convert widescreen content so it looks
             | good on TikTok, Reels, Shorts, then for the most part you
             | can crop a vertical chunk from the centre of the frame, and
             | pan if necessary to keep the action in frame. Sometimes
             | though the shot is too wide and you can't crop that
             | vertical chunk out, so you have to crop it as vertical as
             | you can and then add something to the top and bottom to
             | fill out the frame, otherwise you have a whole shot that
             | isn't in vertical and it breaks the flow of the video.
        
           | jiggawatts wrote:
           | > widescreen to vertical
           | 
           | You're a monster.
        
             | qingcharles wrote:
             | I know. What would my parents think of what has become of
             | their son..? Sorry, Mum and Dad! o_O
        
       | anigbrowl wrote:
       | Pro: a cool and useful looking technology
       | 
       | Con: it's from Google so forget about trying it yourself any time
       | soon
       | 
       | I used to be a huge supporter of Google's products, now the name
       | is an instant red flag.
        
       | corndoge wrote:
       | this page consistently crashes chrome on iOS
        
       | cryptoz wrote:
       | So is the weather just hallucinated then? We're just making up
       | memories and calling them real? And advertising this blatently,
       | called rainy days sunny and sunny days rainy? My god I hate this
       | so much.
       | 
       | Not even a discussion about if this might be harmful or what the
       | risks are or anything, just plain old "THIS FAKE MOMENT WAS REAL
       | AND YOU'LL BELIEVE IT"?!
       | 
       | I really have a hard time with this. Wow I'm upset, more than I
       | expected. The tech is fine yeah but the marketing is just deeply
       | upsetting.
        
         | dymk wrote:
         | > We're just making up memories and calling them real?
         | 
         | This has always been the case, you just don't remember it, and
         | the (human) hallucinated details are usually just not important
         | enough to care about.
        
       | awruko wrote:
       | this is crashing (black screen) my firefox on iphone
        
       | [deleted]
        
       | dwallin wrote:
       | Seems like the real utility of this technique will be as a way to
       | vastly improve the temporal stability with a variety of
       | generative video techniques. For example, if you are trying to
       | use a video as a base for a new generative video: Take the first
       | frame of your video and run it through SD with the control net of
       | your choice. Then take that initial image and run it through this
       | process to generate a new base model and then use that to
       | generate your second frame. Now you can use that second frame to
       | feed back into your model and rinse and repeat, always using the
       | past few frames to inform the latest.
        
         | goodmachine wrote:
         | That makes sense. If I understand correctly, this 'loopback'
         | technique is being used below as you describe. Alarming video,
         | btw.
         | 
         | https://www.reddit.com/r/StableDiffusion/comments/16uqqrh/ho...
        
       | buildbot wrote:
       | Cool tech as others have said, but of course, for thee but not
       | for me with Google, unless I missed a link to a GitHub repo.
       | (That's why OpenAI is called OpenAI - not open source, but at
       | least open access!)
        
       | KETpXDDzR wrote:
       | Does anyone else notice that the example images they provided
       | look like they included their test data in the training set?
       | E.g., the picture with the couch where they cut out the dog in
       | it. How should the network know that there was a dog on the
       | couch? The only explanation is: It knows the reference image.
        
         | amelius wrote:
         | That's the idea, right?
         | 
         | You give it a bunch of reference images, then another image
         | with some rectangle removed, and it will fill in the rectangle
         | with information from the reference image.
        
       | sergiotapia wrote:
       | This is what will make the Pixel compelling.
       | 
       | My wife and I have been using the Pixel phones since Pixel 6 and
       | we love the camera. Great pictures! But the best features are
       | google photos, auto-tagging, recommending collages, walking down
       | memory lane.
       | 
       | Then you can magic erase tourists from pictures and pic a better
       | shot from a picture you took on the fly....
       | 
       | You add this "authentic image completion" to my kids pics, and
       | it's game over...
       | 
       | I want this on my Pixel 8 asap!
        
         | ehsankia wrote:
         | The demo of the new upcoming Magic Editor they gave at I/O was
         | quite magical.
         | 
         | https://www.youtube.com/watch?v=-a583U3Sw44
         | 
         | There's also leaks showing another feature where you can
         | individually swap every person's face to get the perfect photo:
         | 
         | https://www.ign.com/articles/google-pixel-8-leaked-video-ai-...
         | 
         | I definitely agree, Pixel has been at the forefront of
         | computation photography and editing since its inception. Things
         | like night photography that we take for granted now, I remember
         | when Pixel 2 first introduced it and it was honestly mind
         | blowing. this use of computation photography and editing that
        
           | tinyhouse wrote:
           | What's so magical about that I/O? I get the point of
           | improving the quality of a picture. But editing the picture
           | so that it includes things that didn't really happen... why
           | even care besides trying to impress others?
        
       | simoneau wrote:
       | Me: Facebook AI, please post an entry about my vacation on Cape
       | Cod and create a bunch of photos to go with it.
       | 
       | Facebook: Great. I'd be happy to. Any more detail you'd like to
       | add?
       | 
       | Me: Make us look attractive. Show that we're a having a great
       | time. Also, we went to see the Chatham Lighthouse.
       | 
       | Facebook: OK, done!
       | 
       | ...
       | 
       | Facebook: You've received 48 likes. Your mother would like to
       | know if you had any salt water taffy.
       | 
       | Me: Yes, and please create a picture of my oldest daughter having
       | trouble chewing it.
       | 
       | Facebook: Done.
        
         | derefr wrote:
         | When you think about it, the only thing that's weird about this
         | hypothetical conversation is the context of it being about
         | (purported) photographs.
         | 
         | We expect images that _look like photographs_ -- at least when
         | taken by amateurs -- to be the result of a documentary process,
         | rather than an artistic one. They might be slightly filtered or
         | airbrushed, but they won 't be put together from whole cloth.
         | 
         | But amateur photography is actually the outlier, in the history
         | of "capturing memories"!
         | 
         | If you imagine yourself before the invention of photography,
         | describing your vacation to an illustrator you're commissioning
         | to create a some woodblock-print artwork for a set of christmas
         | cards you're having made up, the conversation you've laid out
         | here is exactly how things would go. They'd ask you to recount
         | what you saw, do a sketch, and then you'd give feedback and
         | iterate together with them, to get a final visual down that
         | reflects things _the way you remember them_ , rather than _the
         | way they were_ , per se.
        
           | jprete wrote:
           | This is an interesting point. Usually people claim technology
           | goes inexorably forward, yet here we are, merrily destroying
           | trust in the most objective method we have to record the
           | past!
        
             | pbhjpbhj wrote:
             | Photographs haven't been able to be trusted since almost
             | the beginning. Trusted as an image of a real scene that is.
             | 
             | Indeed, people viewing photographs have always been able to
             | be manipulated by presentation as fact something that is
             | not true -- you dress up smart, in borrowed clothes, when
             | you're really poor; you stand with a person you don't know
             | to indicate association; you get photographed with a dead
             | person as if they're alive; you use a back drop or set; _et
             | cetera_.
        
               | jprete wrote:
               | These aren't even remotely comparable to AI photo
               | manipulation.
        
           | jayunit wrote:
           | https://web.archive.org/web/20140222103103/http://subterrane.
           | ..
        
         | seydor wrote:
         | You guys are very unambitious.
         | 
         | FB AI, make a series of posts about me climbing mount everest,
         | meeting dalai lama, curing cancer, bringing peace to ukraine,
         | changing my name to Melon Tusk, announcing running for
         | president and adopting a dog named Molly
        
           | toyg wrote:
           | But see, that's the sort of thing that would give it away.
           | 
           | You got to shoot for something just attainable enough to
           | sound credible, while still being at the "enviable" end of
           | the spectrum.
           | 
           | "FB AI, make a series of pictures of my first 3 months at
           | Goldman Sachs in 2021. Include me shaking hands with the VP
           | of software as I receive a productivity award for making them
           | $1m in a week. Include a group photo of me and 12 other
           | people (all C execs and my VP must be there). Crosspost all
           | to LinkedIn, with notifications muted."
           | 
           |  _" Ok done"_
           | 
           | "ChatGPT, take my existing CV and replace entries from 2021
           | onwards with a job as Head of Performance Monitoring at
           | Goldman Sachs, reporting to VP of software. Include several
           | projects with direct CEO and CFO involvement. Crosspost
           | changes to LinkedIn."
           | 
           |  _" Ok done"_
           | 
           | ... and now I can go job-hunting.
        
             | seydor wrote:
             | I can see AI Consulting to be the next incarnation of
             | social media expert
        
         | y-curious wrote:
         | Incredible. Man, am I going to be telling my grandkids about a
         | time when you could believe your eyes and ears on the internet.
        
           | DaiPlusPlus wrote:
           | What if we're already living in the future, and everything
           | we're experiencing right-now is being AI generated?
           | 
           | ...that, and other thoughts I have while baked.
        
         | ShakataGaNai wrote:
         | Sounds like the plot line to an episode of Black Mirror, but
         | also something that is far too likely to happen.
        
           | simoneau wrote:
           | me: Facebook AI, please post a tender moment between me and
           | my father when I was a boy. Include some photos.
           | 
           | Facebook: I'd be happy to. Are there any more details you'd
           | like to include?
           | 
           | me: Please show how he didn't understand me at first, but
           | then he looks at me and starts crying with love and regret.
           | 
           | Facebook: Done. Your relationship with your father must have
           | been deeply fulfilling.
        
           | ormax3 wrote:
           | https://petapixel.com/2022/12/14/man-fakes-an-entire-
           | month-o...
        
             | anticristi wrote:
             | This is the weirdest video I ever watched. It's like Black
             | Mirror ... but in real life ... and a somewhat happy
             | ending.
        
       | brap wrote:
       | For the last 2-3 years, on an almost weekly basis, I am blown
       | away by the progress made in AI. Huge steps forward. It actually
       | happened twice in the last 24 hours alone.
       | 
       | Where will we be 10 years from now? 50?
        
         | Heidaradar wrote:
         | What was the second time in the last 24 hours?
        
           | brap wrote:
           | https://youtu.be/MVYrJJNdrEg
        
             | pbhjpbhj wrote:
             | Lex Friedman's 3D realistic avatar interviewing Mark
             | Zuckerberg in a generated space (two floating heads).
             | 
             | Interesting to be how it illustrates philosophical
             | questions on the nature of reality, the projection of
             | personality, the 'problem of other minds', and such.
        
       | js4ever wrote:
       | ohhh Another research paper from google that will leads nowhere
        
       | jawns wrote:
       | There's definitely value in providing this functionality for
       | photographs taken in the present.
       | 
       | But I think the real value -- and this is definitely in Google's
       | favor -- is providing this functionality for photos you have
       | taken in the past.
       | 
       | I have probably 30K+ photos in Google Photos that capture moments
       | from the past 15 years. There are quite a lot of them where I've
       | taken multiple shots of the same scene in quick succession, and
       | it would be fairly straightforward for Google to detect such
       | groupings and apply the technique to produce synthesized pictures
       | that are better than the originals. It already does something
       | similar for photo collages and "best in a series of rapid shots."
       | They surface without my having to do anything.
        
         | thesuavefactor wrote:
         | Every picture is a picture from the past though
        
           | ekianjo wrote:
           | Not the pictures where you age people artificially
        
           | jawns wrote:
           | Philosophically, yes. But some photo-editing techniques rely
           | on data that is not backfillable and must be recorded at
           | capture time. And even in cases where there is no functional
           | impediment to applying it against historical photos,
           | sometimes there is product gatekeeping to contend with.
        
           | royaltheartist wrote:
           | Oh yeah, what about this old Kodak I found in my grandpa's
           | attic that prints pictures showing how people are going to
           | die?
        
             | chii wrote:
             | but how did you know it wasnt a coincidence that the
             | picture depicted a similar scene in the past?
        
           | parineum wrote:
           | Here's a picture of me in the future.
        
             | miohtama wrote:
             | John Titor, is that you?
        
               | ortusdux wrote:
               | No, it's Mitch Hedberg.
        
               | thejazzman wrote:
               | I had an ant farm. They didn't grow shit!
        
             | drewbeck wrote:
             | Where you get that camera at??
        
           | amelius wrote:
           | Every state machine is bound to cycle at some point, even if
           | it has the size of the universe.
        
             | flanked-evergl wrote:
             | This is not true, its very trivial to design a state
             | machine that won't cycle.
        
               | amelius wrote:
               | Sorry, forgot to add that it should be reversible, like
               | the laws of physics.
        
           | makapuf wrote:
           | Every _existing_ pictures are.
        
             | positus wrote:
             | If it hasn't been taken/made/captured yet, it isn't a
             | picture. It's just the potential for one.
        
         | BoppreH wrote:
         | That's exactly why I've been keeping all "duplicates" in my
         | photo collections.
         | 
         | They do take up a lot of space, and just today I asked in
         | photo.stackexchange for backup compression techniques that can
         | exploit inter-image similarities:
         | https://photo.stackexchange.com/questions/132609/backup-comp...
        
           | syntaxfree wrote:
           | Suggestion: stack the images vertically or horizontally.
           | Frequency spectrum compression schemes like JPG will see the
           | similarity in the fine details.
        
             | bayesianbot wrote:
             | I got really good compression using this technique with
             | JPEG XL, I'm sure there's even a good reason why it works
             | so well but it's been a long time and I don't seem to
             | remember why.
        
             | bondarchuk wrote:
             | > _in the fine details_
             | 
             | Could it be possible that jpg also exploits the repetition
             | at the wavelength of the width of a single picture, so to
             | say? E.g. 4 pictures side-by-side with the same black dot
             | in the center, can all 4 dots be encoded with a single sine
             | wave (simplifying a lot here..) that has peaks at each dot?
        
           | bick_nyers wrote:
           | Tiled/stacked approach as others mention is good, and
           | probably the best approach. Could also try doing an
           | uncompressed format (even just .png uncompressed) or
           | something simple like RLE then 7zip them together since 7zip
           | is the only archive format that does inter-file (as opposed
           | to intra-file) compression as far as I am aware.
           | 
           | Unfortunately lossless video compression won't help here as
           | it will compress frames individually for lossless.
        
             | adrianN wrote:
             | Inter file compression has been solved ever since tar|gz
        
               | beagle3 wrote:
               | Not so. Gzip's window is very small - 32K in the original
               | gzip iirc, which meant even identical copies of a 33KB
               | file would bot help each other.
               | 
               | Iirc it was Bzip2 that bumped that up to 1MB, and there
               | are now compressors with larger windows - but files have
               | also grown, it's not a solved problem for compression
               | utilities.
               | 
               | It is solved for backup - but, reatic, and a few others
               | will do that across a backup set with no "window size"
               | limit.
               | 
               | .... And all of that is only true for lossless, which
               | does not include images or video.
        
               | danielheath wrote:
               | Not even remotely an efficient scheme for images or
               | video.
        
               | tehsauce wrote:
               | That's for lossless compression, i think there's special
               | opportunities for multi image lossy
        
           | RockRobotRock wrote:
           | Stupid question. Would a block based deduplicating file
           | system solve this?
        
           | randyrand wrote:
           | most duplicates are from the same vantage point. these are
           | not. i.e. you don't need to keep them all.
        
             | beagle3 wrote:
             | Those have been used for denouncing and super resolution
             | for 30 years now - they are not useless. And storage is
             | cheap, just keep them all.
        
               | beagle3 wrote:
               | That was supposed to be denoising, not denouncing, DYAC.
               | Just noticed, too late to Edit Now.
        
         | fenomas wrote:
         | > ..fairly straightforward for Google to detect such groupings
         | and apply the technique to produce synthesized pictures that
         | are better than the originals.
         | 
         | Wouldn't an operation like this require some kind of fine-
         | tuning? Or do diffusion models have a way of using images as
         | context, the way one would provide context to an LLM?
        
           | sangnoir wrote:
           | I think simpler algorithms (e.g. image histograms) can get
           | you a long way. Regardless of the mechanism, Google Photos
           | already has the capability to detect similar images, which is
           | used to generate animated gifs.
        
       | Workaccount2 wrote:
       | Google might as well just be making up tech considering none of
       | this stuff ever gets released.
        
         | Grazester wrote:
         | Ehh this stuff gets put to use on their pixel phones
        
         | js4ever wrote:
         | Agreed, I also suspect this. Since they don't release anything
         | most of their "fantastic" papers are probably just BS made to
         | let people think they are still relevant
        
       | datameta wrote:
       | I think using allusions to realism with AI is a dangerous road to
       | start out on.
        
       | henriquez wrote:
       | Hasn't something like this been around for a year or so to
       | "decensor" hentai pics?
        
       | drcode wrote:
       | When will they re-release all the old Star Trek TV shows in 1080p
       | resolution and 16:9 aspect ratio?
        
         | ShakataGaNai wrote:
         | There are already applications like
         | https://www.topazlabs.com/topaz-video-ai and
         | https://tensorpix.ai/ -- So it doesn't seem unreasonable that
         | some of these deep learning models could upscale all these old
         | TV episodes to at least 4k.
         | 
         | I'd love to see a combo of this Google tech and AI upscaling do
         | the same for Babylon 5. They had shot the actors in widescreen
         | format, but the CGI spaceships were only rendered in 4:3 and
         | the files have been lost.
        
         | dragonwriter wrote:
         | This requires other pictures of the environment to use to infer
         | what should fill in the gaps, which will not exist for every
         | shot in those series. (TOS and TNG were already rereleased in
         | 1080p, though.) I suppose you could use outpainting to
         | _construct_ the rest of the scene in one frame, and use that as
         | the reference for other frames in the same shot.
        
           | pbhjpbhj wrote:
           | A lot of the shots are on the same set, so you'd want the
           | system to use a whole series (season) as samples.
        
       | andrewprock wrote:
       | I suspect this will do a pretty good job at defeating watermarks.
        
       | gromneer wrote:
       | So "computer, enhance!" is now real?
        
       ___________________________________________________________________
       (page generated 2023-09-30 23:01 UTC)