[HN Gopher] Image Editing with Gaussian Splatting
       ___________________________________________________________________
        
       Image Editing with Gaussian Splatting
        
       Author : Hard_Space
       Score  : 208 points
       Date   : 2024-10-03 12:05 UTC (10 hours ago)
        
 (HTM) web link (www.unite.ai)
 (TXT) w3m dump (www.unite.ai)
        
       | carlosjobim wrote:
       | This is honestly genius. If I understand it correctly, instead of
       | manipulating pixels, you turn any 2D image to a 3D model and then
       | manipulate that model.
        
         | papamena wrote:
         | Yes! This really feels next-gen. After all, you're not actually
         | interested in editing the 2D image, that's just an array of
         | pixels, you want to edit what it represents. And this approach
         | allows exactly that. Will be very interesting to see where this
         | leads!
        
           | riggsdk wrote:
           | Or analogous of how you convert audio waveform data into
           | frequencies with the fast-fourier transform, modify it in the
           | frequency spectrum and convert it back into waveform again.
           | 
           | Their examples does however only look a bit like distorted
           | pixel data. The hands of the children seem to warp with the
           | cloth, something they could have easily prevented.
           | 
           | The cloth also looks very static despite it being animated,
           | mainly because the shading of it never changes. If they had
           | more information about the scene from multiple cameras (or
           | perhaps inferred from the color data), the Gaussian splat
           | would be more accurate and could even incorporate the altered
           | angle/surface-normal after modification to cleverly simulate
           | the changed specular highlights as it animates.
        
         | anamexis wrote:
         | The type of 3D model, Gaussian splatting, is also pretty neat
         | and has been getting a lot of attention lately.
         | 
         | There's been some good previous discussion on it here, like
         | this one:
         | 
         | Gaussian splatting is pretty cool
         | https://news.ycombinator.com/item?id=37415478
        
           | kranke155 wrote:
           | Gaussian splatting is clearly going to change a lot of things
           | in 3D assets, surprise to see it doing the same for 2D here.
        
       | aDyslecticCrow wrote:
       | Now THIS is the kind of shit I signed up for when AI started to
       | become able to understand images properly: no shitty prompt-based
       | generators that puke the most generalised version of every motif
       | while draining the whole illustration industry from life.
       | 
       | It's just good-ass tooling for making cool-ass art. Hell yes!
       | Finally, there is some useful AI tooling that empowers artistic
       | creativity rather than drains it.
       | 
       | Pardon the French; I just think this is too awesome for normal
       | words.
        
         | squigz wrote:
         | Generative art hasn't been what you're describing for a long
         | time.
        
           | tobr wrote:
           | Hasn't been, or has been more than?
        
             | squigz wrote:
             | Has been more than.
        
           | aDyslecticCrow wrote:
           | You should say that to anyone within a creative field and be
           | ready to carry a riot shield.
           | 
           | Current generative models devalue artists' work without
           | giving much, if anything, in return to help them make better
           | work themselves. They're also trained on artists' work
           | without compensation and abuse copyright laws that are too
           | outdated for the technology.
           | 
           | Learning to draw or paint these days and expecting to get
           | paid for it is getting harder, as there will always be the
           | counter, "but I can just generate it with AI at a 10th the
           | speed".
           | 
           | And don't start throwing random papers at me about AI art
           | technology. I follow the field closely, and there is probably
           | nothing major you can find that I haven't already seen.
        
             | squigz wrote:
             | I know people within creative fields who use it.
        
         | visarga wrote:
         | Your fault for poor prompting. If you don't provide distinctive
         | prompts you can expect generalised answers
        
           | aDyslecticCrow wrote:
           | Let's say you want to rotate a cat's head in an existing
           | picture by 5 degrees, as in the most basic example suggested
           | here. No prompt will reliably do that.
           | 
           | A mesh-transform tool and some brush touchups could. Or this
           | tool could. Diffusion models are too uncontrollable, even in
           | the most basic examples, to be meaningfully useful for
           | artists.
        
             | 8n4vidtmkvmk wrote:
             | No, but you could rotate the head with traditional tools
             | and then inpaint the background and touch up the neckline.
             | It's not useless, just different.
        
         | jsheard wrote:
         | Yep, there's a similar refrain amongst 3D artists who are
         | begging for AI tools which can effectively speed up the tedious
         | parts of their current process like retopo and UV unwrapping,
         | but all AI researchers keep giving them are tools which take a
         | text prompt or image and try to automate their _entire_ process
         | from start to finish, with very little control and invariably
         | low quality results.
        
           | aDyslecticCrow wrote:
           | There have been some really nice AI tools to generate bump
           | and diffusion maps from photos. So you could photograph a
           | wall and get a detailed meshing texture with good light
           | scatter and depth.
           | 
           | That's the kind of awesome tech that got me into AI in the
           | first place. But then prompt generators took over everything.
        
             | jsheard wrote:
             | Denoising is another good practical application of AI in
             | 3D, you can save a lot of time without giving up any
             | control by rendering an _almost_ noise-free image and then
             | letting a neural network clean it up. Intel did some good
             | work there with their open source OIDN library, but then
             | genAI took over and now all the research focus is on trying
             | to completely replace precise 3D rendering workflows with
             | diffusion slot machines, rather than continuing to develop
             | smarter AI denoisers.
        
           | treyd wrote:
           | Because the investors funding development of those AI tools
           | don't want to try to empower artists and give them more
           | freedom, they want to try to replace them.
        
             | ChadNauseam wrote:
             | The investors want to make money, and if they make a tool
             | that is usable by more people than just experienced 3D
             | artists who are tired of retopologizing their models, that
             | both empowers many more people and potentially makes them
             | more money.
             | 
             | Aside from that, it's impossible to tools replace artists.
             | Did cameras replace painting? I'm sure they reduced the
             | demand for paintings, but if you want to create art and
             | paint is your chosen medium it has never been easier. If
             | you want to create art and 3D models are your chosen
             | medium, the existence of AI tools for 3D model generation
             | from a prompt doesn't stop you. However, if you want to
             | create a game and you need a 3D model of a rock or
             | something, you're not trying to make "art" with that rock,
             | you're trying to make a game and a 3D model is just
             | something you need to do that.
        
         | doe_eyes wrote:
         | There's a ton of room for using today's ML techniques to
         | greatly simplify photo editing. The problem is, these are not
         | billion dollar ideas. You're not gonna raise a lot of money at
         | crazy valuations by proposing to build a tool for relighting
         | scenes or removing unwanted to objects from a photo. Especially
         | since there is a good chance that Google, Apple, or Adobe are
         | going to just borrow your idea if it pans out.
         | 
         | On the other hand, you can raise a lot of money if you promise
         | to render an entire industry or an entire class of human labor
         | obsolete.
         | 
         | The end result is that far fewer people are working on ML-based
         | dust or noise removal than on tools that are generating made-up
         | images or videos from scratch.
        
         | CaptainFever wrote:
         | I share your excitement for this tool that assists artists.
         | However, I don't share the same disdain for prompt generators.
         | 
         | I find it enlightening to view it in the context of coding.
         | 
         | GitHub Copilot assists programmers, while ChatGPT replaces the
         | entire process. There are pros and cons though:
         | 
         | GitHub Copilot is hard to use for non-programmers, but can be
         | used to assist in the creation of complex programs.
         | 
         | ChatGPT is easy to use for non-programmers, but is usually
         | restricted to making simple scripts.
         | 
         | However, this doesn't mean that ChatGPT is useless for
         | professional programmers either, if you just need to make
         | something simple.
         | 
         | I think a similar dynamic happens in art. Both types of tools
         | are awesome, they're just for different demographics and have
         | different limitations.
         | 
         | For example, using the coding analogy: MidJourney is like
         | ChatGPT. Easy to use, but hard to control. Good for random
         | people. InvokeAI, Generative Fill and this new tool is like
         | Copilot. Hard to use for non-artists, but easier to control and
         | customise. Good for artists.
         | 
         | However, I _do_ find it frustrating how most of the funding in
         | AI art tools goes towards the easy-to-use side, instead of the
         | easy-to-control side (this doesn 't seem to be shared by
         | coding, where Copilot is more well-developed than ChatGPT
         | coding). More funding and development to the easy-to-control
         | type would be very welcome indeed!
         | 
         | (Note, ControlNet is probably a good example as easy-to-
         | control. There's a very high skill ceiling in using Stable
         | Diffusion right now.)
        
           | aDyslecticCrow wrote:
           | Good analogy. Yes, controllability is severely lacking, which
           | is what makes diffusion models a very bad tool for artists.
           | The current tools, even Photoshop's best attempt to implement
           | them as a tool (smart infill), are situational at best.
           | Artists need controllable specialized tools that simplify
           | annoying operations, not prompt generators.
           | 
           | As a programmer, I find copilot a pretty decent tool, thanks
           | to its good controllability. ChatGPT is less so, but it is
           | decent for finding the right keywords or libraries i can look
           | up later.
        
         | TheRealPomax wrote:
         | Except this is explicitly not AI, nor is it even tangentially
         | _related_ to AI. This is a normal graphics algorithm, the kind
         | you get from really smart people working on render-pipeline
         | maths.
        
           | aDyslecticCrow wrote:
           | > nor is it even tangentially related to AI
           | 
           | It's not a deep neural network, but it's a machine learning
           | model. In very simple terms, it minimizes a loss from
           | refining an estimated mesh--about as much machine learning as
           | old-school KNN or SVM.
           | 
           | AI means nothing as a word; it is basically as descriptive as
           | "smart" or "complicated". But yes, it's a very clever
           | algorithm invented by clever people that is finding some nice
           | applications.
        
       | doctorpangloss wrote:
       | When a foreground object is moved, how are the newly visible
       | contents of the background filled?
        
         | aDyslecticCrow wrote:
         | It probably isn't.
         | 
         | The most logical use of this is to replace mesh-transform tools
         | in Photoshop or Adobe Illustrator. In this case, you probably
         | work with a transparent map anyway.
        
           | doctorpangloss wrote:
           | Why do gaussian splats benefit you for mesh transform
           | applications? Name one, and think deeply about what is going
           | on. The applications are generally non-physical
           | transformations, so having a physical representation is
           | worse, not better; and then, the weaknesses are almost always
           | interacting with foreground versus background separation.
           | 
           | Another POV is, well generative AI solves the issue I am
           | describing, which should question why these guys are so
           | emphatic about their thing not interacting with the
           | generative AI thing. If they are not interested in the best
           | technical solutions, what do they bring to the table besides
           | vibes, and how would they compete against even vibesier
           | vibes?
        
             | aDyslecticCrow wrote:
             | Mesh transform is extensively used to create animations and
             | warping perspectives. The most useful kind of warping is
             | emulating perspective and rotation. Gaussian splats allow
             | more intelligent warping in perspective without manually
             | moving every vertex by eye.
             | 
             | Foreground-background separation is entirely uninteresting.
             | Masking manually is relatively easy, and there are good
             | semi-intelligent tools that make it painless. Sure, it's a
             | major topic discussed within AI papers for some reason, but
             | from an artist's perspective, it doesn't matter much.
             | Masking out from the backing is generally step one in any
             | image manipulation process, so why is that a weakness?
        
         | vessenes wrote:
         | The demos show either totally internal modifications (bouncing
         | blanket changing shape / statue cheeks changing) or isolated
         | with white background images that have been clipped out. Based
         | on the description of how they generate the splats, I think
         | you'd auto select the item out of the background, do this with
         | it, then paste it back.
         | 
         | The splatting process uses a pretty interesting idea, which is
         | to imagine two cameras, one the current "view" of the image,
         | the other one 180 degrees opposite looking back, but at a
         | "flat" mirror image of the front. This is going to constrain
         | the splats away from having weird rando shapes. You will
         | emphatically _not_ get the ability to rotate something a long a
         | vertical axis here, (e.g. "let me just see a little more of
         | that statue's other side"). You will instead get a nice method
         | to deform  / rearrange.
        
       | zokier wrote:
       | Isn't it quite a leap to go from single image to usable 3DGS
       | model? The editing part seems relatively minor step afterwards. I
       | thought that 3DGS typically required multiple viewpoints, like
       | photogrammetry.
        
         | chamanbuga wrote:
         | This is what I initially thought, however, I have already
         | witnessed working demoes of 3DGS when using a single viewpoint,
         | but armed with additional auxiliary data that is contextual
         | relevant to the subject.
        
         | vessenes wrote:
         | It's not "real" 3D -- the model doesn't infer anything about
         | unseen portions of the image. They get 3D-embedded splats out
         | of their pipeline, and then can do cool things with them. But
         | those splats represent a 2D image, without inferring (literally
         | or figuratively) anything about hidden parts of the image.
        
         | dheera wrote:
         | Yeah exactly, this page doesn't explain what's going on at all.
         | 
         | It says it uses a mirror image to do a Gaussian splat. How does
         | that infer any kind of 3D geometry? An image and its mirror are
         | explainable by a simple plane and that's probably what the
         | splat will converge to if given only those 2 images.
        
           | pwillia7 wrote:
           | Maybe nERF? https://www.matthewtancik.com/nerf
        
       | chamanbuga wrote:
       | I learned about 3D Gaussian Splatting from the research team at
       | work just 2 weeks ago, and they demoed some incredible use cases.
       | This tech will definitely become mainstream in camera
       | technologies.
        
         | spookie wrote:
         | Having some sort of fast camera view position and orientation
         | computation with colmap + initial point prediction + gaussian
         | splatting for 5 minutes + cloudcompare normal estimation and 3d
         | recon wields some incredible results.
         | 
         | Much better than nerf in my experience. There's however a need
         | to clean the point cloud yourself and stuff like that.
        
       | TheRealPomax wrote:
       | These underlines make reading the text pretty difficult, it might
       | be worth making the links a little less prominent to aid
       | legibility.
        
       | squidsoup wrote:
       | I've been exploring some creative applications of Gaussian splats
       | for photography/photogrammetry, which I think have an interesting
       | aesthetic. The stills of flowers on my Instagram if anyone is
       | interested: https://www.instagram.com/bayardrandel
        
         | echelon wrote:
         | These are great! What software do you use, and what does your
         | pipeline look like?
         | 
         | If you wanted to capture a full 3D scene, my experience with
         | photogrammetry and NeRFs has been that it requires a
         | tremendously large dataset that is meticulously captured. Are
         | Gaussian splat tools more data efficient? How little data can
         | you get away with using?
         | 
         | What are the best open source Gaussian Splat tools for both
         | building and presenting? Are there any that do web
         | visualization particularly well?
         | 
         | I might have to get back into this.
        
           | squidsoup wrote:
           | Thanks very much! I use Polycam on iOS for photogrammetry and
           | generating Gaussian splats from stills. It seems to work
           | remarkably well, but has a subscription fee (given there's
           | processing on their servers this seems reasonable). Typically
           | to build a splat model takes about 30-50 stills for good
           | results, depending on the subject.
           | 
           | The only open source tool I use in my workflow is
           | CloudCompare (https://www.danielgm.net/cc/), for
           | editing/cleaning point cloud data.
           | 
           | For animation I primarily use Touch Designer which is a node
           | based visual programming environment, exporting splats as
           | point clouds, and Ableton/misc instruments for sound.
           | 
           | No idea about web visualisation, but interesting idea!
        
       ___________________________________________________________________
       (page generated 2024-10-03 23:00 UTC)