[HN Gopher] Image Editing with Gaussian Splatting
___________________________________________________________________
Image Editing with Gaussian Splatting
Author : Hard_Space
Score : 208 points
Date : 2024-10-03 12:05 UTC (10 hours ago)
(HTM) web link (www.unite.ai)
(TXT) w3m dump (www.unite.ai)
| carlosjobim wrote:
| This is honestly genius. If I understand it correctly, instead of
| manipulating pixels, you turn any 2D image to a 3D model and then
| manipulate that model.
| papamena wrote:
| Yes! This really feels next-gen. After all, you're not actually
| interested in editing the 2D image, that's just an array of
| pixels, you want to edit what it represents. And this approach
| allows exactly that. Will be very interesting to see where this
| leads!
| riggsdk wrote:
| Or analogous of how you convert audio waveform data into
| frequencies with the fast-fourier transform, modify it in the
| frequency spectrum and convert it back into waveform again.
|
| Their examples does however only look a bit like distorted
| pixel data. The hands of the children seem to warp with the
| cloth, something they could have easily prevented.
|
| The cloth also looks very static despite it being animated,
| mainly because the shading of it never changes. If they had
| more information about the scene from multiple cameras (or
| perhaps inferred from the color data), the Gaussian splat
| would be more accurate and could even incorporate the altered
| angle/surface-normal after modification to cleverly simulate
| the changed specular highlights as it animates.
| anamexis wrote:
| The type of 3D model, Gaussian splatting, is also pretty neat
| and has been getting a lot of attention lately.
|
| There's been some good previous discussion on it here, like
| this one:
|
| Gaussian splatting is pretty cool
| https://news.ycombinator.com/item?id=37415478
| kranke155 wrote:
| Gaussian splatting is clearly going to change a lot of things
| in 3D assets, surprise to see it doing the same for 2D here.
| aDyslecticCrow wrote:
| Now THIS is the kind of shit I signed up for when AI started to
| become able to understand images properly: no shitty prompt-based
| generators that puke the most generalised version of every motif
| while draining the whole illustration industry from life.
|
| It's just good-ass tooling for making cool-ass art. Hell yes!
| Finally, there is some useful AI tooling that empowers artistic
| creativity rather than drains it.
|
| Pardon the French; I just think this is too awesome for normal
| words.
| squigz wrote:
| Generative art hasn't been what you're describing for a long
| time.
| tobr wrote:
| Hasn't been, or has been more than?
| squigz wrote:
| Has been more than.
| aDyslecticCrow wrote:
| You should say that to anyone within a creative field and be
| ready to carry a riot shield.
|
| Current generative models devalue artists' work without
| giving much, if anything, in return to help them make better
| work themselves. They're also trained on artists' work
| without compensation and abuse copyright laws that are too
| outdated for the technology.
|
| Learning to draw or paint these days and expecting to get
| paid for it is getting harder, as there will always be the
| counter, "but I can just generate it with AI at a 10th the
| speed".
|
| And don't start throwing random papers at me about AI art
| technology. I follow the field closely, and there is probably
| nothing major you can find that I haven't already seen.
| squigz wrote:
| I know people within creative fields who use it.
| visarga wrote:
| Your fault for poor prompting. If you don't provide distinctive
| prompts you can expect generalised answers
| aDyslecticCrow wrote:
| Let's say you want to rotate a cat's head in an existing
| picture by 5 degrees, as in the most basic example suggested
| here. No prompt will reliably do that.
|
| A mesh-transform tool and some brush touchups could. Or this
| tool could. Diffusion models are too uncontrollable, even in
| the most basic examples, to be meaningfully useful for
| artists.
| 8n4vidtmkvmk wrote:
| No, but you could rotate the head with traditional tools
| and then inpaint the background and touch up the neckline.
| It's not useless, just different.
| jsheard wrote:
| Yep, there's a similar refrain amongst 3D artists who are
| begging for AI tools which can effectively speed up the tedious
| parts of their current process like retopo and UV unwrapping,
| but all AI researchers keep giving them are tools which take a
| text prompt or image and try to automate their _entire_ process
| from start to finish, with very little control and invariably
| low quality results.
| aDyslecticCrow wrote:
| There have been some really nice AI tools to generate bump
| and diffusion maps from photos. So you could photograph a
| wall and get a detailed meshing texture with good light
| scatter and depth.
|
| That's the kind of awesome tech that got me into AI in the
| first place. But then prompt generators took over everything.
| jsheard wrote:
| Denoising is another good practical application of AI in
| 3D, you can save a lot of time without giving up any
| control by rendering an _almost_ noise-free image and then
| letting a neural network clean it up. Intel did some good
| work there with their open source OIDN library, but then
| genAI took over and now all the research focus is on trying
| to completely replace precise 3D rendering workflows with
| diffusion slot machines, rather than continuing to develop
| smarter AI denoisers.
| treyd wrote:
| Because the investors funding development of those AI tools
| don't want to try to empower artists and give them more
| freedom, they want to try to replace them.
| ChadNauseam wrote:
| The investors want to make money, and if they make a tool
| that is usable by more people than just experienced 3D
| artists who are tired of retopologizing their models, that
| both empowers many more people and potentially makes them
| more money.
|
| Aside from that, it's impossible to tools replace artists.
| Did cameras replace painting? I'm sure they reduced the
| demand for paintings, but if you want to create art and
| paint is your chosen medium it has never been easier. If
| you want to create art and 3D models are your chosen
| medium, the existence of AI tools for 3D model generation
| from a prompt doesn't stop you. However, if you want to
| create a game and you need a 3D model of a rock or
| something, you're not trying to make "art" with that rock,
| you're trying to make a game and a 3D model is just
| something you need to do that.
| doe_eyes wrote:
| There's a ton of room for using today's ML techniques to
| greatly simplify photo editing. The problem is, these are not
| billion dollar ideas. You're not gonna raise a lot of money at
| crazy valuations by proposing to build a tool for relighting
| scenes or removing unwanted to objects from a photo. Especially
| since there is a good chance that Google, Apple, or Adobe are
| going to just borrow your idea if it pans out.
|
| On the other hand, you can raise a lot of money if you promise
| to render an entire industry or an entire class of human labor
| obsolete.
|
| The end result is that far fewer people are working on ML-based
| dust or noise removal than on tools that are generating made-up
| images or videos from scratch.
| CaptainFever wrote:
| I share your excitement for this tool that assists artists.
| However, I don't share the same disdain for prompt generators.
|
| I find it enlightening to view it in the context of coding.
|
| GitHub Copilot assists programmers, while ChatGPT replaces the
| entire process. There are pros and cons though:
|
| GitHub Copilot is hard to use for non-programmers, but can be
| used to assist in the creation of complex programs.
|
| ChatGPT is easy to use for non-programmers, but is usually
| restricted to making simple scripts.
|
| However, this doesn't mean that ChatGPT is useless for
| professional programmers either, if you just need to make
| something simple.
|
| I think a similar dynamic happens in art. Both types of tools
| are awesome, they're just for different demographics and have
| different limitations.
|
| For example, using the coding analogy: MidJourney is like
| ChatGPT. Easy to use, but hard to control. Good for random
| people. InvokeAI, Generative Fill and this new tool is like
| Copilot. Hard to use for non-artists, but easier to control and
| customise. Good for artists.
|
| However, I _do_ find it frustrating how most of the funding in
| AI art tools goes towards the easy-to-use side, instead of the
| easy-to-control side (this doesn 't seem to be shared by
| coding, where Copilot is more well-developed than ChatGPT
| coding). More funding and development to the easy-to-control
| type would be very welcome indeed!
|
| (Note, ControlNet is probably a good example as easy-to-
| control. There's a very high skill ceiling in using Stable
| Diffusion right now.)
| aDyslecticCrow wrote:
| Good analogy. Yes, controllability is severely lacking, which
| is what makes diffusion models a very bad tool for artists.
| The current tools, even Photoshop's best attempt to implement
| them as a tool (smart infill), are situational at best.
| Artists need controllable specialized tools that simplify
| annoying operations, not prompt generators.
|
| As a programmer, I find copilot a pretty decent tool, thanks
| to its good controllability. ChatGPT is less so, but it is
| decent for finding the right keywords or libraries i can look
| up later.
| TheRealPomax wrote:
| Except this is explicitly not AI, nor is it even tangentially
| _related_ to AI. This is a normal graphics algorithm, the kind
| you get from really smart people working on render-pipeline
| maths.
| aDyslecticCrow wrote:
| > nor is it even tangentially related to AI
|
| It's not a deep neural network, but it's a machine learning
| model. In very simple terms, it minimizes a loss from
| refining an estimated mesh--about as much machine learning as
| old-school KNN or SVM.
|
| AI means nothing as a word; it is basically as descriptive as
| "smart" or "complicated". But yes, it's a very clever
| algorithm invented by clever people that is finding some nice
| applications.
| doctorpangloss wrote:
| When a foreground object is moved, how are the newly visible
| contents of the background filled?
| aDyslecticCrow wrote:
| It probably isn't.
|
| The most logical use of this is to replace mesh-transform tools
| in Photoshop or Adobe Illustrator. In this case, you probably
| work with a transparent map anyway.
| doctorpangloss wrote:
| Why do gaussian splats benefit you for mesh transform
| applications? Name one, and think deeply about what is going
| on. The applications are generally non-physical
| transformations, so having a physical representation is
| worse, not better; and then, the weaknesses are almost always
| interacting with foreground versus background separation.
|
| Another POV is, well generative AI solves the issue I am
| describing, which should question why these guys are so
| emphatic about their thing not interacting with the
| generative AI thing. If they are not interested in the best
| technical solutions, what do they bring to the table besides
| vibes, and how would they compete against even vibesier
| vibes?
| aDyslecticCrow wrote:
| Mesh transform is extensively used to create animations and
| warping perspectives. The most useful kind of warping is
| emulating perspective and rotation. Gaussian splats allow
| more intelligent warping in perspective without manually
| moving every vertex by eye.
|
| Foreground-background separation is entirely uninteresting.
| Masking manually is relatively easy, and there are good
| semi-intelligent tools that make it painless. Sure, it's a
| major topic discussed within AI papers for some reason, but
| from an artist's perspective, it doesn't matter much.
| Masking out from the backing is generally step one in any
| image manipulation process, so why is that a weakness?
| vessenes wrote:
| The demos show either totally internal modifications (bouncing
| blanket changing shape / statue cheeks changing) or isolated
| with white background images that have been clipped out. Based
| on the description of how they generate the splats, I think
| you'd auto select the item out of the background, do this with
| it, then paste it back.
|
| The splatting process uses a pretty interesting idea, which is
| to imagine two cameras, one the current "view" of the image,
| the other one 180 degrees opposite looking back, but at a
| "flat" mirror image of the front. This is going to constrain
| the splats away from having weird rando shapes. You will
| emphatically _not_ get the ability to rotate something a long a
| vertical axis here, (e.g. "let me just see a little more of
| that statue's other side"). You will instead get a nice method
| to deform / rearrange.
| zokier wrote:
| Isn't it quite a leap to go from single image to usable 3DGS
| model? The editing part seems relatively minor step afterwards. I
| thought that 3DGS typically required multiple viewpoints, like
| photogrammetry.
| chamanbuga wrote:
| This is what I initially thought, however, I have already
| witnessed working demoes of 3DGS when using a single viewpoint,
| but armed with additional auxiliary data that is contextual
| relevant to the subject.
| vessenes wrote:
| It's not "real" 3D -- the model doesn't infer anything about
| unseen portions of the image. They get 3D-embedded splats out
| of their pipeline, and then can do cool things with them. But
| those splats represent a 2D image, without inferring (literally
| or figuratively) anything about hidden parts of the image.
| dheera wrote:
| Yeah exactly, this page doesn't explain what's going on at all.
|
| It says it uses a mirror image to do a Gaussian splat. How does
| that infer any kind of 3D geometry? An image and its mirror are
| explainable by a simple plane and that's probably what the
| splat will converge to if given only those 2 images.
| pwillia7 wrote:
| Maybe nERF? https://www.matthewtancik.com/nerf
| chamanbuga wrote:
| I learned about 3D Gaussian Splatting from the research team at
| work just 2 weeks ago, and they demoed some incredible use cases.
| This tech will definitely become mainstream in camera
| technologies.
| spookie wrote:
| Having some sort of fast camera view position and orientation
| computation with colmap + initial point prediction + gaussian
| splatting for 5 minutes + cloudcompare normal estimation and 3d
| recon wields some incredible results.
|
| Much better than nerf in my experience. There's however a need
| to clean the point cloud yourself and stuff like that.
| TheRealPomax wrote:
| These underlines make reading the text pretty difficult, it might
| be worth making the links a little less prominent to aid
| legibility.
| squidsoup wrote:
| I've been exploring some creative applications of Gaussian splats
| for photography/photogrammetry, which I think have an interesting
| aesthetic. The stills of flowers on my Instagram if anyone is
| interested: https://www.instagram.com/bayardrandel
| echelon wrote:
| These are great! What software do you use, and what does your
| pipeline look like?
|
| If you wanted to capture a full 3D scene, my experience with
| photogrammetry and NeRFs has been that it requires a
| tremendously large dataset that is meticulously captured. Are
| Gaussian splat tools more data efficient? How little data can
| you get away with using?
|
| What are the best open source Gaussian Splat tools for both
| building and presenting? Are there any that do web
| visualization particularly well?
|
| I might have to get back into this.
| squidsoup wrote:
| Thanks very much! I use Polycam on iOS for photogrammetry and
| generating Gaussian splats from stills. It seems to work
| remarkably well, but has a subscription fee (given there's
| processing on their servers this seems reasonable). Typically
| to build a splat model takes about 30-50 stills for good
| results, depending on the subject.
|
| The only open source tool I use in my workflow is
| CloudCompare (https://www.danielgm.net/cc/), for
| editing/cleaning point cloud data.
|
| For animation I primarily use Touch Designer which is a node
| based visual programming environment, exporting splats as
| point clouds, and Ableton/misc instruments for sound.
|
| No idea about web visualisation, but interesting idea!
___________________________________________________________________
(page generated 2024-10-03 23:00 UTC)