[HN Gopher] Splatter Image: Ultra-Fast Single-View 3D Reconstruc...
       ___________________________________________________________________
        
       Splatter Image: Ultra-Fast Single-View 3D Reconstruction
        
       Author : heliophobicdude
       Score  : 140 points
       Date   : 2023-12-21 13:38 UTC (9 hours ago)
        
 (HTM) web link (szymanowiczs.github.io)
 (TXT) w3m dump (szymanowiczs.github.io)
        
       | rijx wrote:
       | Now we can finally turn Street View into a game world!
        
         | xnx wrote:
         | Waymo has done this for their simulations (a kind of game I
         | suppose): https://waymo.com/research/block-nerf/
        
           | speedgoose wrote:
           | Is there something similar than this, but with source code?
           | Papers with great results but without code are frustrating.
        
       | XorNot wrote:
       | I guess this is how you'd implement that thing in _Enemy Of The
       | State_ where they pan around a single-perspective camera view
       | (which I think doesn 't come across as absurd in the movie anyway
       | since the tech guys point out it's basically a clever
       | extrapolation).
        
       | lawlessone wrote:
       | Am I imagining this ,or somebody making a newer and faster one of
       | these every day?
       | 
       | I'm expecting Overwhelming Fast Splatter by January.
        
         | kridsdale1 wrote:
         | The innovation rate in Splats is astounding.
        
         | xnx wrote:
         | I have already named my residential dwelling optimized
         | splatter, "Splatterhouse".
        
       | tantalor wrote:
       | That "GT" method seems even better, we should just use that. /s
        
         | mft_ wrote:
         | Might I ask what that acronym stands for? :)
        
           | xnx wrote:
           | "Ground Truth" (i.e. real world, actual data)
        
             | mft_ wrote:
             | Thanks!
        
         | xnx wrote:
         | GT also always renders in real time!
        
         | cooper_ganglia wrote:
         | I didn't realize what GT stood for until I came across this
         | thread, I was confused why they weren't providing it's render
         | time results, hahaha
        
       | roflmaostc wrote:
       | Since it's based on 3D Gaussians in space, is there a way to
       | obtain sharp images? Inherently, Gaussian functions extent
       | infinitely, so images always look blurry. Don't they? Of course,
       | \sigma can be optimized to be small, but then it converges to
       | some point representation, doesn't it?
       | 
       | Maybe some CV/ML people can help me understanding.
        
         | dahart wrote:
         | Yes. The main way to keep the images sharp is to render the
         | models at near the same size & resolution that they were
         | captured, or _slightly_ smaller in size. It's the same thing as
         | zooming into an image- if you zoom in it gets blurry because
         | the filtered pixels get too big, the highest frequency in the
         | data is now zoom-factor pixels wide. If you zoom out, the
         | Gaussian splat images become sharper automatically (and
         | eventually you run into aliasing issues). The way to obtain
         | sharp images if you want to zoom in is to let the NN
         | hallucinate some high frequency details based on what it learns
         | about similar objects (or otherwise have external knowledge of
         | the likely geometry and material properties not captured in the
         | original image.)
         | 
         | The theoretical Gaussian function is infinite, but splat
         | rendering doesn't use infinite extent, and that's not really
         | the reason images look blurry, nor do they always look blurry.
         | (Lots of anti-aliasing pixel filters have theoretically
         | infinite extent, but that doesn't matter in practice, i.e.,
         | what matters is only sigma, not extent, provided the finite
         | extent doesn't cut off too early.) There is a near optimal
         | range of Gaussian sizes for image sharpness that will antialias
         | without overblurring. The capture / optimization process of
         | opaque objects will probably produce Gaussians that are near
         | this optimal size at the smallest, so if you render them back
         | at the same size, it will stay near the optimal range.
         | Generally, the optimizers we have so far tend to blur a little
         | bit, which is why rendering the reconstruction slightly smaller
         | than the captured image currently tends to sharpen things.
        
         | heliophobicdude wrote:
         | Hard edges are a challenge right now.
         | 
         | Thinking in 2D for a second, to get a nice crispy edge, you
         | need a long and opaque splat to mark the boundary. Sometimes
         | the long splat could wisp off leaving fuzzy artifacts.
         | 
         | Take this example: https://www.shadertoy.com/view/dtSfDD
         | 
         | Peyman Milanfar [1] suggested using bump functions instead.
         | Bump functions would allow you to specify cut off intervals but
         | still make the whole function smooth and continuous (good for
         | my gradient optimization freaks)
         | 
         | 1: https://x.com/docmilanfar/status/1719584410348204233
        
         | karmakaze wrote:
         | Not working in the field I don't know relevance, but I thought
         | that the "4D Gaussian Splatting"[0] looked like it makes great
         | efficiency gains.
         | 
         | [0] https://news.ycombinator.com/item?id=37905601
        
       | alkonaut wrote:
       | Wouldn't it be more useful to generate a vector model than a "3d
       | image" voxel/radiance field/splats/whatever it's called? Apart
       | from the use case "I want to spin the thing or walk around in it"
       | they feel like they are of limited use?
       | 
       | Unlike say a crude model of a fire hydrant which you could throw
       | into a game or whatever. If the model is fed some more
       | constraints/assumptions? I think I saw some recent paper that did
       | generate meshes now instead of pixels.
        
         | tomp wrote:
         | maybe check this out - it's based on NERFs, not Gaussian
         | Splatters, but might be applicable
         | 
         | https://research.nvidia.com/labs/toronto-ai/adaptive-shells/
        
         | andybak wrote:
         | See my comment above about meshes. Games should adapt to new
         | representations, not the other way round.
         | 
         | What do games need? Relighting, animation, collision. All of
         | these can be done with non-mesh objects. At the moment it's all
         | in it's infancy compared to conventional 3d but it won't stay
         | that way for long.
        
       | catapart wrote:
       | So if I'm tracking the progress correctly, now we should be able
       | to do: Single Image -> Gaussian Splats -> Object Identification
       | -> [Nearest Known Object | Algo-based shell] Mesh Generation ->
       | Use-Case-Based Retopology -> Style-Trained Mesh Transformation
       | 
       | Which would produce a new mesh in the style of your other meshes,
       | based on a single photograph of a real-world object.
       | 
       | ...and, at this speed, you could do that as a real-time(ish)
       | import into a running application/game.
       | 
       | Gotta say, I'm looking forward to someone putting these puzzle
       | pieces together! But it really does feel like if we wait another
       | month, there might be some new AI that shrinks that pipeline by
       | another one or two steps! It's an exhausting time to be excited!
        
         | andybak wrote:
         | I do wonder if we need to stop relying on meshes entirely.
         | NeRFs and splats have potentially much richer representations
         | of material and lighting response. Current hardware is very
         | focused on triangles and bitmaps but GPUs are versatile beasts.
        
           | efnx wrote:
           | I don't think the engines will switch their happy-paths to
           | splats until artists have the proper tools to create assets
           | with splats. As cool as generating splats with AI is, the
           | assets in a AAA game must fit the art directors vision, which
           | means having artists in the loop.
           | 
           | I feel like the visual style of games will change as a result
           | of generative AI to be whatever style those AI models have a
           | hard time generating. Essentially the games that will stand
           | out will be truly original, art wise.
        
             | andybak wrote:
             | > I don't think the engines will switch their happy-paths
             | to splats until artists have the proper tools to create
             | assets with splats.
             | 
             | Oh - I agree and it's a bit chicken and egg. I'm not
             | expecting this shift to be quick (or even universal). But I
             | do feel the need to put the idea out there that meshes
             | might not be the be-all and end-all for games and other
             | spatial media.
        
             | a_t48 wrote:
             | Collision geometry is also generally triangles or other
             | primitives.
        
               | andybak wrote:
               | But only because we generally are starting with a mesh
               | and creating a low-res collision mesh from that reuses
               | existing tooling. Mesh colliders aren't terribly ideal.
               | You need a lot of triangles. SDFs can be a better choice
               | in some cases.
        
             | anigbrowl wrote:
             | I suspect a _lot_ of game /film artists would be very happy
             | to go back to sculpting physical objects and taking a few
             | photographs as opposed to building the models from scratch
             | in the computer.
        
           | lainga wrote:
           | How do you do collisions and shadowing? How is UV mapping
           | done?
        
             | andybak wrote:
             | Collisions could be handles separately (they already are -
             | you don't use the render mesh for collisions). Maybe a
             | separate mesh, maybe an SDF or similar.
             | 
             | UV mapping is a mesh thing. That's Stockholm Syndrome
             | talking. ;-)
        
       | teunispeters wrote:
       | For a change, [code] works, but [arXiv] link is not present. Have
       | to say this looks really interesting!
        
       | StreetChief wrote:
       | All I have to say is "ENHANCE!"
        
       | eurekin wrote:
       | For anybody wanting to take a look at the code, this time the
       | Github link does include it - it's not empty, which is typicaly
       | for those "too good to be true" publications
        
       | joosters wrote:
       | Probably a dumb question, but is this trained by the use of lots
       | of inputs of similar objects, or is it 'just' estimating by the
       | look of the input image?
       | 
       | Like, if you have an image of a car, viewed at an angle, you can
       | gauge the shape of the 3d object from the image itself. You could
       | then assume that the hidden side of the car is similar to the
       | side that you can see, and when you generate a 360 rotation
       | animation of it, it will look pretty good (cars being roughly
       | symmetrical). But if you gave it a flat image of a playing card,
       | just showing the face up side, how would it reconstruct the
       | reverse side? Would it infer it based on the front, or would it
       | 'know' from training data that playing cards have a very
       | different patterned back to them?
        
         | zellyn wrote:
         | I came here to ask this. The output was impressive to the point
         | of magic... until they showed whole grids full of fire hydrants
         | and teddy bear training data.
        
           | lamerose wrote:
           | Where do they show that?
        
       | amelius wrote:
       | This would be more powerful if you could feed it more input
       | images for a better result, if desired.
        
       | billconan wrote:
       | the paper link doesn't work for me. the correct link
       | https://arxiv.org/pdf/2312.13150.pdf
        
       | mk_stjames wrote:
       | Oof, the dependency tree on this.
       | 
       | It uses diff-gaussian-rasterization from the original gaussian
       | splatting implementation (which, is a linked submodule on the
       | git, so if you are trying to git clone that dependency remember
       | to use --recursive to actually download it).
       | 
       | But that is written in mostly pure CUDA.
       | 
       | That part is just used to display the resulting gaussian splatt'd
       | model, and there have been other cross-platform implementations
       | to render splats - there was even that web demo a few weeks ago,
       | that was using WebGL [0] - and if that was used as a display
       | output in place of the original implementation there is no reason
       | people couldn't use this on non-Nvidia hardware, I think.
       | 
       | edit: also device=cuda is hardcoded in the torch portions of the
       | training code (sigh!). This doesn't have to be the case. pytorch
       | could push this onto mps (metal) probably just fine.
       | 
       | [0] https://github.com/antimatter15/splat?tab=readme-ov-file
        
       | anigbrowl wrote:
       | This could get prove useful for autonomous navigation systems as
       | well.
        
       ___________________________________________________________________
       (page generated 2023-12-21 23:01 UTC)