[HN Gopher] Splatter Image: Ultra-Fast Single-View 3D Reconstruc...
___________________________________________________________________
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Author : heliophobicdude
Score : 140 points
Date : 2023-12-21 13:38 UTC (9 hours ago)
(HTM) web link (szymanowiczs.github.io)
(TXT) w3m dump (szymanowiczs.github.io)
| rijx wrote:
| Now we can finally turn Street View into a game world!
| xnx wrote:
| Waymo has done this for their simulations (a kind of game I
| suppose): https://waymo.com/research/block-nerf/
| speedgoose wrote:
| Is there something similar than this, but with source code?
| Papers with great results but without code are frustrating.
| XorNot wrote:
| I guess this is how you'd implement that thing in _Enemy Of The
| State_ where they pan around a single-perspective camera view
| (which I think doesn 't come across as absurd in the movie anyway
| since the tech guys point out it's basically a clever
| extrapolation).
| lawlessone wrote:
| Am I imagining this ,or somebody making a newer and faster one of
| these every day?
|
| I'm expecting Overwhelming Fast Splatter by January.
| kridsdale1 wrote:
| The innovation rate in Splats is astounding.
| xnx wrote:
| I have already named my residential dwelling optimized
| splatter, "Splatterhouse".
| tantalor wrote:
| That "GT" method seems even better, we should just use that. /s
| mft_ wrote:
| Might I ask what that acronym stands for? :)
| xnx wrote:
| "Ground Truth" (i.e. real world, actual data)
| mft_ wrote:
| Thanks!
| xnx wrote:
| GT also always renders in real time!
| cooper_ganglia wrote:
| I didn't realize what GT stood for until I came across this
| thread, I was confused why they weren't providing it's render
| time results, hahaha
| roflmaostc wrote:
| Since it's based on 3D Gaussians in space, is there a way to
| obtain sharp images? Inherently, Gaussian functions extent
| infinitely, so images always look blurry. Don't they? Of course,
| \sigma can be optimized to be small, but then it converges to
| some point representation, doesn't it?
|
| Maybe some CV/ML people can help me understanding.
| dahart wrote:
| Yes. The main way to keep the images sharp is to render the
| models at near the same size & resolution that they were
| captured, or _slightly_ smaller in size. It's the same thing as
| zooming into an image- if you zoom in it gets blurry because
| the filtered pixels get too big, the highest frequency in the
| data is now zoom-factor pixels wide. If you zoom out, the
| Gaussian splat images become sharper automatically (and
| eventually you run into aliasing issues). The way to obtain
| sharp images if you want to zoom in is to let the NN
| hallucinate some high frequency details based on what it learns
| about similar objects (or otherwise have external knowledge of
| the likely geometry and material properties not captured in the
| original image.)
|
| The theoretical Gaussian function is infinite, but splat
| rendering doesn't use infinite extent, and that's not really
| the reason images look blurry, nor do they always look blurry.
| (Lots of anti-aliasing pixel filters have theoretically
| infinite extent, but that doesn't matter in practice, i.e.,
| what matters is only sigma, not extent, provided the finite
| extent doesn't cut off too early.) There is a near optimal
| range of Gaussian sizes for image sharpness that will antialias
| without overblurring. The capture / optimization process of
| opaque objects will probably produce Gaussians that are near
| this optimal size at the smallest, so if you render them back
| at the same size, it will stay near the optimal range.
| Generally, the optimizers we have so far tend to blur a little
| bit, which is why rendering the reconstruction slightly smaller
| than the captured image currently tends to sharpen things.
| heliophobicdude wrote:
| Hard edges are a challenge right now.
|
| Thinking in 2D for a second, to get a nice crispy edge, you
| need a long and opaque splat to mark the boundary. Sometimes
| the long splat could wisp off leaving fuzzy artifacts.
|
| Take this example: https://www.shadertoy.com/view/dtSfDD
|
| Peyman Milanfar [1] suggested using bump functions instead.
| Bump functions would allow you to specify cut off intervals but
| still make the whole function smooth and continuous (good for
| my gradient optimization freaks)
|
| 1: https://x.com/docmilanfar/status/1719584410348204233
| karmakaze wrote:
| Not working in the field I don't know relevance, but I thought
| that the "4D Gaussian Splatting"[0] looked like it makes great
| efficiency gains.
|
| [0] https://news.ycombinator.com/item?id=37905601
| alkonaut wrote:
| Wouldn't it be more useful to generate a vector model than a "3d
| image" voxel/radiance field/splats/whatever it's called? Apart
| from the use case "I want to spin the thing or walk around in it"
| they feel like they are of limited use?
|
| Unlike say a crude model of a fire hydrant which you could throw
| into a game or whatever. If the model is fed some more
| constraints/assumptions? I think I saw some recent paper that did
| generate meshes now instead of pixels.
| tomp wrote:
| maybe check this out - it's based on NERFs, not Gaussian
| Splatters, but might be applicable
|
| https://research.nvidia.com/labs/toronto-ai/adaptive-shells/
| andybak wrote:
| See my comment above about meshes. Games should adapt to new
| representations, not the other way round.
|
| What do games need? Relighting, animation, collision. All of
| these can be done with non-mesh objects. At the moment it's all
| in it's infancy compared to conventional 3d but it won't stay
| that way for long.
| catapart wrote:
| So if I'm tracking the progress correctly, now we should be able
| to do: Single Image -> Gaussian Splats -> Object Identification
| -> [Nearest Known Object | Algo-based shell] Mesh Generation ->
| Use-Case-Based Retopology -> Style-Trained Mesh Transformation
|
| Which would produce a new mesh in the style of your other meshes,
| based on a single photograph of a real-world object.
|
| ...and, at this speed, you could do that as a real-time(ish)
| import into a running application/game.
|
| Gotta say, I'm looking forward to someone putting these puzzle
| pieces together! But it really does feel like if we wait another
| month, there might be some new AI that shrinks that pipeline by
| another one or two steps! It's an exhausting time to be excited!
| andybak wrote:
| I do wonder if we need to stop relying on meshes entirely.
| NeRFs and splats have potentially much richer representations
| of material and lighting response. Current hardware is very
| focused on triangles and bitmaps but GPUs are versatile beasts.
| efnx wrote:
| I don't think the engines will switch their happy-paths to
| splats until artists have the proper tools to create assets
| with splats. As cool as generating splats with AI is, the
| assets in a AAA game must fit the art directors vision, which
| means having artists in the loop.
|
| I feel like the visual style of games will change as a result
| of generative AI to be whatever style those AI models have a
| hard time generating. Essentially the games that will stand
| out will be truly original, art wise.
| andybak wrote:
| > I don't think the engines will switch their happy-paths
| to splats until artists have the proper tools to create
| assets with splats.
|
| Oh - I agree and it's a bit chicken and egg. I'm not
| expecting this shift to be quick (or even universal). But I
| do feel the need to put the idea out there that meshes
| might not be the be-all and end-all for games and other
| spatial media.
| a_t48 wrote:
| Collision geometry is also generally triangles or other
| primitives.
| andybak wrote:
| But only because we generally are starting with a mesh
| and creating a low-res collision mesh from that reuses
| existing tooling. Mesh colliders aren't terribly ideal.
| You need a lot of triangles. SDFs can be a better choice
| in some cases.
| anigbrowl wrote:
| I suspect a _lot_ of game /film artists would be very happy
| to go back to sculpting physical objects and taking a few
| photographs as opposed to building the models from scratch
| in the computer.
| lainga wrote:
| How do you do collisions and shadowing? How is UV mapping
| done?
| andybak wrote:
| Collisions could be handles separately (they already are -
| you don't use the render mesh for collisions). Maybe a
| separate mesh, maybe an SDF or similar.
|
| UV mapping is a mesh thing. That's Stockholm Syndrome
| talking. ;-)
| teunispeters wrote:
| For a change, [code] works, but [arXiv] link is not present. Have
| to say this looks really interesting!
| StreetChief wrote:
| All I have to say is "ENHANCE!"
| eurekin wrote:
| For anybody wanting to take a look at the code, this time the
| Github link does include it - it's not empty, which is typicaly
| for those "too good to be true" publications
| joosters wrote:
| Probably a dumb question, but is this trained by the use of lots
| of inputs of similar objects, or is it 'just' estimating by the
| look of the input image?
|
| Like, if you have an image of a car, viewed at an angle, you can
| gauge the shape of the 3d object from the image itself. You could
| then assume that the hidden side of the car is similar to the
| side that you can see, and when you generate a 360 rotation
| animation of it, it will look pretty good (cars being roughly
| symmetrical). But if you gave it a flat image of a playing card,
| just showing the face up side, how would it reconstruct the
| reverse side? Would it infer it based on the front, or would it
| 'know' from training data that playing cards have a very
| different patterned back to them?
| zellyn wrote:
| I came here to ask this. The output was impressive to the point
| of magic... until they showed whole grids full of fire hydrants
| and teddy bear training data.
| lamerose wrote:
| Where do they show that?
| amelius wrote:
| This would be more powerful if you could feed it more input
| images for a better result, if desired.
| billconan wrote:
| the paper link doesn't work for me. the correct link
| https://arxiv.org/pdf/2312.13150.pdf
| mk_stjames wrote:
| Oof, the dependency tree on this.
|
| It uses diff-gaussian-rasterization from the original gaussian
| splatting implementation (which, is a linked submodule on the
| git, so if you are trying to git clone that dependency remember
| to use --recursive to actually download it).
|
| But that is written in mostly pure CUDA.
|
| That part is just used to display the resulting gaussian splatt'd
| model, and there have been other cross-platform implementations
| to render splats - there was even that web demo a few weeks ago,
| that was using WebGL [0] - and if that was used as a display
| output in place of the original implementation there is no reason
| people couldn't use this on non-Nvidia hardware, I think.
|
| edit: also device=cuda is hardcoded in the torch portions of the
| training code (sigh!). This doesn't have to be the case. pytorch
| could push this onto mps (metal) probably just fine.
|
| [0] https://github.com/antimatter15/splat?tab=readme-ov-file
| anigbrowl wrote:
| This could get prove useful for autonomous navigation systems as
| well.
___________________________________________________________________
(page generated 2023-12-21 23:01 UTC)