[HN Gopher] 3D Gaussian Splatting as Markov Chain Monte Carlo
___________________________________________________________________
3D Gaussian Splatting as Markov Chain Monte Carlo
Author : smusamashah
Score : 145 points
Date : 2024-06-18 17:08 UTC (5 hours ago)
(HTM) web link (ubc-vision.github.io)
(TXT) w3m dump (ubc-vision.github.io)
| corysama wrote:
| This is nice because the original 3DGS technique initialized
| itself with a point cloud generated using a traditional COLMAP
| process https://en.wikipedia.org/wiki/Structure_from_motion
| Maken wrote:
| The original 3DGS paper relied on some really hand-wavy
| heuristics to optimice the gaussians' location and size (and
| yet it worked remarkably well).
| Etheryte wrote:
| The value of the parameter was chosen by I picked it.
| Terr_ wrote:
| Hey guys, I think I found the NSA crypto-saboteur! :p
| jonas21 wrote:
| The point cloud is a by-product of of aligning the input
| images. You essentially get it for free.
| programjames wrote:
| So, just to make it clear, the main difference in this paper is
| adding a small amount of noise to each update? I'm a little
| frustrated that I read through the whole paper and still am not
| sure about this.
| mafuyu wrote:
| I'm not an expert at all, but my understanding is that Gaussian
| splatting is essentially a rendering technique. Normally, you'd
| take actual data, like a set of photos, and optimize some
| Gaussians against it to arrive at a volumetric representation.
| In the case of AI-generated splats, it's kinda flipped. Instead
| of optimizing against a known ground truth, the AI is
| generating Gaussians to be rendered. The insight of this paper
| is that we already have great statistical tools for numerically
| estimating ground truth based on a bunch of Gaussians, so why
| not just apply those?
| dwallin wrote:
| No the main difference is looking at the problem from a new
| perspective, relating it to a large body of existing work in
| statistics (see
| https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo and http
| s://en.wikipedia.org/wiki/Stochastic_gradient_Langevin_d...).
| Then they are able to use this new perspective to add several
| improvements that lead to significant quality improvements,
| clearly establishing the validity and utility of the new
| theoretical underpinnings. It's likely this will be massively
| influential in the direction of future research in the space.
|
| The actual changes this led to are:
|
| 1) As you mentioned, they added noise. But notably, they state
| it was "designed carefully" to conform to the requirements of
| SGLD and they detail how they designed the noise.
|
| 2) They simplified the original operations of "move, split,
| clone, prune, and add" and their related heuristics, into a
| single operation type. They do so guided by existing knowledge
| about MCMC frameworks, leading to a simpler model with stronger
| theoretical underpinnings (huge win!).
|
| 3) Adjustments to how gaussians are added and pruned to better
| fit with the new model. This seems more like housekeeping
| rather than something novel in and of itself.
| Iwan-Zotow wrote:
| No, Markov Chain Monte Carlo to get final stated from
| distribution which is hard to tackle any other way
| dguest wrote:
| I'm sad that the PDF doesn't use hyperref.
|
| When you get used to clicking a few links to to the a cited
| reference the whole scroll-to-reference, copy-paste to google,
| scroll to paper song and dance seems a lot less fun.
|
| Probably a cool paper though.
| hapofuino wrote:
| The paper is also on arXiv, which includes the TeX source. The
| experimental HTML view for this paper has automatic (internal)
| hyperlinks like hyperref: https://arxiv.org/abs/2404.09591
|
| Edit: The TeX source has the `draft` option for hyperref that
| disabled hyperlinks in the produced pdf. External links to
| references aren't recoverable from the included main.bbl
| (probably because it was built with the `draft` option).
| gessha wrote:
| Yeah, scientific papers are a pain to read. Some have a Latex
| extension which links you to the paper down in the references
| but it's not bidirectional - you have to scroll back to your
| previous position.
| flor1s wrote:
| Some PDF readers have a Back button (like the built-in Zotero
| PDF reader).
| StrLght wrote:
| Gaussian splatting is such an impressive technique. I wish it
| finds real-world applications useful for average person -- right
| now it's probably the best way to show photorealistic scenes in
| VR. There ought to be more usages out there.
| hapofuino wrote:
| For an average person, I suspect that some real estate listings
| will adopt Gaussian splatting soon. Many newer listings include
| photogrammetry already, and splatting will provide a
| significant improvement to existing solutions.
| fallat wrote:
| I'm just starting to use photogrammy in hobbyist game maps to
| provide "chill maps". Unfortunately splatting will not be
| possible there.
| dwallin wrote:
| There's a lot of use cases, and companies already using
| photogrammetry techniques have adopted it extremely quickly,
| but for many of these businesses the usage of gaussian splats
| is just a technical implementation detail, resulting in quality
| / feature gains, but doesn't as of now unlock entirely new
| business models.
| jesol wrote:
| It's true, Gaussian Splatting is just an alternative to
| meshing a pointcloud for companies which currently rely on
| photogrammetry or lidar (Lidar works well as a basis for
| splatting when there's reference images taken as part of the
| scan). But I think that misses all the new opportunities that
| exist with Gaussian Splatting, which really just don't with
| existing techniques.
|
| Gaussian Splats are able to handle more heterogeneous
| information sources, allowing more sources to help splat an
| environment. Devices like drones, surveillance cameras, or
| autonomous systems can be used to create or incrementally
| update a Gaussian Splat; and there's interesting work to
| allow them to locate themselves within the splat, not just to
| show themselves but also to place vision ML outputs into it
| (such as object detection or segmentation results).
|
| Up till now nearly all digital representations of physical
| environments are either based off the original designs (by
| things like CAD or BIM files), or are an approximation of the
| environment (from photogrammetry or Lidar scans). CAD and BIM
| files suffer from drift, the real environment almost never
| perfectly matches the design files, small (and large) changes
| are made; and many times those files aren't even available if
| the structure isn't new. Photogrammetry and Lidar scans
| struggle because their output is a pointcloud, and it's very
| difficult to accurately mesh a pointcloud (Matterport only
| partially solved this problem and sold for $1.6B). Gaussian
| Splats overcome these issues; they're comparatively easy to
| generate for any environment, and allow for very accurate and
| easy viewing from any angle.
|
| I think the Digital Twin space will be turned upside down,
| and they could potentially even cause huge changes in
| autonomous and semi-autonomous factories, warehouses, and
| depots. A single Gaussian Splat could be the source of truth
| that many autonomous vehicles update through their separate
| SLAM systems. Operators then would have access to this splat
| (and it's history) as a source of truth for the environment.
| Then, using techniques like iComMa[1], it may be possible to
| directly align XR devices into the Gaussian Splat; allowing
| operators direct access to location-based information
| generated by the environment.
|
| That's a lot of words to say: Gaussian Splatting is a very
| neat new technology that could really underpin many future
| technologies, I'm really excited about it
|
| [1]: https://yuansun-xjtu.github.io/iComMa.io/
| dwallin wrote:
| I do agree that new use cases are emerging and it will
| probably enable tons of new businesses. I'm very gung-ho
| about the technology myself as well. I guess what I'm
| trying to say is that the new businesses that emerge
| because of this are not necessarily going to advertise that
| they use gaussian splats to do it, it's not a buzzy enough
| term, and many of the industries it serves just care about
| the results it delivers. Your average tech person is
| unlikely to hear much about it. Your average graphics
| engineer will have probably heard about it, but not know
| about all the use cases that are leveraging it. And your
| average person in the industry it is changing won't know
| what is causing the change (they will probably assign it to
| the nebulous ai bucket). I fully expect gaussian splats to
| be a quiet revolution.
| jesol wrote:
| Yeah, I see your point. I'd be surprised of Gaussian
| Splatting didn't make it into the advertising for Digital
| Twin services if/when they add it (like Bently's iTwin or
| Dassault's Virtual Twin). Whether that translates more
| broadly into the market, I don't know.
|
| On the other hand, I'm playing with the idea of a
| platform which provides a Gaussian Splat based Digital
| Twin of an environment so other systems can utilize it to
| share location-based information. Even though I don't
| think it'll be possible to build without utilizing
| Gaussian Splatting; splatting may not end up in any of
| the pitches or advertising directly.
| golergka wrote:
| I may be mistaken, but it looks like it would only works for
| completely static scenes with pre-baked lighting.
| crubier wrote:
| There have been an implementation of GS for video since day
| one.
|
| They have also included spherical harmonics for view
| dependence since day one.
| dwallin wrote:
| It's important to disentangle this specific approach of
| generating a gaussian splatting model from 2d images, with
| gaussian splats as a general rendering technique.
|
| There is nothing that prevents gaussian splatting from being
| used dynamically. There are a variety of approaches to extend
| gaussian splats into the time dimension to capture and
| represent a 3d scene over time. The challenges here about how
| to capture sufficient scene data (or use ai to fill in
| insufficient data) and how to compress it. There are also
| techniques that enable dynamic simulations, or real time
| animation of collections of splats.
|
| Also, adding un-baked lighting to gaussian splats is not
| particularly hard, you can already throw slats into several
| game engines / 3d renderers and add new lights to them. The
| hard part of relighting is taking an existing capture of a
| scene with baked-in lighting and deriving the resulting
| material properties and lighting sources. This isn't directly
| related to gaussian splats themselves though, you would have
| a similar problem recovering the base materials and lights
| from a 3d mesh with baked-in lighting textures. This really
| falls under a separate category of techniques called "inverse
| rendering". If anything, gaussian splats give us a new tool
| to help with these sorts of problems.
|
| Honestly the biggest remaining roadblocks to more elaborate
| and widespread uses of gaussians as a rendering method are
| probably storage and performance related. And I'm optimistic
| these will be convincingly solved, triangle rasterization has
| had many orders of magnitude more research, optimization, and
| custom hardware built around it.
| crazygringo wrote:
| Are there any apps that use gaussian splatting in VR?
|
| Is there something I can download on my Meta Quest to try it
| out?
| dwallin wrote:
| I've heard about this one but don't have an appropriate setup
| to test it myself: https://www.gracia.ai/
| 3abiton wrote:
| How does it benefit VR specifically?
| vessenes wrote:
| OK, I read the paper, agree the results look good, like the idea
| of better formal grounding for how to choose where your splats
| are, and ... I still have no idea what that top image is of. Is
| it the distribution of where they put initial splats for any
| given 2D image?? Why does the caption mention buildings? I'm
| really lost.
| barnabask wrote:
| It's a video. Right click and click play.
| Vt71fcAqt7 wrote:
| >Unlike existing approaches to 3D Gaussian Splatting, we propose
| to interpret the training process of placing and optimizing
| Gaussians as a sampling process. Rather than defining a loss
| function and simply taking steps towards a local minimum, we
| define a distribution G which assigns high probability to
| collections of Gaussians which faithfully reconstruct the
| training images.
|
| What is the practical difference here? MCMC itself samples more
| from higher probilities than lower ones (ie. towards a local
| minimum). Is it just that we sample more from lower ends of the
| distribution? Or is it more about formalizing the previous
| algorithm so that it is easier to play with the different
| parameters? (eg. the acceptance threshold)
___________________________________________________________________
(page generated 2024-06-18 23:00 UTC)