[HN Gopher] 3D Gaussian Splatting as Markov Chain Monte Carlo
       ___________________________________________________________________
        
       3D Gaussian Splatting as Markov Chain Monte Carlo
        
       Author : smusamashah
       Score  : 145 points
       Date   : 2024-06-18 17:08 UTC (5 hours ago)
        
 (HTM) web link (ubc-vision.github.io)
 (TXT) w3m dump (ubc-vision.github.io)
        
       | corysama wrote:
       | This is nice because the original 3DGS technique initialized
       | itself with a point cloud generated using a traditional COLMAP
       | process https://en.wikipedia.org/wiki/Structure_from_motion
        
         | Maken wrote:
         | The original 3DGS paper relied on some really hand-wavy
         | heuristics to optimice the gaussians' location and size (and
         | yet it worked remarkably well).
        
           | Etheryte wrote:
           | The value of the parameter was chosen by I picked it.
        
             | Terr_ wrote:
             | Hey guys, I think I found the NSA crypto-saboteur! :p
        
         | jonas21 wrote:
         | The point cloud is a by-product of of aligning the input
         | images. You essentially get it for free.
        
       | programjames wrote:
       | So, just to make it clear, the main difference in this paper is
       | adding a small amount of noise to each update? I'm a little
       | frustrated that I read through the whole paper and still am not
       | sure about this.
        
         | mafuyu wrote:
         | I'm not an expert at all, but my understanding is that Gaussian
         | splatting is essentially a rendering technique. Normally, you'd
         | take actual data, like a set of photos, and optimize some
         | Gaussians against it to arrive at a volumetric representation.
         | In the case of AI-generated splats, it's kinda flipped. Instead
         | of optimizing against a known ground truth, the AI is
         | generating Gaussians to be rendered. The insight of this paper
         | is that we already have great statistical tools for numerically
         | estimating ground truth based on a bunch of Gaussians, so why
         | not just apply those?
        
         | dwallin wrote:
         | No the main difference is looking at the problem from a new
         | perspective, relating it to a large body of existing work in
         | statistics (see
         | https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo and http
         | s://en.wikipedia.org/wiki/Stochastic_gradient_Langevin_d...).
         | Then they are able to use this new perspective to add several
         | improvements that lead to significant quality improvements,
         | clearly establishing the validity and utility of the new
         | theoretical underpinnings. It's likely this will be massively
         | influential in the direction of future research in the space.
         | 
         | The actual changes this led to are:
         | 
         | 1) As you mentioned, they added noise. But notably, they state
         | it was "designed carefully" to conform to the requirements of
         | SGLD and they detail how they designed the noise.
         | 
         | 2) They simplified the original operations of "move, split,
         | clone, prune, and add" and their related heuristics, into a
         | single operation type. They do so guided by existing knowledge
         | about MCMC frameworks, leading to a simpler model with stronger
         | theoretical underpinnings (huge win!).
         | 
         | 3) Adjustments to how gaussians are added and pruned to better
         | fit with the new model. This seems more like housekeeping
         | rather than something novel in and of itself.
        
         | Iwan-Zotow wrote:
         | No, Markov Chain Monte Carlo to get final stated from
         | distribution which is hard to tackle any other way
        
       | dguest wrote:
       | I'm sad that the PDF doesn't use hyperref.
       | 
       | When you get used to clicking a few links to to the a cited
       | reference the whole scroll-to-reference, copy-paste to google,
       | scroll to paper song and dance seems a lot less fun.
       | 
       | Probably a cool paper though.
        
         | hapofuino wrote:
         | The paper is also on arXiv, which includes the TeX source. The
         | experimental HTML view for this paper has automatic (internal)
         | hyperlinks like hyperref: https://arxiv.org/abs/2404.09591
         | 
         | Edit: The TeX source has the `draft` option for hyperref that
         | disabled hyperlinks in the produced pdf. External links to
         | references aren't recoverable from the included main.bbl
         | (probably because it was built with the `draft` option).
        
         | gessha wrote:
         | Yeah, scientific papers are a pain to read. Some have a Latex
         | extension which links you to the paper down in the references
         | but it's not bidirectional - you have to scroll back to your
         | previous position.
        
           | flor1s wrote:
           | Some PDF readers have a Back button (like the built-in Zotero
           | PDF reader).
        
       | StrLght wrote:
       | Gaussian splatting is such an impressive technique. I wish it
       | finds real-world applications useful for average person -- right
       | now it's probably the best way to show photorealistic scenes in
       | VR. There ought to be more usages out there.
        
         | hapofuino wrote:
         | For an average person, I suspect that some real estate listings
         | will adopt Gaussian splatting soon. Many newer listings include
         | photogrammetry already, and splatting will provide a
         | significant improvement to existing solutions.
        
           | fallat wrote:
           | I'm just starting to use photogrammy in hobbyist game maps to
           | provide "chill maps". Unfortunately splatting will not be
           | possible there.
        
         | dwallin wrote:
         | There's a lot of use cases, and companies already using
         | photogrammetry techniques have adopted it extremely quickly,
         | but for many of these businesses the usage of gaussian splats
         | is just a technical implementation detail, resulting in quality
         | / feature gains, but doesn't as of now unlock entirely new
         | business models.
        
           | jesol wrote:
           | It's true, Gaussian Splatting is just an alternative to
           | meshing a pointcloud for companies which currently rely on
           | photogrammetry or lidar (Lidar works well as a basis for
           | splatting when there's reference images taken as part of the
           | scan). But I think that misses all the new opportunities that
           | exist with Gaussian Splatting, which really just don't with
           | existing techniques.
           | 
           | Gaussian Splats are able to handle more heterogeneous
           | information sources, allowing more sources to help splat an
           | environment. Devices like drones, surveillance cameras, or
           | autonomous systems can be used to create or incrementally
           | update a Gaussian Splat; and there's interesting work to
           | allow them to locate themselves within the splat, not just to
           | show themselves but also to place vision ML outputs into it
           | (such as object detection or segmentation results).
           | 
           | Up till now nearly all digital representations of physical
           | environments are either based off the original designs (by
           | things like CAD or BIM files), or are an approximation of the
           | environment (from photogrammetry or Lidar scans). CAD and BIM
           | files suffer from drift, the real environment almost never
           | perfectly matches the design files, small (and large) changes
           | are made; and many times those files aren't even available if
           | the structure isn't new. Photogrammetry and Lidar scans
           | struggle because their output is a pointcloud, and it's very
           | difficult to accurately mesh a pointcloud (Matterport only
           | partially solved this problem and sold for $1.6B). Gaussian
           | Splats overcome these issues; they're comparatively easy to
           | generate for any environment, and allow for very accurate and
           | easy viewing from any angle.
           | 
           | I think the Digital Twin space will be turned upside down,
           | and they could potentially even cause huge changes in
           | autonomous and semi-autonomous factories, warehouses, and
           | depots. A single Gaussian Splat could be the source of truth
           | that many autonomous vehicles update through their separate
           | SLAM systems. Operators then would have access to this splat
           | (and it's history) as a source of truth for the environment.
           | Then, using techniques like iComMa[1], it may be possible to
           | directly align XR devices into the Gaussian Splat; allowing
           | operators direct access to location-based information
           | generated by the environment.
           | 
           | That's a lot of words to say: Gaussian Splatting is a very
           | neat new technology that could really underpin many future
           | technologies, I'm really excited about it
           | 
           | [1]: https://yuansun-xjtu.github.io/iComMa.io/
        
             | dwallin wrote:
             | I do agree that new use cases are emerging and it will
             | probably enable tons of new businesses. I'm very gung-ho
             | about the technology myself as well. I guess what I'm
             | trying to say is that the new businesses that emerge
             | because of this are not necessarily going to advertise that
             | they use gaussian splats to do it, it's not a buzzy enough
             | term, and many of the industries it serves just care about
             | the results it delivers. Your average tech person is
             | unlikely to hear much about it. Your average graphics
             | engineer will have probably heard about it, but not know
             | about all the use cases that are leveraging it. And your
             | average person in the industry it is changing won't know
             | what is causing the change (they will probably assign it to
             | the nebulous ai bucket). I fully expect gaussian splats to
             | be a quiet revolution.
        
               | jesol wrote:
               | Yeah, I see your point. I'd be surprised of Gaussian
               | Splatting didn't make it into the advertising for Digital
               | Twin services if/when they add it (like Bently's iTwin or
               | Dassault's Virtual Twin). Whether that translates more
               | broadly into the market, I don't know.
               | 
               | On the other hand, I'm playing with the idea of a
               | platform which provides a Gaussian Splat based Digital
               | Twin of an environment so other systems can utilize it to
               | share location-based information. Even though I don't
               | think it'll be possible to build without utilizing
               | Gaussian Splatting; splatting may not end up in any of
               | the pitches or advertising directly.
        
         | golergka wrote:
         | I may be mistaken, but it looks like it would only works for
         | completely static scenes with pre-baked lighting.
        
           | crubier wrote:
           | There have been an implementation of GS for video since day
           | one.
           | 
           | They have also included spherical harmonics for view
           | dependence since day one.
        
           | dwallin wrote:
           | It's important to disentangle this specific approach of
           | generating a gaussian splatting model from 2d images, with
           | gaussian splats as a general rendering technique.
           | 
           | There is nothing that prevents gaussian splatting from being
           | used dynamically. There are a variety of approaches to extend
           | gaussian splats into the time dimension to capture and
           | represent a 3d scene over time. The challenges here about how
           | to capture sufficient scene data (or use ai to fill in
           | insufficient data) and how to compress it. There are also
           | techniques that enable dynamic simulations, or real time
           | animation of collections of splats.
           | 
           | Also, adding un-baked lighting to gaussian splats is not
           | particularly hard, you can already throw slats into several
           | game engines / 3d renderers and add new lights to them. The
           | hard part of relighting is taking an existing capture of a
           | scene with baked-in lighting and deriving the resulting
           | material properties and lighting sources. This isn't directly
           | related to gaussian splats themselves though, you would have
           | a similar problem recovering the base materials and lights
           | from a 3d mesh with baked-in lighting textures. This really
           | falls under a separate category of techniques called "inverse
           | rendering". If anything, gaussian splats give us a new tool
           | to help with these sorts of problems.
           | 
           | Honestly the biggest remaining roadblocks to more elaborate
           | and widespread uses of gaussians as a rendering method are
           | probably storage and performance related. And I'm optimistic
           | these will be convincingly solved, triangle rasterization has
           | had many orders of magnitude more research, optimization, and
           | custom hardware built around it.
        
         | crazygringo wrote:
         | Are there any apps that use gaussian splatting in VR?
         | 
         | Is there something I can download on my Meta Quest to try it
         | out?
        
           | dwallin wrote:
           | I've heard about this one but don't have an appropriate setup
           | to test it myself: https://www.gracia.ai/
        
         | 3abiton wrote:
         | How does it benefit VR specifically?
        
       | vessenes wrote:
       | OK, I read the paper, agree the results look good, like the idea
       | of better formal grounding for how to choose where your splats
       | are, and ... I still have no idea what that top image is of. Is
       | it the distribution of where they put initial splats for any
       | given 2D image?? Why does the caption mention buildings? I'm
       | really lost.
        
         | barnabask wrote:
         | It's a video. Right click and click play.
        
       | Vt71fcAqt7 wrote:
       | >Unlike existing approaches to 3D Gaussian Splatting, we propose
       | to interpret the training process of placing and optimizing
       | Gaussians as a sampling process. Rather than defining a loss
       | function and simply taking steps towards a local minimum, we
       | define a distribution G which assigns high probability to
       | collections of Gaussians which faithfully reconstruct the
       | training images.
       | 
       | What is the practical difference here? MCMC itself samples more
       | from higher probilities than lower ones (ie. towards a local
       | minimum). Is it just that we sample more from lower ends of the
       | distribution? Or is it more about formalizing the previous
       | algorithm so that it is easier to play with the different
       | parameters? (eg. the acceptance threshold)
        
       ___________________________________________________________________
       (page generated 2024-06-18 23:00 UTC)