[HN Gopher] Gaussian Splatting SLAM
       ___________________________________________________________________
        
       Gaussian Splatting SLAM
        
       Author : shevis
       Score  : 81 points
       Date   : 2024-08-12 04:41 UTC (18 hours ago)
        
 (HTM) web link (rmurai.co.uk)
 (TXT) w3m dump (rmurai.co.uk)
        
       | andybak wrote:
       | This claims to work with monocular or RGB+depth but the only live
       | demo is for an Intel Realsense d455 RBGD camera. That seems a
       | shame as it significantly raises the bar for people to try it out
       | themselves. (Can you even still buy the d455?)
        
         | markisus wrote:
         | I assumed that the algorithm only uses the RGB sensor and the
         | depth is ignored. I bought two d455 within the past year and
         | they are quite nice.
        
           | dcanelhas wrote:
           | Seems it can use both, if available.
        
           | jacoblambda wrote:
           | Their model works on both. RGB only performance is comparable
           | to other depth based methods which is why they emphasized it
           | over the RGB-D version.
           | 
           | Direct paper link for ref: https://arxiv.org/pdf/2312.06741
        
         | pzo wrote:
         | Most iphones since iphone x have truedepth camera sensor in
         | front that could be used instead of realsense I guess. However
         | since iphone 13 quality of those depthmaps got more noisy. You
         | can try with record3d to stream such rgbd data to your laptop
         | via USB and use in python https://github.com/marek-
         | simonik/record3d/
        
       | totalview wrote:
       | I love the "3D Gaussian Visualisation" section that illustrates
       | the difference between photos of the mono data and the splat
       | data. The splats are like a giant point cloud under the hood,
       | except unlike point clouds which have uniform size, different
       | splats have different sizes.
       | 
       | This all is well and good when you are just using for a pretty
       | visualization, but it appears gaussians have the same weakness as
       | point clouds processed with structure from motion, in that you
       | need lots of camera angles to get quality surface reconstruction
       | accuracy.
        
         | jacoblambda wrote:
         | > This all is well and good when you are just using for a
         | pretty visualization, but it appears gaussians have the same
         | weakness as point clouds processed with structure from motion,
         | in that you need lots of camera angles to get quality surface
         | reconstruction accuracy.
         | 
         | The paper actually suggests the opposite. That gaussian splats
         | actually outperform point clouds and other methods when given
         | the same amount of data. And not just a little bit, but
         | ridiculously so.
         | 
         | Their Gaussian splatting based SLAM variants with RGB-D and RGB
         | (no depth) camera input both outperform essentially everything
         | else and are SOTA (state-of-the-art) for the field. RGB-D
         | obviously outperforms RGB but RGB data when used with gaussian
         | splatting performs comparably to or beats the competition even
         | when they are using depth data.
         | 
         | And not just that but their metrics outperform everything else
         | except for systems operating on literal ground truth data but
         | even then they perform comparably to those ground truth models
         | within a few percent.
         | 
         | And importantly where most other models run at ~0.2-3fps, this
         | model runs several orders of magnitude faster at an average
         | 769fps. While higher fps doesn't mean much past a certain
         | point, importantly this means you can do SLAM on much weaker
         | hardware while still guaranteeing a WCET below the frame time.
         | 
         | So this actually is a massive advancement in the SOTA since
         | gaussians let you very quickly and cheaply approximate a lot of
         | information in a way you can efficiently compare against and
         | refine against the current inputs from sensors.
        
       | dwrodri wrote:
       | Tangentially related to the post: I have what I think is a
       | related computer vision problem I would like to solve and need
       | some pointers on how you would go about doing it.
       | 
       | My desk is currently set up such that I have a large monitor in
       | the middle. I'd like to look at the center of the screen when
       | taking calls. I'd also like it to appear as though I am looking
       | straight into the camera, and the camera is pointed at my face.
       | Obviously, I cannot physically place the camera right in front of
       | the monitor as that would be seriously inconvenient. Some laptops
       | solve but I don't think their methods apply here as the top of my
       | monitor ends up being quite a bit higher than what would look
       | "good" for simple eye correction.
       | 
       | I have multiple webcams that I can place around the monitor to my
       | liking. I would like to have something similar to what is seen
       | when you open this webpage, but for a video. hopefully at higher
       | quality since I'm not constrained to a monocular source.
       | 
       | I've dabbled a bit with OpenCV in the past, but the most I've
       | done is a little camera calibration for de-warping fisheye
       | lenses. Any ideas on what work I should look into to get started
       | with this?
       | 
       | In my head, I'm picturing two camera sources: one above and one
       | below the monitor. The "synthetic" projected perspective would be
       | in the middle of the two.
       | 
       | Is capturing a point cloud from a stereo source and then
       | reprojecting with splats the most "straightforward" way to do
       | this? Any and all papers/advice are welcome. I'm a little rusty
       | on the math side but I figure a healthy mix of Szeliski's
       | Computer Vision, Wolfram Alpha, a chatbot, and of course
       | perseverance will get me there.
        
         | com2kid wrote:
         | This is a solved problem on some platforms (Zoom and Teams),
         | which alter your eyes so they look like they are staring into
         | the camera. Basically you drop your monitor down low (so the
         | camera is more centered on your head) and let software fix your
         | eyes.
         | 
         | If you want your head to actually be centered, there are also
         | some "center screen webcams" that exist that plop into the
         | middle of your screen during a call. There are a few types,
         | thin webcams that drape down, and clear "webcam holders" that
         | hold your webcam at the center of your screen, which are a bit
         | less convenient.
         | 
         | Nvidia also has a software package you can use, but I believe
         | it is a bit fiddle to get setup.
        
           | dwrodri wrote:
           | > Some laptops solve but I don't think their methods apply
           | here as the top of my monitor ends up being quite a bit
           | higher than what would look "good" for simple eye correction.
           | 
           | I appreciate the pragmatism of buying another thing to solve
           | the problem but I am hoping to solve this with stuff I
           | already own.
           | 
           | I'd be lying if the nerd cred of overengineering the solution
           | wasn't attractive as well.
        
         | pedalpete wrote:
         | Have you seen the work done with multiple Kinect cameras in
         | 2015? https://www.inavateonthenet.net/news/article/kinect-
         | camera-a...
         | 
         | Creating a depth field with monocular camera is now possible,
         | so that may help you get further with this.
        
       | Dig1t wrote:
       | I would love to use something like this to make a video game.
       | 
       | Are there any examples or algorithms that can turn this into 3D
       | objects that could be used in a video game? Any examples of
       | someone doing that?
        
         | shevis wrote:
         | I actually stumbled across this paper while researching exactly
         | that question! A reliable method for transforming gaussians
         | into geometry seems like it could dramatically change the
         | gamedev asset pipeline.
        
           | Dig1t wrote:
           | I agree! Also I think it could open up entire new types of
           | games.
        
       ___________________________________________________________________
       (page generated 2024-08-12 23:01 UTC)