[HN Gopher] Gaussian Splatting SLAM
___________________________________________________________________
Gaussian Splatting SLAM
Author : shevis
Score : 81 points
Date : 2024-08-12 04:41 UTC (18 hours ago)
(HTM) web link (rmurai.co.uk)
(TXT) w3m dump (rmurai.co.uk)
| andybak wrote:
| This claims to work with monocular or RGB+depth but the only live
| demo is for an Intel Realsense d455 RBGD camera. That seems a
| shame as it significantly raises the bar for people to try it out
| themselves. (Can you even still buy the d455?)
| markisus wrote:
| I assumed that the algorithm only uses the RGB sensor and the
| depth is ignored. I bought two d455 within the past year and
| they are quite nice.
| dcanelhas wrote:
| Seems it can use both, if available.
| jacoblambda wrote:
| Their model works on both. RGB only performance is comparable
| to other depth based methods which is why they emphasized it
| over the RGB-D version.
|
| Direct paper link for ref: https://arxiv.org/pdf/2312.06741
| pzo wrote:
| Most iphones since iphone x have truedepth camera sensor in
| front that could be used instead of realsense I guess. However
| since iphone 13 quality of those depthmaps got more noisy. You
| can try with record3d to stream such rgbd data to your laptop
| via USB and use in python https://github.com/marek-
| simonik/record3d/
| totalview wrote:
| I love the "3D Gaussian Visualisation" section that illustrates
| the difference between photos of the mono data and the splat
| data. The splats are like a giant point cloud under the hood,
| except unlike point clouds which have uniform size, different
| splats have different sizes.
|
| This all is well and good when you are just using for a pretty
| visualization, but it appears gaussians have the same weakness as
| point clouds processed with structure from motion, in that you
| need lots of camera angles to get quality surface reconstruction
| accuracy.
| jacoblambda wrote:
| > This all is well and good when you are just using for a
| pretty visualization, but it appears gaussians have the same
| weakness as point clouds processed with structure from motion,
| in that you need lots of camera angles to get quality surface
| reconstruction accuracy.
|
| The paper actually suggests the opposite. That gaussian splats
| actually outperform point clouds and other methods when given
| the same amount of data. And not just a little bit, but
| ridiculously so.
|
| Their Gaussian splatting based SLAM variants with RGB-D and RGB
| (no depth) camera input both outperform essentially everything
| else and are SOTA (state-of-the-art) for the field. RGB-D
| obviously outperforms RGB but RGB data when used with gaussian
| splatting performs comparably to or beats the competition even
| when they are using depth data.
|
| And not just that but their metrics outperform everything else
| except for systems operating on literal ground truth data but
| even then they perform comparably to those ground truth models
| within a few percent.
|
| And importantly where most other models run at ~0.2-3fps, this
| model runs several orders of magnitude faster at an average
| 769fps. While higher fps doesn't mean much past a certain
| point, importantly this means you can do SLAM on much weaker
| hardware while still guaranteeing a WCET below the frame time.
|
| So this actually is a massive advancement in the SOTA since
| gaussians let you very quickly and cheaply approximate a lot of
| information in a way you can efficiently compare against and
| refine against the current inputs from sensors.
| dwrodri wrote:
| Tangentially related to the post: I have what I think is a
| related computer vision problem I would like to solve and need
| some pointers on how you would go about doing it.
|
| My desk is currently set up such that I have a large monitor in
| the middle. I'd like to look at the center of the screen when
| taking calls. I'd also like it to appear as though I am looking
| straight into the camera, and the camera is pointed at my face.
| Obviously, I cannot physically place the camera right in front of
| the monitor as that would be seriously inconvenient. Some laptops
| solve but I don't think their methods apply here as the top of my
| monitor ends up being quite a bit higher than what would look
| "good" for simple eye correction.
|
| I have multiple webcams that I can place around the monitor to my
| liking. I would like to have something similar to what is seen
| when you open this webpage, but for a video. hopefully at higher
| quality since I'm not constrained to a monocular source.
|
| I've dabbled a bit with OpenCV in the past, but the most I've
| done is a little camera calibration for de-warping fisheye
| lenses. Any ideas on what work I should look into to get started
| with this?
|
| In my head, I'm picturing two camera sources: one above and one
| below the monitor. The "synthetic" projected perspective would be
| in the middle of the two.
|
| Is capturing a point cloud from a stereo source and then
| reprojecting with splats the most "straightforward" way to do
| this? Any and all papers/advice are welcome. I'm a little rusty
| on the math side but I figure a healthy mix of Szeliski's
| Computer Vision, Wolfram Alpha, a chatbot, and of course
| perseverance will get me there.
| com2kid wrote:
| This is a solved problem on some platforms (Zoom and Teams),
| which alter your eyes so they look like they are staring into
| the camera. Basically you drop your monitor down low (so the
| camera is more centered on your head) and let software fix your
| eyes.
|
| If you want your head to actually be centered, there are also
| some "center screen webcams" that exist that plop into the
| middle of your screen during a call. There are a few types,
| thin webcams that drape down, and clear "webcam holders" that
| hold your webcam at the center of your screen, which are a bit
| less convenient.
|
| Nvidia also has a software package you can use, but I believe
| it is a bit fiddle to get setup.
| dwrodri wrote:
| > Some laptops solve but I don't think their methods apply
| here as the top of my monitor ends up being quite a bit
| higher than what would look "good" for simple eye correction.
|
| I appreciate the pragmatism of buying another thing to solve
| the problem but I am hoping to solve this with stuff I
| already own.
|
| I'd be lying if the nerd cred of overengineering the solution
| wasn't attractive as well.
| pedalpete wrote:
| Have you seen the work done with multiple Kinect cameras in
| 2015? https://www.inavateonthenet.net/news/article/kinect-
| camera-a...
|
| Creating a depth field with monocular camera is now possible,
| so that may help you get further with this.
| Dig1t wrote:
| I would love to use something like this to make a video game.
|
| Are there any examples or algorithms that can turn this into 3D
| objects that could be used in a video game? Any examples of
| someone doing that?
| shevis wrote:
| I actually stumbled across this paper while researching exactly
| that question! A reliable method for transforming gaussians
| into geometry seems like it could dramatically change the
| gamedev asset pipeline.
| Dig1t wrote:
| I agree! Also I think it could open up entire new types of
| games.
___________________________________________________________________
(page generated 2024-08-12 23:01 UTC)