[HN Gopher] Real-Time Coherent 3D Reconstruction from Monocular ...
___________________________________________________________________
Real-Time Coherent 3D Reconstruction from Monocular Video
Author : samber
Score : 88 points
Date : 2022-03-14 17:50 UTC (5 hours ago)
(HTM) web link (zju3dv.github.io)
(TXT) w3m dump (zju3dv.github.io)
| stefan_ wrote:
| Looked cool, then I read that there is some Apple ARkit magic
| blackbox in the middle of it all.
| cmelbye wrote:
| I don't think that's true. The paper says that a camera pose
| estimated by a SLAM system is required. ARKit implements SLAM
| and can easily provide camera pose for each frame through the
| ARFrame class. But there are countless other implementations of
| SLAM, including Android ARCore, Oculus Quest, Roomba, self-
| driving cars, and a number of GitHub repos
| (https://github.com/tzutalin/awesome-visual-slam).
| fxtentacle wrote:
| Yeah, I also consider that odd to use LIDAR-based poses and
| then call it "monocular".
| spyder wrote:
| It's not exactly the same but Neural Radiance Fields are getting
| more impressive:
|
| First one was this but it was slow:
| https://www.matthewtancik.com/nerf
|
| Then it's got faster: https://www.youtube.com/watch?v=fvXOjV7EHbk
|
| Lot of interesting papers:
|
| https://github.com/yenchenlin/awesome-NeRF
| alhirzel wrote:
| Does anyone know what the state of the art is for doing this type
| of reconstruction as a streaming input to detection and
| recognition algorithms? For instance, this could be used for
| object detection and identification on a recycling conveyor line.
| beambot wrote:
| I don't believe that either does reconstruction... but for the
| recycling application, there are a handful of companies
| tackling this problem -- e.g. Everest Labs & Amp Robotics.
| leobg wrote:
| So much for the folks who think Tesla is on a fool's errand when
| they're using cameras instead of LIDAR.
| kajecounterhack wrote:
| Companies like Waymo and Cruise use this kind of technology
| too. Unfortunately there are tons of corner cases of weird
| things you haven't seen before -- for example, some special
| vehicles self-occlude and you never get enough coverage to
| observe them correctly until you're too close. In general,
| radars and lidars used in _conjunction_ with cameras can handle
| occluded objects much better.
|
| Also, to measure the performance / evaluate observations
| generated from this tech, you would want to compare it to a
| pretty sizable 3D ground truth set which Tesla does not
| currently have. There are pretty big advantages to starting
| with a maximal set of sensors even if (eventually)
| breakthroughs turn them into unnecessary crutches.
| leobg wrote:
| That was very insightful. Do you work in that space? It is
| comments like yours that make HN a special place.
| ceejayoz wrote:
| The failure mode (for example: decapitation;
| https://www.latimes.com/business/la-fi-tesla-florida-
| acciden...) is pretty significant when used in a Tesla. Less so
| in this tech demo.
| billconan wrote:
| can ARkit return accurate camera position?
| upbeat_general wrote:
| I haven't looked at any metrics but based on using ARKit
| applications (and various VIO SLAM implementations) it can but
| it depends highly on the scene, camera motion, and whether
| there is LIDAR/Stereo Depth.
| AndrewKemendo wrote:
| Honestly this doesn't look any better than what we were doing
| back in 2016-2017. I'm not sure what's novel here.
|
| This is the only video I could find, but we were doing monocular
| reconstruction from a limited number of RGB (not depth) images
| AND doing voxel segmentation on the processing side.
| https://www.youtube.com/watch?v=nqy44VSWh3g
|
| Even as far back as 2010 people were doing reasonable monocular
| reconstruction including software like meshroom etc...the whole
| of TU Munich also under Matthias Niessner has been doing this for
| a while.
|
| What's novel here?
| tintor wrote:
| Fast enough to be used for mobile robots?
| nobbis wrote:
| Their research doesn't just integrate depth maps into a TSDF -
| it uses NN's to incorporate surface priors.
|
| I don't recall you having similar real-time meshing
| functionality in 2016-2017, Andrew. Can you show what you had?
|
| As far as I'm aware, Abound was the first to demo real-time
| monocular mobile meshing: on Android in early 2017 (e.g.
| https://www.youtube.com/watch?v=K9CpT-sy7HE), and iOS in early
| 2018 (e.g.
| https://twitter.com/nobbis/status/972298968574013440).
| pj_mukh wrote:
| Looks like a much better response to white walls/texture less
| surfaces.
| fxtentacle wrote:
| This is a paper about a new way of storing/merging 3D data.
|
| The actual 3D reconstruction is so-so, I agree. And they kinda
| cheat by using ARKit (which uses LIDAR internally) to get good
| camera poses even if there is little texture.
|
| So the novel part here is that they can immediately merge all
| the images into a coherent representation of the 3D space, as
| opposed to first doing bundle adjustment, then doing pairwise
| depth matching, then doing streak-based depth matching, and
| then merging the resulting point clouds.
|
| Also, they can use learned 3D shape priors to improve their
| results. Basically that means "if there is no visible gap,
| assume the surface is flat". But AFAIK, that's not new.
|
| EDIT: My main criticism of this paper after looking at the
| source code a bit would be that due to the TSDF, which is like
| a 3D voxel grid, they need insane amounts of GPU memory, or
| else the scenes either need to be very small or low resolution.
| That is most likely also the reason why they reconstruction
| looks so cartoon-like and is smooth on all corners: They lack
| the memory to store more high-frequency details.
|
| EDIT2: Mainly, it looks like they managed to reduce the GPU
| memory consumption of Atlas [1] which is why they can
| reconstruct larger areas and/or higher resolution. But it's
| still far less detail than Colmap [2].
|
| [1] https://github.com/magicleap/Atlas
|
| [2] https://colmap.github.io/
| closetnerd wrote:
| Says it's real-time
| AndrewKemendo wrote:
| 2016 from Matthias Niessner's group
|
| https://www.youtube.com/watch?v=keIirXrRb1k
|
| http://graphics.stanford.edu/projects/bundlefusion/
| jonas21 wrote:
| That requires depth input
| AndrewKemendo wrote:
| Good point, I don't recall offhand the paper that was the
| mono-RT one.
|
| At a minimum though 6D.ai and a few others had companies
| that were selling this as a service at least as far back
| as 2017.
| fxtentacle wrote:
| I always found ORB-SLAM2 pretty impressive, which can map 3D
| neighborhoods in realtime while you drive around in a car:
|
| https://www.youtube.com/watch?v=ufvPS5wJAx0
|
| https://www.youtube.com/watch?v=3BrXWH6zRHg
| polishdude20 wrote:
| Shame there's no Android or iPhone app available
| adampk wrote:
| I am surprised that the team didn't choose to add the "Fusion"
| append at the end.
|
| This seems to fit into the genealogy of KinectFusion,
| ElasticFusion, BundleFusion, etc.
|
| https://www.microsoft.com/en-us/research/wp-content/uploads/...
| https://www.imperial.ac.uk/dyson-robotics-lab/downloads/elas...
| https://graphics.stanford.edu/projects/bundlefusion/
|
| Very impressive work. I have not seen any use cases for online 3D
| reconstruction unfortunately. 6D.ai made terrific progress in
| this tech but also could not find great use cases for online
| reconstruction and ended up having to sell to Niantic.
|
| Seems like what people want, if they want 3D reconstruction, is
| extremely high-fidelity scans (a la Matterport) and are willing
| to wait for the model. Unfortunately TSDF approach create a
| "slimey" end look which isn't usually what people are after if
| they want an accurate 3D reconstruction.
|
| It SEEMS like _online_ 3D reconstruction would be helpful, but I
| have yet to see a use case for "online"...
| [deleted]
| tintor wrote:
| Use case: Mobile robotics, lidar replacement in self-driving
| vehicles
| tonyarkles wrote:
| I'm very curious to see how well this would work for online
| terrain reconstruction. I've got a drone with a pretty powerful
| onboard computer and it's always nice to be able to solve and
| tune problems with software instead of additional (e.g. LIDAR)
| hardware.
___________________________________________________________________
(page generated 2022-03-14 23:00 UTC)