[HN Gopher] HybridNeRF: Efficient Neural Rendering
___________________________________________________________________
HybridNeRF: Efficient Neural Rendering
Author : tzmlab
Score : 133 points
Date : 2024-06-22 14:21 UTC (8 hours ago)
(HTM) web link (haithemturki.com)
(TXT) w3m dump (haithemturki.com)
| ofou wrote:
| I'd spent hours navigating Street Views with this
| tmilard wrote:
| Everyone I believe... - Light data for Rendering and - fast 3D
| Reconstruction ======> Big winner.
|
| So many laboratories and software dev have given a shot at
| this. None have yet won.
|
| Success often lies in small (but important ) details...
| ttul wrote:
| Does anyone else look forward to a game that lets you transform
| your house or neighbor into a playable level with destructible
| objects? How far are we from recognizing the "car" and making it
| drivable, or the "tree" and making it choppable?
| TeMPOraL wrote:
| I dreamed of that since being a kid, so for nearly three
| decades now. It's been entirely possible even then - it was
| just a matter of using enough elbow grease. The problem is, the
| world is full of shiny happy people ready to call you a
| terrorist, assert their architectural copyright, or bring in
| the "creepiness factor", to shut down anyone who tries this.
| jsheard wrote:
| There's also just the fact that 1:1 reproductions of real-
| world places rarely make for good video game environments.
| Gameplay has to inform the layout and set dressing, and how
| you perceive space in games requires liberties to be taken to
| keep interiors from feeling weirdly cramped (any kid who had
| the idea to measure their house and build it in Quake or CS
| found this out the hard way).
|
| The main exception I can think of is in racing simulators,
| it's already common for the developers of those to drive
| LiDAR cars around real-world tracks and use that data to
| build a 1:1 replica for their game. NeRF might be a natural
| extension of that if they can figure out a way to combine it
| with dynamic lighting and weather conditions.
| smokel wrote:
| Having destructible objects is in no way possible on
| contemporary hardware, unless you simplify the physics to the
| extreme. Perhaps I'm misunderstanding your statement?
|
| Recognising objects for what they are has only recently
| become somewhat possible. Separating them in a 3D scan is
| still pretty much impossible.
| idiotsecant wrote:
| Destructible environments have been a thing for like....a
| decade or so? There's plenty of tricks to make it realistic
| enough to be fun without simulating every molecule.
| swiftcoder wrote:
| We've had destructible polygonal and voxel environments
| for a while now, yes. Destructible Nerfs are a whole
| other ball game - we're only just starting to get a
| handle on reliably segmenting objects within nerfs, let
| alone animating them
| chefandy wrote:
| A whole lot of manual work goes into making destructible
| 3D assets. Combined, I've put nearly a full work week
| into perfecting a breaking bottle simulation in Houdini
| to add to my demo reel, and it's still not quite there.
| And that's starting out with nice clean geometry that I
| made myself! A lot of it comes down to breaking things up
| based on voronoi tessellation of the surfaces, which is
| easy when you've got an eight-pointed cube, but it takes
| a lot more effort and is much more error prone as the
| geometric complexity increases. If you can figure out how
| to easily make simple enough, realistic looking, manifold
| geometry from real world 3d scans that's clean enough for
| standard 3D asset pipelines, you'll make a lot of money
| doing it.
| TeMPOraL wrote:
| Two decades - see _Red Faction_ , which is a first-person
| shooter from 2001.
| TeMPOraL wrote:
| My statement applies even without the destructive
| environment part - even though that was already mainstream
| 23 years ago! See _Red Faction_. No, just making a real-
| life place a detailed part of a video game is going to
| cause the pushback I mentioned.
| totalview wrote:
| I work in the rendering and gaming industry and also run a 3D
| scanning company. I have similarly wished for this capability,
| especially the destructability part. What you speak of is still
| pretty far off for several reasons:
|
| -No Collision/poor collision on NERFs and GS: to have a proper
| interactive world, you usually need accurate character
| collision so that your character or vehicle can move along the
| floor/ground (as opposed to falling thru it) run into walls, go
| through door frames, etc. NERFs suffer from the same issues as
| photogrammetry in that they need "structure from motion"
| (COLMAP or similar) to give them a mesh or 3-D output that can
| be meshed for collision to register off of. The mesh from
| reality capture is noisy, and is not simple geometry. Think
| millions of triangles from a laser scanner or camera for "flat"
| ground that a video game would use 100 triangles for.
|
| -Scanning: there's no scanner available that provides both good
| 3-D information and good photo realistic textures at a price
| people will want to pay. Scanning every square inch of playable
| space in even a modest sized house is a pain, and people will
| look behind the television, underneath the furniture and
| everywhere else that most of these scanning videos and demos
| never go. There are a lot of ugly angles that these videos omit
| where a player would go.
|
| -Post Processing: of you scan your house or any other real
| space, you will have poor lighting unless you took the time to
| do your own custom lighting and color setup. That will all need
| to be corrected in post process so that you can dynamically
| light your environment. Lighting is one of the most next
| generation things that people associate with games and you will
| be fighting prebaked shadows throughout the entire house or
| area that you have scanned. You don't get away from this with
| NERFs or gaussian splats, because those scenes also have
| prebaked lighting in them that is static.
|
| Object Destruction and Physics: I Love the game teardown, and
| if you want to see what it's like to actually bust up and
| destroy structures that have been physically scanned, there is
| a plug-in to import reality capture models directly into the
| game with a little bit of modding. That said, teardown is voxel
| based, and is one of the most advanced engines that has been
| built to do such a thing. I have seen nothing else capable of
| doing cool looking destruction of any object, scanned or 3D
| modeled, without a large studio effort and a ton of
| optimization.
| knicholes wrote:
| Maybe a quick, cheap NeRF with some object recognition, 3D
| object generation and replacement, so at least you have a
| sink where there is a sink and a couch where you have a
| couch, even though it might look differently.
| andybak wrote:
| I think you're generalising from exacting use cases to ones
| that might be much more tolerant of imperfection.
| modeless wrote:
| I think collision detection is solvable. And the scanning
| process should be no harder than 3D modeling to the same
| quality level. Probably much easier, honestly. Modeling is
| labor intensive. I'm not sure why you say "there's no scanner
| available that provides both good 3-D information and good
| photo realistic textures" because these new techniques don't
| use "scanners", all you need is regular cameras. The 3D
| information is inferred.
|
| Lighting is the big issue, IMO. As soon as you want any kind
| of interactivity besides moving the camera you need dynamic
| lighting. The problem is you're going to have to mix the
| captured absolutely perfect real-world lighting with
| extremely approximate real-time computed lighting (which will
| be much worse than offline-rendered path tracing, which still
| wouldn't match real-world quality). It's going to look awful.
| At least, until someone figures out a revolutionary neural
| relighting system. We are pretty far from that today.
|
| Scale is another issue. Two issues, really, rendering and
| storage. There's already a lot of research into scaling up
| rendering to large and detailed scenes, but I wouldn't say
| it's solved yet. And once you have rendering, storage will be
| the next issue. These scans will be massive and we'll need
| some very effective compression to be able to distribute
| large scenes to users.
| totalview wrote:
| You are correct; most of these new techniques are using a
| camera. In my line of work I consider a camera sensor a
| scanner of sorts, as we do a lot of photogrammetry and
| "scan" with a 45MP full frame. The inferred 3D from cameras
| is pretty bad when it comes to accuracy, especially from
| dimly lit areas or where you dip into a closet or closed
| space that doesn't have a good structural tie back to the
| main space you are trying to recreate in 3D. Laser scanners
| are far preferable to tie your photo pose estimation to,
| and most serious reality capture for video games is done
| with both a camera a and $40,000+ LiDAR Scanner. Have you
| ever tried to scan every corner of a house with only a
| traditional DSLR or point and shoot camera? I have and the
| results are pretty bad from a 3D standpoint without a ton
| of post process.
|
| The collision detection problem is related heavily to
| having clean 3D as mentioned above. My company is doing
| development on computing collision on reality capture right
| now in a clean way and I would be interested in any
| thoughts you have. We are chunking collision on the dataset
| at a fixed distance from the player character (can't go too
| fast in a vehicle or it will outpace the collision and fall
| thru the floor) and have a tunable LOD that influences
| collision resolution.
| sneak wrote:
| Both my iPhone and my Apple Vision Pro both have lidar
| scanners, fwiw.
|
| Frankly I'm surprised that I can't easily make crude 3D
| models of spaces with a simple app presently. It seems
| well within the capabilities of the hardware and
| software.
| totalview wrote:
| Those LiDAR sensors on phones and VR headsets are low
| resolution and mainly used to improve the photos and
| depth information from the camera. Different objective
| than mapping a space, which is mainly being disrupted by
| improvements from the self driving car and ADAS
| industries
| sneak wrote:
| Magic Room for the AVP does a good enough job. Seems the
| low resolution issue can be augmented/improved by
| repeated/closer scans.
| crazygringo wrote:
| I feel like the lighting part will become "easy" once we're
| able to greatly simplify the geometry and correlate it
| across multiple "passes" through the same space at
| different times.
|
| In other words, if you've got a consistent 3D geometric map
| of the house with textures, then you can do a pass in the
| morning with only daylight, midday only daylight, late
| afternoon only daylight, and then one at night with
| artificial light.
|
| If you're dealing with textures that map onto identical
| geometries (and assume no objects move during the day), it
| seems like it ought to be relatively straightforward to
| train AI's to produce a flat unlit texture version,
| especially since you can train them on easily generated
| raytraced renderings. There might even be straight-up
| statistical methods to do it.
|
| So I think it not the lighting itself that is the biggest
| problem -- it's having the clean consistent geometries in
| the first place.
| w-m wrote:
| Recognizing what is a car in a 3D NeRF/Gaussian Splatting scene
| can be done. Also research from CVPR:
| https://www.garfield.studio/
| antihero wrote:
| I mean depending on your risk aversion we're not that far, nor
| have ever been.
| naet wrote:
| My parents had a floor plan of our house drawn up for some
| reason, and when I was in late middle school I found it and
| modeled the house in the hammer editor so my friends and I
| could play Counter Strike source in there.
|
| It wasn't very well done but I figured out how to make the
| basic walls and building, add stairs, add some windows, grab
| some pre existing props like simple couches beds and a TV, and
| it was pretty recognizable. After adding a couple ladders to
| the outside so you could climb in the windows or on the roof
| the map was super fun just as a map, and doubly so since I
| could do things like hide in my own bedroom closet and
| recognize the rooms.
|
| Took some work since I didn't know how to do anything but
| totally worth it. I feel like there has to be a much more
| accessible level editor in some game out there today, not sure
| what it would be though.
|
| I thought my school had great architecture for another map but
| someone rightfully convinced me that would be a very bad idea
| to add to a shooting game. So I never made any others besides
| the house.
| 55555 wrote:
| What's the state of the art right now that can be run on my
| laptop from a set of photos? I want to play with NERFs, starting
| by generating one from a bunch of photos of my apartment, so I
| can then fly around the space virtually.
| sorenjan wrote:
| Probably Nerf studio
|
| https://docs.nerf.studio/
| turkihaithem wrote:
| One of the paper authors here - happy to answer any questions
| about the work or chat about neural rendering in general!
| refibrillator wrote:
| Congrats on the paper! Any chance the code will be released?
|
| Also I'd be curious to hear, what are you excited about in
| terms of future research ideas?
|
| Personally I'm excited by the trend of eliminating the need for
| traditional SfM preprocessing (sparse point clouds via colmap,
| camera pose estimation, etc).
| turkihaithem wrote:
| Thank you! The code is unlikely to be released (it's built
| upon Meta-internal codebases that I no longer have access to
| post-internship), at least not in the form that we
| specifically used at submission time. The last time I caught
| up with the team someone was expressing interest in releasing
| some broadly useful rendering code, but I really can't speak
| on their behalf so no guarantees.
|
| IMHO it's a really exciting time to be in the neural
| rendering / 3D vision space - the field is moving quickly and
| there's interesting work across all dimensions. My personal
| interests lean towards large-scale 3D reconstruction, and to
| that effect eliminating the need for traditional SfM/COLMAP
| preprocessing would be great. There's a lot of relevant
| recent work (https://dust3r.europe.naverlabs.com/,
| https://cameronosmith.github.io/flowmap/,
| https://vggsfm.github.io/, etc), but scaling these methods
| beyond several dozen images remains a challenge. I'm also
| really excited about using learned priors that can improve
| NeRF quality in underobserved regions
| (https://reconfusion.github.io). IMO using these priors will
| be super important to enabling dynamic 4D reconstruction
| (since it's otherwise unfeasible to directly observe every
| space-time point in a scene). Finally, making NeRF
| environments more interactive (as other posts have described)
| would unlock many use cases especially in simulation (ie: for
| autonomous driving). This is kind of tricky for implicit
| representations (like the original NeRF and this work), but
| there have been some really cool papers in the 3D Gaussian
| space (https://xpandora.github.io/PhysGaussian/) that are
| exciting.
| aerodog wrote:
| Thanks for your work!
|
| From my experience, NERF works great, but depends on highly
| accurate camera location information. Unless the VR device has
| this baked in, one must run a Colmap-style or SFM-style process
| to generate those camera extrinsics. Is there anything special
| HybridNeRF does around this?
| modeless wrote:
| The method in this paper relies on precomputed camera poses
| as input, but there have been tons of papers published on the
| topic of eliminating this requirement. Here are a few:
| https://dust3r.europe.naverlabs.com/
| https://arxiv.org/abs/2102.07064
| https://arxiv.org/abs/2312.08760v1
| https://x.com/_akhaliq/status/1734803566802407901
| turkihaithem wrote:
| Your understanding is correct!
| dr_dshiv wrote:
| Wow, it looks beautiful.
|
| Can regular phones capture the data required? How to get into
| this, as a hobbyist? I'm interested in the possibilities of
| scanning coral reefs and other ecological settings.
| turkihaithem wrote:
| One of the datasets we evaluated against in our paper uses a
| bespoke capture rig (https://github.com/facebookresearch/Eyef
| ulTower?tab=readme-o...) but you can definitely train very
| respectable NeRFs using a phone camera. In my experience it's
| less about camera resolution and more about getting a good
| capture - many NeRF methods assume that the scene is static,
| so minimizing things like lighting changes and transient
| shadows can make a big difference. If you're interested in
| getting your feet wet, I highly recommend Nerfstudio
| (https://docs.nerf.studio)!
| lxe wrote:
| Absolute noob question that I'm having a hard time understading:
|
| In practice, why NeRF instead of Gaussian Splatting? I have very
| limited exposure to either, but a very cursory search on the
| subject yields a "it depends on the context" answer. What exact
| context?
| zlenyk wrote:
| It's just a completely different paradigm of rendering and it's
| not clear which one will be dominant in the future. Gaussian
| splats are usually dependent on initialisation from point
| cloud, which makes whole process much more compliacated.
| GistNoesis wrote:
| There are two aspects in the difference between NeRF and
| Gaussian Splatting :
|
| - The first aspect concern how they solve the light rendering
| equation :
|
| NeRF has more potential for rendering physical quality but is
| slower.
|
| NeRF use raycasting. Gaussian Splatting project and draw
| gaussians directly in screen space.
|
| Each have various rendering artefacts. One distinction is in
| handling light reflections. When you use raycasting, you can
| bounce your ray on mirror surfaces. Where as gaussian
| splatting, like alice in wonderland creates a symmetric world
| on the other side of the mirror (and when the mirror surface is
| curved, it's hopeless).
|
| Although many NeRF don't implement reflections as a
| simplification, they can handle them almost natively.
|
| Alternatively, NeRF is a volumetric representation, whereas
| Gaussian Splatting has surfaces baked in : Gaussian Splats are
| rendered in order front to back. This mean that when you have
| two thin objects one behind the other, like the two sides of a
| book, Gaussian splatting will be able to render the front and
| hide the back whereas NeRF will merge front and back because
| volumetric element are transparent. (Though in NeRF with
| spherical harmonics the Radiance Field direction will allow to
| cull back from front based on the viewing angle).
|
| - The second aspect of NeRF vs Gaussian Splatting, is the
| choice of representation :
|
| NeRF usually use a neural network to store the scene in a
| compressed form. Whereas Gaussian Splatting is more explicit
| and uncompressed, the scene is represented in a sort of "point
| cloud" fashion. This mean that if your scene has potential for
| compression, like repetitive textures or objects, then the NeRF
| will make use of it and hallucinate what's missing. Whereas
| gaussian splat will show holes.
|
| Of course like this article is about, you can hybridize them.
___________________________________________________________________
(page generated 2024-06-22 23:00 UTC)