[HN Gopher] HybridNeRF: Efficient Neural Rendering
       ___________________________________________________________________
        
       HybridNeRF: Efficient Neural Rendering
        
       Author : tzmlab
       Score  : 133 points
       Date   : 2024-06-22 14:21 UTC (8 hours ago)
        
 (HTM) web link (haithemturki.com)
 (TXT) w3m dump (haithemturki.com)
        
       | ofou wrote:
       | I'd spent hours navigating Street Views with this
        
         | tmilard wrote:
         | Everyone I believe... - Light data for Rendering and - fast 3D
         | Reconstruction ======> Big winner.
         | 
         | So many laboratories and software dev have given a shot at
         | this. None have yet won.
         | 
         | Success often lies in small (but important ) details...
        
       | ttul wrote:
       | Does anyone else look forward to a game that lets you transform
       | your house or neighbor into a playable level with destructible
       | objects? How far are we from recognizing the "car" and making it
       | drivable, or the "tree" and making it choppable?
        
         | TeMPOraL wrote:
         | I dreamed of that since being a kid, so for nearly three
         | decades now. It's been entirely possible even then - it was
         | just a matter of using enough elbow grease. The problem is, the
         | world is full of shiny happy people ready to call you a
         | terrorist, assert their architectural copyright, or bring in
         | the "creepiness factor", to shut down anyone who tries this.
        
           | jsheard wrote:
           | There's also just the fact that 1:1 reproductions of real-
           | world places rarely make for good video game environments.
           | Gameplay has to inform the layout and set dressing, and how
           | you perceive space in games requires liberties to be taken to
           | keep interiors from feeling weirdly cramped (any kid who had
           | the idea to measure their house and build it in Quake or CS
           | found this out the hard way).
           | 
           | The main exception I can think of is in racing simulators,
           | it's already common for the developers of those to drive
           | LiDAR cars around real-world tracks and use that data to
           | build a 1:1 replica for their game. NeRF might be a natural
           | extension of that if they can figure out a way to combine it
           | with dynamic lighting and weather conditions.
        
           | smokel wrote:
           | Having destructible objects is in no way possible on
           | contemporary hardware, unless you simplify the physics to the
           | extreme. Perhaps I'm misunderstanding your statement?
           | 
           | Recognising objects for what they are has only recently
           | become somewhat possible. Separating them in a 3D scan is
           | still pretty much impossible.
        
             | idiotsecant wrote:
             | Destructible environments have been a thing for like....a
             | decade or so? There's plenty of tricks to make it realistic
             | enough to be fun without simulating every molecule.
        
               | swiftcoder wrote:
               | We've had destructible polygonal and voxel environments
               | for a while now, yes. Destructible Nerfs are a whole
               | other ball game - we're only just starting to get a
               | handle on reliably segmenting objects within nerfs, let
               | alone animating them
        
               | chefandy wrote:
               | A whole lot of manual work goes into making destructible
               | 3D assets. Combined, I've put nearly a full work week
               | into perfecting a breaking bottle simulation in Houdini
               | to add to my demo reel, and it's still not quite there.
               | And that's starting out with nice clean geometry that I
               | made myself! A lot of it comes down to breaking things up
               | based on voronoi tessellation of the surfaces, which is
               | easy when you've got an eight-pointed cube, but it takes
               | a lot more effort and is much more error prone as the
               | geometric complexity increases. If you can figure out how
               | to easily make simple enough, realistic looking, manifold
               | geometry from real world 3d scans that's clean enough for
               | standard 3D asset pipelines, you'll make a lot of money
               | doing it.
        
               | TeMPOraL wrote:
               | Two decades - see _Red Faction_ , which is a first-person
               | shooter from 2001.
        
             | TeMPOraL wrote:
             | My statement applies even without the destructive
             | environment part - even though that was already mainstream
             | 23 years ago! See _Red Faction_. No, just making a real-
             | life place a detailed part of a video game is going to
             | cause the pushback I mentioned.
        
         | totalview wrote:
         | I work in the rendering and gaming industry and also run a 3D
         | scanning company. I have similarly wished for this capability,
         | especially the destructability part. What you speak of is still
         | pretty far off for several reasons:
         | 
         | -No Collision/poor collision on NERFs and GS: to have a proper
         | interactive world, you usually need accurate character
         | collision so that your character or vehicle can move along the
         | floor/ground (as opposed to falling thru it) run into walls, go
         | through door frames, etc. NERFs suffer from the same issues as
         | photogrammetry in that they need "structure from motion"
         | (COLMAP or similar) to give them a mesh or 3-D output that can
         | be meshed for collision to register off of. The mesh from
         | reality capture is noisy, and is not simple geometry. Think
         | millions of triangles from a laser scanner or camera for "flat"
         | ground that a video game would use 100 triangles for.
         | 
         | -Scanning: there's no scanner available that provides both good
         | 3-D information and good photo realistic textures at a price
         | people will want to pay. Scanning every square inch of playable
         | space in even a modest sized house is a pain, and people will
         | look behind the television, underneath the furniture and
         | everywhere else that most of these scanning videos and demos
         | never go. There are a lot of ugly angles that these videos omit
         | where a player would go.
         | 
         | -Post Processing: of you scan your house or any other real
         | space, you will have poor lighting unless you took the time to
         | do your own custom lighting and color setup. That will all need
         | to be corrected in post process so that you can dynamically
         | light your environment. Lighting is one of the most next
         | generation things that people associate with games and you will
         | be fighting prebaked shadows throughout the entire house or
         | area that you have scanned. You don't get away from this with
         | NERFs or gaussian splats, because those scenes also have
         | prebaked lighting in them that is static.
         | 
         | Object Destruction and Physics: I Love the game teardown, and
         | if you want to see what it's like to actually bust up and
         | destroy structures that have been physically scanned, there is
         | a plug-in to import reality capture models directly into the
         | game with a little bit of modding. That said, teardown is voxel
         | based, and is one of the most advanced engines that has been
         | built to do such a thing. I have seen nothing else capable of
         | doing cool looking destruction of any object, scanned or 3D
         | modeled, without a large studio effort and a ton of
         | optimization.
        
           | knicholes wrote:
           | Maybe a quick, cheap NeRF with some object recognition, 3D
           | object generation and replacement, so at least you have a
           | sink where there is a sink and a couch where you have a
           | couch, even though it might look differently.
        
           | andybak wrote:
           | I think you're generalising from exacting use cases to ones
           | that might be much more tolerant of imperfection.
        
           | modeless wrote:
           | I think collision detection is solvable. And the scanning
           | process should be no harder than 3D modeling to the same
           | quality level. Probably much easier, honestly. Modeling is
           | labor intensive. I'm not sure why you say "there's no scanner
           | available that provides both good 3-D information and good
           | photo realistic textures" because these new techniques don't
           | use "scanners", all you need is regular cameras. The 3D
           | information is inferred.
           | 
           | Lighting is the big issue, IMO. As soon as you want any kind
           | of interactivity besides moving the camera you need dynamic
           | lighting. The problem is you're going to have to mix the
           | captured absolutely perfect real-world lighting with
           | extremely approximate real-time computed lighting (which will
           | be much worse than offline-rendered path tracing, which still
           | wouldn't match real-world quality). It's going to look awful.
           | At least, until someone figures out a revolutionary neural
           | relighting system. We are pretty far from that today.
           | 
           | Scale is another issue. Two issues, really, rendering and
           | storage. There's already a lot of research into scaling up
           | rendering to large and detailed scenes, but I wouldn't say
           | it's solved yet. And once you have rendering, storage will be
           | the next issue. These scans will be massive and we'll need
           | some very effective compression to be able to distribute
           | large scenes to users.
        
             | totalview wrote:
             | You are correct; most of these new techniques are using a
             | camera. In my line of work I consider a camera sensor a
             | scanner of sorts, as we do a lot of photogrammetry and
             | "scan" with a 45MP full frame. The inferred 3D from cameras
             | is pretty bad when it comes to accuracy, especially from
             | dimly lit areas or where you dip into a closet or closed
             | space that doesn't have a good structural tie back to the
             | main space you are trying to recreate in 3D. Laser scanners
             | are far preferable to tie your photo pose estimation to,
             | and most serious reality capture for video games is done
             | with both a camera a and $40,000+ LiDAR Scanner. Have you
             | ever tried to scan every corner of a house with only a
             | traditional DSLR or point and shoot camera? I have and the
             | results are pretty bad from a 3D standpoint without a ton
             | of post process.
             | 
             | The collision detection problem is related heavily to
             | having clean 3D as mentioned above. My company is doing
             | development on computing collision on reality capture right
             | now in a clean way and I would be interested in any
             | thoughts you have. We are chunking collision on the dataset
             | at a fixed distance from the player character (can't go too
             | fast in a vehicle or it will outpace the collision and fall
             | thru the floor) and have a tunable LOD that influences
             | collision resolution.
        
               | sneak wrote:
               | Both my iPhone and my Apple Vision Pro both have lidar
               | scanners, fwiw.
               | 
               | Frankly I'm surprised that I can't easily make crude 3D
               | models of spaces with a simple app presently. It seems
               | well within the capabilities of the hardware and
               | software.
        
               | totalview wrote:
               | Those LiDAR sensors on phones and VR headsets are low
               | resolution and mainly used to improve the photos and
               | depth information from the camera. Different objective
               | than mapping a space, which is mainly being disrupted by
               | improvements from the self driving car and ADAS
               | industries
        
               | sneak wrote:
               | Magic Room for the AVP does a good enough job. Seems the
               | low resolution issue can be augmented/improved by
               | repeated/closer scans.
        
             | crazygringo wrote:
             | I feel like the lighting part will become "easy" once we're
             | able to greatly simplify the geometry and correlate it
             | across multiple "passes" through the same space at
             | different times.
             | 
             | In other words, if you've got a consistent 3D geometric map
             | of the house with textures, then you can do a pass in the
             | morning with only daylight, midday only daylight, late
             | afternoon only daylight, and then one at night with
             | artificial light.
             | 
             | If you're dealing with textures that map onto identical
             | geometries (and assume no objects move during the day), it
             | seems like it ought to be relatively straightforward to
             | train AI's to produce a flat unlit texture version,
             | especially since you can train them on easily generated
             | raytraced renderings. There might even be straight-up
             | statistical methods to do it.
             | 
             | So I think it not the lighting itself that is the biggest
             | problem -- it's having the clean consistent geometries in
             | the first place.
        
         | w-m wrote:
         | Recognizing what is a car in a 3D NeRF/Gaussian Splatting scene
         | can be done. Also research from CVPR:
         | https://www.garfield.studio/
        
         | antihero wrote:
         | I mean depending on your risk aversion we're not that far, nor
         | have ever been.
        
         | naet wrote:
         | My parents had a floor plan of our house drawn up for some
         | reason, and when I was in late middle school I found it and
         | modeled the house in the hammer editor so my friends and I
         | could play Counter Strike source in there.
         | 
         | It wasn't very well done but I figured out how to make the
         | basic walls and building, add stairs, add some windows, grab
         | some pre existing props like simple couches beds and a TV, and
         | it was pretty recognizable. After adding a couple ladders to
         | the outside so you could climb in the windows or on the roof
         | the map was super fun just as a map, and doubly so since I
         | could do things like hide in my own bedroom closet and
         | recognize the rooms.
         | 
         | Took some work since I didn't know how to do anything but
         | totally worth it. I feel like there has to be a much more
         | accessible level editor in some game out there today, not sure
         | what it would be though.
         | 
         | I thought my school had great architecture for another map but
         | someone rightfully convinced me that would be a very bad idea
         | to add to a shooting game. So I never made any others besides
         | the house.
        
       | 55555 wrote:
       | What's the state of the art right now that can be run on my
       | laptop from a set of photos? I want to play with NERFs, starting
       | by generating one from a bunch of photos of my apartment, so I
       | can then fly around the space virtually.
        
         | sorenjan wrote:
         | Probably Nerf studio
         | 
         | https://docs.nerf.studio/
        
       | turkihaithem wrote:
       | One of the paper authors here - happy to answer any questions
       | about the work or chat about neural rendering in general!
        
         | refibrillator wrote:
         | Congrats on the paper! Any chance the code will be released?
         | 
         | Also I'd be curious to hear, what are you excited about in
         | terms of future research ideas?
         | 
         | Personally I'm excited by the trend of eliminating the need for
         | traditional SfM preprocessing (sparse point clouds via colmap,
         | camera pose estimation, etc).
        
           | turkihaithem wrote:
           | Thank you! The code is unlikely to be released (it's built
           | upon Meta-internal codebases that I no longer have access to
           | post-internship), at least not in the form that we
           | specifically used at submission time. The last time I caught
           | up with the team someone was expressing interest in releasing
           | some broadly useful rendering code, but I really can't speak
           | on their behalf so no guarantees.
           | 
           | IMHO it's a really exciting time to be in the neural
           | rendering / 3D vision space - the field is moving quickly and
           | there's interesting work across all dimensions. My personal
           | interests lean towards large-scale 3D reconstruction, and to
           | that effect eliminating the need for traditional SfM/COLMAP
           | preprocessing would be great. There's a lot of relevant
           | recent work (https://dust3r.europe.naverlabs.com/,
           | https://cameronosmith.github.io/flowmap/,
           | https://vggsfm.github.io/, etc), but scaling these methods
           | beyond several dozen images remains a challenge. I'm also
           | really excited about using learned priors that can improve
           | NeRF quality in underobserved regions
           | (https://reconfusion.github.io). IMO using these priors will
           | be super important to enabling dynamic 4D reconstruction
           | (since it's otherwise unfeasible to directly observe every
           | space-time point in a scene). Finally, making NeRF
           | environments more interactive (as other posts have described)
           | would unlock many use cases especially in simulation (ie: for
           | autonomous driving). This is kind of tricky for implicit
           | representations (like the original NeRF and this work), but
           | there have been some really cool papers in the 3D Gaussian
           | space (https://xpandora.github.io/PhysGaussian/) that are
           | exciting.
        
         | aerodog wrote:
         | Thanks for your work!
         | 
         | From my experience, NERF works great, but depends on highly
         | accurate camera location information. Unless the VR device has
         | this baked in, one must run a Colmap-style or SFM-style process
         | to generate those camera extrinsics. Is there anything special
         | HybridNeRF does around this?
        
           | modeless wrote:
           | The method in this paper relies on precomputed camera poses
           | as input, but there have been tons of papers published on the
           | topic of eliminating this requirement. Here are a few:
           | https://dust3r.europe.naverlabs.com/
           | https://arxiv.org/abs/2102.07064
           | https://arxiv.org/abs/2312.08760v1
           | https://x.com/_akhaliq/status/1734803566802407901
        
             | turkihaithem wrote:
             | Your understanding is correct!
        
         | dr_dshiv wrote:
         | Wow, it looks beautiful.
         | 
         | Can regular phones capture the data required? How to get into
         | this, as a hobbyist? I'm interested in the possibilities of
         | scanning coral reefs and other ecological settings.
        
           | turkihaithem wrote:
           | One of the datasets we evaluated against in our paper uses a
           | bespoke capture rig (https://github.com/facebookresearch/Eyef
           | ulTower?tab=readme-o...) but you can definitely train very
           | respectable NeRFs using a phone camera. In my experience it's
           | less about camera resolution and more about getting a good
           | capture - many NeRF methods assume that the scene is static,
           | so minimizing things like lighting changes and transient
           | shadows can make a big difference. If you're interested in
           | getting your feet wet, I highly recommend Nerfstudio
           | (https://docs.nerf.studio)!
        
       | lxe wrote:
       | Absolute noob question that I'm having a hard time understading:
       | 
       | In practice, why NeRF instead of Gaussian Splatting? I have very
       | limited exposure to either, but a very cursory search on the
       | subject yields a "it depends on the context" answer. What exact
       | context?
        
         | zlenyk wrote:
         | It's just a completely different paradigm of rendering and it's
         | not clear which one will be dominant in the future. Gaussian
         | splats are usually dependent on initialisation from point
         | cloud, which makes whole process much more compliacated.
        
         | GistNoesis wrote:
         | There are two aspects in the difference between NeRF and
         | Gaussian Splatting :
         | 
         | - The first aspect concern how they solve the light rendering
         | equation :
         | 
         | NeRF has more potential for rendering physical quality but is
         | slower.
         | 
         | NeRF use raycasting. Gaussian Splatting project and draw
         | gaussians directly in screen space.
         | 
         | Each have various rendering artefacts. One distinction is in
         | handling light reflections. When you use raycasting, you can
         | bounce your ray on mirror surfaces. Where as gaussian
         | splatting, like alice in wonderland creates a symmetric world
         | on the other side of the mirror (and when the mirror surface is
         | curved, it's hopeless).
         | 
         | Although many NeRF don't implement reflections as a
         | simplification, they can handle them almost natively.
         | 
         | Alternatively, NeRF is a volumetric representation, whereas
         | Gaussian Splatting has surfaces baked in : Gaussian Splats are
         | rendered in order front to back. This mean that when you have
         | two thin objects one behind the other, like the two sides of a
         | book, Gaussian splatting will be able to render the front and
         | hide the back whereas NeRF will merge front and back because
         | volumetric element are transparent. (Though in NeRF with
         | spherical harmonics the Radiance Field direction will allow to
         | cull back from front based on the viewing angle).
         | 
         | - The second aspect of NeRF vs Gaussian Splatting, is the
         | choice of representation :
         | 
         | NeRF usually use a neural network to store the scene in a
         | compressed form. Whereas Gaussian Splatting is more explicit
         | and uncompressed, the scene is represented in a sort of "point
         | cloud" fashion. This mean that if your scene has potential for
         | compression, like repetitive textures or objects, then the NeRF
         | will make use of it and hallucinate what's missing. Whereas
         | gaussian splat will show holes.
         | 
         | Of course like this article is about, you can hybridize them.
        
       ___________________________________________________________________
       (page generated 2024-06-22 23:00 UTC)