https://smerf-3d.github.io/ SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration * Daniel Duckworth*^1 * Peter Hedman*^2 * Christian Reiser^2,4 * Peter Zhizhin^2 * Jean-Francois Thibert^3 * Mario Lucic^1 * Richard Szeliski^2 * Jonathan T. Barron^2 * Google DeepMind^1 * Google Research^2 * Google Inc.^3 * Tubingen AI Center, University of Tubingen^4 * equal contribution* * [paper] Paper * [youtube_pr] Video * [demo] Demos Your browser does not support the video tag. Your browser does not support the video tag. Your browser does not support the video tag. Your browser does not support the video tag. Abstract Recent techniques for real-time view synthesis have rapidly advanced in fidelity and speed, and modern methods are capable of rendering near-photorealistic scenes at interactive frame rates. At the same time, a tension has arisen between explicit scene representations amenable to rasterization and neural fields built on ray marching, with state-of-the-art instances of the latter surpassing the former in quality while being prohibitively expensive for real-time applications. In this work, we introduce SMERF, a view synthesis approach that achieves state-of-the-art accuracy among real-time methods on large scenes with footprints up to 300 m^2 at a volumetric resolution of 3.5 mm^3. Our method is built upon two primary contributions: a hierarchical model partitioning scheme, which increases model capacity while constraining compute and memory consumption, and a distillation training strategy that simultaneously yields high fidelity and internal consistency. Our approach enables full six degrees of freedom (6DOF) navigation within a web browser and renders in real-time on commodity smartphones and laptops. Extensive experiments show that our method exceeds the current state-of-the-art in real-time novel view synthesis by 0.78 dB on standard benchmarks and 1.78 dB on large scenes, renders frames three orders of magnitude faster than state-of-the-art radiance field models, and achieves real-time performance across a wide variety of commodity devices, including smartphones. Video Real-Time Interactive Viewer Demos * [berlin_v2] Berlin * [nyc] NYC * [alameda] Alameda * [london_v2] London * [gardenvase] Gardenvase * [bicycle] Bicycle * [kitchenlego] Kitchen Lego * [stump] Stump * [officebonsai] Office Bonsai * [fulllivingroom] Full Living Room * [kitchencounter] Kitchen Counter * [flowertreehill] Treehill & Flower How we boost representation power to handle large scenes overview (a): We model large multi-room scenes with a number of independent submodels, each of which is assigned to a different region of the scene. During rendering the submodel is picked based on camera origin. (b): To model complex view-dependent effects, within each submodel we additionally instantiate grid-aligned copies of deferred MLP parameters \(\theta\). These parameters are trilinearly interpolated based on camera origin \(\mathbf{o}\). (c): While each submodel represents the entire scene, only the submodel's assiociated grid cell is modelled with high resolution, which is realized by contracting the submodel-specific local coordinates. Getting the maximum out of our representation via distillation contraction We demonstrate that image fidelity can be greatly boosted via distillation. We first train a state-of-the-art offline radiance field (Zip-NeRF). We then use the RGB color predictions \(\mathbf{c} \) of this teacher model as supervision for our own model. Additionally, we access the volumetric density values \(\tau\) of the pre-trained teacher by minimizing the discrepancy of volume rendering weights between teacher and student. Citation If you want to cite our work, please use: @misc{duckworth2023smerf, title={SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration}, author={Daniel Duckworth and Peter Hedman and Christian Reiser and Peter Zhizhin and Jean-Francois Thibert and Mario Lucic and Richard Szeliski and Jonathan T. Barron}, year={2023}, eprint={2312.07541}, archivePrefix={arXiv}, primaryClass={cs.CV} } Acknowledgements The website template was borrowed from Michael Gharbi. Image sliders are based on dics.