https://github.com/apple/ml-gaudi

Skip to content
 
Sign up

  * Product
      + Features
      + Mobile
      + Actions
      + Codespaces
      + Copilot
      + Packages
      + Security
      + Code review
      + Issues
      + Discussions
      + Integrations
      + GitHub Sponsors
      + Customer stories
  * Team
  * Enterprise
  * Explore
      + Explore GitHub
      + Learn and contribute
      + Topics
      + Collections
      + Trending
      + Skills
      + GitHub Sponsors
      + Open source guides
      + Connect with others
      + The ReadME Project
      + Events
      + Community forum
      + GitHub Education
      + GitHub Stars program
  * Marketplace
  * Pricing
      + Plans
      + Compare plans
      + Contact Sales
      + Education

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
apple / ml-gaudi Public

  * Notifications
  * Fork 6
  * Star 212

License

View license
212 stars 6 forks
Star
Notifications

  * Code
  * Pull requests 0
  * Security
  * Insights

More

  * Code
  * Pull requests
  * Security
  * Insights

apple/ml-gaudi

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags
1 branch 0 tags
Code

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/a]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone apple/]

    Work fast with our official CLI. Learn more.

  * Open with GitHub Desktop
  * Download ZIP

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@mbautistamartin
mbautistamartin Update Arxiv link
...
f330e84 Jul 29, 2022
Update Arxiv link
f330e84

Git stats

  * 2 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
viz
First commit
Jul 27, 2022
CODE_OF_CONDUCT.md
First commit
Jul 27, 2022
CONTRIBUTING.md
First commit
Jul 27, 2022
LICENSE.txt
First commit
Jul 27, 2022
README.md
Update Arxiv link
Jul 29, 2022
View code
[                    ]
GAUDI: A Neural Architect for Immersive 3D Scene Generation, Arxiv.
Summary Model Results Uncoditional generation Text conditional
generation Image conditional generation Interpolation Citation Source
code Related links

README.md

 GAUDI: A Neural Architect for Immersive 3D Scene Generation, Arxiv.

      [vlnce]   [vlnce]   [vlnce]   [vlnce]   [vlnce]   [vlnce]

   Samples from GAUDI (Allow a couple minutes of loading time for
                              videos.)

Miguel Angel Bautista*, Pengsheng Guo*, Samira Abnar, Walter Talbott,
Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin
      Goh, Daniel Ulbricht, Afshin Dehghan, Joshua M. Susskind
                     Apple (*equal contribution)

 Summary

  * We introduce GAUDI, a generative model that captures the
    distribution of 3D scenes parametrized as radiance fields.
  * We decompose generative model in two steps: (i) Optimizing a
    latent representation of 3D radiance fields and corresponding
    camera poses. (ii) Learning a powerful score based generative
    model on latent space.
  * GAUDI obtains state-of-the-art performance accross multiple
    datasets for unconditional generation and enables conditional
    generation of 3D scenes from different modalities like text or
    RGB images.

Expand Abstract

We introduce GAUDI, a generative model capable of capturing the
distribution of complex and realistic 3D scenes that can be rendered
immersively from a moving camera. We tackle this challenging problem
with a scalable yet powerful approach, where we first optimize a
latent representation that disentangles radiance fields and camera
poses. This latent representation is then used to learn a generative
model that enables both unconditional and conditional generation of
3D scenes. Our model generalizes previous works that focus on single
objects by removing the assumption that the camera pose distribution
can be shared across samples. We show that GAUDI obtains
state-of-the-art performance in the unconditional generative setting
across multiple datasets and allows for conditional generation of 3D
scenes given conditioning variables like sparse image observations or
text that describes the scene.

 Model

Our model is composed of two stages: latent representation
optimization and generative modeling. Finding powerful latent
representation for scene radiance fields and camera poses is critical
to obtain good performance. To achieve this, we design a decoder with
three modules:

  * A scene decoder $d$ that takes as input scene latents and outputs
    a tri-plane latent representation to condition a the radiance
    field MLP.
  * A camera pose decoder $c$ that takes as input a camera pose
    latent and a timestamp and outputs a camera pose.
  * A radiance field $f$ that takes as input a 3d point and is
    conditioned on the tri-plane representation.

The parameters of all the modules and the latents for scene and
camera poses are optimized in the first stage. In the second stage,
we learn a score-based generative model in latent space.

model

 Results

We present qualitative results for both unconditional and conditional
generative modeling. During inference, we sample latents from the
generative model and feed them through the decoder to obtain a
radiance field and camera path. In the conditional setting we train
the generative model using pairs of latents and conditioning
variables (like text or images) and sample latents given conditioning
variables during inference.

 Uncoditional generation

Random samples from the unconditional version of GAUDI for 4
different datasets: Vizdoom, Replica, VLN-CE and ARKITScenes.

                         [vizdoom] [replica]
                           [vlnce] [arkit]

 Text conditional generation

Random samples from a text conditional GAUDI model trained on VLN-CE.

                    Prompt: "go down the stairs"

                       [cond_] [cond_] [cond_]

                  Prompt: "go through the hallway"

                       [cond_] [cond_] [cond_]

                     Prompt: "go up the stairs"

                       [cond_] [cond_] [cond_]

                   Prompt: "walk into the kitchen"

                       [cond_] [cond_] [cond_]

 Image conditional generation

Random samples from a image conditional GAUDI model trained on
VLN-CE.

                            Image prompt

                               [cond_]

                       [cond_] [cond_] [cond_]

                            Image prompt

                               [cond_]

                       [cond_] [cond_] [cond_]

                            Image prompt

                               [cond_]

                       [cond_] [cond_] [cond_]

                            Image prompt

                               [cond_]

                       [cond_] [cond_] [cond_]

 Interpolation

We can linearly interpolate the latent representation of two scenes
(leftmost and rightmost columns) and move the camera to explore the
interpolated scene.

                           [interpolation]

 Citation

@article{bautista2022gaudi,
    title={GAUDI: A Neural Architect for Immersive 3D Scene Generation},
    author={Miguel Angel Bautista and Pengsheng Guo and Samira Abnar and Walter Talbott and Alexander Toshev and Zhuoyuan Chen and Laurent Dinh and Shuangfei Zhai and Hanlin Goh and Daniel Ulbricht and Afshin Dehghan and Josh Susskind},
    journal={arXiv},
    year={2022}
}

The author's copyright under the videos provided here are licensed
under the CC-BY-NC license.

 Source code

Source code will be available in the following weeks.

 Related links

Check out recent related work on making radiance fields generalize to
multiple objects/scenes:

  * PixelNeRF
  * GRAF
  * pi-GAN
  * IBRNet
  * GSN
  * NeRF-VAE
  * StyleNeRF
  * MP3D-license

About

No description, website, or topics provided.

Resources

Readme

License

View license

Code of conduct

Code of conduct

Stars

212 stars

Watchers

9 watching

Forks

6 forks

Releases

No releases published

Packages 0

No packages published

Footer

 (c) 2022 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.