[HN Gopher] Nvidia Research Turns 2D Photos into 3D Scenes in th...
___________________________________________________________________
Nvidia Research Turns 2D Photos into 3D Scenes in the Blink of an
AI
Author : bcaulfield
Score : 144 points
Date : 2022-03-25 20:51 UTC (2 hours ago)
(HTM) web link (blogs.nvidia.com)
(TXT) w3m dump (blogs.nvidia.com)
| XorNot wrote:
| So the part which makes this interesting to me is the speed. My
| new desire in our video conferencing world these days has been to
| have my camera on but running a corrected model of myself so I
| can sustain apparently eye-contact without needing to look
| directly at the camera.
| aaron695 wrote:
| siavosh wrote:
| I'm curious for those that work with NeRFs what their results
| look like for random images as opposed to the 'nice' ones that
| are selected for publications/demos.
| jrib wrote:
| Just want to say I appreciate the cleverness of the title.
| daenz wrote:
| >The model requires just seconds to train on a few dozen still
| photos -- plus data on the camera angles they were taken from --
| and can then render the resulting 3D scene within tens of
| milliseconds.
|
| Generating the novel viewpoints is almost fast enough for VR,
| assuming you're tethered to a desktop computer with whatever GPUs
| they're using (probably the best setup possible).
|
| The holy grail (from my estimation) is getting both the training
| and the rendering to fit into a VR frame budget. They'll probably
| achieve it soon with some very clever techniques that only
| require differential re-training as the scene changes. The result
| will be a VR experience with live people and objects that feels
| photorealistic, because it essentially is based on real photos.
| simsla wrote:
| > plus data on the camera angles they were taken from
|
| Doesn't seem like much of a stretch to determine the angles as
| well.
|
| E.g. a semi brute forced way with GANs
| riotnrrd wrote:
| You don't even need anything that fancy. Traditional
| structure-from-motion, or visual odometry gives accurate
| enough position estimations.
|
| If you want to experiment, take a bunch (~100) of photos of
| an object, and use COLMAP to generate the poses. COLMAP
| implements a global SfM technique, so it will be very
| accurate but very slow.
| c4wrd wrote:
| I've spent a lot of time thinking about this (i.e. taking a
| video and creating a 3D scene) and I don't think that it is
| feasible in most cases to have good accuracy. If you need to
| infer the angle, you need make a lot of biased assumptions
| about things like velocity, position, etc., of the camera and
| even if you were 99.9% accurate, that 0.1% inaccuracy is
| compounded over time. Now I'm not saying it's not possible,
| but I'd believe that if you want an accurate 3D scene, you'd
| rather be spending your computation budget on things other
| than determining those angles when it can be simply be
| provided by hardware.
| krasin wrote:
| https://github.com/NVLabs/instant-ngp has a script that
| converts a video into frames and then uses COLMAP ([1]) to
| compute camera poses. You can then train a NeRF model
| within a few seconds.
|
| It all works pretty well. Trying it on your own video is
| pretty straightforward.
|
| 1. https://colmap.github.io/
| riotnrrd wrote:
| You're far too pessimistic (or maybe you don't know the
| field well). The problem of estimating the relative poses
| of the cameras responsible for a set of photos is a long
| standing and essentially "solved" problem in computer
| vision. I say "solved" because there is still active
| research (increasing accuracy, faster, more robust, etc.)
| but there are decades-old, well known techniques that any
| dedicated programmer could implement in a week.
|
| If you're genuinely curious, look into structure from
| motion, visual odometry, or SLAM.
| doliveira wrote:
| > even if you were 99.9% accurate, that 0.1% inaccuracy is
| compounded over time
|
| Not really, with SLAM there are various algorithms to keep
| inaccuracy in check. Basically it works by a feedback loop
| of guessing an estimate for position and then updating it
| using landmarks.
| anyfactor wrote:
| Tangent
|
| I wonder what happens to most people when they see innovation
| such as this. Over the years I have seen numerous mind-blowing AI
| achievement, which essentially feel like miracles. Yet literally
| after an hour I forget what I even saw. I don't find these
| innovations to have a lasting impression on me or on the internet
| except for the times when these solutions are released to the
| public for tinkering and they end up failing catastrophically.
|
| I remember having the same feeling about chatbots and TTS
| technology literally ages ago, but at present time, the practical
| use of these innovation feel very mediocre.
| tomatowurst wrote:
| hmmm I really find this to be different from chatbots, in fact
| it took me a lot of skepticism to overcome before using github
| copilot and I saw a new reality where it became part of the
| process, albeit, not as prolific but enough to make me ponder
| what the next evolution might be.
|
| For 3D modelers, this is huge since it takes a lot of
| experience and grunt work to put the right touches to get an
| even a boilerplate 3D model. So much so that many game
| companies have outsourced non-human 3d modeling, this would
| certainly impact those markets.
|
| 1) It could further lower the cost and improve quality.
|
| 2) Studios could move back those time-consuming tasks on-shore
| and put an experienced in house artist/modeler to manage the
| production.
|
| 3) Hybrid of both
|
| What I see here is that NeRF has a far more impact to the 3d
| modeling/animating industry than github copilot. Another
| certainty is that we are going to see faster rate innovation.
| We are at a point where a paper released merely months ago are
| being completely outpaced by another. The improvement in
| training time that NeRF offers is insane, especially given how
| quickly this new approach came out.
|
| We could be at a future where the release of AI achievements
| will not be able to keep up with published works. It would be
| as fast as somebody tweeting a new technique, only to be
| outdone by somebody weeks or possibly days.
|
| Truly exciting times.
| rilezg wrote:
| I think with any tech demo (or other corporate PR piece), it is
| good to assume the worst, because companies spin things to be
| as ducky as possible. This is a self-reinforcing cycle, because
| if two companies have identical products, then the best liar--
| er, marketer--will win.
|
| (not to say this sort of behavior is exclusive to corporate PR.
| as the best and smartest person ever, I would never need to
| exaggerate my achievements on a job application, but others
| may)
| TulliusCicero wrote:
| The problem is that when the thing is initially announced, it's
| not useful to anyone yet, because it's not productionized and
| released to the general public.
|
| But then once it IS released to the general public, it's
| probably been at least several months, maybe even multiple
| years since the announcement, so people are like, " _yawn_ ,
| this is old news."
| fleischhauf wrote:
| I have the impression, that now some of them seem to really end
| up in some practical applications. Funnily enough someone just
| today showed me a feature of his phone where you can select
| some undesired objects in youe photo and it would just replace
| them with a fitting background indistinguishable from the
| original photo.
| ksec wrote:
| Nvidia is really turning into an AI powerhouse. The moat around
| CUDA, and how those target customer aren't as stringent about
| budget, especially when the hardware cost is tiny compare to what
| they do.
|
| I wonder if they could reach a trillion market cap.
| alanwreath wrote:
| I'm probably going to ramp up the number of photos I take in hope
| that google photos auto applies this tech
| danamit wrote:
| I am kinda skeptical, AI demos are impressive but the real world
| results are underwhelming.
|
| How much it resources it takes to generate images like that? is
| this the most ideal situation?
|
| Can you take images from the web and based on metadata make a
| better street view?
|
| With all this AI where is one accessible translation service? or
| even an accent-adjusting service? or just good auto-subtitles?
| woah wrote:
| Are there examples of this being used on large outdoor spaces?
| krasin wrote:
| Yes, Waymo did the whole San Francisco block:
| https://waymo.com/research/block-nerf/
| noduerme wrote:
| There's something very uncanny-valley about that video. I
| can't decide if it's the smoothness of the shading on the
| textures or if it's the way the parallax perspective on the
| buildings sometimes is just a tiny bit off. I don't generally
| get motion sickness from VR but I feel like this would cause
| it.
| jowday wrote:
| You'll find this is true of all NeRFs if you spend time
| playing around with them. If a NeRF is trying to render
| part of an object that wasn't observered in the input
| images, it's going to look strange, since it's ultimately
| just guessing at the appearance. The NVidia example in the
| link has the benefit of focusing on a single entity that's
| centered in all of the input photographs - the effect is
| much more pronounced in large scale scenes with tons of
| objects, like the Waymo one. You can still see some of this
| distortion in the NVidia one - pay close attention to the
| backside of the woman's left shoulder. You'll see a faint
| haze or blur near her shoulder - the input images didn't
| contain a clear shot of it from multiple angles, so the
| model has to guess when rendering it.
| gundmc wrote:
| Woah, this video is way more interesting than the Nvidia
| polaroid teaser in the original link.
| krasin wrote:
| Still, NVIDIA's achievement (and Thomas Muller in
| particular) is amazing. Thomas and his collaborators
| achieved an almost 1000x performance improvement, by a
| combination of algorithmical and implementation tricks.
|
| I highly recommend trying this at home:
|
| https://nvlabs.github.io/instant-ngp/
|
| https://github.com/NVlabs/instant-ngp
|
| Very straightforward and gives better insight into what
| NeRF is than any shiny marketing demo.
| cinntaile wrote:
| Waymo needed 2.8 million images to create that scene, I
| wonder how many Nvidia would need? Or was the focus only
| on speed? I skimmed the article and didn't really find
| info on that.
| krasin wrote:
| Waymo essentially trained several NeRF models for Block-
| NeRF that are rendered together. It's conceivable that
| NVIDIA's instant-ngp could be used for that.
| xrd wrote:
| This nerf project is cool too.
|
| https://github.com/bmild/nerf
|
| I've been trying to get GANs to do this for a while, but NeRFs
| look like the perfect fit.
| maybelsyrup wrote:
| Is anyone else kinda terrified?
| sorenjan wrote:
| I don't really understand why NeRFs would be particularly useful
| in more than a few niche cases, perhaps because I don't fully
| understand what they really are.
|
| My impression is that you take a bunch of photos in various
| places and directions, then you use those as samples of a 3D
| function that describes the full scene, and optimize a neural
| network to minimize the difference between the true light field
| and what's described by the network. An approximation of the
| actual function, that fits the training data. The millions of
| coefficients are seen as a black box that somehow describes the
| scene when combined in a certain way, I guess mapping a camera
| pose to a rendered image? But why would that be better than some
| other data structure, like a mesh, a point cloud, or signed
| distance field, where you have the scene as structured data you
| can reason about? What happens if you want to animate part of a
| NeRF, or crop it, or change it in any way? Do you have to throw
| away all trained coefficients and start again from training data?
|
| Can you use this method as a part of a more traditional
| photogrammetry pipeline and extract the result as a regular mesh?
| Nvidia seems to suggest that NeRFs are in some way better than
| meshes, but according to my flawed understanding they just seem
| unwieldy.
| sennight wrote:
| I know that taste in comedy is seasonal (yes, there were a people
| in a time that thought vaudeville was the cat's pajamas), but has
| anyone ever greeted a pun with anything other than a pained sigh?
| noduerme wrote:
| It's ones like this that make me shake my head and go "Aiaiai."
| ModernMech wrote:
| Watch Bob's Burgers. The whole show is basically puns. I
| chuckle.
| cogman10 wrote:
| Puns aren't to make people laugh, the pained sigh is the point.
| It's schadenfreude for the person making the pun.
| sennight wrote:
| > It's schadenfreude for the person making the pun.
|
| Nah, if it is a joke at their own expense then it is "self
| deprecating humor", something which is definitely designed to
| get a laugh. Humiliation fetish, maybe? Obviously nothing is
| funny past a certain point of deconstruction... especially if
| you find yourself defending the distinguishing difference of
| the "meta". Just stop making puns, easy.
| bogwog wrote:
| Nvidia is leaving us all behind
| [deleted]
| syspec wrote:
| Is there a video of this? I'm not sure what's the connection to
| the top photo/video/matrix-360-effect
|
| Was that created from a few photos? I didn't see any additional
| imagery below
|
| --- Update
|
| It looks like these are the four source photos:
| https://blogs.nvidia.com/wp-content/uploads/2022/03/NVIDIA-R...
|
| Then it creates this 360 video from them:
| https://blogs.nvidia.com/wp-content/uploads/2022/03/2141864_...
| elil17 wrote:
| My prediction/hope is that NeRFs will totally revolutionize how
| the film/TV industry. I can imagine:
|
| - Shooting a movie from a few cameras, creating a movie version
| of a NeRF using those angles, and then dynamically adding in
| other shots in post
|
| - Using lighting and depth information embedded in NeRFs to
| assist in lighting/integrating CG elements
|
| - Using NeRFs to generate virtual sets on LED walls (like those
| on The Mandalorian) from just a couple of photos of a location or
| a couple of renders of a scene (currently, the sets have to be
| built in a game engine and optimized for real time performance).
| jowday wrote:
| This sort of stuff (generating 3D assets from photographs of
| real objects) has been common for quite a while via
| photogrammetry. NeRFs are interesting because (in some cases)
| they can create renders that look higher quality with fewer
| photos, and they hint at the potential of future learned
| rendering models.
| cogman10 wrote:
| Perhaps even making non-gimmicky live action 3d films.
|
| Having 3d renders of the entire film without needing green
| screens and a bunch of balls seems like it would have to make
| some of the post processing work easier. You can add or remove
| elements. Adjust the camera angles. More effectively de-age
| actors. Heck, even create scenes whole cloth if an actor
| unexpectedly dies (since you still have their model).
|
| Seems like you could also save some time having fewer takes.
| What you can fix in post would be dramatically expanded.
|
| Best part for film makers, they are often using multiple
| cameras anyways. So this doesn't seem like it'd be too much of
| a stretch.
| ALittleLight wrote:
| My, maybe too extreme, future fantasy version of this is
| turning existing movies into 3d movies you could watch in VR.
| andy_ppp wrote:
| Computer games, VR and AR could also be pretty amazing uses for
| this technique too.
| teaearlgraycold wrote:
| RIP photo-realistic modelers
| tomatowurst wrote:
| hmmm well I still think they will be in demand for the same
| reason software developers will be not automated away. NeRF
| is really mind boggling good but there are still artifacts,
| and something that modelers have a good eye for.
|
| Having said that, it might be the end for any junior type
| of roles. Same reason that github copilot really takes a
| bite of the need to have a junior developer.
|
| I'm very curious what will happen because it will become a
| sort of trend across other industries apart from legal or
| medical professions (peace of mind from human-in-the-loop).
| teaearlgraycold wrote:
| Maybe we'll have people spend their time building IRL
| sculptures and spaces to get digitized.
| EugeneOZ wrote:
| It will boost cut-scenes in games as well.
| usrusr wrote:
| > - Using lighting and depth information embedded in NeRFs to
| assist in lighting/integrating CG elements
|
| > - Using NeRFs to generate virtual sets on LED walls
|
| Sounds like a powerful set of tools to defeat a number of image
| manipulation detection tricks, with limited effort once the
| process is set up as routine. State actor level information
| warfare will soon be a class of its own. Not just in terms of
| getting harder to detect, but more importantly in terms of
| becoming able to produce "quality" in high volume.
| gareth_untether wrote:
| AI and 3D content making is becoming so exciting. Soon we'll have
| an idea and be able to make it with automated tools. Sure having
| a deeper undertaking of how 3D works will be beneficial, but will
| no longer be the entry requirement.
| PaulHoule wrote:
| If you have a graphics card which is unobtainable.
| baron816 wrote:
| It would be really great to recreate loved ones after they have
| past in some sort of digital space.
|
| As I've gotten older, and my parents get older as well, I've been
| thinking more about what my life will be like in old age (and
| beyond too). I've also been thinking what I would want "heaven"
| to be. Eternal life doesn't appeal to me much. Imagine living a
| quadrillion years. Even as a god, that would be miserable. That
| would be (by my rough estimate) the equivalent of 500 times the
| cumulative lifespans of all humans who have ever lived.
|
| What I would really like is to see my parents and my beloved dog
| again, decades after they have past (along with any living ones
| at that time). Being able to see them and speak to them one last
| time at the end of my life before fading into eternal darkness
| would be how I would want to go.
|
| Anyway, there's a free startup idea for anyone--recreate loved
| ones in VR so people can see them again.
| rilezg wrote:
| This reminds me a lot of Black Mirror season 2 episode 1.
|
| Always good to treasure the time we are given.
| the_mar wrote:
| and then you have to put your loved one's in the attic
| olladecarne wrote:
| There's so many things we invent with good intentions but in
| the end go terribly wrong and I think this is one of those
| things. I think it's ok to mourn and remember the past, but
| moving on and accepting reality is important to a healthy life.
|
| Let's be real though, the startup that makes this but appeals
| to our worst instincts make bank. I can't imagine how much more
| messed up future generations will be as we keep making more
| dangerous technology that appeals to our primal instincts.
| lowdest wrote:
| >Imagine living a quadrillion years. Even as a god, that would
| be miserable.
|
| This seems very subjective, I don't agree at all.
| luckydata wrote:
| I'm really looking forward to this technology getting applied to
| home improvement.
___________________________________________________________________
(page generated 2022-03-25 23:00 UTC)