[HN Gopher] GET3D: A Generative Model of High Quality 3D Texture...
___________________________________________________________________
GET3D: A Generative Model of High Quality 3D Textured Shapes
Learned from Images
Author : lnyan
Score : 129 points
Date : 2022-09-24 13:49 UTC (9 hours ago)
(HTM) web link (nv-tlabs.github.io)
(TXT) w3m dump (nv-tlabs.github.io)
| ummonk wrote:
| Still nowhere near good enough to be able to generate a VFX or
| video game asset from some pictures, which is what we'd really
| want for a practical application of such a tool.
| mgraczyk wrote:
| Generating good video game assets from pictures is solved, but
| this does more than that. It generates modified versions from
| words.
| aaaaaaaaaaab wrote:
| >Generating good video game assets from pictures is solved
|
| lol no, not at all. It still needs tons of manual work to get
| it up to quality in terms of topology, material, etc.
| mgraczyk wrote:
| In terms of practical engineering it's not solved, I mean
| that that SOTA in photogrammetry is good enough to create
| high quality textures and meshes directly from pictures.
| aaaaaaaaaaab wrote:
| Those meshes and textures are far from usable for
| realtime rendering in a 3D game.
| etaioinshrdlu wrote:
| In somewhat related topics, I think we can just use stable
| diffusion to help convert single photos to 3D NERFs.
|
| 1. find the prompt that best generates the image
|
| 2. generate a (crude) NERF from your starting image and render
| views from other angles
|
| 3. use stable diffusion with the views from other angles as seed
| images, refine them using the prompt from 1 combined with(add
| descriptions to generate "view from back", "view from top", etc
|
| 4. feed the refined views back to the NERF generator, keeping the
| initial photo view constant
|
| 5. Generate new views from the NERF, which should now be much
| more realistic.
|
| Run the above steps 2-5 in a loop indefinitely. Eventually you
| should end up with a highly accurate, realistic NERF which is
| full 3d from any angle, all from a single photo.
|
| Similar techniques could be used to extend the scene in all
| directions.
| eutectic wrote:
| I have my doubts that this will converge to anything
| meaningful.
| aliqot wrote:
| In the short term you may be right, but in the long run it's
| a certainty you won't.
| rsp1984 wrote:
| The problem with such an approach would be that NERFs require a
| set of input images _with their exact poses_ and exact poses
| are only available if the underlying geometry is static.
| However if you use SD to generate new views it 's only an
| approximation and you wouldn't be able to get the exact poses.
|
| Not all hope is lost though. I'm pretty sure in a few years
| (perhaps sooner) we'll be able to generate entire 3D scenes
| directly without going through 2D images as an intermediate
| step.
| londons_explore wrote:
| I don't see it as a blocker... Especially if you alternate
| NeRF and SD iterations - ie. don't generate a whole image
| each time, instead just choose a random angle, do NeRF
| reconstruction, then run a single SD iteration on that NeRF,
| and do another training step of the NeRF.
|
| That way, you know the exact pose for each image, because you
| chose it when rendering the NeRF.
| eutectic wrote:
| You can optimize the poses as part of the model.
| bno1 wrote:
| An AI that does good UV unwrapping would be much more interesting
| and useful.
| smoldesu wrote:
| And I'd love to see an AI-generated rigging tool for auto-
| generating bone structures so you don't have to do it by hand.
|
| Baby steps, though. The data required to train an
| unwrapping/rigging tool is a lot more domain-specific than
| correlating an OBJ file with it's completed render.
| rytill wrote:
| Can you help me understand how you are imagining that? As in,
| you have texture images already and you want to apply them to
| the 3D object intelligently? Is that the case you're talking
| about? Or texture generation and UV unwrapping in one?
| caenorst wrote:
| I think what they mean by UV unwrapping is generating a UV
| map from. Textured model (here the texture is generated by a
| triplanes network).
|
| It's interesting for compression purpose, but kinda
| orthogonal to this method (a good UV unwrapping can be used
| once the model generated)
| bno1 wrote:
| Generating an UV map from an untextured mesh. The UV map is
| stored in vertices as texture coordinates, and together with
| the topology of the mesh it defines how the texture (images)
| gets mapped to the mesh. A good UV maps preserve surface area
| (e.g. every region of the mesh map to regions of the texture
| proportionally, otherwise you get stretching), have few seams
| and have little empty space around the texture to reduce
| size.
|
| There are ways to do this automatically but they're far from
| perfect. Artists usually take the mesh and literally unwrap
| it until it's planar, and convert this transformed mesh to
| the UV mapping. The advantage of this method is that it gives
| you very good control of seams and texture islands, but it's
| tricky to preserve surface area.
|
| Those neural rendering methods are very cool because they use
| light/color fields, but they still have a lot of catching up
| to do compared to modern 3D graphics.
| TOMDM wrote:
| Ok, so on the generative model modality landscape I'm now aware
| of:
|
| - speech
|
| - images
|
| - audio samples
|
| - text
|
| - code
|
| - 3d models
|
| I've seen basic attempts at music and video, and based on
| everything else we've seen getting good results there seems to be
| mostly a matter of scaling.
|
| What content generation modalities are left? Will all corporate
| generation of these fall to progressively larger models, leaving
| a (relatively) niche "Made by humans!" industry in it's wake?
| christiangenco wrote:
| HN comments.
| r1chardnl wrote:
| Management positions replaced by a supervising AI
| TOMDM wrote:
| I wonder what your input data would need to be for a
| competant AI in this space.
|
| Finance, goals, delivery timelines, capabilities of the team,
| employee availability, which employees work well with each
| other, office politics, regulatory constraints...
| Filligree wrote:
| It doesn't need to work as well as current management, just
| well enough to be cheaper.
|
| Perhaps not even that, factoring in loyalty.
| thanatos519 wrote:
| The problem is finding a good training set.
| monkeydust wrote:
| Don't see why not, especially for low impact, high frequency
| decisions. Some AI guided assistance with option to automate.
| The next level for auto complete I guess.
| Keyframe wrote:
| Animation, at least the 'background one'.
| snek_case wrote:
| It's not just a matter of different modalities. It's still a
| matter of sophistication.
|
| The end game is endless generative music or video streaming
| customized to your preferences. Being able to describe a story,
| or having the AI model take a guess at what you mind find
| interesting/entertaining and generating a whole TV show or
| movie for you to watch. Or generating background music while
| you work and automatically adjusting to your tastes as well as
| adjusting if you're finding it hard to concentrate or you need
| to take a call.
| tetris11 wrote:
| Except it wont be, will it. Such things were promised for the
| internet and we had maybe a good 10 years or so before corps
| caught up and told us what to watch through their channels.
|
| I imagine this being much of the same: AI trained on corp-
| approved training sets to give suggestions to your
| preferences that they want.
|
| Sure, you could spin up your own and buy a machine that
| trains on it's own training data, but watch how no one will
| do that because of the cost, or the diminishing access to
| untainted resources.
| theptip wrote:
| This seems like a weird take to me. We just saw Stable
| Diffusion land, an open-source community-trained SOTA
| model. There is an open reimplemented version of GPT.
|
| How is the correct extrapolation that corps will control
| all content generation?
|
| Sure, Google will always be an OOM or two (more?) ahead in
| terms of compute dedicated to the problem. And so the best-
| quality stuff will likely come from big corps; Netflix (or
| their successor) will have the best quality video-
| generation AI. That is how it always has been though;
| movies are heavily capital intensive.
|
| But this tech raises the quality of hobbyist-generated
| content vs. highly-capitalized studio content. So I think
| it's reasonable to extrapolate to even more content at the
| long tail, instead of consolidation.
| suby wrote:
| It's going to be akin to content creation on youtube,
| perhaps even just people using youtube as their
| distribution medium. Anyone can make a youtube video but we
| don't see everyone creating content.
|
| We should see a proliferation of the tech such that lots of
| small (even one-man) studios pop up pumping out high
| quality content, but the content is released on a schedule
| similar to how youtube videos are now. Your preferences
| come into play through your suggested watch list, it'll be
| populated from this pre-created media based on whatever
| preference machine learning algorithm the distribution
| platform (youtube?) decides. The feedback through watch-
| metrics will then be used by these micro studios to decide
| what to create next. It's basically what already happens
| now with youtube content creation, but the quality of what
| people will produce will be better than hollywood content
| movies / tv shows, and the pace of release will be much
| quicker.
|
| Not everyone needs to be training and generating their own
| content in order for your content preferences to be
| absolutely saturated with things you'd enjoy watching.
| thanatos519 wrote:
| I'm waiting for the "literal video" generator, which writes and
| sings new lyrics describing what is happening in the video.
| eezurr wrote:
| Music will continue to be made by humans because of strong
| copyright law. Its illegal to sample (and distribute) > 0
| seconds of recorded music. If that makes it into any ai
| generated music, its game over if you distribute it.
|
| Source on sampling: the head audio engineer at Juilliard School
| of Music.
| dinobones wrote:
| Why would being a "head audio engineer" at Julliard give you
| any credibility on AI generated music and sampling/copyright
| law? Lol.
| eezurr wrote:
| Because they work with/teach electronic music/sampling in
| addition to recording classical acoustic music.
| Geee wrote:
| It's not sampling. Sampling is copying & pasting, but that
| doesn't happen, technically. Just like stable diffusion
| doesn't copy any artworks. AI learns from previous works, but
| doesn't copy them. It's quite similar to how humans learn and
| make adaptations based on other work.
| eezurr wrote:
| While technically true, the unspoken premise of my argument
| is that it can and will output distinctive samples derived
| from the source. E.g. you cant change the pitch, tempo, and
| add other effects and call the output your own, legally.
|
| Its a landmine
| skybrian wrote:
| That seems like good advice for professionals, but I'm
| wondering if it's going to hold up with new ways of
| distribution.
|
| Would distributing a generative model that can sometimes
| generate such music would also be considered illegal? Will it
| actually stop people from doing it in practice?
|
| Would it be illegal to share seeds and prompts?
|
| Though these alternative methods, you could have a lot of
| people listening to music that's never distributed as audio
| or video files. And if there's an API for it, games could use
| such generated music via a plugin.
|
| And then I suppose people start sharing on YouTube, and we
| see how good their copyright violation detection actually is.
| shadowfoxx wrote:
| Well, you're gonna need folks to sift through all the generated
| images and curate the results into something coherent. Taste is
| still a thing after-all.
| TOMDM wrote:
| Visions of the future where the consumer has to pick apart
| "human curated", "human assembled" and "human made" much the
| same way we do now for cage free and free range eggs at the
| grocer.
| gersh wrote:
| That would be the same as curating social media feeds.
| brnaftr361 wrote:
| To me these models look worthless. They'd be useless for
| anything other than BG props with high DOF and complementary
| lighting, you can see on the rear windows in particular there
| are artifacts from the topology. If you hit most of that shit
| with a light from the side it would look horrendous.
|
| You can get away with a lot, but I think this is too much. I
| think future iterations could be promising, but this definitely
| isn't challenging any pipeline I'm aware of.
| smoldesu wrote:
| To be fair, these models are no worse than the ones the
| iPhone makes with LIDAR. It's pretty impressive for being
| generated from a single static image.
| jokethrowaway wrote:
| Great, now we can get the unreleased code for this paper and use
| it with the unreleased code for generating animations (really
| impressive stuff by Sebastian Starke, presented at various
| SIGGRAPH) and build a videogame generator.
|
| I wouldn't even mad if it were a paid product and not free code,
| just release something to the world so we can start using it.
| calibas wrote:
| Some of the videos aren't working in Firefox. Here's the error:
|
| > Can't decode H.264 stream because its resolution is out of the
| maximum limitation
| Jipazgqmnm wrote:
| They all work for me on Firefox. Btw. it's using the systems
| decoder.
| corscans wrote:
| Hecking man
| wokwokwok wrote:
| https://github.com/nv-tlabs/GET3D
|
| > News
|
| > 2022-09-22: Code will be uploaded next week!
|
| Not really that interesting at this point; the 5 page paper has a
| lot of hand waving, and without the code to see how they actually
| implemented it...
|
| ...I'm left totally underwhelmed.
|
| No weights.
|
| No model.
|
| No code.
|
| The pictures were very pretty.
|
| /shrug
| _visgean wrote:
| Thats very dismissive. The paper is 39 pages. Most of the
| details is in Appendix which I think is fairly standard. I
| think they describe the network quite ok (page 16).
| caenorst wrote:
| Disclaimer: This work is done by some of my colleagues.
|
| As someone pointed out, there are 25 pages of text (not
| including bibliography of course), not 5.
|
| Most publications are coming with multiple months delay before
| code release (if any), here you literally have a written soft
| deadline of 1 week. So maybe you can wait few days before
| posting such bad comment?
| egnehots wrote:
| you are barking on the wrong tree, nvidia labs have a history
| of releasing their codes and models quickly.
| incrudible wrote:
| Spoiler: The results are not high quality, at all.
| sk0g wrote:
| Higher quality than what I can whip up in Blender!
|
| But yeah, calling this high quality is quite disingenuous. I
| don't think this kind of mis-labelling is helpful or
| productive. The results are what they are, and a massive step
| forward from what was available/possible some years ago.
___________________________________________________________________
(page generated 2022-09-24 23:00 UTC)