[HN Gopher] SPAD: Spatially Aware Multiview Diffusers
___________________________________________________________________
SPAD: Spatially Aware Multiview Diffusers
Author : PaulHoule
Score : 92 points
Date : 2024-02-18 14:09 UTC (8 hours ago)
(HTM) web link (yashkant.github.io)
(TXT) w3m dump (yashkant.github.io)
| whimsicalism wrote:
| No major comment other than this tech is obviously going to
| transform gaming.
| Teknomancer wrote:
| My immediate thoughts as well, but I wonder about how exactly
| this could be implemented in current game build chains such as
| unreal or unity?
| teaearlgraycold wrote:
| I expect people will implement rough geometry with polygons
| and feed a depth+material map to a diffuser for
| rasterization. This way you get photorealism from the models
| but maintain precise control over what the scene has in it
| and where.
| Etherlord87 wrote:
| to my understanding it produces 2D images (from various
| angles), not 3D models... But sure, it's very close to
| producing a 3D model.
| jsheard wrote:
| There's been some attempts at translating these techniques
| over to 3D models but the results so far aren't very useful,
| they have a tendency to produce extremely triangle-dense
| meshes which are ironically not actually very detailed or
| well defined, with most of the detailing just painted over
| the surface as textures, which also have terrible UV mapping.
| Not to mention the topology is a mess so it's impossible to
| rig for animation. I'm not sure how good they are at
| producing proper PBR material channels, but I'm guessing "not
| good" if they're derived from text-to-image models trained on
| fully rendered images/photos rather than individual PBR
| components, which is much more difficult data to source on a
| massive scale.
|
| I suspect this will continue to be an uphill battle because
| there aren't billions of high quality 3D models and PBR
| textures just lying around on the internet to slurp up and
| train a model on, so they're having to build it as a second-
| order system using images as the training data, and muddle
| though the rest of the steps to get to a usable 3D model.
| Etherlord87 wrote:
| I'm looking at such techniques that appear here on HN every
| now and then, and indeed, they are very impressive, but
| also something an intermediate 3D artist will do better
| with ease.
|
| I think however, that the existence of such tools could
| perhaps motivate more people to create low quality 3D
| assets, that for many purposes are good enough - but people
| might be shy to do that due to many high quality assets
| shaming them... Once AI 3D stuff will flood the space,
| amateurs might start competing with it - or so I hope.
|
| Also it's just a matter of time until we reach the 80%
| (from the 80/20 rule), 90% and 98% quality of these tools.
| The remaining 2% will still be of value in AAA titles
| though...
| dheera wrote:
| MVDream can be combined with a NeRF (or equivalent spatial
| interpolator such as InstantNGP/TorchNGP) followed by
| marching cubes to extract a mesh and produce a 3D model. SDS
| loss and guidance is done from multiple equally-spaced views
| simultaneously. MVDream has a repo that implements this:
|
| https://github.com/bytedance/MVDream-threestudio
|
| It should be fairly straightforward to adapt this repo (which
| is based on threestudio) to use SPAD instead of MVDream.
| deepnet wrote:
| Are there any NERF to textured or vertex coloured polygon
| mesh point cloud tools or papers ?
|
| I.e. can these be assets for traditional game engines ?
|
| Could a sort of photogrammetry like Meshroom or Reality
| capture do the trick ?
| bugglebeetle wrote:
| I'm confused why there is so much focus on text to images and
| models. If you spent five minutes talking to anyone with artistic
| ability, they would tell you that this is not how they generate
| their work. Making images involves entirely different parts of
| reasoning than that for speech and language. We seem to be
| building an entirely faulty model of image generation (outside of
| things like ControlNet) on the premise that text and images are
| equivalent, solely because that's the training data we have.
| teaearlgraycold wrote:
| It's good for stock images.
|
| And for in-painting I think you'll find text-to-image is still
| useful to artists. It's extra metadata to guide the generation
| of a small portion of the final image.
| refulgentis wrote:
| Not even wrong, in the Pauli sense: to engage requires ceding
| the incorrect premises that image models only accept text as
| input and that the generation process relies on this text
| ummonk wrote:
| Project briefs to an artist typically contain both text and
| reference images. Image diffusion models and the like likewise
| typically use a text prompt together with optional reference
| images.
| bugglebeetle wrote:
| Project briefs are generally not descriptions of images and
| reference tends to be more style than content focused.
| Source: I used to be an illustrator for major media outlets
| like the NYTimes, etc.
| ilkke wrote:
| Check out invoke.ai for an example of something much closer to
| a professional tool.
| deepnet wrote:
| Can you share some of what you have found about the creative
| process by talking to people with artistic ability ?
|
| What are your ideas about the differences between a human and
| AI's creative process ?
|
| Are there any similarities, or analagous processes ?
|
| Do you think creators have an kind of latent space where
| different concepts are inspired by multi-modal inputs ( what
| sparks inspiration ? e.g. sometimes music or a mood inspires a
| picture ) and then the creators make different versions of
| their idea by combining different amounts of different concepts
| ?
|
| I am not being snarky, I am genuinely interested in views
| comparing human an AI's creative processes.
| bugglebeetle wrote:
| I used to work as an illustrator. Most images appeared to me
| as somewhere between fuzzy or clear image concepts,
| unaccompanied by any words. I then have to take these
| concepts and translate them using principles of design,
| color, composition, abstraction etc., such that they're
| coherent and understandable to others.
|
| Most illustration briefs are also not wrote descriptions of
| images either because people are remarkably bad at describing
| what they want in an image, beyond in the most general sense
| of its subject. This is why you see DALLE doing all kinds of
| prompt elaboration on user inputs to generate "good" images.
| Typically, the illustrator is given the work to be
| illustrated (e.g. an editorial), distills key concepts or
| from the work and translates these into various visual
| analogues, such as archetypes, metaphors and themes.
| Depending on the subject, one may have to include reference
| images or other work in a particular style, if the client has
| something specific in mind.
| fxtentacle wrote:
| Reducing geometric detail while keeping outlines intact is one of
| the major showstoppers that prevent current game engines from
| having realistic foliage. And that exact same problem is also why
| a Nerf with its near-infinite geometric detail is impractical to
| use for games. And this paper is yet another way to produce a
| Nerf.
|
| SpeedTree already used billboard textures 10 years ago and that's
| still the way to go if you need a forest in UE5. Fortnite did
| slightly improve upon that by having multiple billboard textures
| that get swapped based on viewing angle, and they call that
| impostors. But the core issue of how to reduce overdraw and poly
| count when starting with a high detail object is still unsolved.
|
| That's also the reason, BTW, why UE5's Nanite is used only for
| mostly solid objects like rocks and statues, but not for trees.
|
| But until this is solved, you always need a technical artist to
| make a low poly mesh onto whose textures you can bake your high
| resolution mesh.
| jsheard wrote:
| Nanite can actually do trees now, and Fortnite is using it in
| production, with fully modelled leaves rather than cutout
| textures because that turned out to be more efficient under
| Nanite. They talk about it here:
| https://www.unrealengine.com/en-US/tech-blog/bringing-nanite...
|
| That's still ultimately triangle meshes though, not some other
| weird representation like NERF, or distance fields, or voxels,
| or any of the other supposed triangle-killers that didn't
| stick. Triangles are proving very difficult to kill.
| fxtentacle wrote:
| My understanding is that while they allow masked textures,
| the geometry is still fully emitted by Nanite. That means you
| still need to start with a mesh that has multiple leaves
| baked into a single polygon plane, as opposed to starting
| with individual leaf geometry and then that is somehow baked
| automatically.
|
| This illustration from the page you linked to shows that as
| well:
|
| https://cdn2.unrealengine.com/nanite-in-fortnite-
| chapter-4-p...
|
| The alpha masked holes move around, but the polygons remain
| static. That means if you draw a tree with this, you still
| have the full overdraw of the highest-poly mesh.
| jsheard wrote:
| Yeah alpha masking is inefficient under Nanite, but as they
| explain further down its handling of dense geometry is good
| enough that they were able to get away with not using
| masked materials for the foliage in Fortnite. The
| individual leaves are modelled as actual geometry and
| rendered with an opaque, non-masked material.
|
| https://cdn2.unrealengine.com/nanite-in-fortnite-
| chapter-4-t...
|
| https://cdn2.unrealengine.com/nanite-in-fortnite-
| chapter-4-t...
| iandanforth wrote:
| Please note that these results were obtained using a small amount
| of compute (compared to say a large language model training run)
| on a limited training set. Nothing in the paper makes me think
| that this won't scale. I wouldn't be surprised to see a AAA
| quality version of this within a few months.
| gruturo wrote:
| or Single Photon Avalanche Diode, coming to a LIDAR near you very
| soon if not already.
|
| Yay ambiguous acronyms.
| spacebacon wrote:
| Fortsense FL6031 - Automotive ready. For anyone not familiar
| with SPAD (Single Photon Avalanche Diode) YouTube it. Very
| impressive computational imagery through walls, around corners
| and such.
| denkmoon wrote:
| Signal Passed at Danger
___________________________________________________________________
(page generated 2024-02-18 23:00 UTC)