[HN Gopher] SPAD: Spatially Aware Multiview Diffusers
       ___________________________________________________________________
        
       SPAD: Spatially Aware Multiview Diffusers
        
       Author : PaulHoule
       Score  : 92 points
       Date   : 2024-02-18 14:09 UTC (8 hours ago)
        
 (HTM) web link (yashkant.github.io)
 (TXT) w3m dump (yashkant.github.io)
        
       | whimsicalism wrote:
       | No major comment other than this tech is obviously going to
       | transform gaming.
        
         | Teknomancer wrote:
         | My immediate thoughts as well, but I wonder about how exactly
         | this could be implemented in current game build chains such as
         | unreal or unity?
        
           | teaearlgraycold wrote:
           | I expect people will implement rough geometry with polygons
           | and feed a depth+material map to a diffuser for
           | rasterization. This way you get photorealism from the models
           | but maintain precise control over what the scene has in it
           | and where.
        
         | Etherlord87 wrote:
         | to my understanding it produces 2D images (from various
         | angles), not 3D models... But sure, it's very close to
         | producing a 3D model.
        
           | jsheard wrote:
           | There's been some attempts at translating these techniques
           | over to 3D models but the results so far aren't very useful,
           | they have a tendency to produce extremely triangle-dense
           | meshes which are ironically not actually very detailed or
           | well defined, with most of the detailing just painted over
           | the surface as textures, which also have terrible UV mapping.
           | Not to mention the topology is a mess so it's impossible to
           | rig for animation. I'm not sure how good they are at
           | producing proper PBR material channels, but I'm guessing "not
           | good" if they're derived from text-to-image models trained on
           | fully rendered images/photos rather than individual PBR
           | components, which is much more difficult data to source on a
           | massive scale.
           | 
           | I suspect this will continue to be an uphill battle because
           | there aren't billions of high quality 3D models and PBR
           | textures just lying around on the internet to slurp up and
           | train a model on, so they're having to build it as a second-
           | order system using images as the training data, and muddle
           | though the rest of the steps to get to a usable 3D model.
        
             | Etherlord87 wrote:
             | I'm looking at such techniques that appear here on HN every
             | now and then, and indeed, they are very impressive, but
             | also something an intermediate 3D artist will do better
             | with ease.
             | 
             | I think however, that the existence of such tools could
             | perhaps motivate more people to create low quality 3D
             | assets, that for many purposes are good enough - but people
             | might be shy to do that due to many high quality assets
             | shaming them... Once AI 3D stuff will flood the space,
             | amateurs might start competing with it - or so I hope.
             | 
             | Also it's just a matter of time until we reach the 80%
             | (from the 80/20 rule), 90% and 98% quality of these tools.
             | The remaining 2% will still be of value in AAA titles
             | though...
        
           | dheera wrote:
           | MVDream can be combined with a NeRF (or equivalent spatial
           | interpolator such as InstantNGP/TorchNGP) followed by
           | marching cubes to extract a mesh and produce a 3D model. SDS
           | loss and guidance is done from multiple equally-spaced views
           | simultaneously. MVDream has a repo that implements this:
           | 
           | https://github.com/bytedance/MVDream-threestudio
           | 
           | It should be fairly straightforward to adapt this repo (which
           | is based on threestudio) to use SPAD instead of MVDream.
        
           | deepnet wrote:
           | Are there any NERF to textured or vertex coloured polygon
           | mesh point cloud tools or papers ?
           | 
           | I.e. can these be assets for traditional game engines ?
           | 
           | Could a sort of photogrammetry like Meshroom or Reality
           | capture do the trick ?
        
       | bugglebeetle wrote:
       | I'm confused why there is so much focus on text to images and
       | models. If you spent five minutes talking to anyone with artistic
       | ability, they would tell you that this is not how they generate
       | their work. Making images involves entirely different parts of
       | reasoning than that for speech and language. We seem to be
       | building an entirely faulty model of image generation (outside of
       | things like ControlNet) on the premise that text and images are
       | equivalent, solely because that's the training data we have.
        
         | teaearlgraycold wrote:
         | It's good for stock images.
         | 
         | And for in-painting I think you'll find text-to-image is still
         | useful to artists. It's extra metadata to guide the generation
         | of a small portion of the final image.
        
         | refulgentis wrote:
         | Not even wrong, in the Pauli sense: to engage requires ceding
         | the incorrect premises that image models only accept text as
         | input and that the generation process relies on this text
        
         | ummonk wrote:
         | Project briefs to an artist typically contain both text and
         | reference images. Image diffusion models and the like likewise
         | typically use a text prompt together with optional reference
         | images.
        
           | bugglebeetle wrote:
           | Project briefs are generally not descriptions of images and
           | reference tends to be more style than content focused.
           | Source: I used to be an illustrator for major media outlets
           | like the NYTimes, etc.
        
         | ilkke wrote:
         | Check out invoke.ai for an example of something much closer to
         | a professional tool.
        
         | deepnet wrote:
         | Can you share some of what you have found about the creative
         | process by talking to people with artistic ability ?
         | 
         | What are your ideas about the differences between a human and
         | AI's creative process ?
         | 
         | Are there any similarities, or analagous processes ?
         | 
         | Do you think creators have an kind of latent space where
         | different concepts are inspired by multi-modal inputs ( what
         | sparks inspiration ? e.g. sometimes music or a mood inspires a
         | picture ) and then the creators make different versions of
         | their idea by combining different amounts of different concepts
         | ?
         | 
         | I am not being snarky, I am genuinely interested in views
         | comparing human an AI's creative processes.
        
           | bugglebeetle wrote:
           | I used to work as an illustrator. Most images appeared to me
           | as somewhere between fuzzy or clear image concepts,
           | unaccompanied by any words. I then have to take these
           | concepts and translate them using principles of design,
           | color, composition, abstraction etc., such that they're
           | coherent and understandable to others.
           | 
           | Most illustration briefs are also not wrote descriptions of
           | images either because people are remarkably bad at describing
           | what they want in an image, beyond in the most general sense
           | of its subject. This is why you see DALLE doing all kinds of
           | prompt elaboration on user inputs to generate "good" images.
           | Typically, the illustrator is given the work to be
           | illustrated (e.g. an editorial), distills key concepts or
           | from the work and translates these into various visual
           | analogues, such as archetypes, metaphors and themes.
           | Depending on the subject, one may have to include reference
           | images or other work in a particular style, if the client has
           | something specific in mind.
        
       | fxtentacle wrote:
       | Reducing geometric detail while keeping outlines intact is one of
       | the major showstoppers that prevent current game engines from
       | having realistic foliage. And that exact same problem is also why
       | a Nerf with its near-infinite geometric detail is impractical to
       | use for games. And this paper is yet another way to produce a
       | Nerf.
       | 
       | SpeedTree already used billboard textures 10 years ago and that's
       | still the way to go if you need a forest in UE5. Fortnite did
       | slightly improve upon that by having multiple billboard textures
       | that get swapped based on viewing angle, and they call that
       | impostors. But the core issue of how to reduce overdraw and poly
       | count when starting with a high detail object is still unsolved.
       | 
       | That's also the reason, BTW, why UE5's Nanite is used only for
       | mostly solid objects like rocks and statues, but not for trees.
       | 
       | But until this is solved, you always need a technical artist to
       | make a low poly mesh onto whose textures you can bake your high
       | resolution mesh.
        
         | jsheard wrote:
         | Nanite can actually do trees now, and Fortnite is using it in
         | production, with fully modelled leaves rather than cutout
         | textures because that turned out to be more efficient under
         | Nanite. They talk about it here:
         | https://www.unrealengine.com/en-US/tech-blog/bringing-nanite...
         | 
         | That's still ultimately triangle meshes though, not some other
         | weird representation like NERF, or distance fields, or voxels,
         | or any of the other supposed triangle-killers that didn't
         | stick. Triangles are proving very difficult to kill.
        
           | fxtentacle wrote:
           | My understanding is that while they allow masked textures,
           | the geometry is still fully emitted by Nanite. That means you
           | still need to start with a mesh that has multiple leaves
           | baked into a single polygon plane, as opposed to starting
           | with individual leaf geometry and then that is somehow baked
           | automatically.
           | 
           | This illustration from the page you linked to shows that as
           | well:
           | 
           | https://cdn2.unrealengine.com/nanite-in-fortnite-
           | chapter-4-p...
           | 
           | The alpha masked holes move around, but the polygons remain
           | static. That means if you draw a tree with this, you still
           | have the full overdraw of the highest-poly mesh.
        
             | jsheard wrote:
             | Yeah alpha masking is inefficient under Nanite, but as they
             | explain further down its handling of dense geometry is good
             | enough that they were able to get away with not using
             | masked materials for the foliage in Fortnite. The
             | individual leaves are modelled as actual geometry and
             | rendered with an opaque, non-masked material.
             | 
             | https://cdn2.unrealengine.com/nanite-in-fortnite-
             | chapter-4-t...
             | 
             | https://cdn2.unrealengine.com/nanite-in-fortnite-
             | chapter-4-t...
        
       | iandanforth wrote:
       | Please note that these results were obtained using a small amount
       | of compute (compared to say a large language model training run)
       | on a limited training set. Nothing in the paper makes me think
       | that this won't scale. I wouldn't be surprised to see a AAA
       | quality version of this within a few months.
        
       | gruturo wrote:
       | or Single Photon Avalanche Diode, coming to a LIDAR near you very
       | soon if not already.
       | 
       | Yay ambiguous acronyms.
        
         | spacebacon wrote:
         | Fortsense FL6031 - Automotive ready. For anyone not familiar
         | with SPAD (Single Photon Avalanche Diode) YouTube it. Very
         | impressive computational imagery through walls, around corners
         | and such.
        
         | denkmoon wrote:
         | Signal Passed at Danger
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:00 UTC)