[HN Gopher] A working implementation of text-to-3D DreamFusion, ...
___________________________________________________________________
A working implementation of text-to-3D DreamFusion, powered by
Stable Diffusion
Author : nopinsight
Score : 203 points
Date : 2022-10-06 15:12 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| xwdv wrote:
| Feels like AI generated art is approaching a sort of singularity
| at this point. Progress getting very exponential.
| antegamisou wrote:
| Only if you have low standards of what constitutes research on
| artificial intelligence and art perception.
| hwers wrote:
| These things take months and months to train (hardly fast
| progress). Any new model that's coming out is generally known
| in the atmosphere (not unpredictable) and these applications
| were pretty expected the day stable diffusion came out.
| whazor wrote:
| I would love a AI model that does text-to-SVG.
| WASDx wrote:
| OpenAI Codex can do that.
| fsiefken wrote:
| that would be lovely! perhaps it can also dream up interesting
| game of life patterns
| nobbis wrote:
| Took a week, as predicted:
| https://twitter.com/DvashElad/status/1575614411834011651
| nobbis wrote:
| Key step in generating 3D - ask Stable Diffusion to score views
| from different angles: for d in ['front',
| 'side', 'back', 'side', 'overhead', 'bottom']: text =
| f"{ref_text}, {d} view"
|
| https://github.com/ashawkey/stable-dreamfusion/blob/0cb8c0e0...
| dwallin wrote:
| Given the way the language model works these words could have
| multiple meanings. I wonder if training a form of textual
| inversion to more directly represent these concepts might
| improve the results. You could even try teaching it to
| represent more fine grained degree adjustments.
| shadowgovt wrote:
| I'm modestly surprised that those few angles give us enough
| data to build out a full 3D render, but I guess I shouldn't
| be too surprised, as that's tech that has had high demand and
| been understood for years (those kind of front-cut / side-cut
| images are what 3D artists use to do their initial prototypes
| of objects if they're working from real-life models).
| nobbis wrote:
| DreamFusion doesn't directly build a 3D model from those
| generated images. It starts with a completely random 3D
| voxel model, renders it from 6 different angles, then asks
| Stable Diffusion how plausible an image of "X, side view"
| it is.
|
| It then sprinkles some noise on the rendering, makes Stable
| Diffusion improve it a little, then adjusts the voxels to
| produce that image (using differentiable rendering.)
|
| Rinse and repeat for hours.
| shadowgovt wrote:
| Thank you for the clarification; I hadn't grokked the
| algorithm yet.
|
| That's interesting for a couple of reasons. I can see why
| that works. It also implies that for closed objects, the
| voxel data on the interior (where no images can see it)
| will be complete noise, as there's no signal to pick any
| color or lack of a voxel.
| FeepingCreature wrote:
| text = f"{ref_text}, front cutaway drawing"
|
| Maybe?
| nobbis wrote:
| Yes, although not complete noise - probably empty.
| Haven't checked but assume there's regularization of the
| NeRF parameters.
| mhuffman wrote:
| I don't think that NeRFs require too many image to make
| impressive results.
| [deleted]
| bergenty wrote:
| I can't wait to generate novel 3D models to CNC/3D print! Can
| these be exported out as STL/OBJs?
| samstave wrote:
| Heckin A!
|
| I just got my 3D printer and was a bit too tipsy to assemble it
| the day it arrived - and have several things I want to print...
|
| It will be interesting to experiment with describing the thing
| I want to print with text instead of designing it in SolidEdge
| and see what AI thinks....
|
| I wonder if you can feed it specific dimensions?
|
| "A holder for a power supply for an e bike with two mounting
| holes 120mm apart with a carry capacity that is 5 inches long
| and 1.5 inches deep"
| wongarsu wrote:
| Well, here's 16 attempts of regular Stable Diffusion with
| that prompt [1], and here's what it things a technical
| drawing of it might look like [2].
|
| Maybe two papers down the line :D For now you might have more
| luck with something less specific.
|
| 1: https://i.imgur.com/RPNCwyM.png
|
| 2: https://i.imgur.com/c9pfM8U.png
| samstave wrote:
| Still dope... but are those also obj or stl?
|
| I like my DALLE expressions of "masterchief as ventruvian
| man as drawn by da Vinci"
|
| And my "technical exploded diagrams of cybernetic eco
| skeleton suits in blueprint"
|
| Try those out?
| moron4hire wrote:
| In the usage notes, there's a line that mentions
| # test (exporting 360 video, and an obj mesh with png texture)
| python main_nerf.py --text "a hamburger" --workspace trial -O
| --test
|
| So I guess so. That's pretty awesome.
| etaioinshrdlu wrote:
| I think it would be interesting to convert to a polygon mesh
| periodically in-the-loop. It could end up with more precise
| models.
| Cook4986 wrote:
| Currently working with a student group to build out a 3D scene
| generator (https://github.com/Cook4986/Longhand), and the
| prospect of arbitrary, hyper-specific mesh arrays on demand is
| thrilling.
|
| Right now, we are relying on the Sketchfab API to populate our
| (Blender) scenes, which is an imperfect lens through which to
| visualize the contents of texts that our non-technical
| "clientele" are studying.
|
| Since we are publishing these scenes via WebXR (Hubs), we have
| specific criteria related to poly counts (latency, bandwidth,
| etc) and usability. Regarding the latter concern, it's not clear
| that our end users will want to wait/pay for compute.
|
| *copyedited
| sirianth wrote:
| wow
| Geee wrote:
| Would be cool to see it adapted to img2img scenario, using one or
| more 'seed images'. It would be closer to a standard NERF, but
| also would be able to imagine novel angles and guidance with
| prompt.
| thehappypm wrote:
| Omg, imagine how useful this would be for video games or movies.
| Whipping up an asset in a matter of hours of computer time?
| Amazing
| sdwvit wrote:
| All these news on image/3d/video generation just show how we live
| in the middle of ai/ml breakthrough. Incredible to see news with
| extreme progress in the field like this popping everyday.
| dmingod666 wrote:
| Browse Lexica.art your mind will be blown by the range and
| amount of detail on some of the art.
|
| Like this (nsfw content):
| https://lexica.art/?q=Intricate+goddess
|
| There is an addictive and trippy quality to this and it is yet
| to hit mainstream -- The art itself is stunning but it goes
| beyond that, the ability to nudge it around and make variations
| to it is incredible. now add the fact that you can train it
| with your own content. people are going to go bonkers with this
| and it's going to open up a lot of debates too.
| wongarsu wrote:
| There's a small gallery of success and failure cases here [1].
|
| It certainly doesn't look as good as the original, yet. I wonder
| if that's due to the implementation differences noted, less
| cherry picking in what they show, or inherent differences between
| Imagen and Stable Diffusion.
|
| Maybe Imagen just has a much better grasp of how images translate
| to actual objects, where Stable Diffusion is stuck more on the 2d
| image plane.
|
| 1: https://github.com/ashawkey/stable-dreamfusion/issues/1
| shadowgovt wrote:
| I feel like the Cthulhu head is extra-successful, given the
| subject matter.
|
| Non-Euclidean back-polygon imaging? Good work, algorithm. ;)
| nopenopenopeno wrote:
| One cannot help but notice the success cases are expected to
| have symmetry along at least one axis, whereas the failure
| cases are not.
| namarie wrote:
| Aren't squirrels and frogs expected to have an axis of
| symmetry? I think the reason for the failures is the presence
| of faces; it seems to be trying to make a face visible from
| all angles.
| gpm wrote:
| Which probably has a lot to do with us taking nearly all
| our pictures of things with faces from the face facing
| direction.
| codeflo wrote:
| I guess a lot can be done to force the model to create properly
| connected 3D shapes instead of these thin protruding 2D slices.
| But I noticed something else. Some of the angles "in between"
| the frog faces have three eyes. I wonder if part of the issue
| might be that those don't look especially wrong to Stable
| Diffusion. It's often surprisingly confused about the number of
| limbs it should generate.
| jrmann100 wrote:
| [Edited] The original Dreamfusion project was discussed here a
| few days ago: https://news.ycombinator.com/item?id=33025446
| naet wrote:
| This is a different project/implementation, based on the open
| source stable diffusion instead of proprietary google imagegen
| jrmann100 wrote:
| Edited; thanks for clarifying!
|
| That makes this Stable-Dreamfusion adaptation even more
| promising.
| hwers wrote:
| Only downside with this is that each mesh takes like 5 hours to
| generate (on a v100 too). Obviously it'll speed up but we're far
| from the panacea
| etaioinshrdlu wrote:
| How long does it take? 5 hours?
| egypturnash wrote:
| Jesus. Well thanks for your contribution to putting the entire
| creative industry out of work, I guess, little anime girl icon
| person. Ugh.
| yieldcrv wrote:
| tent cities by the beach can use the showers
| ccity88 wrote:
| How very pessimistic. We should never shirk technological
| progress for fear of upsetting the status quo or established
| agenda. All of this is only a matter of time away from
| emerging. Have fun being on the forgotten side of history
| uni_rule wrote:
| This is a force multiplayer. It doesn't take the place of
| artistic intent, dingus. Besides you can't accomplish much with
| just "a model". This is an asset generator, hardly a threat to
| anyone especially when these things will likely need some
| weight painting to touch up anyway.
| mrtranscendence wrote:
| I don't think the commenter is upset that this _particular_
| model will be deployed, putting creative professionals out of
| work. It's clearly a janky proof of concept. I think they're
| upset about what follow on work could eventually mean.
| axg11 wrote:
| A new technology is developed with the potential to make you
| 100x more efficient at your job. Today, a creative artist can
| only contribute to a project through a narrow slice. Tomorrow,
| the same creative artist can single-handedly orchestrate an
| entire project.
| astrange wrote:
| It's more like there were tasks that were previously so
| unproductive they couldn't be done at all, and now they're
| productive enough you might be able to be employed doing
| them.
|
| Automation creates jobs rather than destroying them. What
| destroys jobs is mainly bad macroeconomic conditions.
| cercatrova wrote:
| Another day, another AI media generation project, and yet
| another comment by egypturnash lamenting the "death of the
| creative industry."
| egypturnash wrote:
| representative of the industry currently under threat of
| disruption is not happy about this and continues to be vocal
| about her unhappiness, film at 11
| Gabriel_Martin wrote:
| A representative of the luddite contingent perhaps.
| tluyben2 wrote:
| I have a friend who is an old school professional artist from
| before affordable computing who has been using AI (and
| computers before that) to aid him with creations for many years
| now and runs everything himself on his own machines (which is a
| pretty expensive setup) experimenting and training and he loves
| every iteration.
|
| But I guess what creative industry means to you? Pumping out
| web UIs or 3d gaming models were never, for the most part, the
| creative industry; learning to see what people like and copying
| that for different situations is not necessarily creative and
| thus what AI easily does; anything that doesn't come with a lot
| of learning and practice and talent outside manual work will be
| replaced by AI soon; the other stuff will take somewhat longer.
|
| If you think this can replace you, you weren't/aren't in the
| creative industry. Same goes for coders afraid of no code.
| egypturnash wrote:
| So how many shitty "not creative industry" jobs did your
| friend take on the way to where he could have "a pretty
| expensive setup" to do this? What did he crank out solely to
| earn a paycheck with his art skills?
| tluyben2 wrote:
| Your tone is not great, but he never had those jobs; he was
| born in a poor family (for NL), but his talent was
| recognised by HR Giger when he sent him a paintbrushed work
| (via dhl with a frame and all on a whim) and that was
| enough. He is not rich but makes a nice living. Note that
| this is the EU; there is not much of a risk of dying under
| a bridge even if you don't succeed. But he did as far as he
| is concerned. He never compromised anything like you imply
| he must have done.
|
| Edit; but you are also implying you think your job is gone
| with stuff like this? What do you do? Also I am hoping I
| will be replaced: I have been thinking I will be replaced
| since the early 80s as my work as a programmer is not so
| exciting (I love it and will keep doing it even if it's not
| viable anymore, which I do believe for the 20% of people
| who do niche work is very far off AI wise; like I think
| with creative as well) but it seems closer now than ever.
|
| Edit2: looking at your profile work, you don't seem you
| will be replaced by anything soon; what is the anger about?
| Do you have public blogs/tweets about your feelings about
| this; looking at your work (in your HN profile) you seem
| the group not touched by this at all.
| mrtranscendence wrote:
| What is "soon" here? Admittedly I'm not particularly
| sanguine about the prospects of AI generated art or code
| taking many jobs in the near future, but at some point it
| could well happen even to talented engineers and artists.
| It's nice of _you_ to not mind being replaced, but of
| course not everyone will be happy about existential
| threats to their hard-earned livelihoods.
| GistNoesis wrote:
| >Stable-Diffusion is a latent diffusion model, which diffuses in
| a latent space instead of the original image space. Therefore, we
| need the loss to propagate back from the VAE's encoder
|
| There is also an alternative way to handle this latent difference
| with the original paper that should also work :
|
| Instead of working in voxel color space, you push the latent to
| the voxel (Aka instead of having a voxel grid of 3d rgb color,
| you have a voxel grid of dimlatent latents, (you can also use
| spherical harmonics if you want as it works just the same in nd)
| ).
|
| Only the color prediction network differ, the density is kept the
| same.
|
| The NERF then directly render to the latent space (so there are
| less rays to render) which mean you need to decode it with the
| VAE only for visualization purposes and not in the training loop.
| hwers wrote:
| This sounds really interesting but I'm not sure I follow.
| Having a hard time expressing how I'm confused though (maybe
| its unfamiliar nerf terminology) but if you have the time I'd
| be very interested if you could reformulate this alternative
| method somehow (I've been stuck on this very issue for two days
| now trying to implement this myself).
| baxtr wrote:
| Can someone explain the significance of this? I am not familiar
| what DreamFusion is.
___________________________________________________________________
(page generated 2022-10-06 23:00 UTC)