[HN Gopher] DreamFusion: Text-to-3D using 2D Diffusion
___________________________________________________________________
DreamFusion: Text-to-3D using 2D Diffusion
Author : going_ham
Score : 327 points
Date : 2022-09-29 18:57 UTC (4 hours ago)
(HTM) web link (dreamfusion3d.github.io)
(TXT) w3m dump (dreamfusion3d.github.io)
| etaioinshrdlu wrote:
| Huh, it's a pretty similar technique to what I outlined a couple
| days ago: https://news.ycombinator.com/item?id=32965139
|
| Although they start with random initialization and a text prompt.
| It seems to work well. I now see no reason we can't start with
| image initialization!
| efrank3 wrote:
| The version that you proposed wouldn't have worked
| ToJans wrote:
| "those who say it cannot be done should not interrupt the
| people doing it"
| parasj wrote:
| Correct link with full demo: https://dreamfusion3d.github.io/
| kentlyons wrote:
| That link also has a link to the authors and the paper
| preprint.
| dang wrote:
| Changed now. Thanks!
| MitPitt wrote:
| Coincidentally came out the same day as Meta's text-to-video. I
| wonder if Google deliberately held out the release to make a
| bigger impact somehow?
| astrange wrote:
| I think it's because of ICLR deadlines.
| bm-rf wrote:
| Would they publish it anonymously? I'd bet they'd want to take
| credit somehow.
| kmonsen wrote:
| Someone posted the "correct" URL that has names:
| https://dreamfusion3d.github.io/
| blondin wrote:
| nvidia also released GET3D[1] a few days ago. research seems to
| be heading towards similar goals.
|
| [1]: https://github.com/nv-tlabs/GET3D
| coolca wrote:
| This is like magic to me. The pace at which we are getting these
| tool amazes me.
| edgartaor wrote:
| I don't see a person in the gallery. It's capable of generate a
| 3D model of me with only a photo?
| sirianth wrote:
| Is there code for any of these models? Or a collab? Ajay Jain's
| colab doesn't work, but I would love to see a colab for this.
| _just7_ wrote:
| My guess is we have to wait a good 3 months time before someone
| makes a opensource version
| sirianth wrote:
| Silly replying to myself I know, but I had more thoughts. I'm
| an architect for 3D worlds and I am desperate, lol, for this
| kind of tool. I use both blender and grasshopper, but I use
| midjourney to think and prototype all the time. Obvious but it
| would be astonishing to have something like this for game
| worlds. I used another version of this to create "a forest
| emerging from an aircraft carrier"
| https://www.instagram.com/p/CiRfXKzpnLC/ but the technique
| didn't have good resolution yet (high fidelity).
| jaggs wrote:
| You should try the AUTOMATIC1111 version of stable diffusion.
| It's crazy fast and has great results -
| https://github.com/AUTOMATIC1111/stable-diffusion-
| webui/wiki...
| achr2 wrote:
| The thing that frightens me is that we are rapidly reaching broad
| humanity disrupting ML technologies without any of the social or
| societal frameworks to cope with it.
| narrator wrote:
| There were bigger disruptions in the past. The telegraph,
| railroads, explosives. "The Devils" by Dostoevsky is a great
| fictional account of what all these technological disruptions
| do to the fragile social order in the late 19th century Russia
| countryside. All of a sudden all these foreign people, ideas ,
| technology and commerce start streaming in to these once
| isolated communities.
| throwaway675309 wrote:
| I'm usually not a fan of this general hand wringing / fear
| mongering around ML that a lot of people with too much time and
| not enough STEM background constantly bring up.
|
| Stable diffusion has been made available to the public for
| quite a while now and if anything has disproved a lot of the
| ungrounded nonsense that made companies like OpenAI censor
| their generative models.
| modeless wrote:
| The most incredible thing here is that this demonstrates a level
| of 3D understanding that I didn't believe existed in 2D image
| models yet. All of the 3D information in the output was inferred
| from the training set, which is exclusively uncurated and
| unsorted 2D still images. No 3D models, no camera parameters, no
| depth maps. No information about picture content other than a
| text label (scraped from the web and often incorrect!).
|
| From a pile of random undifferentiated images the model has
| learned the detailed 3D structure and plausible poses and
| variants of thousands (millions?) of everyday objects. And all we
| needed to get that 3D information out of the model was the right
| sampling procedure.
| bmpoole wrote:
| Co-author here - we were also surprised :) The breadth of
| knowledge of the visual world embedded in these 2D models and
| what they unlock is astounding.
| Vt71fcAqt7 wrote:
| Any word about how Nerf -> marching cubes works? I thought
| that was still an open problem. Is that another discovery in
| this research paper?
| adamredwoods wrote:
| So I wonder if unusual angles that normally do not get
| photographed will be distorted? For example, underneath a table
| looking up.
| jacobr1 wrote:
| They reapply noise to the potentially distorted image and
| then predict the de-noised version like the originally
| rendered first frame. So the image is at least internally
| consistent for the frame (to the extend the the system
| generates consistency whatsoever).
|
| The example with a squirrel wearing a hoodie demonstrates an
| interesting edge case, the "front" of the squirrel (with
| hoodie over the head) show a normal hooded face as expected,
| but when you rotate to the "back" you get another face where
| the hoodie is low over the eyes. Each looks fine in
| isolation, but in aggregate it seems like we have a two-faced
| squirrel.
| yarg wrote:
| It'll be delusions and guesses, rather than distortions.
|
| It'll just make up some colours and geometries that don't
| contradict anything it already knows from the defined
| perspectives.
|
| Or leave it empty.
| bmpoole wrote:
| Yes, this is often a problem. We use view-dependent prompts
| (e.g. "cat wearing sunglasses, back view") but the pretrained
| 2D model often does not do a good job of interpreting non-
| canonical views and will put sunglasses on the back of the
| cats head (as well as the front).
| salawat wrote:
| >cat wearing sunglasses, back view")
|
| Bad prompt, missing implied antecedent/ambiguous subject...
|
| You may want:
|
| Back view of a cat which is wearing sunglasses, back view
| of a cat, but the view is wearing sunglasses, etc... I
| actually tried using projective terms from drafting books,
| and didn't get great results. Nor anatomicals either.
| LordDragonfang wrote:
| Some of the "mesh exports" used as examples on the page
| actually show this, to some extent. Look specifically at the
| plush corgi's belly and the weird geometry on the underside
| of the lemur's book, and to a lesser extent the underside of
| the bedsheet-ghost.
| GistNoesis wrote:
| As far as I understand from a quick read of the paper, the 2D
| diffusion doesn't have a 3D understanding. It probably have
| some sort of local neighborhood understanding, aka small
| geometric transformation of objects map close to each other in
| the diffusion space (That's why like with latent spaces you can
| "interpolate" (https://replicate.com/andreasjansson/stable-
| diffusion-animat...) in the diffusion space).
|
| But that's not really surprising because when you have enough
| data, even simple clustering methods group objects like faces
| by the direction they are looking to. With enough views even a
| simple L2 distance in pixel space allow t-SNE to do that.
|
| They are injecting the 3D constraints via the NERF and an
| optimization process to add the consistency between the frames.
|
| It's a deep dream process that optimize by alternating updates
| for 3D consistency, and updates for text-to-2Dimage
| correspondence. It's searching for a solution that satisfy
| these two constraints at the same time.
|
| Even though they only need to run a single diffusion step to
| get the update direction, this optimization process is quite
| long : 1h30 (but they are not using things like instant Nerf
| (or even simple voxel grids) ).
|
| But this will allow for creation of a dataset of 3D objects
| with corresponding text, which will then allow to train a
| diffusion model that will have a 3D understanding and will be
| able to generate 3D objects directly with a single diffusion
| process.
| samuell wrote:
| Gives a new perspective on a classic verse:
|
| "For he spoke, and it came to be;
|
| he commanded, and it stood firm."
|
| Psalm 33:9, NIV
|
| :)
| IshKebab wrote:
| Not really though.
| macawfish wrote:
| So does this mean I can use DreamBooth to create plausible NERFs
| of myself in any scenario? The future is looking weird.
| parasj wrote:
| @dang The link should be updated to
| https://dreamfusion3d.github.io
| joewhatkins wrote:
| This is crazy good - most prior text-to-3d models produced weird
| amorphous blobs that would kind of look like the prompt from some
| angles, but had no actual spatial consistency.
|
| Blown away by how quickly this stuff is advancing, even as
| someone who's relatively cynical about AI art.
| drKarl wrote:
| Amazing! How long then until we get photorealistic AI generated
| 3D VR games and experiences in the metaverse?
| drKarl wrote:
| Why the downvote? I wasn't being sarcastic, it was a honest
| question, I'm really impressed how far this technology has come
| since GPT-3 2 years ago to DALl-E and Stable Diffusion ro
| Meta's text to video to this...
| inerte wrote:
| I was wondering the same thing in the other thread about Text
| to Video. Someone asked about 3D Blender models, which made
| me think about animating blender models. Bang, now on this
| thread we see animated images... it does feel like we can get
| to asking for a 3D environment, put on a VR glass and
| experience it. And with outpainting, that we can even change
| it in real time.
|
| It's totally sci-fi, and at the same time seems to be
| possible? I am amazed how even image generation evolved over
| the last year, but that's just me daydreaming.
| macrolime wrote:
| Or in-painting with AR glasses. Change things in the real
| world just by looking at it (with eye tracking) and say
| what you want it changed into.
| macrolime wrote:
| This sounds like something that could be made to work with stable
| diffusion if someone just implements the code based on the paper.
| coolspot wrote:
| Give it a week or two...
| naillo wrote:
| It's funny that the authors are 'anonymous' but they have access
| to Imagen so obviously it's by Google.
| cuuupid wrote:
| A large portion of the ML community (rightly) discredits Google
| papers because:
|
| - they rarely provide the data or code used so it's basically
| "i swear it works bro" research
|
| - what they achieve is usually through having the most pristine
| dataset on the planet and is often unusable by other
| researchers
|
| - other times they publish papers that are basically "we
| slightly modified this excellent open source paper, slapped an
| internal name on it and trained it on our proprietary dataset"
|
| - sometimes they achieve remarkably little but their papers
| still get a shiny spot because they're a big name and sponsor
| all the conferences
|
| - they've also been caught trying to patent/copyright ML
| techniques; disregarding that this is the same as privatizing
| math, these are often techniques they plainly didn't come up
| with
|
| Also ever since OpenAI did their "we have to go closed-source
| for-profit to save humanity" PR campaign, every company that
| releases models that can achieve a large amount in NLP/CV gets
| dragged by the media and equated to Skynet.
| alphabetting wrote:
| A ton of big advances in AI that community has benefitted
| from have been from published google research
| oldgradstudent wrote:
| > Paper under double-blind review
|
| Once the paper is accepted (or rejected) the names may be
| revealed.
|
| Though, in reality, the reviewers can often easily tell who
| wrote the paper.
| joewhatkins wrote:
| This is par for the course - there have been other instances
| where an 'anonymous' paper mentioned training on a cluster of
| TPUs that weren't publicly available yet - dead giveaway it was
| Google.
| VikingCoder wrote:
| Dead giveaway... Dead giveaway...
| googlryas wrote:
| Lots of reasons to stay anonymous besides for hiding what org
| is behind the paper. Maybe they don't want to be kidnapped by
| the North Koreans and forced to produce new paens with "lost
| footage" to Kim Il-Sung.
| [deleted]
| parasj wrote:
| The full author list is on the updated link at:
| https://dreamfusion3d.github.io/
| dang wrote:
| Url changed from https://dreamfusionpaper.github.io/ to the page
| that names the authors.
| RosanaAnaDana wrote:
| This is getting asymptotic.
| layer8 wrote:
| Progress often happens in waves. There will be a trough again.
| sva_ wrote:
| Seems a bit like a tsunami currently. But I wonder how we'll
| think about it 10 years from now.
| gfodor wrote:
| AI might be different - as has been predicted for many years
| now - due to the compounding effects on intelligence.
| arisAlexis wrote:
| Is it a light version of script when the AGI comes fast
| ml_basics wrote:
| Awesome! I wonder how long it will be until there is an open
| source implementation compatible with Stable Diffusion
| owenpalmer wrote:
| Source?
| jonas21 wrote:
| Can someone explain what's going on in this example from the
| gallery? The prompt is "a humanoid robot using a rolling pin to
| roll out dough":
|
| https://dreamfusion-cdn.ajayj.com/gallery_sept28/crf20/a_DSL...
|
| But if you look closely, the pin looks like it's actually rolling
| across the dough as the camera orbits.
| WithinReason wrote:
| The rolling pin is above the table but the shading is wrong
| because they don't render shadows.
| ajayjain wrote:
| Hi! Ajay here. Correct, our shading model doesn't compute
| intersections, since that's a bit challenging with a NeRF
| scene representation.
| chaps wrote:
| Super interesting work. Do you think that's a solvable
| problem and something you'll work on?
| golemotron wrote:
| Anonymously authored research is very ominous.
| parasj wrote:
| The full author list is on the updated link at:
| https://dreamfusion3d.github.io/
| VikingCoder wrote:
| We're quickly approaching HNFusion: Text-to-HN-Article-That-
| Implements-That-Idea ...
| O__________O wrote:
| Unclear to me what is going on, but there's another URL that
| lists the authors names. Given it's possible this change was done
| for reason, not linking to it, but strikes me as odd it's still
| up. Anyone know what's going on without causing problems for the
| authors?
| parasj wrote:
| This link was from OpenReview which must be anonymous (double
| blind). The full author list is on the updated link at:
| https://dreamfusion3d.github.io
| O__________O wrote:
| Aware of the link, though you have not provided any
| clarification for why there are two links; strikes me as odd
| if authors are trying to post it anonymously that simple
| Google finds authors names.
| gersh wrote:
| Is code available?
| yarg wrote:
| Cool.
|
| The samples are lacking definition, but they're otherwise
| spatially stable across perspectives.
|
| That's something that's been struggled with for years.
| jianshen wrote:
| Did we hit some sort of technical inflection point in the last
| couple of weeks or is this just coincidence that all of these ML
| papers around high quality procedural generation are just
| dropping every other day?
| the8472 wrote:
| This has been going on for years. The applications are just
| crossing thresholds now that are more salient for people, e.g.
| doing art.
| sirianth wrote:
| yay
| alphabetting wrote:
| Maybe deadline for neurips which is coming up?
| grandmczeb wrote:
| This was submitted to ICLR
| macrolocal wrote:
| Conference season?
| [deleted]
| dr_dshiv wrote:
| It's called the technological singularity. Pretty fun so far!
| AStrangeMorrow wrote:
| This isn't what is usually meant by "technological
| singularity". It is an inflection point where technology
| growth becomes incontrollable and unpredictable, usually
| theorized to be cause by a self improving agent (/AI) that
| becomes smarter with each of its iterations. This is still
| standard technological progress, human control, even if very
| fast
| layer8 wrote:
| From the abstract: "We introduce a loss based on probability
| density distillation that enables the use of a 2D diffusion
| model as a prior for optimization of a parametric image
| generator. Using this loss in a DeepDream-like procedure, we
| optimize a randomly-initialized 3D model (a Neural Radiance
| Field, or NeRF) via gradient descent such that its 2D
| renderings from random angles achieve a low loss."
|
| This seems like basically plugging a couple of techniques
| together that already existed, allowing to turn 2D text-to-
| image into 3D text-to-image.
| macawfish wrote:
| Time and time again these ML techniques are proving to be
| wildly modular and pluggable. Maybe sooner or later someone
| will build a framework for end to end text-to-effective-ML-
| architecture that will just plug different things together
| and optimize them.
| lbotos wrote:
| I think this is what huggingface (github for machine
| learning) is trying with diffusers lib:
| https://huggingface.co/docs/diffusers/index
|
| They have others as well.
| sdan wrote:
| > This seems like basically plugging a couple of techniques
| together that already existed
|
| as with a majority of ML research
| WithinReason wrote:
| Isn't that what the Singularity was described as a few
| decades ago? Progress so fast it's unpredictable even in
| the short term.
| rileyphone wrote:
| Same as it ever was, scientific revolutions arrive all at
| once, punctuating otherwise uneventful periods. As I
| understand, the present one is the product of the paper
| "Attention is all you need":
| https://arxiv.org/pdf/1706.03762.pdf.
| ramesh31 wrote:
| >as with a majority of ML research
|
| Plus "we did the same thing, but with 10x the compute
| resources".
|
| But yeah.
| [deleted]
| anigbrowl wrote:
| True (I made such a proposal myself a few hours ago, albeit
| in vaguer terms). The thing is deployment infrastructure is
| good enough now that we can just treat it as modular signal
| flows and experiment a lot without having to engineer a
| whole pile of custom infrastructure for each impulsive
| experiment.
| beambot wrote:
| > This seems like basically plugging a couple of techniques
| together that already existed [...]
|
| In his Lex Fridman interview, John Carmack makes similar
| assertions about this prospect for AGI: That it will likely
| be the clever combination of existing primitives (plus maybe
| a couple novel new ones) that make the first AGI feasible in
| just a couple thousand lines of code.
| aliqot wrote:
| That's a great example that reminds me of another one:
| there was nothing new about Bitcoin conceptually, it was
| all concepts we already had just in a new combination. IRC,
| Hashing, Proof of Work, Distributed Consensus, Difficulty
| algorithms, you name it. Aside from Base58 there wasn't
| much original other than the combination of those elements.
| stavros wrote:
| Base58 really should have been base57.
| aliqot wrote:
| Hello Stavros, I agree. When I look at the goals that
| base58 sought to achieve, (eliminating visually similar
| characters) I couldn't help but wonder why more
| characters were not eliminated. There is quite a bit of
| typeface androgyny when you consider case and face.
| stavros wrote:
| Yeah, I don't know why 1 was left in there, seems like a
| lost opportunity. Discarding l, I, 0, O, but then leaving
| 1? I wonder why.
| aliqot wrote:
| I can only assume it was for a superstitious reason so
| that the original address prefixes could be a 1. This is
| the only sense I can make from it.
| ImprobableTruth wrote:
| My hot take is that we're merely catching up on until recently
| unutilized hardware improvements. There's nothing 'self-
| improving', it's largely "just" scaled up methods or new,
| clever applications of scaled up methods.
|
| The pace at which methods scale up is currently a lot faster
| than hardware improvements, so unless these scaled up methods
| become incredibly lucrative (not impossible), I think it's
| quite likely we'll soon-ish (a couple years from now) see a
| slowdown.
| darkhorse222 wrote:
| I think DALLE really kicked things into high gear.
| gitfan86 wrote:
| It has become clear since alphaGo that intelligence is an
| emergent property of neural networks. Since then the time and
| cost requirements to create a useful intelligence have been
| coming down. The big change was in August when Stable Diffusion
| was able to run on consumer hardware. Things were already
| accelerating before August, but that has really kicked up the
| speed because millions of people can play around with it and
| discover intelligence applications, especially in the latent
| space.
| anigbrowl wrote:
| They're AI generated, the singularity already happened but the
| machines are trying to ease us into it.
| samstave wrote:
| Scary fn thought!
|
| and I agree with you!
|
| And the OP comment its by the magnanimous/infamous AnigBrowl
|
| You need to start doing AI legal admin ( I dont have the
| terms, but you may - we need legal language to control how we
| deal with AI)
|
| and @dang - kill the gosh darn "posting too fast" thing
|
| Jiminey Crickets I have talked to you abt this so many
| times...
| bhedgeoser wrote:
| > Scary fn thought!
|
| I'm a kotlin programmer so it's a scary fun thought for me.
| LordDragonfang wrote:
| "You're posting too fast" is a limit that's manually
| applied to accounts that have a history of "posting too
| many low-quality comments too quickly and/or getting into
| flame wars". You can email dang (hn@ycombinator.com) if you
| think it was applied in error, but if it's been applied to
| you more than once... you probably have a continuing
| problem with proliferating flame wars or posting otherwise
| combative comments.
| samstave wrote:
| THIS
|
| WTF - the singularity is closer than we thought!!!
| Workaccount2 wrote:
| SD is open source (for real open source) and the community has
| been having a field day with it.
| [deleted]
___________________________________________________________________
(page generated 2022-09-29 23:00 UTC)