[HN Gopher] MeshGPT: Generating triangle meshes with decoder-onl...
___________________________________________________________________
MeshGPT: Generating triangle meshes with decoder-only transformers
Author : jackcook
Score : 381 points
Date : 2023-11-28 17:56 UTC (5 hours ago)
(HTM) web link (nihalsid.github.io)
(TXT) w3m dump (nihalsid.github.io)
| chongli wrote:
| This looks really cool! Seems like it would be an incredible boon
| for an indie game developer to generate a large pool of assets!
| stuckinhell wrote:
| I think indie game development is dead with these techniques.
| Instead big companies will create "make your own game" games.
|
| Indie games already seems pretty derivative these days. I think
| this tech will kill them in mid-term as big companies use them.
| CamperBob2 wrote:
| For values of "dead" equal to "Now people who aren't 3D
| artists and can't afford to hire them will be able to make
| games," maybe.
|
| User name checks out.
| stuckinhell wrote:
| AI is already taking video game illustrators' jobs in China
| https://restofworld.org/2023/ai-image-china-video-game-
| layof...
|
| It feels like a countdown until every creative in the
| videogame industry is automated.
| owenpalmer wrote:
| People who use "make your own game" games aren't good at
| making games. They might enjoy a simplified process to feel
| the accomplishment of seeing quick results, but I find it
| unlikely they'll be competing with indie developers.
| CaptainFever wrote:
| Yeah, and if there was going to be such a tool, people who
| invest more time in it would be better than those casually
| using it. In other words, professionals.
| CamperBob2 wrote:
| Careful with that generalization. Game-changing FPS mods
| like Counterstrike were basically "make your own game"
| projects, built with the highest-level toolkits imaginable
| (editors for existing commercial games.)
| chongli wrote:
| "Make your own game" games will never replace regular games.
| They target totally different interests. People who play
| games (vast majority) just want to play an experience created
| by someone else. People who like "make your own game" games
| are creative types who just use that as a jumping off point
| to becoming a game designer.
|
| It's no different than saying "these home kitchen appliances
| are really gonna kill off the restaurant industry."
| stuckinhell wrote:
| Hmm I think it will destroy the market in a couple ways.
|
| AI creating video games would drastically increase the
| volume of games available in the market. This surge in
| supply could make it harder for indie games to stand out,
| especially if AI-generated games are of high quality or
| novelty. It could also lead to even more indie saturation(
| the average indie makes less than 1000 dollars).
|
| As the market expectations shift, I think most indie
| development dies unless you are already rich or basically
| have patronage from rich clients.
| dexwiz wrote:
| The platform layer of the "make your own game" game is always
| too heavy and too limited to compete with a dedicated engine
| in the long run. Also the monetization strategy is bad for
| professionals.
| angra_mainyu wrote:
| I couldn't disagree more. RPGMaker didn't kill RPGs,
| Unity/Godot/Unreal didn't kill games, Minecraft didn't kill
| games, and Renpy didn't kill VNs.
|
| Far more people prefer playing games than making them.
|
| We'll probably see a new boom of indie games instead. Don't
| forget, a large part of what makes the gaming experience
| unique is the narrative elements, gameplay, and aesthetics -
| none of which are easily replaceable.
|
| This empowers indie studios to hit a faster pace on one of
| the most painful areas of indie game dev: asset generation
| (or at least for me as a solo dev hobbyist).
| stuckinhell wrote:
| Sorry I guess I wasn't clear. None of those things made
| games automatically. The future is buying a game making
| game, and saying I want a zelda clone but funnier.
|
| The ai game framework handles the full game creation
| pipeline.
| Vegenoid wrote:
| There are more amazing, innovative and interesting indie
| games being created now than ever before. There's just also
| way more indie games that aren't those things.
| airstrike wrote:
| This is revolutionary
| shaileshm wrote:
| This is what a truly revolutionary idea looks like. There are so
| many details in the paper. Also, we know that transformers can
| scale. Pretty sure this idea will be used by a lot of companies
| to train the general 3D asset creation pipeline. This is just too
| great.
|
| "We first learn a vocabulary of latent quantized embeddings,
| using graph convolutions, which inform these embeddings of the
| local mesh geometry and topology. These embeddings are sequenced
| and decoded into triangles by a decoder, ensuring that they can
| effectively reconstruct the mesh."
|
| This idea is simply beautiful and so obvious in hindsight.
|
| "To define the tokens to generate, we consider a practical
| approach to represent a mesh M for autoregressive generation: a
| sequence of triangles."
|
| More from paper. Just so cool!
| tomcam wrote:
| Can someone explain quantized embeddings to me?
| _hark wrote:
| NNs are typically continuous/differentiable so you can do
| gradient-based learning on them. We often want to use some of
| the structure the NN has learned to represent data
| efficiently. E.g., we might take a pre-trained GPT-type
| model, and put a passage of text through it, and instead of
| getting the next-token prediction probability (which GPT was
| trained on), we just get a snapshot of some of the
| activations at some intermediate layer of the network. The
| idea is that these activations will encode semantically
| useful information about the input text. Then we might e.g.
| store a bunch of these activations and use them to do
| semantic search/lookup to find similar passages of text, or
| whatever.
|
| Quantized embeddings are just that, but you introduce some
| discrete structure into the NN, such that the representations
| there are not continuous. A typical way to do this these days
| is to learn a codebook VQ-VAE style. Basically, we take some
| intermediate continuous representation learned in the normal
| way, and replace it in the forward pass with the nearest
| "quantized" code from our codebook. It biases the learning
| since we can't differentiate through it, and we just pretend
| like we didn't take the quantization step, but it seems to
| work well. There's a lot more that can be said about why one
| might want to do this, the value of discrete vs continuous
| representations, efficiency, modularity, etc...
| enjeyw wrote:
| If you're willing, I'd love your insight on the "why one
| might want to do this".
|
| Conceptually I understand embedding quantization, and I
| have some hint of why it works for things like WAV2VEC -
| human phonemes are (somewhat) finite so forcing the
| representation to be finite makes sense - but I feel like
| there's a level of detail that I'm missing regarding whats
| really going on and when quantisation helps/harms that I
| haven't been able to gleam from papers.
| visarga wrote:
| Maybe it helps to point out that the first version of
| Dall-E (of 'baby daikon radish in a tutu walking a dog'
| fame) used the same trick, but they quantized the image
| patches.
| hedgehog wrote:
| Another thing to note here is this looks to be around seven
| total days of training on at most 4 A100s. Not all really
| cutting edge work requires a data center sized cluster.
| sram1337 wrote:
| What is the input? Is it converting a text query like "chair" to
| a mesh?
|
| edit: Seems like mesh completion is the main input-output method,
| not just a neat feature.
| CamperBob2 wrote:
| That's what I was wondering. From the diagram it looks like the
| input is other chair meshes, which makes it somewhat less
| interesting.
| tayo42 wrote:
| Really the hardest thing with art is details and usually
| seperates good from bad. So if you can sketch what you want
| roughly without skill and have the details generated, that's
| extremely useful. And image to image with the existing
| diffusion models is useful and popular.
| nullptr_deref wrote:
| I have no idea about your background when I am commenting
| here. But these are my two cents.
|
| NO. Details are mostly like icing on top of the cake. Sure,
| good details make good art but it is not always the case.
| True and beautiful art requires form + shape. What you are
| saying is something visually appealing. So, the reason why
| diffusion models feel so bland is because they are good
| with details but do not have precise forms and shape.
| Nowadays they are getting better, however, it still remains
| an issue.
|
| Form + shape > details is something they teach in Art 101.
| treyd wrote:
| There's also examples of tables, lamps, couches, etc in the
| video.
| all2 wrote:
| You prompt this LLM using 3D meshes for it to complete, in the
| same manner you use language to prompt language specific LLMs.
| owenpalmer wrote:
| That's what it seems like. Although this is not an LLM.
|
| > Inspired by recent advances in powerful large language
| models, we adopt a sequence-based approach to
| autoregressively generate triangle meshes as sequences of
| triangles.
|
| It's only inspired by LLMs
| adw wrote:
| This is sort of a distinction without a difference. It's an
| autoregressive sequence model; the distinction is how
| you're encoding data into (and out of) a sequence of
| tokens.
|
| LLMs are autoregressive sequence models where the "role" of
| the graph convolutional encoder here is filled by a BPE
| tokenizer (also a learned model, just a much simpler one
| than the model used here). That this works implies that you
| can probably port this idea to other domains by designing
| clever codecs which map their feature space into discrete
| token sequences, similarly.
|
| (Everything is feature engineering if you squint hard
| enough.)
| ShamelessC wrote:
| The only difference is the label, really. The underlying
| transformer architecture and the approach of using a
| codebook is identical to a large language model. The same
| approach was also used originally for image generation in
| DALL-E 1.
| anentropic wrote:
| Yeah it's hard to tell.
|
| It looks like the input is itself a 3D mesh? So the model is
| doing "shape completion" (e.g. they show generating a chair
| from just some legs)... or possibly generating "variations"
| when the input shape is more complete?
|
| But I guess it's a starting point... maybe you could use
| another model that does worse quality text-to-mesh as the input
| and get something more crisp and coherent from this one.
| carbocation wrote:
| On my phone so I've only read this promo page - could this
| approach be modified for surface reconstruction from a 3D point
| cloud?
| kranke155 wrote:
| My chosen profession (3D / filmmaking) feels like being in some
| kind of combat trench at the moment. Both fascinating and scary
| nextworddev wrote:
| What do you ascertain the use case of this in your field? Does
| it seem high quality? (I have no context)
| zavertnik wrote:
| I'm not a professional in VFX, but I work in television and
| do a lot of VFX/3D work on the side. The quality isn't
| amazing, but it looks like this could be the start of a
| Midjourney-tier VFX/3D LLM, which would be awesome. For me,
| this would help bridge the gap between having to use/find
| premade assets and building what I want.
|
| For context, building from scratch in a 3D pipeline requires
| you to wear a lot of different hats (modeling, materials,
| lighting, framing, animating, ect). It costs a lot of time to
| get to not only learn these hats but also use them together.
| The individual complexity of those skill sets makes it
| difficult to experiment and play around, which is how people
| learn with software.
|
| The shortcut is using premade assets or addons. For instance,
| being able to use the Source game assets in Source Filmmaker
| combined with SFM using a familiar game engine makes it easy
| to build an intuition with the workflow. This makes Source
| Filmmaker accessible and its why theres so much content out
| there made with it. So if you have gaps in your skillset or
| need to save time, you'll buy/use premade assets. This comes
| at a cost of control, but that's always been the tradeoff
| between building what you want and building with what you
| have.
|
| Just like GPT and DALL-E built a bridge between building what
| you want and building with what you have, a high fidelity GPT
| for the 3D pipeline would make that world so much more
| accessible and would bring the kind of attention NLE video
| editing got in the post-Youtube world. If I could describe in
| text and/or generate an image of a scene I want and have a
| GPT create the objects, model them, generate textures, and
| place them in the scene, I could suddenly just open blender,
| describe a scene, and just experimenting with shooting in it,
| as if I was playing in a sandbox FPS game.
|
| I'm not sure if MeshGPT is the ChatGPT of the 3D pipeline,
| but I do think this is kind of content generation is the
| conduit for the DALL-E of video that so many people are
| terrified and/or excited for.
| gavinray wrote:
| On an unrelated note, could I ask your opinion?
|
| My wife is passionate about film/TV production and VFX.
|
| She's currently in school for this but is concerned about
| the difficulty of landing a job afterwards.
|
| Do you have any recommendations on breaking into the
| industry without work experience?
| bsenftner wrote:
| So you're probably familiar with the role of a Bidding
| Producer; imagine the difficulty they are facing: on one side
| they have filmmakers saying they just read so and so is now
| created by AI, while that is news to the bidding producer and
| their VFX/animation studio clients scrambling as everything
| they do is new again.
| sheepscreek wrote:
| Perhaps one way to look at this could be auto-scaffolding. The
| typical modelling and CAD tools might include this feature to
| get you up and running faster.
|
| Another massive benefit is composability. If the model can
| generate a cup and a table, it also knows how to generate a cup
| on a table.
|
| Think of all the complex gears and machine parts this could
| generate in the blink of an eye, while being relevant to the
| project - rotated and positioned exact where you want it. Very
| similar to how GitHub Copilot works.
| worldsayshi wrote:
| I don't see that LLM's have come that much further in 3D
| animation than programming in this regard: It can spit out bits
| and pieces that looks okay in isolation but a human need to
| solve the puzzle. And often solving the puzzle means
| rewriting/redoing most of the pieces.
|
| We're safe for now but we should learn how to leverage the new
| tech.
| andkenneth wrote:
| This is the "your job won't be taken away by AI, it will be
| taken away by someone who knows how to leverage AI better
| than you"
| trostaft wrote:
| Seems like the bibtex on the page is broken? Or might just be an
| extension of mine.
| alexose wrote:
| It sure feels like every remaining hard problem (i.e., the ones
| where we haven't made much progress since the 90s) is in line to
| be solved by transformers in some fashion. What a time to be
| alive.
| mclanett wrote:
| This is very cool. You can start with an image, generate a mesh
| for it, render it, and then compare the render to the image.
| Fully automated training.
| j7ake wrote:
| I love this field. Paper include a nice website, examples, and
| videos.
|
| So much more refreshing than the dense abstract, intro, results
| paper style.
| valine wrote:
| Even if this is "only" mesh autocomplete, it is still massively
| useful for 3D artists. There's a disconnect right now between how
| characters are sculpted and how characters are animated. You'd
| typically need a time consuming step to retopologize your model.
| Transformer based retopology that takes a rough mesh and gives
| you clean topology would be a big time saver.
|
| Another application: take the output of your gaussian splatter or
| diffusion model and run it through MeshGPT. Instant usable assets
| with clean topology from text.
| toxik wrote:
| What you have to understand is that these methods are very
| sensitive to what is in distribution and out of distribution.
| If you just plug in user data, it will likely not work.
| toxik wrote:
| This was done years ago, with transformers. It was then dubbed
| Polygen.
| Sharlin wrote:
| You might want to RTFA. Polygen and other prior art are
| mentioned. This approach is superior.
| toxik wrote:
| I read the article. It has exactly the same limitations as
| Polygen from what I can tell.
| dymk wrote:
| Their comparison against PolyGen looks like it's a big
| improvement. What are the limitations that this has in
| common with PolyGen that make it still not useful?
| toxik wrote:
| I don't think it's as widely applicable as they try to
| make it seem. I have worked specifically with PolyGen,
| and the main problem is "out of distribution" data.
| Basically anything you want to do will likely be outside
| the training distribution. This surfaces as sequencing.
| How do you determine which triangle or vertex to place
| first? Why would a user do it that way? What if I want to
| draw a table with the legs last? Cannot be done. The
| model is autoregressive.
| mlsu wrote:
| The next breakthrough will be the UX to create 3d scenes in front
| of a model like this, in VR. This would basically let you
| _generate_ a permanent, arbitrary 3D environment, for any
| environment for which we have training data.
|
| Diffusion models could be used to generate textures.
|
| Mark is right and so so early.
| amelius wrote:
| Is this limited to shapes that have mostly flat faces?
| catapart wrote:
| Dang, this is getting so good! Still got a ways to go, with the
| weird edges, but at this point, that feels like 'iteration
| details' rather than an algorithmic or otherwise complex problem.
|
| It's really going to speed up my pipeline to not have to pipe all
| of my meshes into a procgen library with a million little mesh
| modifiers hooked up to drivers. Instead, I can just pop all of my
| meshes into a folder, train the network on them, and then start
| asking it for other stuff in that style, knowing that I won't
| have to re-topo or otherwise screw with the stuff it makes,
| unless I'm looking for more creative influence.
|
| Of course, until it's all the way to that point, I'm still better
| served by the procgen; but I'm very excited by how quickly this
| is coming together! Hopefully by next year's Unreal showcase,
| they'll be talking about their new "Asset Generator" feature.
| truckerbill wrote:
| Do you have a recommended procgen lib?
| catapart wrote:
| Oh man, sorry, I wish! I've been using cobbled together bits
| of python plugins that handle Blender's geometry nodes, and
| the geometry scripts tools in Unreal. I haven't even ported
| over to their new proc-gen tools, which I suspect can be
| pretty useful.
| circuit10 wrote:
| Can this handle more organic shapes?
| LarsDu88 wrote:
| As a machine learning engineer who dabbles with Blender and hobby
| gamedev, this is pretty impressive, but not quite to the point of
| being useful in any practical manner (as far as the limited
| furniture examples are concerned.
|
| A competent modeler can make these types of meshes in under 5
| minutes, and you still need to seed the generation with polys.
|
| I imagine the next step will be to have the seed generation
| controlled by an LLM, and to start adding image models to the
| autoregressive parts of the architecture.
|
| Then we might see truly mobile game-ready assets!
| th0ma5 wrote:
| This is a very underrated comment... As with any tech demo, I'd
| they don't show it, it can't do it. It is very very easy to
| imagine a generalization of these things to other purposes,
| which, if it could do it, would be a different presentation.
| rawrawrawrr wrote:
| It's research, not meant for commercialization. The main
| point is in the process, not necessarily the output.
| empath-nirvana wrote:
| > A competent modeler can make these types of meshes in under 5
| minutes.
|
| I don't think this general complaint about AI workflows is that
| useful. Most people are not a competent <insert job here>. Most
| people don't know a competent <insert job here> or can't afford
| to hire one. Even something that takes longer than a
| professional do at worse quality for many things is better than
| _nothing_ which is the realistic alternative for most people
| who would use something like this.
| cannonpalms wrote:
| Is the target market really "most people," though? I would
| say not. The general goal of all of this economic investment
| is to improve the productivity of labor--that means first and
| foremost that things need to be useful and practical for
| those trained to make determinations such as "useful" and
| "practical."
| taneq wrote:
| Millions of people generating millions of images (some of
| them even useful!) using Dall-E and Stable Diffusion would
| say otherwise. A skilled digital artist could create most
| of these images in an hour or two, I'd guess... but 'most
| people' certainly could not, and it turns out that these
| people really want to.
| chefandy wrote:
| > I don't think this general complaint about AI workflows is
| that useful
|
| Maybe not to you, but it's useful if you're in these fields
| professionally, though. The difference between a neat
| hobbyist toolkit and a professional toolkit has gigantic
| financial implications, even if the difference is minimal to
| "most people."
| Kaijo wrote:
| The mesh topology here would see these rejected as assets for
| in basically any professional context. A competent modeler
| could make much higher quality models, more suited to texturing
| and deformation, in under five minutes. A speed modeler could
| make the same in under a minute. And a procedural system in
| something like Blender geonodes can already spit out an endless
| variety of such models. But the pace of progress is staggering.
| frozencell wrote:
| Not reproducible with code = Not research.
___________________________________________________________________
(page generated 2023-11-28 23:00 UTC)