[HN Gopher] MeshGPT: Generating triangle meshes with decoder-onl...
___________________________________________________________________
MeshGPT: Generating triangle meshes with decoder-only transformers
Author : jackcook
Score : 700 points
Date : 2023-11-28 17:56 UTC (1 days ago)
(HTM) web link (nihalsid.github.io)
(TXT) w3m dump (nihalsid.github.io)
| chongli wrote:
| This looks really cool! Seems like it would be an incredible boon
| for an indie game developer to generate a large pool of assets!
| stuckinhell wrote:
| I think indie game development is dead with these techniques.
| Instead big companies will create "make your own game" games.
|
| Indie games already seems pretty derivative these days. I think
| this tech will kill them in mid-term as big companies use them.
| CamperBob2 wrote:
| For values of "dead" equal to "Now people who aren't 3D
| artists and can't afford to hire them will be able to make
| games," maybe.
|
| User name checks out.
| stuckinhell wrote:
| AI is already taking video game illustrators' jobs in China
| https://restofworld.org/2023/ai-image-china-video-game-
| layof...
|
| It feels like a countdown until every creative in the
| videogame industry is automated.
| owenpalmer wrote:
| People who use "make your own game" games aren't good at
| making games. They might enjoy a simplified process to feel
| the accomplishment of seeing quick results, but I find it
| unlikely they'll be competing with indie developers.
| CaptainFever wrote:
| Yeah, and if there was going to be such a tool, people who
| invest more time in it would be better than those casually
| using it. In other words, professionals.
| mattigames wrote:
| Not really, "I" can make 2D pictures that look like
| masterpieces using stable diffusion and didn't invest
| more than 6 hours playing with it, the learning curve is
| not that high, and people already have a hard time
| telling apart AI art than those from real 2D masters who
| have a lifetime learning it, the same thing will happen
| with making videogames and 3D art.(Yeah nothing of this
| looks exiting to me, actually it looks completely bleak)
| CaptainFever wrote:
| I didn't mean comparing it to human-created art, I meant
| comparing it to other AI generated or assisted artworks.
| Currently the hard parts of that would probably be
| consistency, fidelity (e.g. multiple characters) and
| control, which definitely stands out when compared
| against the casual raw gens.
| CamperBob2 wrote:
| Careful with that generalization. Game-changing FPS mods
| like Counterstrike were basically "make your own game"
| projects, built with the highest-level toolkits imaginable
| (editors for existing commercial games.)
| chongli wrote:
| "Make your own game" games will never replace regular games.
| They target totally different interests. People who play
| games (vast majority) just want to play an experience created
| by someone else. People who like "make your own game" games
| are creative types who just use that as a jumping off point
| to becoming a game designer.
|
| It's no different than saying "these home kitchen appliances
| are really gonna kill off the restaurant industry."
| stuckinhell wrote:
| Hmm I think it will destroy the market in a couple ways.
|
| AI creating video games would drastically increase the
| volume of games available in the market. This surge in
| supply could make it harder for indie games to stand out,
| especially if AI-generated games are of high quality or
| novelty. It could also lead to even more indie saturation(
| the average indie makes less than 1000 dollars).
|
| As the market expectations shift, I think most indie
| development dies unless you are already rich or basically
| have patronage from rich clients.
| crq-yml wrote:
| The likes of itch.io, Roblox, and the App Store already
| exist, each with more games than anyone can reasonably
| curate.
|
| The games market has been in the same place as the rest
| of the arts for some time now: if you want to be noticed,
| you have to mount a bit of a production around it, add
| layers of design effort, and find a marketing funnel for
| that particular audience. The days of just making a Pong
| clone passed in the 1970's.
|
| What technology has done to the arts, historically, is
| add either more precision or more repeatability. The
| relationship to production and arts as a business maps to
| what kinds of capital-and-labor-intensive endeavors
| leverage the tech.
|
| Photographs didn't end painting, they ended painting as
| the ideal of precisely representational art. In the
| classical era, just before the tech was good enough to
| switch, painting was a process of carefully staging a
| scene with actors and sketching it using a camera obscura
| to trace details, then transferring the result to your
| canvas. Afterwards, the exact scene could be generated
| precisely in a photo, and so a more candid, informal
| method became possible both through using photographs
| directly and using them as reference. As well, exact
| copies of photographs could be manufactured. What changed
| was that you had a repeatable way of getting a precise
| result, and so getting the precision or the product
| itself became uninteresting. But what happened next was
| that movies and comics were invented, and they brought us
| back to a place of needing production: staged scenes,
| large quantities of film or illustration, etc.
|
| With generative AI, you are getting a clip art tool - a
| highly repeatable way of getting a generic result. If you
| want the design to be specific, you still have to stage
| it with a photograph, model it as a scene, or draw it
| yourself using illustration techniques.
|
| And so the next step in the marketplace is simply in
| finding the approach to a production that will be
| differentiating with AI - the equivalent of movies to
| photography. This collapses not the indie space - because
| they never could afford productions to begin with - but
| existing modes of mobile gaming, because they were
| leveraging the old production framework. Nobody has need
| of microtransaction cosmetics if they can generate the
| look they want.
| stuckinhell wrote:
| Maybe if you were talking about the generative AI from 1
| year ago. The incredibly fast evolution is makes most of
| your points irrelevant. For example ai art doesn't need
| prompt engineers as jobs anymore because it alot of
| prompt engineering is already being absorbed by other
| ai's.
|
| The chaining of various AI's and the feedback loops
| between are accelerating far beyond what people think it
| is.
|
| Just yesterday major breakthroughs were released on
| stable diffusion video. It's the pace and categorical
| type of these breakthroughs that represent a paradigm
| shift, never seen before in the creative fields.
| chongli wrote:
| I have yet to see any evidence that would convince me
| that generative AIs can produce compelling gameplay.
| Furthermore, even the image generation stuff has a lot of
| issues, such as making all the people in an image into
| weird amalgamations of each other.
| stuckinhell wrote:
| https://blogs.nvidia.com/blog/gamegan-research-pacman-
| annive...
|
| pacman was recreated just AI
| quickthrower2 wrote:
| But the make your own game games are saleable games. Or at
| least they will be with enough AI. The Roblox minecraft for
| example is super realistic.
| dexwiz wrote:
| The platform layer of the "make your own game" game is always
| too heavy and too limited to compete with a dedicated engine
| in the long run. Also the monetization strategy is bad for
| professionals.
| angra_mainyu wrote:
| I couldn't disagree more. RPGMaker didn't kill RPGs,
| Unity/Godot/Unreal didn't kill games, Minecraft didn't kill
| games, and Renpy didn't kill VNs.
|
| Far more people prefer playing games than making them.
|
| We'll probably see a new boom of indie games instead. Don't
| forget, a large part of what makes the gaming experience
| unique is the narrative elements, gameplay, and aesthetics -
| none of which are easily replaceable.
|
| This empowers indie studios to hit a faster pace on one of
| the most painful areas of indie game dev: asset generation
| (or at least for me as a solo dev hobbyist).
| stuckinhell wrote:
| Sorry I guess I wasn't clear. None of those things made
| games automatically. The future is buying a game making
| game, and saying I want a zelda clone but funnier.
|
| The ai game framework handles the full game creation
| pipeline.
| CaptainFever wrote:
| The issue with that is that it probably produces generic-
| looking games, since the AI can't read your mind. See
| ChatGPT or SD for example, if you just say "write me a
| story about Zelda but funnier" it will do it, but it's
| the blandest possible story. To truly make it good
| requires a lot of human intention and direction (i.e.
| soul), typically drawn from our own human experiences and
| emotions.
| stuckinhell wrote:
| Then I'll just update my prompt to Zelda but funnier like
| Curb Your Enthusiasm style humor.
| Vegenoid wrote:
| There are more amazing, innovative and interesting indie
| games being created now than ever before. There's just also
| way more indie games that aren't those things.
| airstrike wrote:
| This is revolutionary
| shaileshm wrote:
| This is what a truly revolutionary idea looks like. There are so
| many details in the paper. Also, we know that transformers can
| scale. Pretty sure this idea will be used by a lot of companies
| to train the general 3D asset creation pipeline. This is just too
| great.
|
| "We first learn a vocabulary of latent quantized embeddings,
| using graph convolutions, which inform these embeddings of the
| local mesh geometry and topology. These embeddings are sequenced
| and decoded into triangles by a decoder, ensuring that they can
| effectively reconstruct the mesh."
|
| This idea is simply beautiful and so obvious in hindsight.
|
| "To define the tokens to generate, we consider a practical
| approach to represent a mesh M for autoregressive generation: a
| sequence of triangles."
|
| More from paper. Just so cool!
| tomcam wrote:
| Can someone explain quantized embeddings to me?
| _hark wrote:
| NNs are typically continuous/differentiable so you can do
| gradient-based learning on them. We often want to use some of
| the structure the NN has learned to represent data
| efficiently. E.g., we might take a pre-trained GPT-type
| model, and put a passage of text through it, and instead of
| getting the next-token prediction probability (which GPT was
| trained on), we just get a snapshot of some of the
| activations at some intermediate layer of the network. The
| idea is that these activations will encode semantically
| useful information about the input text. Then we might e.g.
| store a bunch of these activations and use them to do
| semantic search/lookup to find similar passages of text, or
| whatever.
|
| Quantized embeddings are just that, but you introduce some
| discrete structure into the NN, such that the representations
| there are not continuous. A typical way to do this these days
| is to learn a codebook VQ-VAE style. Basically, we take some
| intermediate continuous representation learned in the normal
| way, and replace it in the forward pass with the nearest
| "quantized" code from our codebook. It biases the learning
| since we can't differentiate through it, and we just pretend
| like we didn't take the quantization step, but it seems to
| work well. There's a lot more that can be said about why one
| might want to do this, the value of discrete vs continuous
| representations, efficiency, modularity, etc...
| enjeyw wrote:
| If you're willing, I'd love your insight on the "why one
| might want to do this".
|
| Conceptually I understand embedding quantization, and I
| have some hint of why it works for things like WAV2VEC -
| human phonemes are (somewhat) finite so forcing the
| representation to be finite makes sense - but I feel like
| there's a level of detail that I'm missing regarding whats
| really going on and when quantisation helps/harms that I
| haven't been able to gleam from papers.
| visarga wrote:
| Maybe it helps to point out that the first version of
| Dall-E (of 'baby daikon radish in a tutu walking a dog'
| fame) used the same trick, but they quantized the image
| patches.
| topwalktown wrote:
| Quantization also works as regularization; it stops the
| neural network from being able to use arbitrarily complex
| internal rules.
|
| But really it's only really useful if you absolutely need
| to have a discrete embedding space for some sort of
| downstream usage. VQVAEs can be difficult to get to
| converge, they have problems stemming from the
| approximation of the gradient like codebook collapse
| hedgehog wrote:
| Another thing to note here is this looks to be around seven
| total days of training on at most 4 A100s. Not all really
| cutting edge work requires a data center sized cluster.
| ganzuul wrote:
| ...Is graph convolution matrix factorization by another name?
| fjkdlsjflkds wrote:
| No... a graph convolution is just a convolution (over a
| graph, like all convolutions).
|
| The difference from a "normal" convolution is that you can
| consider arbitrary connectivity of the graph (rather than the
| usual connectivity induced by a regular Euclidian grid), but
| the underlying idea is the same: to calculate the result of
| the operation at any single place (i.e., node), you need to
| perform a linear operation over that place (i.e., node) and
| its neighbourhood (i.e., connected nodes), the same way that
| (e.g.) in a convolutional neural network, you calculate the
| value of a pixel by considering its value and that of its
| neighbours, when performing a convolution.
| donpark wrote:
| How does this differ from similar techniques previously applied
| to DNA and RNA sequences?
| legel wrote:
| It's cool, it's also par for the field of 3D reconstruction
| today. I wouldn't describe this paper as particularly
| innovative or exceptional.
|
| What do I think is really compelling in this field (given that
| it's my profession)?
|
| This has me star-struck lately -- 3D meshing from a single
| image, a very large 3D reconstruction model trained on millions
| of all kinds of 3D models... https://yiconghong.me/LRM/
| godelski wrote:
| > Also, we know that transformers can scale
|
| Do we have strong evidence that other models don't scale or
| have we just put more time into transformers?
|
| Convolutional resnets look to scale on vision and language:
| (cv) https://arxiv.org/abs/2301.00808, (cv)
| https://arxiv.org/abs/2110.00476, (nlp)
| https://github.com/HazyResearch/safari
|
| MLPs also seem to scale: (cv) https://arxiv.org/abs/2105.01601,
| (cv) https://arxiv.org/abs/2105.03404
|
| I mean I don't see a strong reason to turn away from attention
| as well but I also don't think anyone's thrown a billion
| parameter MLP or Conv model at a problem. We've put a lot of
| work into attention, transformers, and scaling these. Thousands
| of papers each year! Definitely don't see that for other
| architectures. The ResNet Strikes back paper is a great paper
| for one reason being that it should remind us all to not get
| lost in the hype and that our advancements are coupled. We
| learned a lot of training techniques since the original ResNet
| days and pushing those to ResNets also makes them a lot better
| and really closes the gaps. At least in vision (where I
| research). It is easy to railroad in research where we have
| publish or perish and hype driven reviewing.
| sram1337 wrote:
| What is the input? Is it converting a text query like "chair" to
| a mesh?
|
| edit: Seems like mesh completion is the main input-output method,
| not just a neat feature.
| CamperBob2 wrote:
| That's what I was wondering. From the diagram it looks like the
| input is other chair meshes, which makes it somewhat less
| interesting.
| tayo42 wrote:
| Really the hardest thing with art is details and usually
| seperates good from bad. So if you can sketch what you want
| roughly without skill and have the details generated, that's
| extremely useful. And image to image with the existing
| diffusion models is useful and popular.
| nullptr_deref wrote:
| I have no idea about your background when I am commenting
| here. But these are my two cents.
|
| NO. Details are mostly like icing on top of the cake. Sure,
| good details make good art but it is not always the case.
| True and beautiful art requires form + shape. What you are
| saying is something visually appealing. So, the reason why
| diffusion models feel so bland is because they are good
| with details but do not have precise forms and shape.
| Nowadays they are getting better, however, it still remains
| an issue.
|
| Form + shape > details is something they teach in Art 101.
| treyd wrote:
| There's also examples of tables, lamps, couches, etc in the
| video.
| all2 wrote:
| You prompt this LLM using 3D meshes for it to complete, in the
| same manner you use language to prompt language specific LLMs.
| owenpalmer wrote:
| That's what it seems like. Although this is not an LLM.
|
| > Inspired by recent advances in powerful large language
| models, we adopt a sequence-based approach to
| autoregressively generate triangle meshes as sequences of
| triangles.
|
| It's only inspired by LLMs
| adw wrote:
| This is sort of a distinction without a difference. It's an
| autoregressive sequence model; the distinction is how
| you're encoding data into (and out of) a sequence of
| tokens.
|
| LLMs are autoregressive sequence models where the "role" of
| the graph convolutional encoder here is filled by a BPE
| tokenizer (also a learned model, just a much simpler one
| than the model used here). That this works implies that you
| can probably port this idea to other domains by designing
| clever codecs which map their feature space into discrete
| token sequences, similarly.
|
| (Everything is feature engineering if you squint hard
| enough.)
| ShamelessC wrote:
| The only difference is the label, really. The underlying
| transformer architecture and the approach of using a
| codebook is identical to a large language model. The same
| approach was also used originally for image generation in
| DALL-E 1.
| anentropic wrote:
| Yeah it's hard to tell.
|
| It looks like the input is itself a 3D mesh? So the model is
| doing "shape completion" (e.g. they show generating a chair
| from just some legs)... or possibly generating "variations"
| when the input shape is more complete?
|
| But I guess it's a starting point... maybe you could use
| another model that does worse quality text-to-mesh as the input
| and get something more crisp and coherent from this one.
| carbocation wrote:
| On my phone so I've only read this promo page - could this
| approach be modified for surface reconstruction from a 3D point
| cloud?
| kranke155 wrote:
| My chosen profession (3D / filmmaking) feels like being in some
| kind of combat trench at the moment. Both fascinating and scary
| nextworddev wrote:
| What do you ascertain the use case of this in your field? Does
| it seem high quality? (I have no context)
| zavertnik wrote:
| I'm not a professional in VFX, but I work in television and
| do a lot of VFX/3D work on the side. The quality isn't
| amazing, but it looks like this could be the start of a
| Midjourney-tier VFX/3D LLM, which would be awesome. For me,
| this would help bridge the gap between having to use/find
| premade assets and building what I want.
|
| For context, building from scratch in a 3D pipeline requires
| you to wear a lot of different hats (modeling, materials,
| lighting, framing, animating, ect). It costs a lot of time to
| get to not only learn these hats but also use them together.
| The individual complexity of those skill sets makes it
| difficult to experiment and play around, which is how people
| learn with software.
|
| The shortcut is using premade assets or addons. For instance,
| being able to use the Source game assets in Source Filmmaker
| combined with SFM using a familiar game engine makes it easy
| to build an intuition with the workflow. This makes Source
| Filmmaker accessible and its why theres so much content out
| there made with it. So if you have gaps in your skillset or
| need to save time, you'll buy/use premade assets. This comes
| at a cost of control, but that's always been the tradeoff
| between building what you want and building with what you
| have.
|
| Just like GPT and DALL-E built a bridge between building what
| you want and building with what you have, a high fidelity GPT
| for the 3D pipeline would make that world so much more
| accessible and would bring the kind of attention NLE video
| editing got in the post-Youtube world. If I could describe in
| text and/or generate an image of a scene I want and have a
| GPT create the objects, model them, generate textures, and
| place them in the scene, I could suddenly just open blender,
| describe a scene, and just experimenting with shooting in it,
| as if I was playing in a sandbox FPS game.
|
| I'm not sure if MeshGPT is the ChatGPT of the 3D pipeline,
| but I do think this is kind of content generation is the
| conduit for the DALL-E of video that so many people are
| terrified and/or excited for.
| gavinray wrote:
| On an unrelated note, could I ask your opinion?
|
| My wife is passionate about film/TV production and VFX.
|
| She's currently in school for this but is concerned about
| the difficulty of landing a job afterwards.
|
| Do you have any recommendations on breaking into the
| industry without work experience?
| kranke155 wrote:
| As a producer? Huh. That's such a great question.
|
| I think producer roles are a little bit less ultra
| competitive / scarce as they are actually jobs jobs where
| you have to use excel and planning and budgeting.
|
| Being a producer means being on the phone all the time,
| negotiating, haggling, finding solutions where they don't
| seem to exist.
|
| Be it in TV, advertising or somewhere in the media space,
| the common rule is that producers are mostly actually
| terrible at their jobs, that's my experience in London.
| So if she's really good and really dedicated and learns
| the job of everyone on set, I'd say she has a shot.
|
| The real secret to being good in filmmaking is learning
| everyone else's job. Toyota Production System says if you
| want to run a production line you have to know how it
| works.
|
| If she wants to do VFX production she could start doing
| her own test scenes, learning basics in nuke and Blender,
| even understanding the role of Houdini and how that
| works.
|
| If she does that - any company will be lucky to have her.
| bsenftner wrote:
| So you're probably familiar with the role of a Bidding
| Producer; imagine the difficulty they are facing: on one side
| they have filmmakers saying they just read so and so is now
| created by AI, while that is news to the bidding producer and
| their VFX/animation studio clients scrambling as everything
| they do is new again.
| sheepscreek wrote:
| Perhaps one way to look at this could be auto-scaffolding. The
| typical modelling and CAD tools might include this feature to
| get you up and running faster.
|
| Another massive benefit is composability. If the model can
| generate a cup and a table, it also knows how to generate a cup
| on a table.
|
| Think of all the complex gears and machine parts this could
| generate in the blink of an eye, while being relevant to the
| project - rotated and positioned exact where you want it. Very
| similar to how GitHub Copilot works.
| worldsayshi wrote:
| I don't see that LLM's have come that much further in 3D
| animation than programming in this regard: It can spit out bits
| and pieces that looks okay in isolation but a human need to
| solve the puzzle. And often solving the puzzle means
| rewriting/redoing most of the pieces.
|
| We're safe for now but we should learn how to leverage the new
| tech.
| andkenneth wrote:
| This is the "your job won't be taken away by AI, it will be
| taken away by someone who knows how to leverage AI better
| than you"
| ChatGTP wrote:
| Don't worry , the price of goods will be exponentially
| cheaper, so you won't need a job /s
| johnnyanmac wrote:
| Probably, but isn't that how most if the technical fields
| go? Software in particular moves blazing fast and you need
| to adapt to the market quickly to be marketable.
| adventured wrote:
| > We're safe for now
|
| Some are safe for several years (3-5), that's it. During that
| time it's going to wreck the bottom tiers of employees and
| progressively move up the ladder.
|
| GPT and the equivalent will be extraordinary at programming
| five years out. It will end up being a trivially easy task
| for AI in hindsight (15-20 years out), not a difficult task.
|
| Have you seen how far things like MidJourney, Dalle, Stable
| Diffusion have come in just a year or two? It's moving
| extremely fast. They've gone from generating stick figures to
| realistic photographs in two years.
| user432678 wrote:
| Yeah, I'd better buy more wedding cakes in advance.
|
| https://xkcd.com/605
| ChatGTP wrote:
| _GPT and the equivalent will be extraordinary at
| programming five years out._
|
| Being exceptional at programming isn't hard. Being
| exceptional at listening to bullshit requirements for 5
| hours a day is.
| kranke155 wrote:
| The reason AI generative tools are faster to become useful in
| artistic areas is that in the arts you can take "errors" as
| style.
|
| Doesn't apply too much to mesh generation but was certainly
| the case in image gen. Mistakes that wouldn't fly for a human
| artist (hands) were just accepted as part of AIgen.
|
| So these areas are much less strict about precision than
| coding. Making these tools much more capable are replacing
| artists in some tasks than CoPilot is for coders atm.
| orbital-decay wrote:
| I don't know, 3D CGI has already been moving at the breakneck
| speed for the last three decades without any AI. Today's tools
| are qualitatively different (sculpting, simulation, auto-
| rigging etc etc etc).
| kranke155 wrote:
| 3D CGI has gotten faster, but I haven't seen any qualitative
| jump for quite some time.
|
| IMO the last time a major tech advance was visible was Davy
| Jones on the Pirates films. That was a fully photorealistic
| animated character that was plausible as a hero character in
| a major feature. That was a breakthrough. After that a lot of
| refinement and speeding up.
|
| This is different. I have some positivity about it, but it's
| getting hard to keep track of everything that's going on tbh.
| Every week it's a new application and every few months it's
| some quantum leap.
|
| Like others said, Midjourney and DallE are essentially
| photorealistic.
|
| It seems to me that the next step is generative AI creating
| better and better assets.
|
| And then of course you have video generation which is
| happening as well...
| orbital-decay wrote:
| Both DE3 and MJ are essentially toys for single random
| pictures, unusable in a professional setting. DALL-E in
| particular has really bad issues with quality, and while it
| follows the prompt well it also rewrites it so it's barely
| controllable. Midjourney is RLHF'd to death.
|
| What you want for asset creation is not photorealism, but
| style and concept transfer, multimodal controllability
| (text alone is terrible at expressing artistic intent), and
| tooling. And tooling isn't something that is developed
| quickly (although there were several rapid breakthroughs in
| the past, for example ZBrush).
|
| Most of the fancy demos you hear about sound good on paper,
| but don't really go anywhere. Academia is throwing shit at
| the wall to see what sticks, this is its purpose,
| especially when practice is running ahead of theory. It's
| similar to building airplanes before figuring out
| aerodynamics (which happened long ago): watching a heavier-
| than-air thing fly is amazing, until you realize it's not
| very practical in the current form, or might even kill its
| brave inventor who tried to fly it.
|
| If you look at the field closely, most of the progress in
| visual generative tooling happens in the open source
| community; people are trying to figure out what works in
| real use and what doesn't. Little is being done in big
| houses, at least publicly and for now, as they're more
| interested in a DC-3 than a Caproni Ca.60. The change is
| really incremental and gradual, similarly to the current
| mature state of 3D. Paradigms are different but they are
| both highly technical and depend on academic progress. Once
| it matures, it's going to become another skill-demanding
| field.
| kranke155 wrote:
| With respect, I disagree with almost everything you said.
|
| The idea that somehow "AI isn't art directable" is one I
| keep hearing, but I remain unconvinced this is somehow an
| unsolvable problem.
|
| The idea that AIgen is unusable at the moment for
| professional work doesn't hold up to my experience since
| I now regularly use Photoshop's gen feature.
| ChatGTP wrote:
| You can remain unconvinced but it's somewhat true.
|
| I can keep writing prompts for DE3 or similar until it
| gives me something like what I want, but the problem is,
| there are often subtle but important mistakes in many
| images that are generated.
|
| I think it's really good at portraits of people, but for
| anything requiring complex lighting, representation of
| real world situations or events, I don't think it's ready
| yet, unless we're ready to just write prompts, click
| buttons and just accept what we receive in return.
| kranke155 wrote:
| It's absolutely not ready yet for sure.
|
| Midjourney already has tools that allow you to select
| parts of the image to regenerate with new prompts,
| Photoshop-style. The tools are being built, even if a bit
| slowly, to make these things useful.
|
| I could totally see creating Matte paintings through
| Midjourney for indie filmmaking soon, and for tiny budget
| films using a video generative tool to make let's say
| zombies in the distance seems within reach now or very
| soon. Slowly for some kind of VFX I think AI will start
| being able to replace the human element.
| orbital-decay wrote:
| Photoshop combined with Firefly is exactly the rare kind
| of good tooling I'm talking about. In/outpainting was
| found to be working for creatives in practice, and got
| added to Photoshop.
|
| _> The idea that somehow "AI isn't art directable" is
| one I keep hearing, but I remain unconvinced this is
| somehow an unsolvable problem._
|
| That's not my point. AI _can be_ perfectly directable and
| usable, just not in the specific form DE3 /MJ do it. Text
| prompts alone don't have enough semantic capacity to
| guide it for useful purposes, and the tools they have
| (img2img, basic in/outpainting) aren't enough for
| production.
|
| In contrast, Stable Diffusion has a myriad of non-textual
| tools around it right now - style/concept/object transfer
| of all sorts, live painting, skeleton-based character
| posing, neural rendering, conceptual sliders that can be
| created at will, lighting control, video rotoscoping,
| etc. And plugins for existing digital painting and 3D
| software leveraging all this witchcraft.
|
| All this is extremely experimental and janky right now.
| It will be figured out in the upcoming years, though. (if
| only community's brains weren't deep fried by porn...)
| This is exactly the sort of tooling the industry needs to
| get shit done.
| kranke155 wrote:
| Ah ok yes I agree. How many years is really the million
| dollar question. I've begun to act as if it's around 5
| years and sometimes I think I'm being too conservative.
| trostaft wrote:
| Seems like the bibtex on the page is broken? Or might just be an
| extension of mine.
| alexose wrote:
| It sure feels like every remaining hard problem (i.e., the ones
| where we haven't made much progress since the 90s) is in line to
| be solved by transformers in some fashion. What a time to be
| alive.
| mclanett wrote:
| This is very cool. You can start with an image, generate a mesh
| for it, render it, and then compare the render to the image.
| Fully automated training.
| de6u99er wrote:
| continous training
| j7ake wrote:
| I love this field. Paper include a nice website, examples, and
| videos.
|
| So much more refreshing than the dense abstract, intro, results
| paper style.
| valine wrote:
| Even if this is "only" mesh autocomplete, it is still massively
| useful for 3D artists. There's a disconnect right now between how
| characters are sculpted and how characters are animated. You'd
| typically need a time consuming step to retopologize your model.
| Transformer based retopology that takes a rough mesh and gives
| you clean topology would be a big time saver.
|
| Another application: take the output of your gaussian splatter or
| diffusion model and run it through MeshGPT. Instant usable assets
| with clean topology from text.
| toxik wrote:
| What you have to understand is that these methods are very
| sensitive to what is in distribution and out of distribution.
| If you just plug in user data, it will likely not work.
| mattigames wrote:
| Lol for 3D artists, this will be used 99% by people who have
| have never created a mesh by hand in their lifes; to replace
| their need to hire a 3D artist: programmers who don't want (or
| can't) pay a designer, architects who never learned nothing
| other than CAD, fiver "jobs", et al
|
| I don't think people here realize how are we inching to
| automating the automation itself, and the programmers who will
| be able to make a living out of this will be a tiny fraction of
| those who can make a living out of it today.
| bradleyishungry wrote:
| sorry to tell you, but there's no way anything will be
| generating clean topology for characters for a long long time.
| valine wrote:
| There's no shortage of 3D mesh data to train on. Who to say
| scaling up the parameter count won't allow for increasingly
| intricate topology the same way scaling language models
| improved reading comprehension.
| toxik wrote:
| This was done years ago, with transformers. It was then dubbed
| Polygen.
| Sharlin wrote:
| You might want to RTFA. Polygen and other prior art are
| mentioned. This approach is superior.
| toxik wrote:
| I read the article. It has exactly the same limitations as
| Polygen from what I can tell.
| dymk wrote:
| Their comparison against PolyGen looks like it's a big
| improvement. What are the limitations that this has in
| common with PolyGen that make it still not useful?
| toxik wrote:
| I don't think it's as widely applicable as they try to
| make it seem. I have worked specifically with PolyGen,
| and the main problem is "out of distribution" data.
| Basically anything you want to do will likely be outside
| the training distribution. This surfaces as sequencing.
| How do you determine which triangle or vertex to place
| first? Why would a user do it that way? What if I want to
| draw a table with the legs last? Cannot be done. The
| model is autoregressive.
| GaggiX wrote:
| First, you use the word "transformers" to mean "autoregressive
| models", they are not synonymous, second, this model beats
| Polygen on every metric, it's not even close.
| mlsu wrote:
| The next breakthrough will be the UX to create 3d scenes in front
| of a model like this, in VR. This would basically let you
| _generate_ a permanent, arbitrary 3D environment, for any
| environment for which we have training data.
|
| Diffusion models could be used to generate textures.
|
| Mark is right and so so early.
| ShamelessC wrote:
| Mark?
|
| edit: Oh, _that_ Mark? lol okay
|
| edit edit: Maybe credit Lecun or something? Mark going all in
| on the metaverse was definitely not because he somehow
| predicted deep learning would take off. Even the people who
| trained the earliest models weren't sure how well it would
| work.
| amelius wrote:
| Is this limited to shapes that have mostly flat faces?
| catapart wrote:
| Dang, this is getting so good! Still got a ways to go, with the
| weird edges, but at this point, that feels like 'iteration
| details' rather than an algorithmic or otherwise complex problem.
|
| It's really going to speed up my pipeline to not have to pipe all
| of my meshes into a procgen library with a million little mesh
| modifiers hooked up to drivers. Instead, I can just pop all of my
| meshes into a folder, train the network on them, and then start
| asking it for other stuff in that style, knowing that I won't
| have to re-topo or otherwise screw with the stuff it makes,
| unless I'm looking for more creative influence.
|
| Of course, until it's all the way to that point, I'm still better
| served by the procgen; but I'm very excited by how quickly this
| is coming together! Hopefully by next year's Unreal showcase,
| they'll be talking about their new "Asset Generator" feature.
| truckerbill wrote:
| Do you have a recommended procgen lib?
| catapart wrote:
| Oh man, sorry, I wish! I've been using cobbled together bits
| of python plugins that handle Blender's geometry nodes, and
| the geometry scripts tools in Unreal. I haven't even ported
| over to their new proc-gen tools, which I suspect can be
| pretty useful.
| circuit10 wrote:
| Can this handle more organic shapes?
| LarsDu88 wrote:
| As a machine learning engineer who dabbles with Blender and hobby
| gamedev, this is pretty impressive, but not quite to the point of
| being useful in any practical manner (as far as the limited
| furniture examples are concerned.
|
| A competent modeler can make these types of meshes in under 5
| minutes, and you still need to seed the generation with polys.
|
| I imagine the next step will be to have the seed generation
| controlled by an LLM, and to start adding image models to the
| autoregressive parts of the architecture.
|
| Then we might see truly mobile game-ready assets!
| th0ma5 wrote:
| This is a very underrated comment... As with any tech demo, I'd
| they don't show it, it can't do it. It is very very easy to
| imagine a generalization of these things to other purposes,
| which, if it could do it, would be a different presentation.
| rawrawrawrr wrote:
| It's research, not meant for commercialization. The main
| point is in the process, not necessarily the output.
| th0ma5 wrote:
| What? If the research doesn't show it, it can't do it, is
| my point, or else they would've put it in their research.
| empath-nirvana wrote:
| > A competent modeler can make these types of meshes in under 5
| minutes.
|
| I don't think this general complaint about AI workflows is that
| useful. Most people are not a competent <insert job here>. Most
| people don't know a competent <insert job here> or can't afford
| to hire one. Even something that takes longer than a
| professional do at worse quality for many things is better than
| _nothing_ which is the realistic alternative for most people
| who would use something like this.
| cannonpalms wrote:
| Is the target market really "most people," though? I would
| say not. The general goal of all of this economic investment
| is to improve the productivity of labor--that means first and
| foremost that things need to be useful and practical for
| those trained to make determinations such as "useful" and
| "practical."
| taneq wrote:
| Millions of people generating millions of images (some of
| them even useful!) using Dall-E and Stable Diffusion would
| say otherwise. A skilled digital artist could create most
| of these images in an hour or two, I'd guess... but 'most
| people' certainly could not, and it turns out that these
| people really want to.
| reubenmorais wrote:
| Are those millions of people actually creating something
| of lasting value, or just playing around with a new toy?
| Filligree wrote:
| Is there a problem with the latter?
| willy_k wrote:
| A lot, but how many people will start with the latter but
| find themselves (capable of) doing the former?
| chefandy wrote:
| > I don't think this general complaint about AI workflows is
| that useful
|
| Maybe not to you, but it's useful if you're in these fields
| professionally, though. The difference between a neat
| hobbyist toolkit and a professional toolkit has gigantic
| financial implications, even if the difference is minimal to
| "most people."
| ajuc wrote:
| Linux vs Unix. Wikipedia vs Britannica. GCC vs Intel
| compiler. Good enough free hobby toy beats expansive
| professional tools given enough hobbysts.
| johnnyanmac wrote:
| 1. they don't beat them outight. It's simply more
| accessible.
|
| 2. those "hobbyists" in all examples are in fact
| professionals now. That's why they could scale up.
| chefandy wrote:
| First, we're talking about the state of the technology
| and what it can produce, not the fundamental worthiness
| of the approach. Right now, it's not up to the task. In
| the earliest phases of those technologies, they also
| weren't good enough for for professional use cases.
|
| Secondly, the number of hobbyists only matters if you're
| talking about hobbyists that develop the technology-- not
| hobbyists that _use_ the technology. Until those tools
| are good enough, you could have every hobbyist on the
| planet collectively attempting to make a Disney-quality
| character model with tools that aren 't capable of doing
| so and it wouldn't get much closer to the requisite
| result than a single hobbyist doing the same.
| stuckinhell wrote:
| Blender is an another good example
| hutzlibu wrote:
| Blender was a professional tool from the start. The
| company behind it went insolvent ... and with
| crowdfunding the source could be freed.
| chefandy wrote:
| Right-- being open source doesn't automatically mean it's
| an amateur tool or has its roots in a collective hobbyist
| effort.
| johnnyanmac wrote:
| >Most people don't know a competent <insert job here> or
| can't afford to hire one
|
| May be relevant in the long run, but it'll probably be 5+
| years before this is commercially available. And it won't be
| cheap either, so out of the range of said people who can't
| hire a competent <insert job here>
|
| That's why a lot of this stuff is pitched to companies with
| competent people instead of offered as a general product to
| download.
| hipadev23 wrote:
| > but it'll probably be 5+ years before this is
| commercially available
|
| I think you should look at the progress of image, text, and
| video generation over the past 12 months and re-asses your
| timeline prediction.
| johnnyanmac wrote:
| Availability =/= viability. I'm sure as we speak some
| large studios are already leveraging this work or are
| close to leveraging it.
|
| But this stuff trickles down to the public very slowly.
| Because indies aren't a good audience to sell what is
| likely an expensive tech that is focused on mid-large
| scale production.
| willy_k wrote:
| Yes but no, none of that really describes current
| development.
| johnnyanmac wrote:
| perhaps, but I was responding to
|
| >Most people are not a competent <insert job here>. Most
| people don't know a competent <insert job here> or _can
| 't afford to hire one._
|
| emphasis mine. Affordability doesn't have much to do with
| capabilities, but it is a strong factor to consider for
| an indie dev. Devs in fields (games, VFX) that don't
| traditionally pay well to begin with.
| LarsDu88 wrote:
| I have no doubt that 3d modeling will become commodified
| in the same way that art has with the dawn of AI art
| generation over the past year.
|
| I honestly think we'll get there within 18 months.
|
| My skepticism is whether the technique described here
| will be the basis of what people will be using in ~2
| years to replace their low level static 3d asset
| generation.
|
| There are several techniques out there, leveraging
| different sources of data right now. This looks like a
| step in the right direction, but who knows.
| jamilton wrote:
| Is there a reason to expect it'd be significantly more
| expensive than current-gen LLM? Reading the "Implementation
| Details" section, this was done with GPT2-medium, and
| assuming running it is about as intensive as the original
| GPT2, it can be run (slowly) on a regular computer, without
| a graphics card. Seems reasonable to assume future versions
| will be around GPT-3/4's price.
| 22c wrote:
| Agreed! There's also no way this is 5 years away from
| being viable.
|
| I just checked the timestamps on my Dall-E Mini generated
| images. They're dated June 2022
|
| This is what people were doing on commodity hardware back
| then:
|
| https://cdn-
| uploads.huggingface.co/production/uploads/165537...
|
| This is what people are doing on commodity hardware now:
|
| https://civitai.com/images/3853761
|
| I'm not even going to try to predict what we'll be able
| to do in 2 years time; even when accounting for the
| current GenAI hype/bubble!
| johnnyanmac wrote:
| Perhaps not, but it begs the question of if GPT is
| affordable for a dev to begin with. I don't know how they
| would monetize this sort of work so it's hard to say. But
| making game models probably requires a lot more
| processing power than generating text or static images.
| stuckinhell wrote:
| The open source art ai community is far more mature than
| people think.
| johnnyanmac wrote:
| 2d, maybe. 3D, I haven't seen anything close to a game
| ready asset.
| Kaijo wrote:
| The mesh topology here would see these rejected as assets for
| in basically any professional context. A competent modeler
| could make much higher quality models, more suited to texturing
| and deformation, in under five minutes. A speed modeler could
| make the same in under a minute. And a procedural system in
| something like Blender geonodes can already spit out an endless
| variety of such models. But the pace of progress is staggering.
| johnnyanmac wrote:
| I see it as a black triangle[0] more than anything else.
| Sounds like a really good first step that will scale to stuff
| that would take even a good modeler days to produce. That's
| where the real value will start to be seen.
|
| [0]: https://rampantgames.com/blog/?p=7745
| GaggiX wrote:
| A simple next step would be to simply scale the model, make it
| bigger, and train it on millions of images in the wild.
| hipadev23 wrote:
| > A competent modeler can make these types of meshes in under 5
| minutes
|
| Sweet. Can you point me to these modelers who work on-demand
| and bill for their time in 5 minute increments? I'd love to be
| able to just pay $1-2 per model and get custom <whatever>
| dropped into my game when I need it.
| dvngnt_ wrote:
| they said competent though no cheap
| quickthrower2 wrote:
| but the AI will be cheap. $1 per model would be the OpenAI
| wrapper's price. Let alone the wholesale price.
| bufferoverflow wrote:
| There's no competent modeler that can produce 12 models per
| hour for 8 hours a day, let alone 24/7.
|
| Sure, you can probably demo your skills on one such model,
| but to do it consistently non-stop is a fantasy.
| Art9681 wrote:
| Just like a competent developer can use LLMs to bootstrap
| workflows, a competent model will soon have tools like this as
| part of their normal workflow. A casual user would be able to
| do things that they otherwise wouldnt have been able to. But an
| expert in the ML model's knowledge domain can really make it
| shine.
|
| I really believe that the more experienced you are in a
| particular use case, the more use you can get out of an ML
| model.
|
| Unfortunately, it's those very same people that seem to be the
| most resistant to adopting this without really giving it the
| practice required to get somewhere useful with it. I suppose
| part of the problem is we expect it to be a magic wand. But
| it's really just the new PhotoShop, or Blender, or Microsoft
| Word, or PowerPoint ...
|
| Most people open those apps, click mindleslly for a bit,
| promptly leave never to return. And so it is with "AI".
| eropple wrote:
| I think eventually it may settle into what you describe. I
| don't think it's guaranteed, and I fear that there will be a
| pretty huge amount of damage done before that by the hype
| freaks whose real interest isn't in making artists more
| productive, but in rendering them (and other members of the
| actually-can-do-a-thing creative class) _unemployed_.
|
| The pipeline problem also exists: if you need to still have
| the skillsets you build up through learning the craft, you
| still need to have avenues to learn the craft--and the people
| who already have will get old eventually.
|
| There's a golden path towards a better future for everybody
| out of this, but a lot of swamps to drive into instead
| without careful forethought.
| WhitneyLand wrote:
| As I understand it their claim is more about efficiency and
| quality.
|
| Being able to model something - is way different from being
| able to do it in the least amount of triangles and/or without
| losing details.
| stuckinhell wrote:
| Until you create an AI to do those other parts too. (There is
| an AI being tested right now that tries to do that in the
| game dev community)
| esperent wrote:
| > A competent modeler can make these types of meshes in under 5
| minutes
|
| It's not about competent modellers, any more than SD is for
| expert artists.
|
| It's about giving tools to the non-experts. And also about
| freeing up those competent modellers to work on more
| interesting things than the 10,000 chair variants needed for
| future AAA games. They can work on making unique and
| interesting characters instead, or novel futuristic models that
| aren't in the training set and require real imagination
| combined with their expertise.
| adventured wrote:
| Like most of the generative AI space, it'll eliminate
| something like the bottom half of modelers, and turn them
| into lower paid prompt wizards. The top half will become
| combo modelers / prompt wizards, using both skillsets as
| needed.
|
| Prompt wizard hands work off to the finisher/detailer.
|
| It'll boost productivity and lead to higher quality finished
| content. And you'll be able to spot when a production -
| whether video game or movie - lacks a finisher (relying just
| on generation by prompt). The objects won't have that higher
| tier level of realism or originality.
| boppo1 wrote:
| >freeing up those competent modellers to work on more
| interesting things than the 10,000 chair variants needed for
| future AAA games. They can work on making unique and
| interesting characters instead, or novel futuristic models
| that aren't in the training set and require real imagination
| combined with their expertise.
|
| Or flipping burgers at McDonald's!
|
| There are only so many games that the market can support, and
| in those, only so many unique characters[0] that are
| required. We're pretty much at saturation already.
|
| [0]Not to mention that if AI can generate chairs, from what
| we have seen from Dall-E & SDXL, it can generatte characters
| too. Less great than human generated ones? Sure, but it's
| clear that big boys like Bethesda and Activision do not care.
| eurekin wrote:
| I can imagine one usecase, in a typical architecture design,
| where the architect creates a design and always faces this
| stumbling block, when wanting to make it look as lively as
| possible: sprinkling a lot of convincing assets everywhere.
|
| As they are generated, variations are much easier to come by
| easier, than buying a couple asset packs.
| frozencell wrote:
| Not reproducible with code = Not research.
| beebeepka wrote:
| Games and pretty much any other experience being generated by AI
| is obvious to anyone paying attention at this point. But how
| would it work. Are current ai generated images and videos using
| rasterisation? Will they use rasterisation, path tracing or any
| other traditional rendering technique, or is will it be an
| entirely different thing.
| wolfgang805 wrote:
| Why would a video or an image, something generated without a
| mesh, be using rasterization?
| beebeepka wrote:
| If it's faster to generate? I don't know, that's what I am
| asking
| KyleLewis wrote:
| Cant wait for the "multimodal" version that can take a written
| description and generate meshes
| m3kw9 wrote:
| So you train it with vector sequences that represent furnitures
| and it predicts the next token(triangles), so how is this
| different from it ChatGPT was trained with the same sequences and
| can output all the 3d locations and trangle size/lengths in
| sequence and have a 3d program piece it together?
| btbuildem wrote:
| This is fantastic! You can broad-strokes sketch the key strokes
| of the shape you want, and this will generate some "best" matches
| around that.
|
| What I really appreciate about this is that they took the concept
| (transformers) and applied it in a quite different-from-usual
| domain. Thinking outside of the (triangulated) box!
| Stevvo wrote:
| Fantastic, but still useless from a professional perspective.
| i.e. A mesh that represents a cube as 12 triangles is a better
| prestation of the form than previous efforts, but barely more
| usable.
|
| Whilst it might not be the solution I'm waiting for, I can now
| see it as possible. If an AI model can handle traingles, it might
| handle edge loops and NURBS curves.
| BrokrnAlgorithm wrote:
| I'm not a 3D artist, but why are we still, for lack of a better
| word, "stuck" with having / wanting to use simple meshes? I
| appreciate the simplicity, but isn't this an unnecessary
| limitation of mesh generation? It feels like an approach that
| imitates the constraints of having both limited hardware and
| artist resources. Shouldn't AI models help us break these
| boundaries?
| ipsum2 wrote:
| We're not stuck on meshes. Check out neural radiance fields as
| an alternative.
| fireant wrote:
| My understanding is that it's quite hard to make convex
| objects with radiance fields, right? For example the
| furniture in OP would be quite problematic.
|
| We can create radiance fields with photogrammetry, but IMO we
| need much better algorithms for transforming these into high
| quality triangle meshes that are usable in lower triangle
| budget media like games.
| BrokrnAlgorithm wrote:
| "Lower triangle budget media" is what I wonder if its still
| a valid problem. Modern game engines coupled with modern
| hardware can already render insane number of triangles. It
| feels like the problem is rather in engines not handling
| LOD correctly (see city skylines 2), although stuff like
| UE5 nanite seems to have taken the right path here.
|
| I suppose though there is a case for AI models for example
| doing what nanite does entirely algorithmically and
| research like this paper may come in handy there.
| BrokrnAlgorithm wrote:
| I was referring to being stuck with having to create simple /
| low tri polygonal meshes as opposed to using complex poly
| meshes such as photogrammetry would provide. The paper
| specifically addresses clean low poly meshes as opposed to
| what they call complex iso surfaces created by photogrammetry
| and other methods
| hackerlight wrote:
| Lots of polys is bad for performance. For a flat object
| like a table you want that to be low poly. Parallax can
| also help to give a 3D look without increasing poly count.
| DeathArrow wrote:
| So maybe in a few years we can ask AI to generate a level or
| entire game.
| wolfgang805 wrote:
| It would be nice to be see work and be part of a field that did
| work that humans could not do, instead of creating work that just
| replaces what humans already know how to do.
| Mizza wrote:
| Great work. But I don't get from the demo how it knows what
| object to autocomplete the mesh with - if you give it four posts
| as an input, how does it know to autocomplete as a table and not
| a dog?
|
| So maybe the next step is something like CLIP, but for meshes?
| CLuMP?
| jhiggins777 wrote:
| Really cool, but in 3d modeling triangles are a "no no". You are
| taught early on to design in quads.
___________________________________________________________________
(page generated 2023-11-29 23:02 UTC)