hngopher.com

       [HN Gopher] Stable Fast 3D: Rapid 3D Asset Generation from Singl...
       ___________________________________________________________________
        
       Stable Fast 3D: Rapid 3D Asset Generation from Single Images
        
       Author : meetpateltech
       Score  : 239 points
       Date   : 2024-08-01 15:16 UTC (7 hours ago)
        
 (HTM) web link (stability.ai)
 (TXT) w3m dump (stability.ai)
        
       | talldayo wrote:
       | > 0.5 seconds per 3D asset generation on a GPU with 7GB VRAM
       | 
       | Holy cow - I was thinking this might be one of those datacenter-
       | only models but here I am proven wrong. 7GB of VRAM suggests this
       | could run on a lot of hardware that 3D artists own already.
        
       | msp26 wrote:
       | You can interact with the models on their project page:
       | https://stable-fast-3d.github.io/
        
       | calini wrote:
       | I'm going to 3D print so much dumb stuff with this.
        
         | jsheard wrote:
         | They're still hesitant to show the untextured version of the
         | models so I would assume it's like previous efforts where most
         | of the detail is in the textures, and the model itself, the
         | part you would 3D print, isn't so impressive.
        
           | yazzku wrote:
           | I was going to comment on the same; these 3d reconstructions
           | often generate a mess of a topology, and this post does not
           | show any of the mesh triangulations, so I assume they're
           | still not good. Arguably, the meshes are bad even for
           | rendering.
        
             | dlivingston wrote:
             | Presumably, these meshes can be cleaned up using standard
             | mesh refinement algorithms, like those found in MeshLab:
             | https://www.meshlab.net/#features
        
               | Keyframe wrote:
               | Hopefully that's in the (near) future, but as of now
               | there still exists 'retopo' in 3D work for a reason. Just
               | like roto and similar menial tasks. We're getting there
               | with automation though.
        
           | mft_ wrote:
           | You can download a .glb file (from the HuggingFace demo page)
           | and open it locally (e.g. in MS 3D Viewer). I'm looking at a
           | mesh from one of the better examples I tried and it's
           | actually pretty good...
        
           | jayd16 wrote:
           | You know I do wonder about this. If its just for static
           | assets does it really matter? In something like Unreal, the
           | textures are going to be virtualized and the geometry is
           | going to be turned in to LODed triangle soup anyway.
           | 
           | Has anyone tried to build an Unreal scene with these
           | generated meshes?
        
             | jsheard wrote:
             | Usually the problem is the model itself is severely lacking
             | in detail, sure Nanite could make light work of a poorly
             | _optimized_ model but it 's not going to fix the model
             | being a vague blob which doesn't hold up to close scrutiny.
        
               | kaibee wrote:
               | Generate the accompanying normal map and then just
               | tesselate it?
        
               | andybak wrote:
               | So don't use them in a context where they require close
               | scrutiny?
        
         | fragmede wrote:
         | hueforge
        
       | bloopernova wrote:
       | Closer and closer to the automatic mapping drones from
       | _Prometheus_.
       | 
       | I wonder what the optimum group of technologies is that would
       | enable that kind of mapping? Would you pile on LIDAR, RADAR, this
       | tech, ultrasound, magnetic sensing, etc etc. Although, you're
       | then getting a flying tricorder. Which could enable some cool
       | uses even outside the stereotypical search and rescue.
        
         | nycdatasci wrote:
         | High-res images from multiple perspectives should be
         | sufficient. If you have a consumer drone, this product (no
         | affiliation) is extremely impressive:
         | https://www.dronedeploy.com/
         | 
         | You basically select an area on a map that you want to model in
         | 3d, it flies your drone (take-off, flight path, landing), takes
         | pictures, uploads to their servers for processing, generates
         | point cloud, etc. Very powerful.
        
           | thetoon wrote:
           | What you could do with WebODM is already quite impressive
        
         | alsodumb wrote:
         | Are you talking about mapping tunnels with drones? That's
         | already done and it doesn't really need any 'AI': it's plain
         | old SLAM.
         | 
         | DARPA's subterranean challenge had many teams that did some
         | pretty cool stuff in this direction:
         | https://spectrum.ieee.org/darpa-subterranean-challenge-26571...
        
         | sorenjan wrote:
         | You don't need or want generative AI for mapping, you "just"
         | need lidar and drones for slam.
         | 
         | https://www.youtube.com/watch?v=1CWWP9jb4cE
        
         | pzo wrote:
         | You already have depth anything v2 that can generate depthmap
         | in realtime even on iPhone. Quality is pretty good but probably
         | will be even improved in the future. Actually in many ways
         | those depthmaps are much better quality than iPhone Lidar or
         | Truedepth camera (that cannot handle transparent, metalic,
         | reflective surfaces and also they have a big noise).
         | 
         | https://github.com/DepthAnything/Depth-Anything-V2
         | 
         | https://huggingface.co/spaces/pablovela5620/depth-compare
         | 
         | https://huggingface.co/apple/coreml-depth-anything-v2-small
        
       | specproc wrote:
       | Be still my miniature-painting heart.
        
       | timr wrote:
       | For all of the hype around LLMs, this general area (image
       | generation and graphical assets) seems to me to be the big long-
       | term winner of current-generation AI. It hits the sweet spot for
       | the fundamental limitations of the methods:
       | 
       | * so-called "hallucination" (actually just how generative models
       | work) is a feature, not a bug.
       | 
       | * anyone can easily _see_ the unrealistic and biased outputs
       | without complex statistical tests.
       | 
       | * human intuition is useful for evaluation, and not fundamentally
       | misleading (i.e. the equivalent of _" this text sounds fluent, so
       | the generator must be intelligent!"_ hype doesn't really exist
       | for imagery. We're capable of treating it as _technology_ and
       | evaluating it fairly, because there 's no equivalent human
       | capability.)
       | 
       | * even lossy, noisy, collapsed and over-trained methods can be
       | valuable for different creative pursuits.
       | 
       | * perfection is not required. You can easily see distorted
       | features in output, and iteratively try to improve them.
       | 
       | * consistency is not required (though it will unlock hugely
       | valuable applications, like video, should it ever arrive).
       | 
       | * technologies like LoRA allow even unskilled users to train
       | character-, style- or concept-specific models with ease.
       | 
       | I've been amazed at how much better image / visual generation
       | models have become in the last year, and IMO, the pace of
       | improvement has not been slowing as much as text models.
       | Moreover, it's becoming increasingly clear that the future isn't
       | the wholesale replacement of photographers, cinematographers,
       | etc., but rather, a generation of crazy AI-based power tools that
       | can do things like add and remove _concepts_ to imagery with a
       | few text prompts. It 's insanely useful, and just like Photoshop
       | in the 90s, a new generation of power-users is already emerging,
       | and doing wild things with the tools.
        
         | CuriouslyC wrote:
         | Image models are a great way to understand generate AI. It's
         | like surveying a battlefield from the air as opposed to the
         | ground.
        
         | ibash wrote:
         | > anyone can easily see the unrealistic outputs without complex
         | statistical tests.
         | 
         | This is key, we're all pre-wired with fast correctness tests.
         | 
         | Are there other data types that match this?
        
           | batch12 wrote:
           | Audio to a lesser degree
        
           | sounds wrote:
           | Software (I mean the product, not the code)
           | 
           | Mundane tasks that can be visually inspected at the end
           | (cleaning, organizing, maintenance and mechanical work)
        
         | leetharris wrote:
         | > For all of the hype around LLMs, this general area (image
         | generation and graphical assets) seems to me to be the big
         | long-term winner of current-generation AI. It hits the sweet
         | spot for the fundamental limitations of the methods:
         | 
         | I am biased (I work at Rev.com and Rev.ai), but I totally agree
         | and would add one more thing: transcription. Accurate human
         | transcription takes a really, really long time to do right.
         | Often a ratio of 3:1-10:1 of transcriptionist time to original
         | audio length.
         | 
         | Though ASR is only ~90-95% accurate on many "average" audios,
         | it is often 100% accurate on high quality audio.
         | 
         | It's not only a cost savings thing, but there are entire
         | industries that are popping up around AI transcription that
         | just weren't possible before with human speed and scale.
        
           | timr wrote:
           | I agree. I think it's more of a niche use-case than image
           | models (and fundamentally harder to evaluate), but
           | transcription and summarization is my current front-runner
           | for winning use-case of LLMs.
           | 
           | That said, "hallucination" is more of a fundamental problem
           | for this area than it is for imagery, which is why I still
           | think imagery is the most interesting category.
        
           | llm_trw wrote:
           | Is there any models that can do diarization well yet?
           | 
           | I need one for a product and the state of the art, e.g.
           | pyannote, is so bad it's better to not use them.
        
             | throw03172019 wrote:
             | Deepgram has been pretty good for our product. Fast and
             | fairly accurate for English.
        
               | llm_trw wrote:
               | Do they have a local model?
               | 
               | I keep getting burned by APIs having stupid restrictions
               | that makes use cases impossible that are trivial if you
               | can run the thing locally.
        
           | toddmorey wrote:
           | Also the other way around: text to speech. We're at the point
           | where I can finally listen to computer generated voice for
           | extended periods of time without fatigue.
           | 
           | There was a project mentioned here on HN where someone was
           | creating audio book versions of content in the public domain
           | that would never have been converted through the time and
           | expense of human narrators because it wouldn't be
           | economically feasible. That's a huge win for accessibility.
           | Screen readers are also about to get dramatically better.
        
             | qup wrote:
             | > a project mentioned here on HN where someone was creating
             | audio book versions of content in the public domain
             | 
             | Maybe this: https://news.ycombinator.com/item?id=40961385
        
               | toddmorey wrote:
               | That's the one! Thanks!
        
         | mitthrowaway2 wrote:
         | > it's becoming increasingly clear that the future isn't the
         | wholesale replacement of photographers, cinematographers, etc.
         | 
         | I'd refrain from making any such statements about the future;*
         | the pace of change makes it hard to see the horizon beyond a
         | few years, especially relative to the span of a career. It's
         | already wholesale-replacing many digital artists and editorial
         | illustrators, and while it's still early, there's a clear push
         | starting in the cinematography direction. (I fully agree with
         | the rest of your comment, and it's strange how much diffusion
         | models seem to be overlooked relative to LLMs when people think
         | about AI progress these days.)
         | 
         | * (edit: about the future _impact of AI on jobs_ ).
        
           | timr wrote:
           | I mean, my whole comment is a prediction of the future, so
           | that's water under the bridge. Maybe you're right and this is
           | the start of the apocalypse for digital artists, but it feels
           | more like photoshop in 1990 to me -- and people were saying
           | the same stuff back then.
           | 
           | > It's already wholesale-replacing many digital artists and
           | editorial illustrators
           | 
           | I think you're going to need to cite some data on a claim
           | like that. Maybe it's replacing the fiverr end of the market?
           | It's certainly much harder to justify paying someone to
           | generate a (bad) logo or graphic when a diffusion model can
           | do the same thing, but there's no way that a model, today,
           | can replace a _skilled_ artist. Or said differently: a
           | skilled artist, combined with a good AI model, is vastly more
           | productive than an unskilled artist with the same model.
        
             | cjbgkagh wrote:
             | What happens when the AI takes the low end of the market is
             | that the people who catered to the low end now have to try
             | to compete more in the mid-to-high end. The mid end facing
             | increased competition has to try to move up to the high
             | end. So while AI may not be able to compete directly with
             | the high end it will erode the negotiating power and thus
             | the earning potential of the high end.
        
               | sroussey wrote:
               | We have watched this same process repeat a few times over
               | the last century with photography.
        
               | timr wrote:
               | Or graphic design, or video editing, or audio mastering,
               | or...every new tool has come with a bunch of people
               | saying things like _" what will happen to the linotype
               | operators!?"_
               | 
               | I sort of hate this line of argument, but it also has
               | been manifestly true of the past, and rhymes with the
               | present.
        
         | llm_trw wrote:
         | >For all of the hype around LLMs, this general area (image
         | generation and graphical assets) seems to me to be the big
         | long-term winner of current-generation AI.
         | 
         | Let me show you the future:
         | https://www.youtube.com/watch?v=eVlXZKGuaiE
         | 
         | This is an LLM controlling an embodied VR body in a physics
         | simulation.
         | 
         | It is responding to human voice input not only with voice but
         | body movements.
         | 
         | Transformers aren't just chatbots, they are general symbolic
         | manipulation machines. Anything that can be expressed as a
         | series of symbols is a thing they can do.
        
           | latentsea wrote:
           | >This is an LLM controlling an embodied VR body in a physics
           | simulation.
           | 
           | No it's not. It's VAM that is controlling the character and
           | it's literally just using a bog standard LLM as a chatbot and
           | feeding the text into a plugin in VAM and VAM itself does the
           | animation. Don't get me wrong it's absolutely next level to
           | experience chatbots this way, but it's still a chat bot.
        
             | llm_trw wrote:
             | The animation, not the movement decisions.
             | 
             | This is as native as calling an industrial robot 'just a
             | calculator'.
        
         | kkukshtel wrote:
         | > This general area (image generation and graphical assets)
         | seems to me to be the big long-term winner of current-
         | generation AI
         | 
         | I think it's easy to totally miss that LLMs are just being
         | completely and quietly subsumed into a ton of products. They
         | have been far more successful, and many image generation models
         | use LLMs on the backend to generate "better" prompts for the
         | models themselves. LLMs are the bedrock
        
         | derefr wrote:
         | I would argue the opposite -- image generation is the clear
         | loser. If you've ever tried to do it yourself, grabbing a bunch
         | of LoRAs from Civitai to try to convince a model to draw
         | something it doesn't initially know how to draw -- it becomes
         | clear that there's far too much unavoidable correlation between
         | "form" and "representation" / "style" going on in even a SOTA
         | diffusion model's hidden layers.
         | 
         | Unlike LLMs, that really seem to translate the text into
         | "concepts" at a certain embedding layer, the (current, 2D)
         | diffusion models will store (and thus require to be trained on)
         | a completely different idea of a thing, if it's viewed from a
         | slightly different angle, or is a different size. Diffusion
         | models can _interpolate_ but not _extrapolate_ -- they can 't
         | see a prompt that says "lion goat dragon monster" and come up
         | with the ancient-greek Chimera, unless they've actually been
         | _trained on_ a Chimera. You can tell them  "asian man, blond
         | hair" -- and if their training dataset contains asian men and
         | men with blonde hair but never _at the same time_ , then they
         | _won 't_ be able to "hallucinate" a blond asian man for you,
         | because that won't be an established point in the model's
         | latent space.
         | 
         | ---
         | 
         | On a tangent: IMHO the _true_ breakthrough would be a model for
         | "text to textured-3D-mesh" -- where it builds the model out of
         | parts that it shapes individually and assembles in 3D space not
         | out of tris, but _by writing /manipulating tokens representing
         | shader code_ (i.e. it creates "procedural art"); and then it
         | consistency-checks itself at each step not just against a
         | textual embedding, but _also_ against an arbitrary (i.e.
         | controlled for each layer at runtime by data) set of 2D
         | projections that can be decoded out _to_ textual embeddings.
         | 
         | (I imagine that such a model would need some internal
         | "blackboard" of representational memory that it can set up
         | arbitrarily-complex "lenses" for between each layer -- i.e. a
         | camera with an arbitrary projection matrix, through which is
         | read/written a memory matrix. This would allow the model to
         | arbitrarily re-project its internal working visual "conception"
         | of the model between each step, _in a way controllable by the
         | output of each step_. Just like a human would rotate and zoom a
         | 3D model while working on it[1]. But (presumably) with all the
         | edits needing a particular perspective done in parallel on the
         | first layer where that perspective is locked in.)
         | 
         | Until we have something like that, though, all we're really
         | getting from current {text,image}-to-{image,video} models is
         | the parallel layered inpainting of a _decently, but not
         | remarkably_ exhaustive pre-styled patch library, with each
         | patch of each layer being applied with an arbitrary Photoshop-
         | like  "layer effect" (convolution kernel.) Which is the big
         | reason that artists get mad at AI for "stealing their work" --
         | but also why the results just aren't very flexible. Don't have
         | a patch of a person's ear with a big earlobe seen in profile?
         | No big-earlobe ear in profile for you. It either becomes a
         | small-earlobe ear or the whole image becomes not-in-profile.
         | (Which is an improvement from earlier models, where _just the
         | ear_ became not-in-profile.)
         | 
         | [1] Or just like our _minds_ are known to rotate and zoom
         | objects in our  "spatial memory" to snap them into our mental
         | visual schemas!
        
           | earthnail wrote:
           | I think you're arguing about slightly different things. OP
           | said that image generation is useful despite all its
           | shortcomings, and that the shortcomings are easy to deal with
           | for humans. OP didn't argue that the image generation AIs are
           | actually smart. Just that they are useful tech for a variety
           | of use cases.
        
           | mrandish wrote:
           | > Until we have something like that...
           | 
           | The kind of granular, human-assisted interaction interface
           | and workflow you're describing is, IMHO, the high-value path
           | for the evolution of AI creative tools for non-text
           | applications such as imaging, video and music, etc. Using a
           | single or handful of images or clips as a starting place is
           | good but as a semi-talented, life-long aspirational creative,
           | current AI generation isn't that practically useful to me
           | without the ability to interactively guide the AI toward what
           | I want in more granular ways.
           | 
           | Ideally, I'd like an interaction model akin to real-time
           | collaboration. Due to my semi-talent, I've often done initial
           | concepts myself and then worked with more technically
           | proficient artists, modelers, musicians and sound designers
           | to achieve my desired end result. By far the most valuable
           | such collaborations weren't necessarily with the most
           | technically proficient implementers, but rather those who had
           | the most evolved real-time collaboration skills. The 'soft
           | skill' of interpreting my directional inputs and then
           | interactively refining or extrapolating them into new options
           | or creative combinations proved simply invaluable.
           | 
           | For example, with graphic artists I've developed a strong
           | preference for working with those able to start out by
           | collaboratively sketching rough ideas on paper in real-time
           | before moving to digital implementation. The interaction and
           | rapid iteration of tossing evolving ideas back and forth
           | tended to yield vastly superior creative results. While I
           | don't expect AI-assisted creative tools to reach anywhere
           | near the same interaction fluidity as a collaboratively-
           | gifted human anytime soon, even minor steps in this direction
           | will make such tools far more useful for concepting and
           | creative exploration.
        
             | derefr wrote:
             | ...but I wasn't describing a "human-assisted interaction
             | interface and workflow." I was describing a different way
             | for an AI to do things "inside its head" in a feed-forward
             | span-of-a-few-seconds inference pass.
        
         | thrance wrote:
         | Honestly, I am still to see an AI generated image that makes me
         | go "oh wow". It's missing those 10 last percents that always
         | seem to elude neural networks.
         | 
         | Also, the very bad press gen AI gets is very much slowing down
         | adoption. Particularly among the creative-minded people, who
         | would be the most likely users.
        
           | jokethrowaway wrote:
           | Hop on civitai
           | 
           | There's plenty of mindblowing images
        
       | quantumwoke wrote:
       | Great result. Just had a play around with the demo models and
       | they preserve structure really nicely; although the textures are
       | still not great. It's kind of a voxelized version of the input
       | image
        
       | mft_ wrote:
       | I'm really excited for something in this area to really deliver,
       | and it's really cool that I can just drag pictures into the demo
       | on HuggingFace [0] to try it.
       | 
       | However... mixed success. It's not good with (real) cats yet -
       | which was obvs the first thing I tried. It did reasonably well
       | with a simple image of an iPhone, and actually pretty
       | impressively with a pancake with fruit on top, terribly with a
       | rocket, and impressively again with a rack of pool balls.
       | 
       | [0] https://huggingface.co/spaces/stabilityai/stable-fast-3d
        
       | kleiba wrote:
       | This is good news for the indie game dev scene, I suppose?
        
         | jayd16 wrote:
         | The models aren't really optimized for game dev. Fine for
         | machinima, probably.
        
       | ww520 wrote:
       | This is a great step forward.
       | 
       | I wonder whether RAG based 3D animation generation can be done
       | with this.
       | 
       | 1. Textual description of a story.
       | 
       | 2. Extract/generate keywords from the story using LLM.
       | 
       | 3. Search and look up 2D images by the keywords.
       | 
       | 4. Generate 3D models from the 2D images using Stable Fast 3D.
       | 
       | 5. Extract/generate path description from the story using LLM.
       | 
       | 6. Generate movement/animation/gait using some AI.
       | 
       | ...
       | 
       | 7. Profit??
        
         | nwoli wrote:
         | Pre generate a bunch of images via sdxl and convert to 3d and
         | then serve nearest mesh after querying
        
       | nwoli wrote:
       | Huggingface space to try it
       | https://huggingface.co/spaces/stabilityai/stable-fast-3d
        
       | Y_Y wrote:
       | It really looks like they've been doing that classic infomercial
       | tactic of desaturating the images of the things they're comparing
       | against to make theirs seen better.
        
       | woolion wrote:
       | This is the third image to 3D AI I've tested, and in all cases
       | the examples they give look like 2D renders of 3D models already.
       | My tests were with cel-shaded images (cartoony, not with
       | realistic lighting) and the model outputs something very flat but
       | with very bad topology, which is worse than starting with a low
       | poly or extruding the drawing. I suspect it is unable to give
       | decent results without accurate shadows from which the normal
       | vectors could be recomputed and thus lacks any 'understanding' of
       | what the structure would be from the lines and forms.
       | 
       | In any case it would be cool if they specified the set of inputs
       | that is expected to give decent results.
        
         | quitit wrote:
         | It might not just be your tests.
         | 
         | All of my tests of img2mesh technologies have produced poor
         | results, even when using images that are very similar to the
         | ones featured in their demo. I've never got fidelity like what
         | they've shown.
         | 
         | I'll give this a whirl and see if it performs better.
        
           | quitit wrote:
           | Tried it with a collection of images, and in my opinion it
           | performs -worse- than earlier releases.
           | 
           | It is however fast.
        
           | woolion wrote:
           | All right, I was hesitating to try shading some images to see
           | if that improves the quality. It's probably still too early.
        
         | diggan wrote:
         | What stuck out to me from this release was this:
         | 
         | > Optional quad or triangle remeshing (adding only 100-200ms to
         | processing time)
         | 
         | But it seems to have been optional. Did you try it with that
         | turned on? I'd be very interested in those results, as I had
         | the same experience as you, the models don't generate good
         | enough meshes, so was hoping this one would be a bit better at
         | that.
         | 
         | Edit: I just tried it out myself on their Huggingface demo and
         | even with the predefined images they have there, the mesh
         | output is just not good enough. https://i.imgur.com/e6voLi6.png
        
       | nextworddev wrote:
       | For those reading from Stability - just tried it - API seems to
       | be down and the notebook doesn't have the example code it claimed
       | to have.
        
       | hansc wrote:
       | Looks very good on examples, but testing a few Ikea chairs or a
       | Donald Duck image gives very wrong results.
       | 
       | You can test here:
       | https://huggingface.co/spaces/stabilityai/stable-fast-3d
        
       | ksec wrote:
       | Given the Graphics Asset part of AA or AAA Games are the most
       | expensive, i wonder if 3D Asset Generation could perhaps
       | drastically lower that by 50% or more? At least in terms of same
       | output. Because in reality I guess artist will just spend more
       | time in other areas.
        
       | causi wrote:
       | Man it would be so cool to get AI-assisted photogrammetry.
       | Imagine that instead of taking a hundred photos or a slow scan
       | and having to labor over a point cloud, you could just take like
       | three pictures and then go down a checklist. "Is this circular?
       | How long is this straight line? Is this surface flat? What's the
       | angle between these two pieces?" and getting a perfect replica or
       | even a STEP file out of it. Heaven for 3D printers.
        
       | puppycodes wrote:
       | I really can't wait for this technology to improve. Unfortunately
       | just from testing this it seems not very useful. It takes more
       | work to modify the bad model it approximates from the image
       | output than starting with a good foundation from scratch. I would
       | rather see something that took a series of steps to reach a
       | higher quality end product more slowly instead of expecting
       | everything to come from one image. Perhaps i'm missing the use
       | case?
        
         | MrTrvp wrote:
         | Perhaps it'll require a series of segmentation and transforms
         | that improves individual components and then works up towards
         | the full 3d model of the image.
        
         | andybak wrote:
         | > not very useful
         | 
         | Useful for what? I think use cases will emerge.
         | 
         | A lot of critiques assume you're working in VFX or game
         | development. Making image to 3d (and by extension text to image
         | to 3d) effortless a whole host of new applications open up -
         | which might not be anywhere near so demanding.
        
       | fsloth wrote:
       | Not the holy grail yet, but pretty cool!
       | 
       | I see these usable not as main assets, but as something you would
       | add as a low effort embellishment to add complexity to the main
       | scene. The fact they maintain profile makes them usable for
       | situations where mere 2d billboard impostor (i.e the original
       | image always oriented towards the camera) would not cut it.
       | 
       | You can totally create a figure image (Midjourney|Bing|Dalle3)
       | and drag and drop it to the image input and get a surprising good
       | 3d presentation that is not a highly detailed model, but
       | something you could very well put to a shelf in a 3d scene as an
       | embellishment where the camera never sees the back of it, and the
       | model is never at the center of attention.
        
       | abidlabs wrote:
       | Official Gradio demo is here:
       | https://huggingface.co/spaces/stabilityai/stable-fast-3d
        
       ___________________________________________________________________
       (page generated 2024-08-01 23:01 UTC)