[HN Gopher] Stability.ai - Introducing Stable Video 3D
       ___________________________________________________________________
        
       Stability.ai - Introducing Stable Video 3D
        
       Author : ed
       Score  : 236 points
       Date   : 2024-03-18 20:06 UTC (2 hours ago)
        
 (HTM) web link (stability.ai)
 (TXT) w3m dump (stability.ai)
        
       | Filligree wrote:
       | If the animations shown are representative, then the mesh output
       | may very well be good enough to use in a 3d printer.
       | 
       | Looking forward to experimenting with this.
        
         | neom wrote:
         | I don't know much about 3D printing, would be very interested
         | in learning more about this idea if you'd be so kind as to
         | expand on it. Could I have AI spend all day auto scanning what
         | teens are doing on instagram, auto generate toys based on it,
         | auto generate advertisements for the toys, auto 3D print on
         | demand?
        
           | SirSourdough wrote:
           | Hypothetically, sure, assuming the parent comment that these
           | meshes are sufficient for modelling is correct and that you
           | can find any teens who want a non-digital toy.
           | 
           | I think a good hobbyist application for this would be
           | something like modelling figurines for games, which is
           | already a pretty popular 3D printing application. This would
           | allow people with limited modelling skills to bring
           | fantastical, unique characters to life "easily".
        
             | Filligree wrote:
             | Pretty much. We're already generating images of monsters
             | and characters for a D&D campaign; being able to print
             | those in 3D would be pretty amazing.
        
           | CobrastanJorji wrote:
           | I think their suggestion was more "I have a photo of a cool
           | horse, and now I would like a 3D model of that same horse."
           | 
           | Another way of looking at it, 3D artists often begin projects
           | by taking reference images of their subject from multiple
           | angles, then very manually turning that into a 3D model. That
           | step could potentially be greatly sped up with an algorithm
           | like this one. The artist could (hopefully) then focus on
           | cleanup, rigging, etc, and have a quality asset in
           | significantly less time.
        
           | maicro wrote:
           | OP is suggesting that this (AI model? I honestly am behind on
           | the terminology) could replace one of the common steps of 3D
           | printing - specifically, the step where you create a digital
           | representation of the physical object you would want to end
           | up with.
           | 
           | There are other steps to 3D printing in general, though; a
           | super rough outline:
           | 
           | - Model generation
           | 
           | - "Slicing" - processing the 3D model into instructions that
           | the 3D printer can handle, as well as adding any support
           | structures or other modifications to make it printable
           | 
           | - Printing - the actual printing process
           | 
           | - Post-processing - depending on the 3D printing technology
           | used, the desired resulting product, and the specific
           | model/slicing settings, this can be as simple as "remove from
           | bed and use" to "carefully snip off support structures, let
           | cure in a UV chamber for X minutes, sand and fill, then
           | paint"
           | 
           | As I said before, this AI model specifically would cover 3D
           | model generation. If you were to use a printing technology
           | that doesn't require support structures, and handles color
           | directly in the printing process (I think powder bed fusion
           | is the only real option here?), the entire process should be
           | fairly automatable - a human might be needed to remove the
           | part from the printer, but there might not be much post-
           | processing to do.
           | 
           | The rest of your desired workflow is a bit more nebulous - I
           | don't know how you would handle "scanning what teens are
           | doing on instagram", at least in a way that would let you
           | generate toys from the information; generating and posting
           | the advertisement shouldn't be too hard - have a standardish
           | template that you fill in with a render from the model, and
           | the description; printing on demand again is possible, though
           | you'll likely need a human to remove the part, check it for
           | quality and ship it. You could automate the latter, but that
           | would probably be more trouble than it's worth.
        
             | neom wrote:
             | Interesting, to be clear I don't think this is a good idea
             | and it's kinda my nightmare post capitalism hell. I just
             | think it's interesting this could be done now.
             | 
             | On finding out what teens want, that part is somewhat easy-
             | ish, I guess you'd need a couple of agents, one that is
             | scanning teen blogs for stories and then converting them to
             | key words, then another agent that takes the key words
             | (#taylorswift #HaileyBieberChiaPudding #latestkdrama etc)
             | into Instagram, after a while your recommend page will turn
             | into a pretty accurate representation of what teens are
             | into, then just have an agent look at those images and
             | generate difs of them. I doubt it would work for a bunch of
             | reasons, but it's an interesting thought experiment!
             | Thanks!
        
         | jsheard wrote:
         | With previous attempts at this problem the shaded examples
         | could be quite misleading because details that appeared to be
         | geometric were actually just painted over the surface as part
         | of the texture, so when you took that texture away you just had
         | a melted looking blob with nowhere near as much detail as you
         | thought. I'd reserve judgement until we see some unshaded
         | meshes.
         | 
         | What they show in the demo: https://i.imgur.com/9bZNTcd.jpeg
         | 
         | What comes out of the 3D printer:
         | https://i.imgur.com/MZrzsfh.png
        
           | SV_BubbleTime wrote:
           | It's always been this. None of these ever show the untextured
           | model.
           | 
           | When I see a demo where they are showing wireframes I know
           | it'll be good enough.
        
             | jsheard wrote:
             | Seems like a tougher nut to crack than image generation
             | was, since there isn't a bajillion high quality 3D models
             | lying around on the internet to use as training data,
             | everyone is trying to do 3D model generation as a second-
             | order system using images as the training data again. The
             | things that make 3D assets good, the tiny geometric details
             | that are hard to infer without many input views of the same
             | object, the quality of the mesh topology and UV mapping,
             | rigging and skinning for animation, reducing materials down
             | to PBR channels that can be fed into a renderer and so on
             | aren't represented in the input training data, so the model
             | is expected to make far more logical leaps than image
             | generators do.
        
               | refulgentis wrote:
               | It almost seems easier, in that you have an arbitrary #
               | of real world objects to scan and the hardware is heavily
               | commoditized (IIRC iPhones have this built in at highres
               | now?)
        
           | euazOn wrote:
           | Therefore, what is the main usecase of this model? Generating
           | cheap 3D assets for videogames?
        
       | ionwake wrote:
       | Im sorry for dumb lazy question. But would the input require more
       | than one image? Is there a demo url to test this? I think it
       | might jsut be time to buy a 3d printer.
       | 
       | EDIT> Does "single image inputs" mean more than one image?
        
         | kylebenzle wrote:
         | Single image means one image.
        
           | dartos wrote:
           | Can confirm the word single means 1
        
           | ionwake wrote:
           | lol cmon guys don't be too hard on me it does say "inputs"
        
             | stavros wrote:
             | I do see how "single image inputs" can be conflated with
             | "multiple inputs of a single image each time", as opposed
             | to "video".
        
               | ionwake wrote:
               | TBH I always look at the worst case scenario. I was
               | worried it meant it need 3 images inputted as a single
               | image at direct steps of the process, so requiring
               | different angles. I wasn't sure, but thought best to
               | check. I feel like it would have been clearer to have
               | said something like " generates a 3d models from a single
               | image". ( not exact wording but you catch my drift ).
               | Sorry I am over analysing but all feedback is good right?
        
             | ganeshkrishnan wrote:
             | Describe in single words only the good things that come
             | into your mind about... your mother.
        
         | simonw wrote:
         | It's just a single image. It guesses the shape of the bits it
         | can't see based on vast amounts of training data.
        
           | ionwake wrote:
           | Amazing! Thank you
        
       | airstrike wrote:
       | that demo animation is so clever and satisfying
        
         | amelius wrote:
         | But it doesn't look very realistic, tbh.
        
           | dreadlordbone wrote:
           | it doesn't break Euclidian space at least
        
       | ddtaylor wrote:
       | Does anyone know what hardware inference can run on or memory
       | requirements?
        
         | Mathnerd314 wrote:
         | In the repo the model weights file is 9.37GB, whereas sdxl
         | turbo is 13.9GB, and I don't see any mention of huge context
         | windows, so probably it just needs a decent graphics card.
        
         | kouteiheika wrote:
         | It crashes with an out-of-memory error on my 24GB 4090, so at
         | least when it comes to their sample script the answer is "a
         | lot". Maybe it's just an inefficient implementation though.
        
       | canadiantim wrote:
       | I can't wait until we can use something like this for
       | architectural design
        
       | kouteiheika wrote:
       | Just tried to run this using their sample script on my 4090
       | (which has 24GB of VRAM). It ran for a little over 1 minute and
       | crashed with an out-of-memory error. I tried both SV3D_u and
       | SV3D_p models.
        
         | ganeshkrishnan wrote:
         | 4090 is in weird spot. High speed but low RAM. Theoretically
         | everything should run in ai but practically nothing runs
        
           | LoganDark wrote:
           | 4090 has more VRAM than most computers have system RAM.
           | Surprised this is considered "low RAM" in any way except for
           | relative to datacenter cards and top-spec ASi.
        
           | jokethrowaway wrote:
           | What can't you run? Unquantised large text models are the
           | only thing I can't run
           | 
           | Stable diffusion, stable video, text models, audio models, I
           | never had issues with anything yet
        
         | GistNoesis wrote:
         | I managed to get it working with a 4090. You need to adjust the
         | parameter decoding_t of the sample function in
         | simple_video_sample.py to a lower value (decoding_t = 5 works
         | fine for me). I also needed to install imageio==2.19.3 and
         | imageio-ffmpeg
        
           | kouteiheika wrote:
           | Ah, yep! You're right! It works now!
        
       | bugbuddy wrote:
       | "3D"
        
       ___________________________________________________________________
       (page generated 2024-03-18 23:00 UTC)