[HN Gopher] Stability.ai - Introducing Stable Video 3D
___________________________________________________________________
Stability.ai - Introducing Stable Video 3D
Author : ed
Score : 692 points
Date : 2024-03-18 20:06 UTC (1 days ago)
(HTM) web link (stability.ai)
(TXT) w3m dump (stability.ai)
| Filligree wrote:
| If the animations shown are representative, then the mesh output
| may very well be good enough to use in a 3d printer.
|
| Looking forward to experimenting with this.
| neom wrote:
| I don't know much about 3D printing, would be very interested
| in learning more about this idea if you'd be so kind as to
| expand on it. Could I have AI spend all day auto scanning what
| teens are doing on instagram, auto generate toys based on it,
| auto generate advertisements for the toys, auto 3D print on
| demand?
| SirSourdough wrote:
| Hypothetically, sure, assuming the parent comment that these
| meshes are sufficient for modelling is correct and that you
| can find any teens who want a non-digital toy.
|
| I think a good hobbyist application for this would be
| something like modelling figurines for games, which is
| already a pretty popular 3D printing application. This would
| allow people with limited modelling skills to bring
| fantastical, unique characters to life "easily".
| Filligree wrote:
| Pretty much. We're already generating images of monsters
| and characters for a D&D campaign; being able to print
| those in 3D would be pretty amazing.
| CobrastanJorji wrote:
| I think their suggestion was more "I have a photo of a cool
| horse, and now I would like a 3D model of that same horse."
|
| Another way of looking at it, 3D artists often begin projects
| by taking reference images of their subject from multiple
| angles, then very manually turning that into a 3D model. That
| step could potentially be greatly sped up with an algorithm
| like this one. The artist could (hopefully) then focus on
| cleanup, rigging, etc, and have a quality asset in
| significantly less time.
| bobba27 wrote:
| The question is whether this actually "creates a 3d model
| based on the picture", or if it "finds an existing model
| that looks similar to the picture and texture map it".
| maicro wrote:
| OP is suggesting that this (AI model? I honestly am behind on
| the terminology) could replace one of the common steps of 3D
| printing - specifically, the step where you create a digital
| representation of the physical object you would want to end
| up with.
|
| There are other steps to 3D printing in general, though; a
| super rough outline:
|
| - Model generation
|
| - "Slicing" - processing the 3D model into instructions that
| the 3D printer can handle, as well as adding any support
| structures or other modifications to make it printable
|
| - Printing - the actual printing process
|
| - Post-processing - depending on the 3D printing technology
| used, the desired resulting product, and the specific
| model/slicing settings, this can be as simple as "remove from
| bed and use" to "carefully snip off support structures, let
| cure in a UV chamber for X minutes, sand and fill, then
| paint"
|
| As I said before, this AI model specifically would cover 3D
| model generation. If you were to use a printing technology
| that doesn't require support structures, and handles color
| directly in the printing process (I think powder bed fusion
| is the only real option here?), the entire process should be
| fairly automatable - a human might be needed to remove the
| part from the printer, but there might not be much post-
| processing to do.
|
| The rest of your desired workflow is a bit more nebulous - I
| don't know how you would handle "scanning what teens are
| doing on instagram", at least in a way that would let you
| generate toys from the information; generating and posting
| the advertisement shouldn't be too hard - have a standardish
| template that you fill in with a render from the model, and
| the description; printing on demand again is possible, though
| you'll likely need a human to remove the part, check it for
| quality and ship it. You could automate the latter, but that
| would probably be more trouble than it's worth.
| neom wrote:
| Interesting, to be clear I don't think this is a good idea
| and it's kinda my nightmare post capitalism hell. I just
| think it's interesting this could be done now.
|
| On finding out what teens want, that part is somewhat easy-
| ish, I guess you'd need a couple of agents, one that is
| scanning teen blogs for stories and then converting them to
| key words, then another agent that takes the key words
| (#taylorswift #HaileyBieberChiaPudding #latestkdrama etc)
| into Instagram, after a while your recommend page will turn
| into a pretty accurate representation of what teens are
| into, then just have an agent look at those images and
| generate difs of them. I doubt it would work for a bunch of
| reasons, but it's an interesting thought experiment!
| Thanks!
| jsheard wrote:
| With previous attempts at this problem the shaded examples
| could be quite misleading because details that appeared to be
| geometric were actually just painted over the surface as part
| of the texture, so when you took that texture away you just had
| a melted looking blob with nowhere near as much detail as you
| thought. I'd reserve judgement until we see some unshaded
| meshes.
|
| What they show in the demo: https://i.imgur.com/9bZNTcd.jpeg
|
| What comes out of the 3D printer:
| https://i.imgur.com/MZrzsfh.png
| SV_BubbleTime wrote:
| It's always been this. None of these ever show the untextured
| model.
|
| When I see a demo where they are showing wireframes I know
| it'll be good enough.
| jsheard wrote:
| Seems like a tougher nut to crack than image generation
| was, since there isn't a bajillion high quality 3D models
| lying around on the internet to use as training data,
| everyone is trying to do 3D model generation as a second-
| order system using images as the training data again. The
| things that make 3D assets good, the tiny geometric details
| that are hard to infer without many input views of the same
| object, the quality of the mesh topology and UV mapping,
| rigging and skinning for animation, reducing materials down
| to PBR channels that can be fed into a renderer and so on
| aren't represented in the input training data, so the model
| is expected to make far more logical leaps than image
| generators do.
| refulgentis wrote:
| It almost seems easier, in that you have an arbitrary #
| of real world objects to scan and the hardware is heavily
| commoditized (IIRC iPhones have this built in at highres
| now?)
| polygamous_bat wrote:
| How is building a dataset easier than using a prebuilt
| dataset?
| refulgentis wrote:
| In context, the conversation was beyond a dichotomy -
| thankfully. Having only 2 choices leaves conversation at
| people insisting one is better, and becomes an argument
| about definitions where people take turns alternating
| being "right" from the viewpoint of a neutral observer.
|
| It's proposing a solution to the author's observation
| that everyone is doing it in second order fashion and
| missing a significant amount of necessary data.
|
| The implication is that rather than doing it the hard way
| via the already-obtained 2nd order dataset, it'll be
| easier to get a new dataset, and getting that dataset
| will be significantly easier that it was to get the
| second-order dataset, as you don't need to worry about
| aesthetic variety as much as teaching what level of
| detail is needed in the mesh for it to be "real"
| derefr wrote:
| > since there isn't a bajillion high quality 3D models
| lying around on the internet to use as training data
|
| There aren't a bajillion high-quality 3D models of
| _everything_ , but there are an unbounded number of high-
| quality 3D models of _some_ things, due to the existence
| of procedural mesh systems for things like foliage.
|
| You could, at the very least, train an ML model to
| translate images of jungles into 3D meshes of the trees
| composing them right now.
|
| Although I wonder if having a few very-well-understood
| object types like these, to serve as a base, would be
| enough to allow such a model to deduce more generalized
| rules of optics, such that it could then be trained on
| other object categories with much smaller training
| sets...
| wincy wrote:
| I know where I could get several hundred terabytes (maybe
| an exabyte? It's constantly growing) of ultra high
| quality STL files designed for 3D printing. I just don't
| have the storage or the knowledge of how to turn those
| into a model that outputs new STL files.
|
| I'd imagine it'd require a ton of tagging, although I
| have a good idea of how I could leverage existing APIs to
| tag it mostly automatically by generating three still
| image thumbnails of the content, then feeding that
| through CLIP, and verifying that all two or three agree
| on what it's an STL of, and manually tag the ones that
| fail that test.
| supermatt wrote:
| There's a pretty big difference between hundreds of
| terabytes and an exabyte. Maybe you meant petabyte?
| clbrmbr wrote:
| Couldn't a deep network learn the latent 3D
| representation just on video input?
| bobba27 wrote:
| Yes. But it is still promising. Things are getting
| incrementally better.
|
| (I dream of the day when this can be used to automatically
| create paper-craft templates.)
| euazOn wrote:
| Therefore, what is the main usecase of this model? Generating
| cheap 3D assets for videogames?
| jsheard wrote:
| I don't think they have a specific use-case for this model,
| they're throwing ideas at the wall again in the hopes some
| of them stick and eventually turn into another product. The
| paper doesn't discuss any of the problems that would need
| to be solved in order to easily generate game-ready assets
| so I think it's safe to assume that it currently doesn't.
|
| For games at the very least you need to consider polygon
| budget, getting reasonably good UVs, and generating
| materials which fit into a PBR shader pipeline, at least if
| it's going to work with rendering pipelines as we know them
| today (as opposed to rendering neural representations
| directly, which is a thing people are trying to do but is
| totally unproven in production).
| pksebben wrote:
| I'd be willing to bet you could create a diffusion model
| to map unrefined meshes to UV-fixed and remeshed
| surfaces. If you had a large enough library of good
| meshes you just programmatically mess 'em up and use that
| as the dataset.
| strich wrote:
| There exists software to reproject texture normals back on to
| a high poly model. So this problem does have a solution for
| anyone interested.
| jsheard wrote:
| That's assuming your generator produces a normal map, the
| ones I've seen do not, the only texture channel they output
| is color. That being the one channel that a model trained
| on images is naturally equipped to produce.
| pksebben wrote:
| I may be speaking out of ignorance here, but couldn't you
| use photogrammetry techniques to translate these to a
| higher resolution mesh?
| zo1 wrote:
| Only if you have multiple images of the same areas so
| that you can extract actual position. And there is no
| guarantee that multiple pictures of the same model have
| the same detail, much less in a manner that can be
| triangulated with accuracy. A lot of the photogrammetry
| algorithms discard points that don't match certain error-
| bars.
|
| So yes, there might be a wooden frame in the middle of
| that window, but does it match the math on both angles of
| it? Doubt it.
| huytersd wrote:
| You can generate pretty reliable texture depth maps from
| just an image. It's going to be trash if you're trying to
| generate the depth for the entire 3D model but I presume
| it's going to go a good job with just texture. Then you
| just use a displacement based on the depth map.
| Oioioioiio wrote:
| There are AI models who can create proper meshes though.
| dgellow wrote:
| Which ones?
| Oioioioiio wrote:
| This for example:
| https://research.nvidia.com/labs/toronto-ai/flexicubes/
| ionwake wrote:
| Im sorry for dumb lazy question. But would the input require more
| than one image? Is there a demo url to test this? I think it
| might jsut be time to buy a 3d printer.
|
| EDIT> Does "single image inputs" mean more than one image?
| kylebenzle wrote:
| Single image means one image.
| dartos wrote:
| Can confirm the word single means 1
| ionwake wrote:
| lol cmon guys don't be too hard on me it does say "inputs"
| stavros wrote:
| I do see how "single image inputs" can be conflated with
| "multiple inputs of a single image each time", as opposed
| to "video".
| ionwake wrote:
| TBH I always look at the worst case scenario. I was
| worried it meant it need 3 images inputted as a single
| image at direct steps of the process, so requiring
| different angles. I wasn't sure, but thought best to
| check. I feel like it would have been clearer to have
| said something like " generates a 3d models from a single
| image". ( not exact wording but you catch my drift ).
| Sorry I am over analysing but all feedback is good right?
| ganeshkrishnan wrote:
| Describe in single words only the good things that come
| into your mind about... your mother.
| simonw wrote:
| It's just a single image. It guesses the shape of the bits it
| can't see based on vast amounts of training data.
| ionwake wrote:
| Amazing! Thank you
| exodust wrote:
| I have an even lazier question after failing to speed-read the
| article.
|
| Does this output an actual 3D mesh? Or does it only output a
| 3d-looking rendered animation?
| airstrike wrote:
| that demo animation is so clever and satisfying
| amelius wrote:
| But it doesn't look very realistic, tbh.
| dreadlordbone wrote:
| it doesn't break Euclidian space at least
| itsgrimetime wrote:
| I can't get them to play
| ddtaylor wrote:
| Does anyone know what hardware inference can run on or memory
| requirements?
| Mathnerd314 wrote:
| In the repo the model weights file is 9.37GB, whereas sdxl
| turbo is 13.9GB, and I don't see any mention of huge context
| windows, so probably it just needs a decent graphics card.
| kouteiheika wrote:
| It crashes with an out-of-memory error on my 24GB 4090, so at
| least when it comes to their sample script the answer is "a
| lot". Maybe it's just an inefficient implementation though.
| dragonwriter wrote:
| Pretty much every initial Stability release has been
| inefficient and has resources drop a lot when optimized for
| real consumer hardware community engines appeared for running
| the model.
|
| OTOH, with their shift to a less open licensing structure,
| community tooling probably won't emerge with the same level
| of energy.
| canadiantim wrote:
| I can't wait until we can use something like this for
| architectural design
| whywhywhywhy wrote:
| SDXL+Controlnet and then feeding it just blocked out depth maps
| are probably more useful for that.
| kouteiheika wrote:
| Just tried to run this using their sample script on my 4090
| (which has 24GB of VRAM). It ran for a little over 1 minute and
| crashed with an out-of-memory error. I tried both SV3D_u and
| SV3D_p models.
|
| [edit]Managed to generate by tweaking the script to generate less
| frames simultaneously. 19.5GB peak VRAM usage, 1 min 25 secs to
| generate at 225 watts.[/edit]
| ganeshkrishnan wrote:
| 4090 is in weird spot. High speed but low RAM. Theoretically
| everything should run in ai but practically nothing runs
| LoganDark wrote:
| 4090 has more VRAM than most computers have system RAM.
| Surprised this is considered "low RAM" in any way except for
| relative to datacenter cards and top-spec ASi.
| samplatt wrote:
| You're comparing RAM amounts to other RAM amounts without
| considering requirements. 24GB is more than (most) current
| games would ever require, but is considered a
| uncomfortably-constrictive minimum for most industrial
| work.
|
| Traditional CPU-bound physics/simulation models have
| typically wanted all the RAM they could get; the more RAM
| the more accurate the model. The same is true for AI
| models.
|
| I can max out 24GB just using spreadsheets and databases,
| let alone my 3D work or anything computational.
| jokethrowaway wrote:
| What can't you run? Unquantised large text models are the
| only thing I can't run
|
| Stable diffusion, stable video, text models, audio models, I
| never had issues with anything yet
| michaelt wrote:
| The 4090 is in a bit of a funny space for LLMs.
|
| There's a lot of open weights activity around 7B/13B models
| which the 4090 will run with ease. But you could can run
| those OK on much cheaper cards like the 4070Ti (which is of
| course why they're popular).
|
| And there's a lot of open weights activity around 70B and
| 8x7B models which are state-of-the-art - but too big to fit
| on a 4090. There's not much activity around 30B models,
| which are too big to be mainstream and too small to be
| cutting edge.
|
| If you're specifically looking to QLoRA fine-tune a 7B/13B
| model a 4090 can do that - but if you want to go bigger
| than that you'll end up using a cloud multi-gpu machine
| anyway.
| Hikikomori wrote:
| Maybe dont use a gaming card for ai then? 24 is plenty as
| most games dont use more than half in 4k.
| smcleod wrote:
| Maybe give me lots of money to give Nvidia for a card with
| more memory then?
|
| Nvidia have held back the majority of their cards from
| going over 24GB for years now. It's 2024 and my laptop has
| 96GB of RAM available to the GPU but desktop GPUs that cost
| several thousands just by themselves are stuck at 24GB.
| dannyw wrote:
| They don't get their absurd profit margins by
| cannibalising their data centre chips.
|
| This is like Intel and their refusal to support ECC
| memory; when AMD does on nearly all Ryzens.
|
| --
|
| Note: your laptop is probably using a 64-bit memory bus
| for system RAM. For GPUs, the 4090 is 384-bit. That takes
| up a lot more die area for the bus and memory controller.
| versteegen wrote:
| But GP's laptop with 96GB of unified memory would be a M2
| Max Macbook or better. The M2 Max has a 4 x 128-bit
| memory bus (410GB/s) and the M2 Ultra is 8 x 128bit
| (819GB/s), versus a 4090 at 1008GB/s. But see here for
| caveats about Mac bandwidth:
| https://news.ycombinator.com/item?id=38811290
| Hikikomori wrote:
| Why would they do that with a gaming card? If you want
| more you can rent in Aws etc.
| smcleod wrote:
| It wouldn't be a local model if it has to work on AWS.
| DSMan195276 wrote:
| Isn't there the risk that if they give the gaming cards
| enough RAM for such tasks then they'll get bought up for
| that purpose and the second-hand price will go even
| higher?
|
| I guess my point is, rather than give the cards more RAM,
| the gaming cards should just be priced cheaper.
| chaostheory wrote:
| Which laptop models share system RAM with an Nvidia RTX
| cards?
| stygiansonic wrote:
| Op probably referring to an M series MacBook since it has
| a unified memory architecture and the same memory space
| used by both cpu and gpu
| karolist wrote:
| This is unfairly downvoted. They launched 3090 on Sep 2020
| with 24GB which was more than AMD's 16GB 6900XT launched on
| that same month. Maybe before blaming Nvidia, blame AMD for
| lack of trying to compete with them? Of course they're not
| gonna release a gaming card with loads more VRAM because a)
| competition doesn't exist nor has gaming cards with more
| VRAM b) it would all be bought up for AI workloads c) games
| don't really need more as parent said.
| Sohcahtoa82 wrote:
| They don't want to cannibalize sales of the super-expensive
| GPUs dedicated to ML/AI.
|
| 5090 likely won't have more than 32 GB, if even that much.
| Tenoke wrote:
| I made a Manifold market[0] on the amount of ram a 5090
| will have, and while pretty much nobody has participated, I
| just checked and the market is amusingly at the 32GB you've
| also quoted. Just like you, I hope it will be more but I
| fear it will be even less.
|
| 0. https://manifold.markets/Tenoke/how-much-vram-will-
| nvidia-50...
| karolist wrote:
| Even 32GB would be great for a gaming card, any more and
| you're never seeing on sale as it will be bought by
| truckloads for AI, so of course they're not gonna balloon
| the VRAM. I suspect we'd still be at 16GB but they launched
| 3090 on Sep 2020 with 24GB, before all this craze really,
| and lowering is bad optics now.
| Culonavirus wrote:
| Meanwhile Apple will sell you a chip with 96GB of unified
| memory for the price of two 4090s ... and that is with
| the Apple tax ... it's ridiculous. I know the memory
| bandwith of M2 Max is like 1/2 of a 4090, but still, the
| artificial kneecapping Nvidia does is absurd.
| Zenst wrote:
| Perhaps NVIDIA or somebody could invent a RAM upgrade via
| NVLINK? Seems plausible and not every problem would want to
| add another GPU when the ability to add the extra memory
| alone is all they need.
| wongarsu wrote:
| But why would NVIDIA do that when they can just sell you an
| A100 for ten times the price of a 4090?
| margorczynski wrote:
| We need AMD to compete, but from what I know their
| software is subpar to NVIDIA's offering and most of the
| current ML stacks are built around CUDA. Still there's a
| lot of money to be made in this area now so competition
| big and small should pop up.
| idonotknowwhy wrote:
| I'd love it if AMD and Intel teamed up to make a wrapper
| layer for CUDA. Surely they'd both benefit greatly.
| versteegen wrote:
| First Intel and then AMD funded a wrapper, yes.
| Unfortunately the new version supports AMD but no longer
| Intel.
|
| https://github.com/vosen/ZLUDA
|
| That's a binary level wrapper. Of course there's also
| ROCm HIP at the source level, and many other things, such
| as SYCL
| dragonwriter wrote:
| In a hypothetical near-future world, competition?
| dacryn wrote:
| the memory is inherent to the gpu architecture. You cannot
| just add VRAM and expect no other bottlenecks to pop up.
| Yes they can reduce the VRAM to create budget models and
| save a bit here and there. But adding VRAM to a top model
| is a tricky endeavour
| Animats wrote:
| There is a mini-industry in China buying old NVidia GPUs,
| upgrading the memory, and reselling them.
| devit wrote:
| What's the best that can be achieved with this method?
| Animats wrote:
| 2X. Converting old NVidia 2080s from 11GB to 22GB. [1]
|
| [1] https://www.tomshardware.com/pc-
| components/gpus/chinese-work...
| chaostheory wrote:
| Yeah, I'm still debating whether I go with a Mac Studio with
| the RAM maxed out (approx $7500 for 192 GB) or a PC with a
| 4090. Is there a better value path with the Nvidia A series
| or something else? (I'm not sure about tibygrad)
| karolist wrote:
| I have an M1 Max with 64GB and 3090 Ti. M1 Max is ~4x
| slower at inference for the same models than 3090 (i.e.
| 7t/s vs 30t/s), which depending on the task can be very
| annoying. As a plus you get to run really large models,
| albeit very slowly. Think if that will bother you or not. I
| will not give up my 3090 Ti and am rather waiting for 5090
| to see what it can do because when programming, the Mac is
| too slow to shoot of questions. I use it mostly to better
| understand book topics now and 3090 Ti to do fast chat
| sessions.
| chaxor wrote:
| Groq may be an option?
| Oioioioiio wrote:
| Just don't max out the Mac Studio and get both...
| kristianp wrote:
| You can get a previous gen RTX A6000 with 48GB of gddr6 for
| about $5000 (1). Disclosure: I run that website. Is anyone
| using the pro cards for inference?
|
| (1) https://gpuquicklist.com/pro?models=RTX%20A6000
| idonotknowwhy wrote:
| Didn't know 24GB was considered low lol.
| mcbuilder wrote:
| For AI that's either a very fat SDXL model at it's max
| native resolution, or a quantized 34B parameter model, so
| it's on the low size. Compare that with the Blackwell AI
| "superchip" announced yesterday that appears to the
| programmer as single GPU with 30TB of RAM.
| michaelt wrote:
| Here's a vram requirements table for fine-tuning an LLM:
| https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-
| file#...
|
| No matter how much vram you have, there's something that
| doesn't fit :)
| throwing_away wrote:
| This is also how I learned that 8X7B doesn't mean "eight
| 7B models joined somehow".
| karolist wrote:
| You can add multiple, but practically speaking you're better
| off with used 3090s which you get 2 for the price of one
| 4090.
|
| I have 3090 Ti and I can run Q4 quant 33b models at 30t/s
| with 8k context. A 4090 would allow me to do the same but
| with ~45t/s, both inference speeds are more than fast enough
| for people so 3090 is the usual choice. In my tests on
| runpod, H100 with 80GB memory is around the same speed as
| 3090, so slower than a 4090.
| ynniv wrote:
| Don't forget the 24GB P40, which is a third the speed but
| also a third the cost if a 3090 (both used).
| chaxor wrote:
| "Theoretically everything should run in ai"
|
| Odd statement. I don't really know what you mean by that.
| Perhaps 'math _works_, code should too' ?
|
| I would definitely agree that it _should_ work.
|
| I'm of the belief that no one should _have to_ publish (e.g.
| to graduate, get promotions, etc) in academia, and that
| publications should only occur if they're believed to be near
| Novel prize worthy, and fully reproducible by code with
| packaging that should last and work in 10 years, from data
| archives that will exist in 10 years.
|
| But it seems I have been outvoted by the administration in
| academia.
|
| Hence, we get this "ai that doesn't run" phenomenon
| KeplerBoy wrote:
| What's the point of academia if not to publish?
|
| Do you want to publicly fund researchers only for the
| industrial research partner's benefit?
| chaxor wrote:
| It already is effectively just for industry benefit. It's
| been like that since the start. Work that is too
| expensive for industry to do (research and discovery) was
| put into the public sphere such that the role of industry
| was to take that innovation and optimize it. That's at
| least how it is intentionally constructed.
|
| My main point was that there is a lot of noise in
| scientific journals that are caused from pressures in
| academia that are requirements if publishing. If these
| are removed, then the quality of work published increases
| and quantity decreased.
|
| There are other places to post work that is derivative
| and non-novel like blogs. The field of biology has an
| immense amount of work that is mostly observational
| without strong conclusions or predictivity. A tabulation
| of observation should definitely be put out by a lab, and
| it should be much sooner with far less pressures than
| today, such as the typical dance of putting the data in
| during publication. The SRA is one example of a place to
| share data. If the typical way to work was put all data
| immediately onto a public repo, sometimes comment on it
| in ways that have been seen before on blogs and other
| classes below scientific journals, and then if something
| truly substantial comes out of it (a novel model that is
| analytical and highly predictive of cell behavior in all
| situations for example) then publish.
|
| It could alleviate the noise from the signal. LLMs is one
| case where the noise is very strong in that many papers
| are simply 'we fine tuned an llm'.
| michaelmior wrote:
| So how should knowledge be shared in academia without
| publishing? Any work worthy of a Nobel Prize (or more
| likely, a Turing Award) is built on top of significant
| amounts of other research that itself wasn't so
| groundbreaking.
|
| That said, I certainly think that researchers can do more
| to make their code and data more accessible. We have the
| tools to do so already but the incentives are often
| misaligned.
| jug wrote:
| Almost sounds like a GPU vendor who isn't seeing enough
| competition.
| ImHereToVote wrote:
| Almost like the only competition of Nvidia is the niece of
| the CEO.
| paxys wrote:
| Or, you know, the fact that the card is made for playing
| video games, not training AI models.
| blackoil wrote:
| It is targeted to gamers, that professionals are buying. They
| should be buying A6000 which has 48GB.
| GistNoesis wrote:
| I managed to get it working with a 4090. You need to adjust the
| parameter decoding_t of the sample function in
| simple_video_sample.py to a lower value (decoding_t = 5 works
| fine for me). I also needed to install imageio==2.19.3 and
| imageio-ffmpeg
| kouteiheika wrote:
| Ah, yep! You're right! It works now!
| monkeynotes wrote:
| Yeah, this is to be expected with early adoption. This stuff
| comes out of the lab and it's not perfect. The key thing to
| evaluate is the trajectory and pace of development. Much of
| what folks challenged ChatGPT with a year ago is long lost in
| the dust. Go look at stable diffusion this time last year.
| Dall-E couldn't do words and hands, it nails that 90% of the
| time in my experience today.
| remotefonts wrote:
| About words, Dall-e is nor even close to nail it 90% of the
| time. Not even 50%. Maybe they nerf it when you request a
| logo from it, but that was my experience in the last few
| days.
| whywhywhywhy wrote:
| Dunno why the defaults for this stuff isn't the base
| performance, feel I always have to tweak the batch size down on
| all the base scripts even with 24gb cos everything assumes 48gb
| londons_explore wrote:
| All the examples resemble plastic children's toys...
|
| How would it handle other objects? (People, fabrics, buildings,
| plants, mountains, mechanical parts, etc)
| programjames wrote:
| It's hard to get camera position tracking for random objects,
| so it looks like they used simulations. There's probably a lot
| more plastic children's toy models in Blender than people,
| fabrics, buildings, &c.
| issung wrote:
| > Stable Video 3D (SV3D) is a generative model based on Stable
| Video Diffusion that takes in a still image of an object as a
| conditioning frame, and generates an orbital video of that
| object.
|
| So can it actually output a 3d model? Or just images of what it
| thinks the object would look like from other angles?
| krebby wrote:
| The reference video (https://youtu.be/Zqw4-1LcfWg) says they
| use a NeRF / structure from motion and then create a mesh with
| marching cubes from the generated radiance field. This is how
| most soa text-to-object generators work now as well
| 2StepsOutOfLine wrote:
| I'm also struggling to find any examples of how to actually get
| a 3D model output. Very few references to this capability
| outside of the blog post.
| dubin wrote:
| I'd like to play around with something like this, but from my
| understanding my machine (Macbook, 2021 M1) isn't nearly powerful
| enough (right?). Are there remote/cloud environments where I can
| run models like this?
| ilaksh wrote:
| I suggest just using Stability's API. You aren't allowed to use
| it locally for commercial use anyway.
|
| You could set something up on RunPod or AWS, but I doubt it's
| worth the effort.
| dubin wrote:
| Awesome, thank you!
|
| It does look like SV3D is not a part of the API currently,
| but only a matter of time I imagine.
| thrdbndndn wrote:
| The emphasis here is Single Image, but can this model generate
| with multiple images too?
|
| We know that a single image of an object physically can't cover
| all the sides of it, so it's all guesswork in AI. This is totally
| fine for certain scenario, but in lots of other cases, it's
| trivial to have multiple images of the same object, and if that
| offers higher fidelity, it's totally worth it.
|
| I'm aware there are many algorithms or AI models that already do
| that. I'm asking about Stability's one specifically because if
| they have impressive Single Image result, surely their multi-
| image results would also be much better than state-of-the-art?
| pksebben wrote:
| If it's not there yet, I'm willing to bet it will be soon
| enough given folks hacking it apart and injecting their own
| solutions.
| leesec wrote:
| I wonder when Emad will be outed as a fed or a fraud. He's
| certainly leaving a trail of nasty behavior in the industry.
| preommr wrote:
| elaborate?
| esafak wrote:
| https://en.wikipedia.org/wiki/Emad_Mostaque#Controversy_and_.
| ..
| dheera wrote:
| They compare against Zero123-XL, but they should compare against
| MVDream instead. MVDream is quite good. If you fiddle with the
| loss you can get even better results.
| abdellah123 wrote:
| Did you write the blog post using AI ?
| throwaway743 wrote:
| Anyone know of anything that'll auto rig/add weights?
| ImHereToVote wrote:
| There are numerous tools that auto-rig humanoid figures. The
| obvious one: https://www.mixamo.com/#/
___________________________________________________________________
(page generated 2024-03-19 23:01 UTC)