[HN Gopher] Stability.ai - Introducing Stable Video 3D
       ___________________________________________________________________
        
       Stability.ai - Introducing Stable Video 3D
        
       Author : ed
       Score  : 692 points
       Date   : 2024-03-18 20:06 UTC (1 days ago)
        
 (HTM) web link (stability.ai)
 (TXT) w3m dump (stability.ai)
        
       | Filligree wrote:
       | If the animations shown are representative, then the mesh output
       | may very well be good enough to use in a 3d printer.
       | 
       | Looking forward to experimenting with this.
        
         | neom wrote:
         | I don't know much about 3D printing, would be very interested
         | in learning more about this idea if you'd be so kind as to
         | expand on it. Could I have AI spend all day auto scanning what
         | teens are doing on instagram, auto generate toys based on it,
         | auto generate advertisements for the toys, auto 3D print on
         | demand?
        
           | SirSourdough wrote:
           | Hypothetically, sure, assuming the parent comment that these
           | meshes are sufficient for modelling is correct and that you
           | can find any teens who want a non-digital toy.
           | 
           | I think a good hobbyist application for this would be
           | something like modelling figurines for games, which is
           | already a pretty popular 3D printing application. This would
           | allow people with limited modelling skills to bring
           | fantastical, unique characters to life "easily".
        
             | Filligree wrote:
             | Pretty much. We're already generating images of monsters
             | and characters for a D&D campaign; being able to print
             | those in 3D would be pretty amazing.
        
           | CobrastanJorji wrote:
           | I think their suggestion was more "I have a photo of a cool
           | horse, and now I would like a 3D model of that same horse."
           | 
           | Another way of looking at it, 3D artists often begin projects
           | by taking reference images of their subject from multiple
           | angles, then very manually turning that into a 3D model. That
           | step could potentially be greatly sped up with an algorithm
           | like this one. The artist could (hopefully) then focus on
           | cleanup, rigging, etc, and have a quality asset in
           | significantly less time.
        
             | bobba27 wrote:
             | The question is whether this actually "creates a 3d model
             | based on the picture", or if it "finds an existing model
             | that looks similar to the picture and texture map it".
        
           | maicro wrote:
           | OP is suggesting that this (AI model? I honestly am behind on
           | the terminology) could replace one of the common steps of 3D
           | printing - specifically, the step where you create a digital
           | representation of the physical object you would want to end
           | up with.
           | 
           | There are other steps to 3D printing in general, though; a
           | super rough outline:
           | 
           | - Model generation
           | 
           | - "Slicing" - processing the 3D model into instructions that
           | the 3D printer can handle, as well as adding any support
           | structures or other modifications to make it printable
           | 
           | - Printing - the actual printing process
           | 
           | - Post-processing - depending on the 3D printing technology
           | used, the desired resulting product, and the specific
           | model/slicing settings, this can be as simple as "remove from
           | bed and use" to "carefully snip off support structures, let
           | cure in a UV chamber for X minutes, sand and fill, then
           | paint"
           | 
           | As I said before, this AI model specifically would cover 3D
           | model generation. If you were to use a printing technology
           | that doesn't require support structures, and handles color
           | directly in the printing process (I think powder bed fusion
           | is the only real option here?), the entire process should be
           | fairly automatable - a human might be needed to remove the
           | part from the printer, but there might not be much post-
           | processing to do.
           | 
           | The rest of your desired workflow is a bit more nebulous - I
           | don't know how you would handle "scanning what teens are
           | doing on instagram", at least in a way that would let you
           | generate toys from the information; generating and posting
           | the advertisement shouldn't be too hard - have a standardish
           | template that you fill in with a render from the model, and
           | the description; printing on demand again is possible, though
           | you'll likely need a human to remove the part, check it for
           | quality and ship it. You could automate the latter, but that
           | would probably be more trouble than it's worth.
        
             | neom wrote:
             | Interesting, to be clear I don't think this is a good idea
             | and it's kinda my nightmare post capitalism hell. I just
             | think it's interesting this could be done now.
             | 
             | On finding out what teens want, that part is somewhat easy-
             | ish, I guess you'd need a couple of agents, one that is
             | scanning teen blogs for stories and then converting them to
             | key words, then another agent that takes the key words
             | (#taylorswift #HaileyBieberChiaPudding #latestkdrama etc)
             | into Instagram, after a while your recommend page will turn
             | into a pretty accurate representation of what teens are
             | into, then just have an agent look at those images and
             | generate difs of them. I doubt it would work for a bunch of
             | reasons, but it's an interesting thought experiment!
             | Thanks!
        
         | jsheard wrote:
         | With previous attempts at this problem the shaded examples
         | could be quite misleading because details that appeared to be
         | geometric were actually just painted over the surface as part
         | of the texture, so when you took that texture away you just had
         | a melted looking blob with nowhere near as much detail as you
         | thought. I'd reserve judgement until we see some unshaded
         | meshes.
         | 
         | What they show in the demo: https://i.imgur.com/9bZNTcd.jpeg
         | 
         | What comes out of the 3D printer:
         | https://i.imgur.com/MZrzsfh.png
        
           | SV_BubbleTime wrote:
           | It's always been this. None of these ever show the untextured
           | model.
           | 
           | When I see a demo where they are showing wireframes I know
           | it'll be good enough.
        
             | jsheard wrote:
             | Seems like a tougher nut to crack than image generation
             | was, since there isn't a bajillion high quality 3D models
             | lying around on the internet to use as training data,
             | everyone is trying to do 3D model generation as a second-
             | order system using images as the training data again. The
             | things that make 3D assets good, the tiny geometric details
             | that are hard to infer without many input views of the same
             | object, the quality of the mesh topology and UV mapping,
             | rigging and skinning for animation, reducing materials down
             | to PBR channels that can be fed into a renderer and so on
             | aren't represented in the input training data, so the model
             | is expected to make far more logical leaps than image
             | generators do.
        
               | refulgentis wrote:
               | It almost seems easier, in that you have an arbitrary #
               | of real world objects to scan and the hardware is heavily
               | commoditized (IIRC iPhones have this built in at highres
               | now?)
        
               | polygamous_bat wrote:
               | How is building a dataset easier than using a prebuilt
               | dataset?
        
               | refulgentis wrote:
               | In context, the conversation was beyond a dichotomy -
               | thankfully. Having only 2 choices leaves conversation at
               | people insisting one is better, and becomes an argument
               | about definitions where people take turns alternating
               | being "right" from the viewpoint of a neutral observer.
               | 
               | It's proposing a solution to the author's observation
               | that everyone is doing it in second order fashion and
               | missing a significant amount of necessary data.
               | 
               | The implication is that rather than doing it the hard way
               | via the already-obtained 2nd order dataset, it'll be
               | easier to get a new dataset, and getting that dataset
               | will be significantly easier that it was to get the
               | second-order dataset, as you don't need to worry about
               | aesthetic variety as much as teaching what level of
               | detail is needed in the mesh for it to be "real"
        
               | derefr wrote:
               | > since there isn't a bajillion high quality 3D models
               | lying around on the internet to use as training data
               | 
               | There aren't a bajillion high-quality 3D models of
               | _everything_ , but there are an unbounded number of high-
               | quality 3D models of _some_ things, due to the existence
               | of procedural mesh systems for things like foliage.
               | 
               | You could, at the very least, train an ML model to
               | translate images of jungles into 3D meshes of the trees
               | composing them right now.
               | 
               | Although I wonder if having a few very-well-understood
               | object types like these, to serve as a base, would be
               | enough to allow such a model to deduce more generalized
               | rules of optics, such that it could then be trained on
               | other object categories with much smaller training
               | sets...
        
               | wincy wrote:
               | I know where I could get several hundred terabytes (maybe
               | an exabyte? It's constantly growing) of ultra high
               | quality STL files designed for 3D printing. I just don't
               | have the storage or the knowledge of how to turn those
               | into a model that outputs new STL files.
               | 
               | I'd imagine it'd require a ton of tagging, although I
               | have a good idea of how I could leverage existing APIs to
               | tag it mostly automatically by generating three still
               | image thumbnails of the content, then feeding that
               | through CLIP, and verifying that all two or three agree
               | on what it's an STL of, and manually tag the ones that
               | fail that test.
        
               | supermatt wrote:
               | There's a pretty big difference between hundreds of
               | terabytes and an exabyte. Maybe you meant petabyte?
        
               | clbrmbr wrote:
               | Couldn't a deep network learn the latent 3D
               | representation just on video input?
        
             | bobba27 wrote:
             | Yes. But it is still promising. Things are getting
             | incrementally better.
             | 
             | (I dream of the day when this can be used to automatically
             | create paper-craft templates.)
        
           | euazOn wrote:
           | Therefore, what is the main usecase of this model? Generating
           | cheap 3D assets for videogames?
        
             | jsheard wrote:
             | I don't think they have a specific use-case for this model,
             | they're throwing ideas at the wall again in the hopes some
             | of them stick and eventually turn into another product. The
             | paper doesn't discuss any of the problems that would need
             | to be solved in order to easily generate game-ready assets
             | so I think it's safe to assume that it currently doesn't.
             | 
             | For games at the very least you need to consider polygon
             | budget, getting reasonably good UVs, and generating
             | materials which fit into a PBR shader pipeline, at least if
             | it's going to work with rendering pipelines as we know them
             | today (as opposed to rendering neural representations
             | directly, which is a thing people are trying to do but is
             | totally unproven in production).
        
               | pksebben wrote:
               | I'd be willing to bet you could create a diffusion model
               | to map unrefined meshes to UV-fixed and remeshed
               | surfaces. If you had a large enough library of good
               | meshes you just programmatically mess 'em up and use that
               | as the dataset.
        
           | strich wrote:
           | There exists software to reproject texture normals back on to
           | a high poly model. So this problem does have a solution for
           | anyone interested.
        
             | jsheard wrote:
             | That's assuming your generator produces a normal map, the
             | ones I've seen do not, the only texture channel they output
             | is color. That being the one channel that a model trained
             | on images is naturally equipped to produce.
        
               | pksebben wrote:
               | I may be speaking out of ignorance here, but couldn't you
               | use photogrammetry techniques to translate these to a
               | higher resolution mesh?
        
               | zo1 wrote:
               | Only if you have multiple images of the same areas so
               | that you can extract actual position. And there is no
               | guarantee that multiple pictures of the same model have
               | the same detail, much less in a manner that can be
               | triangulated with accuracy. A lot of the photogrammetry
               | algorithms discard points that don't match certain error-
               | bars.
               | 
               | So yes, there might be a wooden frame in the middle of
               | that window, but does it match the math on both angles of
               | it? Doubt it.
        
               | huytersd wrote:
               | You can generate pretty reliable texture depth maps from
               | just an image. It's going to be trash if you're trying to
               | generate the depth for the entire 3D model but I presume
               | it's going to go a good job with just texture. Then you
               | just use a displacement based on the depth map.
        
           | Oioioioiio wrote:
           | There are AI models who can create proper meshes though.
        
             | dgellow wrote:
             | Which ones?
        
               | Oioioioiio wrote:
               | This for example:
               | https://research.nvidia.com/labs/toronto-ai/flexicubes/
        
       | ionwake wrote:
       | Im sorry for dumb lazy question. But would the input require more
       | than one image? Is there a demo url to test this? I think it
       | might jsut be time to buy a 3d printer.
       | 
       | EDIT> Does "single image inputs" mean more than one image?
        
         | kylebenzle wrote:
         | Single image means one image.
        
           | dartos wrote:
           | Can confirm the word single means 1
        
           | ionwake wrote:
           | lol cmon guys don't be too hard on me it does say "inputs"
        
             | stavros wrote:
             | I do see how "single image inputs" can be conflated with
             | "multiple inputs of a single image each time", as opposed
             | to "video".
        
               | ionwake wrote:
               | TBH I always look at the worst case scenario. I was
               | worried it meant it need 3 images inputted as a single
               | image at direct steps of the process, so requiring
               | different angles. I wasn't sure, but thought best to
               | check. I feel like it would have been clearer to have
               | said something like " generates a 3d models from a single
               | image". ( not exact wording but you catch my drift ).
               | Sorry I am over analysing but all feedback is good right?
        
             | ganeshkrishnan wrote:
             | Describe in single words only the good things that come
             | into your mind about... your mother.
        
         | simonw wrote:
         | It's just a single image. It guesses the shape of the bits it
         | can't see based on vast amounts of training data.
        
           | ionwake wrote:
           | Amazing! Thank you
        
         | exodust wrote:
         | I have an even lazier question after failing to speed-read the
         | article.
         | 
         | Does this output an actual 3D mesh? Or does it only output a
         | 3d-looking rendered animation?
        
       | airstrike wrote:
       | that demo animation is so clever and satisfying
        
         | amelius wrote:
         | But it doesn't look very realistic, tbh.
        
           | dreadlordbone wrote:
           | it doesn't break Euclidian space at least
        
         | itsgrimetime wrote:
         | I can't get them to play
        
       | ddtaylor wrote:
       | Does anyone know what hardware inference can run on or memory
       | requirements?
        
         | Mathnerd314 wrote:
         | In the repo the model weights file is 9.37GB, whereas sdxl
         | turbo is 13.9GB, and I don't see any mention of huge context
         | windows, so probably it just needs a decent graphics card.
        
         | kouteiheika wrote:
         | It crashes with an out-of-memory error on my 24GB 4090, so at
         | least when it comes to their sample script the answer is "a
         | lot". Maybe it's just an inefficient implementation though.
        
           | dragonwriter wrote:
           | Pretty much every initial Stability release has been
           | inefficient and has resources drop a lot when optimized for
           | real consumer hardware community engines appeared for running
           | the model.
           | 
           | OTOH, with their shift to a less open licensing structure,
           | community tooling probably won't emerge with the same level
           | of energy.
        
       | canadiantim wrote:
       | I can't wait until we can use something like this for
       | architectural design
        
         | whywhywhywhy wrote:
         | SDXL+Controlnet and then feeding it just blocked out depth maps
         | are probably more useful for that.
        
       | kouteiheika wrote:
       | Just tried to run this using their sample script on my 4090
       | (which has 24GB of VRAM). It ran for a little over 1 minute and
       | crashed with an out-of-memory error. I tried both SV3D_u and
       | SV3D_p models.
       | 
       | [edit]Managed to generate by tweaking the script to generate less
       | frames simultaneously. 19.5GB peak VRAM usage, 1 min 25 secs to
       | generate at 225 watts.[/edit]
        
         | ganeshkrishnan wrote:
         | 4090 is in weird spot. High speed but low RAM. Theoretically
         | everything should run in ai but practically nothing runs
        
           | LoganDark wrote:
           | 4090 has more VRAM than most computers have system RAM.
           | Surprised this is considered "low RAM" in any way except for
           | relative to datacenter cards and top-spec ASi.
        
             | samplatt wrote:
             | You're comparing RAM amounts to other RAM amounts without
             | considering requirements. 24GB is more than (most) current
             | games would ever require, but is considered a
             | uncomfortably-constrictive minimum for most industrial
             | work.
             | 
             | Traditional CPU-bound physics/simulation models have
             | typically wanted all the RAM they could get; the more RAM
             | the more accurate the model. The same is true for AI
             | models.
             | 
             | I can max out 24GB just using spreadsheets and databases,
             | let alone my 3D work or anything computational.
        
           | jokethrowaway wrote:
           | What can't you run? Unquantised large text models are the
           | only thing I can't run
           | 
           | Stable diffusion, stable video, text models, audio models, I
           | never had issues with anything yet
        
             | michaelt wrote:
             | The 4090 is in a bit of a funny space for LLMs.
             | 
             | There's a lot of open weights activity around 7B/13B models
             | which the 4090 will run with ease. But you could can run
             | those OK on much cheaper cards like the 4070Ti (which is of
             | course why they're popular).
             | 
             | And there's a lot of open weights activity around 70B and
             | 8x7B models which are state-of-the-art - but too big to fit
             | on a 4090. There's not much activity around 30B models,
             | which are too big to be mainstream and too small to be
             | cutting edge.
             | 
             | If you're specifically looking to QLoRA fine-tune a 7B/13B
             | model a 4090 can do that - but if you want to go bigger
             | than that you'll end up using a cloud multi-gpu machine
             | anyway.
        
           | Hikikomori wrote:
           | Maybe dont use a gaming card for ai then? 24 is plenty as
           | most games dont use more than half in 4k.
        
             | smcleod wrote:
             | Maybe give me lots of money to give Nvidia for a card with
             | more memory then?
             | 
             | Nvidia have held back the majority of their cards from
             | going over 24GB for years now. It's 2024 and my laptop has
             | 96GB of RAM available to the GPU but desktop GPUs that cost
             | several thousands just by themselves are stuck at 24GB.
        
               | dannyw wrote:
               | They don't get their absurd profit margins by
               | cannibalising their data centre chips.
               | 
               | This is like Intel and their refusal to support ECC
               | memory; when AMD does on nearly all Ryzens.
               | 
               | --
               | 
               | Note: your laptop is probably using a 64-bit memory bus
               | for system RAM. For GPUs, the 4090 is 384-bit. That takes
               | up a lot more die area for the bus and memory controller.
        
               | versteegen wrote:
               | But GP's laptop with 96GB of unified memory would be a M2
               | Max Macbook or better. The M2 Max has a 4 x 128-bit
               | memory bus (410GB/s) and the M2 Ultra is 8 x 128bit
               | (819GB/s), versus a 4090 at 1008GB/s. But see here for
               | caveats about Mac bandwidth:
               | https://news.ycombinator.com/item?id=38811290
        
               | Hikikomori wrote:
               | Why would they do that with a gaming card? If you want
               | more you can rent in Aws etc.
        
               | smcleod wrote:
               | It wouldn't be a local model if it has to work on AWS.
        
               | DSMan195276 wrote:
               | Isn't there the risk that if they give the gaming cards
               | enough RAM for such tasks then they'll get bought up for
               | that purpose and the second-hand price will go even
               | higher?
               | 
               | I guess my point is, rather than give the cards more RAM,
               | the gaming cards should just be priced cheaper.
        
               | chaostheory wrote:
               | Which laptop models share system RAM with an Nvidia RTX
               | cards?
        
               | stygiansonic wrote:
               | Op probably referring to an M series MacBook since it has
               | a unified memory architecture and the same memory space
               | used by both cpu and gpu
        
             | karolist wrote:
             | This is unfairly downvoted. They launched 3090 on Sep 2020
             | with 24GB which was more than AMD's 16GB 6900XT launched on
             | that same month. Maybe before blaming Nvidia, blame AMD for
             | lack of trying to compete with them? Of course they're not
             | gonna release a gaming card with loads more VRAM because a)
             | competition doesn't exist nor has gaming cards with more
             | VRAM b) it would all be bought up for AI workloads c) games
             | don't really need more as parent said.
        
           | Sohcahtoa82 wrote:
           | They don't want to cannibalize sales of the super-expensive
           | GPUs dedicated to ML/AI.
           | 
           | 5090 likely won't have more than 32 GB, if even that much.
        
             | Tenoke wrote:
             | I made a Manifold market[0] on the amount of ram a 5090
             | will have, and while pretty much nobody has participated, I
             | just checked and the market is amusingly at the 32GB you've
             | also quoted. Just like you, I hope it will be more but I
             | fear it will be even less.
             | 
             | 0. https://manifold.markets/Tenoke/how-much-vram-will-
             | nvidia-50...
        
             | karolist wrote:
             | Even 32GB would be great for a gaming card, any more and
             | you're never seeing on sale as it will be bought by
             | truckloads for AI, so of course they're not gonna balloon
             | the VRAM. I suspect we'd still be at 16GB but they launched
             | 3090 on Sep 2020 with 24GB, before all this craze really,
             | and lowering is bad optics now.
        
               | Culonavirus wrote:
               | Meanwhile Apple will sell you a chip with 96GB of unified
               | memory for the price of two 4090s ... and that is with
               | the Apple tax ... it's ridiculous. I know the memory
               | bandwith of M2 Max is like 1/2 of a 4090, but still, the
               | artificial kneecapping Nvidia does is absurd.
        
           | Zenst wrote:
           | Perhaps NVIDIA or somebody could invent a RAM upgrade via
           | NVLINK? Seems plausible and not every problem would want to
           | add another GPU when the ability to add the extra memory
           | alone is all they need.
        
             | wongarsu wrote:
             | But why would NVIDIA do that when they can just sell you an
             | A100 for ten times the price of a 4090?
        
               | margorczynski wrote:
               | We need AMD to compete, but from what I know their
               | software is subpar to NVIDIA's offering and most of the
               | current ML stacks are built around CUDA. Still there's a
               | lot of money to be made in this area now so competition
               | big and small should pop up.
        
               | idonotknowwhy wrote:
               | I'd love it if AMD and Intel teamed up to make a wrapper
               | layer for CUDA. Surely they'd both benefit greatly.
        
               | versteegen wrote:
               | First Intel and then AMD funded a wrapper, yes.
               | Unfortunately the new version supports AMD but no longer
               | Intel.
               | 
               | https://github.com/vosen/ZLUDA
               | 
               | That's a binary level wrapper. Of course there's also
               | ROCm HIP at the source level, and many other things, such
               | as SYCL
        
               | dragonwriter wrote:
               | In a hypothetical near-future world, competition?
        
             | dacryn wrote:
             | the memory is inherent to the gpu architecture. You cannot
             | just add VRAM and expect no other bottlenecks to pop up.
             | Yes they can reduce the VRAM to create budget models and
             | save a bit here and there. But adding VRAM to a top model
             | is a tricky endeavour
        
             | Animats wrote:
             | There is a mini-industry in China buying old NVidia GPUs,
             | upgrading the memory, and reselling them.
        
               | devit wrote:
               | What's the best that can be achieved with this method?
        
               | Animats wrote:
               | 2X. Converting old NVidia 2080s from 11GB to 22GB. [1]
               | 
               | [1] https://www.tomshardware.com/pc-
               | components/gpus/chinese-work...
        
           | chaostheory wrote:
           | Yeah, I'm still debating whether I go with a Mac Studio with
           | the RAM maxed out (approx $7500 for 192 GB) or a PC with a
           | 4090. Is there a better value path with the Nvidia A series
           | or something else? (I'm not sure about tibygrad)
        
             | karolist wrote:
             | I have an M1 Max with 64GB and 3090 Ti. M1 Max is ~4x
             | slower at inference for the same models than 3090 (i.e.
             | 7t/s vs 30t/s), which depending on the task can be very
             | annoying. As a plus you get to run really large models,
             | albeit very slowly. Think if that will bother you or not. I
             | will not give up my 3090 Ti and am rather waiting for 5090
             | to see what it can do because when programming, the Mac is
             | too slow to shoot of questions. I use it mostly to better
             | understand book topics now and 3090 Ti to do fast chat
             | sessions.
        
             | chaxor wrote:
             | Groq may be an option?
        
             | Oioioioiio wrote:
             | Just don't max out the Mac Studio and get both...
        
             | kristianp wrote:
             | You can get a previous gen RTX A6000 with 48GB of gddr6 for
             | about $5000 (1). Disclosure: I run that website. Is anyone
             | using the pro cards for inference?
             | 
             | (1) https://gpuquicklist.com/pro?models=RTX%20A6000
        
           | idonotknowwhy wrote:
           | Didn't know 24GB was considered low lol.
        
             | mcbuilder wrote:
             | For AI that's either a very fat SDXL model at it's max
             | native resolution, or a quantized 34B parameter model, so
             | it's on the low size. Compare that with the Blackwell AI
             | "superchip" announced yesterday that appears to the
             | programmer as single GPU with 30TB of RAM.
        
             | michaelt wrote:
             | Here's a vram requirements table for fine-tuning an LLM:
             | https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-
             | file#...
             | 
             | No matter how much vram you have, there's something that
             | doesn't fit :)
        
               | throwing_away wrote:
               | This is also how I learned that 8X7B doesn't mean "eight
               | 7B models joined somehow".
        
           | karolist wrote:
           | You can add multiple, but practically speaking you're better
           | off with used 3090s which you get 2 for the price of one
           | 4090.
           | 
           | I have 3090 Ti and I can run Q4 quant 33b models at 30t/s
           | with 8k context. A 4090 would allow me to do the same but
           | with ~45t/s, both inference speeds are more than fast enough
           | for people so 3090 is the usual choice. In my tests on
           | runpod, H100 with 80GB memory is around the same speed as
           | 3090, so slower than a 4090.
        
             | ynniv wrote:
             | Don't forget the 24GB P40, which is a third the speed but
             | also a third the cost if a 3090 (both used).
        
           | chaxor wrote:
           | "Theoretically everything should run in ai"
           | 
           | Odd statement. I don't really know what you mean by that.
           | Perhaps 'math _works_, code should too' ?
           | 
           | I would definitely agree that it _should_ work.
           | 
           | I'm of the belief that no one should _have to_ publish (e.g.
           | to graduate, get promotions, etc) in academia, and that
           | publications should only occur if they're believed to be near
           | Novel prize worthy, and fully reproducible by code with
           | packaging that should last and work in 10 years, from data
           | archives that will exist in 10 years.
           | 
           | But it seems I have been outvoted by the administration in
           | academia.
           | 
           | Hence, we get this "ai that doesn't run" phenomenon
        
             | KeplerBoy wrote:
             | What's the point of academia if not to publish?
             | 
             | Do you want to publicly fund researchers only for the
             | industrial research partner's benefit?
        
               | chaxor wrote:
               | It already is effectively just for industry benefit. It's
               | been like that since the start. Work that is too
               | expensive for industry to do (research and discovery) was
               | put into the public sphere such that the role of industry
               | was to take that innovation and optimize it. That's at
               | least how it is intentionally constructed.
               | 
               | My main point was that there is a lot of noise in
               | scientific journals that are caused from pressures in
               | academia that are requirements if publishing. If these
               | are removed, then the quality of work published increases
               | and quantity decreased.
               | 
               | There are other places to post work that is derivative
               | and non-novel like blogs. The field of biology has an
               | immense amount of work that is mostly observational
               | without strong conclusions or predictivity. A tabulation
               | of observation should definitely be put out by a lab, and
               | it should be much sooner with far less pressures than
               | today, such as the typical dance of putting the data in
               | during publication. The SRA is one example of a place to
               | share data. If the typical way to work was put all data
               | immediately onto a public repo, sometimes comment on it
               | in ways that have been seen before on blogs and other
               | classes below scientific journals, and then if something
               | truly substantial comes out of it (a novel model that is
               | analytical and highly predictive of cell behavior in all
               | situations for example) then publish.
               | 
               | It could alleviate the noise from the signal. LLMs is one
               | case where the noise is very strong in that many papers
               | are simply 'we fine tuned an llm'.
        
             | michaelmior wrote:
             | So how should knowledge be shared in academia without
             | publishing? Any work worthy of a Nobel Prize (or more
             | likely, a Turing Award) is built on top of significant
             | amounts of other research that itself wasn't so
             | groundbreaking.
             | 
             | That said, I certainly think that researchers can do more
             | to make their code and data more accessible. We have the
             | tools to do so already but the incentives are often
             | misaligned.
        
           | jug wrote:
           | Almost sounds like a GPU vendor who isn't seeing enough
           | competition.
        
             | ImHereToVote wrote:
             | Almost like the only competition of Nvidia is the niece of
             | the CEO.
        
             | paxys wrote:
             | Or, you know, the fact that the card is made for playing
             | video games, not training AI models.
        
           | blackoil wrote:
           | It is targeted to gamers, that professionals are buying. They
           | should be buying A6000 which has 48GB.
        
         | GistNoesis wrote:
         | I managed to get it working with a 4090. You need to adjust the
         | parameter decoding_t of the sample function in
         | simple_video_sample.py to a lower value (decoding_t = 5 works
         | fine for me). I also needed to install imageio==2.19.3 and
         | imageio-ffmpeg
        
           | kouteiheika wrote:
           | Ah, yep! You're right! It works now!
        
         | monkeynotes wrote:
         | Yeah, this is to be expected with early adoption. This stuff
         | comes out of the lab and it's not perfect. The key thing to
         | evaluate is the trajectory and pace of development. Much of
         | what folks challenged ChatGPT with a year ago is long lost in
         | the dust. Go look at stable diffusion this time last year.
         | Dall-E couldn't do words and hands, it nails that 90% of the
         | time in my experience today.
        
           | remotefonts wrote:
           | About words, Dall-e is nor even close to nail it 90% of the
           | time. Not even 50%. Maybe they nerf it when you request a
           | logo from it, but that was my experience in the last few
           | days.
        
         | whywhywhywhy wrote:
         | Dunno why the defaults for this stuff isn't the base
         | performance, feel I always have to tweak the batch size down on
         | all the base scripts even with 24gb cos everything assumes 48gb
        
       | londons_explore wrote:
       | All the examples resemble plastic children's toys...
       | 
       | How would it handle other objects? (People, fabrics, buildings,
       | plants, mountains, mechanical parts, etc)
        
         | programjames wrote:
         | It's hard to get camera position tracking for random objects,
         | so it looks like they used simulations. There's probably a lot
         | more plastic children's toy models in Blender than people,
         | fabrics, buildings, &c.
        
       | issung wrote:
       | > Stable Video 3D (SV3D) is a generative model based on Stable
       | Video Diffusion that takes in a still image of an object as a
       | conditioning frame, and generates an orbital video of that
       | object.
       | 
       | So can it actually output a 3d model? Or just images of what it
       | thinks the object would look like from other angles?
        
         | krebby wrote:
         | The reference video (https://youtu.be/Zqw4-1LcfWg) says they
         | use a NeRF / structure from motion and then create a mesh with
         | marching cubes from the generated radiance field. This is how
         | most soa text-to-object generators work now as well
        
         | 2StepsOutOfLine wrote:
         | I'm also struggling to find any examples of how to actually get
         | a 3D model output. Very few references to this capability
         | outside of the blog post.
        
       | dubin wrote:
       | I'd like to play around with something like this, but from my
       | understanding my machine (Macbook, 2021 M1) isn't nearly powerful
       | enough (right?). Are there remote/cloud environments where I can
       | run models like this?
        
         | ilaksh wrote:
         | I suggest just using Stability's API. You aren't allowed to use
         | it locally for commercial use anyway.
         | 
         | You could set something up on RunPod or AWS, but I doubt it's
         | worth the effort.
        
           | dubin wrote:
           | Awesome, thank you!
           | 
           | It does look like SV3D is not a part of the API currently,
           | but only a matter of time I imagine.
        
       | thrdbndndn wrote:
       | The emphasis here is Single Image, but can this model generate
       | with multiple images too?
       | 
       | We know that a single image of an object physically can't cover
       | all the sides of it, so it's all guesswork in AI. This is totally
       | fine for certain scenario, but in lots of other cases, it's
       | trivial to have multiple images of the same object, and if that
       | offers higher fidelity, it's totally worth it.
       | 
       | I'm aware there are many algorithms or AI models that already do
       | that. I'm asking about Stability's one specifically because if
       | they have impressive Single Image result, surely their multi-
       | image results would also be much better than state-of-the-art?
        
         | pksebben wrote:
         | If it's not there yet, I'm willing to bet it will be soon
         | enough given folks hacking it apart and injecting their own
         | solutions.
        
       | leesec wrote:
       | I wonder when Emad will be outed as a fed or a fraud. He's
       | certainly leaving a trail of nasty behavior in the industry.
        
         | preommr wrote:
         | elaborate?
        
           | esafak wrote:
           | https://en.wikipedia.org/wiki/Emad_Mostaque#Controversy_and_.
           | ..
        
       | dheera wrote:
       | They compare against Zero123-XL, but they should compare against
       | MVDream instead. MVDream is quite good. If you fiddle with the
       | loss you can get even better results.
        
       | abdellah123 wrote:
       | Did you write the blog post using AI ?
        
       | throwaway743 wrote:
       | Anyone know of anything that'll auto rig/add weights?
        
         | ImHereToVote wrote:
         | There are numerous tools that auto-rig humanoid figures. The
         | obvious one: https://www.mixamo.com/#/
        
       ___________________________________________________________________
       (page generated 2024-03-19 23:01 UTC)