[HN Gopher] Stable Video Diffusion
___________________________________________________________________
Stable Video Diffusion
Author : roborovskis
Score : 675 points
Date : 2023-11-21 19:01 UTC (3 hours ago)
(HTM) web link (stability.ai)
(TXT) w3m dump (stability.ai)
| minimaxir wrote:
| Model weights (two variations, each 10GB) are available without
| waitlist/approval: https://huggingface.co/stabilityai/stable-
| video-diffusion-im...
|
| The LICENSE is a special non-commercial one:
| https://huggingface.co/stabilityai/stable-video-diffusion-im...
|
| It's unclear how exactly to run it easily: diffusers has video
| generation support now but need to see if it plugs in seamlessly.
| ronsor wrote:
| Regular reminder that it is very likely that model weights
| can't be copyrighted (and thus can't be licensed).
| chankstein38 wrote:
| It looks like the huggingface page links their github that
| seems to have python scripts to run these:
| https://github.com/Stability-AI/generative-models
| minimaxir wrote:
| Those scripts aren't as easy to use or iterate upon since
| they are CLI apps instead of a REPL like a Colab/Jupyter
| Notebook (although these models probably will not run in a
| normal Colab without shenanigans).
|
| They can be hacked into a Jupyter Notebook but it's really
| not fun.
| valine wrote:
| The rate of progress in ML this past year has been breath taking.
|
| I can't wait to see what people do with this once controlnet is
| properly adapted to video. Generating videos from scratch is
| cool, but the real utility of this will be the temporal
| consistency. Getting stable video out of stable diffusion
| typically involves lots of manual post processing to remove
| flicker.
| Der_Einzige wrote:
| Controlnet is adapted to video today, the issues are that it's
| very slow. Haven't you seen the insane quality of videos on
| civitai?
| valine wrote:
| I have seen them, the workflows to create those videos are
| extremely labor intensive. Control net lets you maintain
| poses between frames, it doesn't solve the temporal
| consistency of small details.
| mattnewton wrote:
| People use animatediff's motion module (or other models
| that have cross frame attention layers). Consistency is
| close to being solved.
| valine wrote:
| Hopefully this new model will be a step beyond what you
| can do with animatediff
| dragonwriter wrote:
| Temporal consistency is improving, but "close to being
| solved" is very optimistic.
| mattnewton wrote:
| No I think we're actually close. My source is I'm working
| on this problem and the incredible progress of our tiny 3
| person team at drip.art (http://api.drip.art) - we can
| generate a lot of frames that are consistent, and with
| interpolation between them, smoothly restyle even long
| videos. Cross-frame attention works for most cases, it
| just needs to be scaled up.
|
| And that's just for diffusion focused approaches like
| ours. There are probably other techniques from the token
| flow or nerf family of approaches close to breakout
| levels of quality, tons of talented researchers working
| on that too.
| capableweb wrote:
| > Haven't you seen the insane quality of videos on civitai?
|
| I have not, so I went to https://civitai.com/ which I guess
| is what you're talking about? But I cannot find a single
| video there, just images and models.
| Kevin09210 wrote:
| https://www.youtube.com/shorts/ZN-NbdFwfNQ
|
| https://www.youtube.com/watch?v=3WWy98ylLT4
|
| The inconsistencies are what's most interesting in these
| videos in fact
| alberth wrote:
| What was the big "unlock" that allowed so much progress this
| past year?
|
| I ask as a noob in this area.
| mlboss wrote:
| Stable diffusion open source release and llama release
| alberth wrote:
| But what technically allowed for so much progress?
|
| There's been open source AI/ML for 20+ years.
|
| Nothing comes close to the massive milestones over the past
| year.
| Chabsff wrote:
| Public availability of large transformer-based foundation
| models trained at great expense, which is what OP is
| referring to, is definitely unprecedented.
| kmeisthax wrote:
| Attention, transformers, diffusion. Prior image synthesis
| techniques - i.e. GANs - had problems that made it
| difficult to scale them up, whereas the current
| techniques seem to have no limit other than the amount of
| RAM in your GPU.
| jasonjmcghee wrote:
| People figuring out how to train and scale newer
| architectures (like transfomers) effectively, to be
| wildly larger than ever before.
|
| Take AlexNet - the major "oh shit" moment in image
| classification.
|
| It had an absolutely mind-blowing number of parameters at
| a whopping 62 million.
|
| Holy shit, what a large network, right?
|
| Absolutely unprecedented.
|
| Now, for language models, anything under 1B parameters is
| a toy that barely works.
|
| Stable diffusion has around 1B or so - or the early
| models did, I'm sure they're larger now.
|
| A whole lot of smart people had to do a bunch of cool
| stuff to be able to keep networks working at all at that
| size.
|
| Many, many times over the years, people have tried to
| make larger networks, which fail to converge (read: learn
| to do something useful) in all sorts of crazy ways.
|
| At this size, it's also expensive to train these things
| from scratch, and takes a shit-ton of data, so
| research/discovery of new things is slow and difficult.
|
| But, we kind of climbed over a cliff, and now things are
| absolutely taking off in all the fields around this kind
| of stuff.
|
| Take a look at XTTSv2 for example, a leading open source
| text-to-speech model. It uses multiple models in its
| architecture, but one of them is GPT.
|
| There are a few key models that are still being used in a
| bunch of different modalities like CLIP, U-Net, GPT, etc.
| or similar variants. When they were released / made
| available, people jumped on them and started
| experimenting.
| dragonwriter wrote:
| > Stable diffusion has around 1B or so - or the early
| models did, I'm sure they're larger now.
|
| SDXL is 6.6 billion.
| 4death4 wrote:
| I think these are the main drivers behind the progress:
|
| - Unsupervised learning techniques, e.g. transformers and
| diffusion models. You need unsupervised techniques in order
| to utilize enough data. There have been other unsupervised
| techniques in the past, e.g. GANs, but they don't work as
| well.
|
| - Massive amounts of training data.
|
| - The belief that training these models will produce
| something valuable. It costs between hundreds of thousands to
| millions of dollars to train these models. The people doing
| the training need to believe they're going to get something
| interesting out at the end. More and more people and teams
| are starting to see training a large model as something worth
| pursuing.
|
| - Better GPUs, which enables training larger models.
|
| - Honestly the fall of crypto probably also contributed,
| because miners were eating a lot of GPU time.
| mkaic wrote:
| I don't think transformers or diffusion models are
| inherently "unsupervised", especially not the way they're
| used in Stable Diffusion and related models (which are very
| much trained in a supervised fashion). I agree with the
| rest of your points though.
| ebalit wrote:
| Generative methods have usually been considered
| unsupervised.
|
| You're right that conditional generation start to blur
| the lines though.
| Cyphase wrote:
| One factor is that Stable Diffusion and ChatGPT were released
| within 3 months of each other - August 22, 2022 and November
| 3, 2022, respectively. That brought a lot of attention and
| excitement to the field. More excitement, more people, more
| work being done, more progress.
|
| Of course those two releases didn't fall out of the sky.
| hanniabu wrote:
| > but the real utility of this will be the temporal consistency
|
| The main utility will me misinformation
| ericpauley wrote:
| I'm still puzzled as to how these "non-commercial" model licenses
| are supposed to be enforceable. Software licenses govern the
| redistribution of the _software_ , not products produced with it.
| An image isn't GPL'd because it was produced with GIMP.
| cubefox wrote:
| Nobody claimed otherwise?
| not2b wrote:
| There are sites that make Stable Diffusion-derived models
| available, along with GPU resources, and they sell the
| service of generating images from the models. The company
| isn't permitting that use, and it seems that they could find
| violators and shut them down.
| littlethoughts wrote:
| Fantasy.ai was subject to controversy for attempting to
| license models.
| Der_Einzige wrote:
| They're not enforceable.
| yorwba wrote:
| The license is a contract that allows you to use the software
| provided you fulfill some conditions. If you do not fulfill the
| conditions, you have no _right_ to a _copy_ of the software and
| can be sued. This enforcement mechanism is the same whether the
| conditions are that you include source code with copies you
| redistribute, or that you may only use it for evil, or that you
| must pay a monthly fee. Of course this enforcement mechanism
| may turn out to be ineffective if it 's hard to discover that
| you're violating the conditions.
| comex wrote:
| It also somewhat depends on open legal questions like whether
| models are copyrightable and, if so, whether model outputs
| are derivative works of the model. Suppose that models are
| not copyrightable, due to their not being the product of
| human creativity (this is debatable). Then the creator can
| still require people to agree to contractual terms before
| downloading the model from them, presumably including the
| usage limitations as well as an agreement not to redistribute
| the model to anyone else who does not also agree. Agreement
| can happen explicitly by pressing a button, or potentially
| implicitly just by downloading the model from them, if the
| terms are clearly disclosed beforehand. But if someone
| decides on their own (not induced by you in any way) to
| violate the contract by uploading it somewhere else, and you
| passively download it from there, then you may be in the
| clear.
| ronsor wrote:
| > Then the creator can still require people to agree to
| contractual terms before downloading the model from them,
| presumably including the usage limitations as well as an
| agreement not to redistribute the model to anyone else who
| does not also agree.
|
| I don't think it's possible to invent copyright-like
| rights.
| dist-epoch wrote:
| Visual Studio Community (and many other products) only allows
| "non-commercial" usage. Sounds like it limits what you can do
| with what you produce with it.
|
| At the end of the day, a license is a legal contract. If you
| agree that an image which you produce with some software will
| be GPL'ed, it's enforceable.
|
| As an example, see the Creative Commons license, ShareAlike
| clause:
|
| > If you remix, transform, or build upon the material, you must
| distribute your contributions under the same license as the
| original.
| blibble wrote:
| > At the end of the day, a license is a legal contract. If
| you agree that an image which you produce with some software
| will be GPL'ed, it's enforceable.
|
| you can put whatever you want in a contract, doesn't mean
| it's enforceable
| antonyt wrote:
| Do you have link for the VS Community terms you're
| describing? What I've found is directly contradictory: "Any
| individual developer can use Visual Studio Community to
| create their own free or paid apps." From
| https://visualstudio.microsoft.com/vs/community/
| dist-epoch wrote:
| Enterprise organizations are not allowed to use VS
| Community for commercial purposes:
|
| > _In enterprise organizations (meaning those with >250 PCs
| or >$1 Million US Dollars in annual revenue), no use is
| permitted beyond the open source, academic research, and
| classroom learning environment scenarios described above._
| kmeisthax wrote:
| So, there's a few different things interacting here that are a
| little confusing.
|
| First off, you have copyright law, which grants monopolies on
| the act of copying to the creators of the original. In order to
| legally make use of that work you need to either have
| permission to do so (a license), or you need to own a copy of
| the work that was made by someone with permission to make and
| sell copies (a sale). For the purposes of computer software,
| you will almost always get rights to the software through a
| license and _not_ a sale. In fact, there is an argument that
| usage of computer software requires a license and that a sale
| wouldn 't be enough because you wouldn't have permission to
| load it into RAM[0].
|
| Licenses are, at least under US law, contracts. These are
| Turing-complete priestly rites written in a special register of
| English that legally bind people to do or not do certain
| things. A license can grant rights, or, confusingly, take them
| away. For example, you could write a license that takes away
| your fair use rights[1], and courts will actually respect that.
| So you can also have a license that says you're only allowed to
| use software for specific listed purposes but not others.
|
| In copyright you also have the notion of a derivative work.
| This was invented whole-cloth by the US Supreme Court, who
| needed a reason to prosecute someone for making a SSSniperWolf-
| tier abridgement[2] of someone else's George Washington
| biography. Normal copyright infringement is evidenced by
| substantial similarity and access: i.e. you saw the original,
| then you made something that's nearly identical, ergo
| infringement. The law regarding derivative works goes a step
| further and counts hypothetical works that an author _might_
| make - like sequels, translations, remakes, abridgements, and
| so on - as requiring permission in order to make. Without that
| permission, you don 't own anything and your work has no right
| to exist.
|
| The GPL is the anticopyright "judo move", invented by a really
| ornery computer programmer that was angry about not being able
| to fix their printer drivers. It disclaims _almost_ the entire
| copyright monopoly, but it leaves behind one license
| restriction, called a "copyleft": any derivative work must be
| licensed under the GPL. So if you modify the software and
| distribute it, you have to distribute your changes under GPL
| terms, thus locking the software in the commons.
|
| Images made with software are not derivative works of the
| software, nor do they contain a substantially similar copy of
| the software in them. Ergo, the GPL copyleft does not trip. In
| fact, _even if it did trip_ , your image is still not a
| derivative work of the software, so you don't lose ownership
| over the image because you didn't get permission. This also
| applies to model licenses on AI software, insamuch as the AI
| companies don't own their training data[3].
|
| However, there's still something that licenses can take away:
| your right to use the software. If you use the model for
| "commercial" purposes - whatever those would be - you'd be in
| breach of the license. What happens next is also determined by
| the license. It could be written to take away your
| noncommercial rights if you breach the license, or it could
| preserve them. In either case, however, the primary enforcement
| mechanism would be a court of law, and courts usually award
| money damages. If particularly justified, they _could_ demand
| you destroy all copies of the software.
|
| If it went to SCOTUS (unlikely), they might even decide that
| images made by software are derivative works of the software
| after all, just to spite you. The Betamax case said that
| advertising a copying device with potentially infringing
| scenarios was fine as long as that device could be used in a
| non-infringing manner, but then the Grokster case said it was
| "inducement" and overturned it. Static, unchanging rules are
| ultimately a polite fiction, and the law can change behind your
| back if the people in power want or need it to. This is why you
| don't talk about the law in terms of something being legal or
| illegal, you talk about it in terms of risk.
|
| [0] Yes, this is a real argument that courts have actually
| made. Or at least the Ninth Circuit.
|
| The actual facts of the case are even more insane - basically a
| company trying to sue former employees for fixing it's
| customers computers. Imagine if Apple sued Louis Rossman for
| pirating macOS every time he turned on a customer laptop. The
| only reason why they _can 't_ is because Congress actually
| created a special exemption for computer repair and made it
| part of the DMCA.
|
| [1] For example, one of the things you agree to when you buy
| Oracle database software is to give up your right to benchmark
| the software. I'm serious! The tech industry is evil and needs
| to burn down to the ground!
|
| [2] They took 300 pages worth of material from 12 books and
| copied it into a separate, 2 volume work.
|
| [3] Whether or not copyright on the training data images flows
| through to make generated images a derivative work is a
| separate legal question in active litigation.
| dragonwriter wrote:
| > Licenses are, at least under US law, contracts
|
| Not necessarily; gratuitous licenses are not contracts.
| Licenses which happen to also meet the requirements for
| contracts (or be embedded in agreements that do) are
| contracts or components of contracts, but that's not all
| licenses.
| SXX wrote:
| It doesn't have to be enforceable. This licensing model works
| exactly the same as Microsoft Windows licensing or WinRAR
| licensing. Lots and lots of people have pirated Windows or just
| buy some cheap keys off Ebay, but no one of them in their sane
| mind would use anything like that at their company.
|
| The same way you can easily violate any "non-commercial"
| clauses of models like this one as private person or as some
| tiny startup, but company that decide to use them for their
| business will more likely just go and pay.
|
| So it's possible to ignore license, but legal and financial
| risks are not worth it for businesses.
| helpmenotok wrote:
| Can this be used for porn?
| theodric wrote:
| If it can't, someone will massage it until it can. Porn, and
| probably also stock video to sell to YouTubers.
| citrusui wrote:
| Very unusual comment.
|
| I do not think so as the chance of constructing a fleshy
| eldritch horror is quite high.
| tstrimple wrote:
| > I do not think so as the chance of constructing a fleshy
| eldritch horror is quite high.
|
| There is a market for everything!
| johndevor wrote:
| How is that not the first question to ask? Porn has proven to
| be a fantastic litmus test of fast market penetration when it
| comes to new technologies.
| citrusui wrote:
| This is true. I was hoping my educated guess of the outcome
| would minimize the possibility of anyone attempting this.
| And yet, here we are - the only losing strategy in the
| technology sector is to not try at all.
| throwaway743 wrote:
| No pun intended?
| xanderlewis wrote:
| Market what?
| crtasm wrote:
| That didn't stop people using PornPen for images and it
| wouldn't stop them using something else for video.
| ben_w wrote:
| A surprisingly large number of people are into fleshy
| eldritch horrors.
| 1024core wrote:
| The question reminded me of this classic:
| https://www.youtube.com/watch?v=YRgNOyCnbqg
| Racing0461 wrote:
| Nope, all commercial models are severly gated.
| hbn wrote:
| Depends on whether trains, cars, and/or black cowboys tickle
| your fancy.
| boppo1 wrote:
| Whatever this is:
|
| https://i.4cdn.org/g/1700595378919869.png
| artursapek wrote:
| Porn will be one of the main use cases for this technology.
| Porn sites pioneered video streaming technologies back in the
| day, and drove a lot of the innovation there.
| SXX wrote:
| It's already posted to Unstable Diffusion discord so soon we'll
| know.
|
| After all fine-tuning wouldn't take that long.
| christkv wrote:
| Looks like I'm still good for my bet with some friends that
| before 2028 a team of 5-10 people will create a blockbuster style
| movie that today costs 100+ million USD on a shoestring budget
| and we won't be able to tell.
| CamperBob2 wrote:
| It'll happen, but I think you're early. 2038 for sure, unless
| something drastic happens to stop it (or is forced to happen.)
| accrual wrote:
| The first full-length AI generated movie will be an important
| milestone for sure, and will probably become a "required watch"
| for future AI history classes. I wonder what the Rotten
| Tomatoes page will look like.
| jjkaczor wrote:
| As per the reviews - it will be hard to say, as both positive
| and negative takes will be uploaded by ChatGPT bots (or it's
| myriad of descendents).
| qiine wrote:
| "I wonder what the Rotten Tomatoes page will look like"
|
| Surely it will be written using machine vision and llms !
| throwaway743 wrote:
| Definitely a big first for benchmarks. After that hyper
| personalized content/media generated on-demand
| ben_w wrote:
| I wouldn't bet either way.
|
| Back in the mid 90s to 2010 or so, graphical improvements were
| hailed as photorealistic only to be improved upon with each
| subsequent blockbuster game.
|
| I think we're in a similar phase with AI[0]: every new release
| in $category is better, gets hailed as super fantastic world
| changing, is improved upon in the subsequent Two Minute Papers
| video on $category, and the cycle repeats.
|
| [0] all of them: LLMs, image generators, cars, robots, voice
| recognition and synthesis, scientific research, ...
| Keyframe wrote:
| Your comment reminded me of this: https://www.reddit.com/r/ga
| ming/comments/ktyr1/unreal_yes_th...
|
| Many more examples, of course.
| ben_w wrote:
| Yup, that castle flyby, those reflections. I remember being
| mesmerised by the sequence as a teenager.
|
| Big quality improvement over Marathon 2 on a mid-90s Mac,
| which itself was a substantial boost over the Commodore 64
| and NES I'd been playing on before that.
| marcusverus wrote:
| I'm pumped for this future, but I'm not sure that I buy your
| optimistic timeline. If the history of AI has taught us
| anything, it is that the last 1% of of progress is the hardest
| half. And given the unforgiving nature of the uncanny valley,
| the video produced by such a system will be worthless until it
| is damn-near perfect. That's a tall order!
| deckard1 wrote:
| I'm imagining more of an AI that takes a standard movie
| screenplay and a sidecar file, similar to a CSS file for the
| web and generates the movie. This sidecar file would contain
| the "director" of the movie, with camera angles, shot length
| and speed, color grading, etc. Don't like how the new Dune
| movie looks? Edit the stylesheet and make it your own.
| Personalized remixed blockbusters.
|
| On a more serious note, I don't think Roger Deakins has
| anything to worry about right now. Or maybe ever. We've been
| here before. DAWs opened up an entire world of audio production
| to people that could afford a laptop and some basic gear. But
| we certainly do not have a thousand Beatles out there. It still
| requires talent and effort.
| timeon wrote:
| > thousand Beatles out there. It still requires talent and
| effort
|
| As well as marketing.
| btbuildem wrote:
| In the video towards the bottom of the page, there are two birds
| (blue jays), but in the background there are two identical
| buildings (which look a lot like the CN Tower). CN Tower is the
| main landmark of Toronto, whose baseball team happens to be the
| Blue Jays. It's located near the main sportsball stadium
| downtown.
|
| I vaguely understand how text-to-image works, and so it makes
| sense that the vector space for "blue jays" would be near
| "toronto" or "cn tower". The improvements in scale and speed
| (image -> now video) are impressive, but given how incredibly
| able the image generation models are, they simultaneously feel
| crippled and limited by their lack of editing / iteration
| ability.
|
| Has anyone come across a solution where model can iterate (eg,
| with prompts like "move the bicycle to the left side of the
| photo")? It feels like we're close.
| appplication wrote:
| I don't spend a lot of time keeping up with the space, but I
| could have sworn I've seen a demo that allowed you to iterate
| in the way you're suggesting. Maybe someone else can link it.
| accrual wrote:
| It's not exactly like GP described (e.g. move bike to the
| left) but there is a more advanced SD technique called
| inpainting [0] that allows you to manually recompose parts of
| the image, e.g. to fix bad eyes and hands.
|
| [0] https://stable-diffusion-art.com/inpainting_basics/
| ssalka wrote:
| My guess is you're thinking of InstructPix2Pix[1], with
| prompts like "make the sky green" or "replace the fruits with
| cake"
|
| [1] https://github.com/timothybrooks/instruct-pix2pix
| appplication wrote:
| This is exactly it!
| tjoff wrote:
| Emu-Edit is the closest I've seen.
|
| https://emu-edit.metademolab.com/
|
| https://ai.meta.com/blog/emu-text-to-video-generation-
| image-...
| kshacker wrote:
| Assuming we can post links, you mean this video:
| https://youtu.be/G7mihAy691g?si=o2KCmR2Uh_97UQ0N
|
| Also, maybe you can't edit post facto, but when you give
| prompts, would you not be able to say : two blue jays but no CN
| tower
| FrozenTuna wrote:
| Yes, its called a negative prompt. Idk if txt2video has it,
| but both llms and stable-diffusion have it so I'd assume its
| good to go.
| nottheengineer wrote:
| Haven't implemented negative prompts yet, but from what I
| can tell it's as simple as substracting from the prompt in
| embedding space.
| FrozenTuna wrote:
| Not _exactly_ what you 're asking for, but AnimateDiff has
| introduced creating gifs to SD. Still takes quite a bit of
| tweaking IME.
| xianshou wrote:
| Emu edit should be exactly what you're looking for:
| https://ai.meta.com/blog/emu-text-to-video-generation-image-...
| smcleod wrote:
| It doesn't look like the code for that is available anywhere
| though?
| TacticalCoder wrote:
| > Has anyone come across a solution where model can iterate
| (eg, with prompts like "move the bicycle to the left side of
| the photo")? It feels like we're close.
|
| I feel like we're close too, but for another reason.
|
| For although I love SD and these video examples are great...
| It's a flawed method: they never get lighting correctly and
| there are many incoherent things just about everywhere. Any 3D
| artist or photographer can immediately spot that.
|
| However I'm willing to bet that we'll soon have something
| _much_ better: you 'll describe something and you'll get a full
| 3D scene, with 3D models, source of lights set up, etc.
|
| And the scene shall be sent into Blender and you'll click on a
| button and have an actual rendering made by Blender, with
| correct lighting.
|
| Wanna move that bicycle? Move it in the 3D scene exactly where
| you want.
|
| That is coming.
|
| And for audio it's the same: why generate an audio file when
| soon models shall be able to generate the various tracks, with
| all the instruments and whatnots, allowing to create the audio
| file?
|
| That is coming too.
| p1esk wrote:
| Are you working on all that?
| cptaj wrote:
| Probably not. But there does seem to be a clear path to it.
|
| The main issue is going to be having the right dataset. You
| basically need to record user actions in something like
| blender (ie: moving a model of a bike to the left of a
| scene), match it to a text description of the action (ie;
| "move bike to the left") and match those to before/after
| snapshots of the resulting file format.
|
| You need a whole metric fuckton of these.
|
| After that, you train your model to produce those 3d scene
| files instead of image bitmaps.
|
| You can do this for a lot of other tasks. These general
| purpose models can learn anything that you can usefully
| represent in data.
|
| I can imagine AGI being, at least in part, a large set of
| these purpose trained models. Heck, maybe our brains work
| this way. When we learn to throw a ball, we train a model
| in a subset of our brain to do just this and then this
| model is called on by our general consciousness when
| needed.
|
| Sorry, I'm just rambling here but its very exciting stuff.
| sterlind wrote:
| The hard part of AGI is the self-training and few
| examples. Your parents didn't attach strings to your body
| and puppeteer you through a few hundred thousand games of
| baseball. And the humans that invented baseball had zero
| training data to go on.
| atentaten wrote:
| Whats your reasoning for feeling that we're close?
| cptaj wrote:
| We do it for text, audio and bitmapped images. A 3D scene
| file format is no different, you could train a model to
| output a blender file format instead of a bitmap.
|
| It can learn anything you have data for.
|
| Heck, we do it with geospatial data already, generating
| segmentation vectors. Why not 3D?
| boppo1 wrote:
| >3D scene file format is no different
|
| Not in theory, but the level of complexity is way higher
| and the amount of data available is much smaller.
|
| Compare bitmaps to this: https://fossies.org/linux/blende
| r/doc/blender_file_format/my...
| kaibee wrote:
| Also the level of fault tolerance... if your pixels are a
| bit blurry, chances are no one notices at a high enough
| resolution. If your json is a bit blurry you have
| problems.
| jncfhnb wrote:
| Text, audio, and bitmapped images are data. Numbers and
| tokens.
|
| A 3D scene is vastly more complex, and the way you
| consume it is tangential to the rendering of it we use to
| interpret. It is a collection of arbitrary data
| structures.
|
| We'll need a new approach for this kind of problem
| dragonwriter wrote:
| > Text, audio, and bitmapped images are data. Numbers and
| tokens.
|
| > A 3D scene is vastly more complex
|
| 3D scenes, in fact, are also data, numbers and tokens.
| (Well, numbers, but so are tokens.)
| dragonwriter wrote:
| We do it for 3D, too.
|
| https://guytevet.github.io/mdm-page/
| bob1029 wrote:
| > However I'm willing to bet that we'll soon have something
| much better: you'll describe something and you'll get a full
| 3D scene, with 3D models, source of lights set up, etc.
|
| I agree with this philosophy - Teach the AI to work with the
| same tools the human does. We already have a lot of human
| experts to refer to. Training material is everywhere.
|
| There isn't a "text-to-video" expert we can query to help us
| refine the capabilities around SD. It's a one-shot, Jupiter-
| scale model with incomprehensible inertia. Contrast this with
| an expert-tuned model (i.e. natural language instructions)
| that can be nuanced precisely and to the the point of
| imperceptibility with a single sentence.
|
| The other cool thing about the "use existing tools" path is
| that if the AI fails part way through, it's actually possible
| for a human operator to step in and attempt recovery.
| epr wrote:
| > you'll describe something and you'll get a full 3D scene,
| with 3D models, source of lights set up, etc.
|
| I'm always confused why I don't hear more about projects
| going in this direction. Controlnets are great, but there's
| still quite a lot of hallucination and other tiny mistakes
| that a skilled human would never make.
| boppo1 wrote:
| Blender files are dramatically more complex than any image
| format, which are basically all just 2D arrays of 3-value
| vectors. The blender filetype uses a weird DNA/RNA struct
| system that would probably require its own training run.
|
| More on the Blender file format: https://fossies.org/linux/
| blender/doc/blender_file_format/my...
| mikepurvis wrote:
| But surely you wouldn't try to emit that format directly,
| but rather some higher level scene description? Or even
| just a set of instructions for how to manipulate the UI
| to create the imagined scene?
| BirdieNZ wrote:
| I've seen this but producing Python scripts that you run
| in Blender, e.g.
| https://www.youtube.com/watch?v=x60zHw_z4NM (but I saw
| something marginally more impressive, not sure where
| though!)
| Keyframe wrote:
| Scene layouts, models and their attributes are a result
| of user input (ok and sometimes program output). One
| avenue to take there would be to train on input expecting
| an output. Like teaching a model to draw instead of
| generate images.. which in a sense we already did by
| broadly painting out silhouettes and then rendering
| details.
| guyomes wrote:
| Voxel files could be a simpler step for 3D images.
| bozhark wrote:
| One was on the front page the other day, I'll search for a
| link
| jowday wrote:
| There's a lot of issues with it, but perhaps the biggest is
| that there aren't just troves of easily scrapable and
| digestible 3D models lying around on the internet to train
| on top of like we have with text, images, and video.
|
| Almost all of the generative 3D models you see are actually
| generative image models that essentially (very crude
| simplification) perform something like photogrammetry to
| generate a 3D model - 'does this 3D object, rendered from
| 25 different views, match the text prompt as evaluated by
| this model trained on text-image pairs'?
|
| This is a shitty way to generate 3D models, and it's why
| they almost all look kind of malformed.
| sterlind wrote:
| If reinforcement learning were farther along, you could
| have it learn to reproduce scenes as 3D models. Each
| episode's task is to mimic an image, each step is a
| command mutating the scene (adding a polygon, or rotating
| the camera, etc.), and the reward signal is image
| similarity. You can even start by training it with
| synthetic data: generate small random scenes and make
| them increasingly sophisticated, then later switch over
| to trying to mimic images.
|
| You wouldn't need any models to learn from. But my
| intuition is that RL is still quite weak, and that the
| model would flounder after learning to mimic background
| color and placing a few spheres.
| skdotdan wrote:
| Deepmind tried something similar in 2018
| https://deepmind.google/discover/blog/learning-to-write-
| prog...
| dragonwriter wrote:
| > I'm always confused why I don't hear more about projects
| going in this direction.
|
| Probably because they aren't as advanced and the demos
| aren't as impressive to nontechnical audiences who don't
| understand the implications: there's lots of work on text-
| to-3d-model generation, and even plugins for some stable
| diffusion UIs (e.g., MotionDiff for ComyUI.)
| a_bouncing_bean wrote:
| Thanks! this is exactly what I have been thinking, only
| you've expressed it much more eloquently than I would be
| able.
| internet101010 wrote:
| I am guessing it will be similar to inpainting in normal
| stable diffusion, which is easy when using the workflow
| feature InvokeAI ui.
| coldtea wrote:
| > _For although I love SD and these video examples are
| great... It 's a flawed method: they never get lighting
| correctly and there are many incoherent things just about
| everywhere. Any 3D artist or photographer can immediately
| spot that._
|
| The question is whether the 99% of the audience would even
| care...
| sheepscreek wrote:
| Excellent point.
|
| Perhaps a more computationally expensive but better looking
| method will be to pull all objects in the scene from a 3D
| model library, then programmatically set the scene and render
| it.
| Kuinox wrote:
| This isn't coming, it's already here.
| https://github.com/gsgen3d/gsgen Yes, it's just 3D models for
| now, but it can do whole scenes generations, it's just not
| great yet at it. The tech is there but just need to improve.
| treesciencebot wrote:
| Have you seen fal.ai/dynamic where you can perform image to
| image synthesis (basically editing an existing image with the
| help of diffusion process) using LCMs to provide a real time
| UI?
| filterfiber wrote:
| > Has anyone come across a solution where model can iterate
| (eg, with prompts like "move the bicycle to the left side of
| the photo")? It feels like we're close.
|
| Emu can do that.
|
| The bluejay/toronto thing may be addressable later (I suspect
| via more detailed annotations a la dalle3) - these current
| video models are highly focused on figuring out temporal
| coherence
| JoshTriplett wrote:
| I also wonder if the model takes capitalization into account.
| Capitalized "Blue Jays" seems more likely to reference the
| sports team; the birds would be lowercase.
| psunavy03 wrote:
| > sportsball
|
| This is not the flex you think it is. You don't have to like
| sports, but snarking on people who do doesn't make you
| intellectual, it just makes you come across as a douchebag, no
| different than a sports fan making fun of "D&D nerds" or
| something.
| chaps wrote:
| Ah, Mr. Kettle, I see you've met my friend, Mr. Pot!
| Zetaphor wrote:
| This has become a colloquial term for describing all sports,
| not the insult you're perceiving it to be.
|
| Rather than projecting your own hangups and calling people
| names, try instead assuming that they're not trying to offend
| you personally and are just using common vernacular.
| amoshebb wrote:
| I wonder what other odd connections are made due to city-name
| almost certainly being the most common word next to sportsball-
| name.
|
| Do the parameters think that Jazz musicians are mormon? Padres
| often surf? Wizards like the Lincoln Memorial?
| ProfessorZoom wrote:
| that sounds like v0 by vercel, you can iterate just like you
| asked, to combine that type of iteration with video would be
| really awesome
| dinvlad wrote:
| Seems relatively unimpressive tbh - it's not really a video, and
| we've seen this kind of thing for a few months now
| accrual wrote:
| It seems like the breakthrough is that the video generating
| method is now baked into the model and generator. I've seen
| several fairly impressive AI animations as well, but until now,
| I assumed they were tediously cobbled together by hacking on
| the still-image SD models.
| youssefabdelm wrote:
| Can't wait for these things to not suck
| accrual wrote:
| It's definitely pretty impressive already. If there could be
| some kind of "final pass" to remove the slightly glitchy
| generative artifacts, these look completely passible for simple
| .gif/.webm header images. Especially if they could be made to
| loop smoothly ala Snapchat's bounce filter.
| accrual wrote:
| Fascinating leap forward.
|
| It makes me think of the difference between ancestral and non-
| ancestral samplers, e.g. Euler vs Euler Ancestral. With Euler,
| the output is somewhat deterministic and doesn't vary with
| increasing sampling steps, but with Ancestral, noise is added to
| each step which creates more variety but is more
| random/stochastic.
|
| I assume to create video, the sampler needs to lean heavily on
| the previous frame while injecting some kind of sub-prompt, like
| rotate <object> to the left by 5 degrees, etc. I like the phrase
| another commenter used, "temporal consistency".
|
| Edit: Indeed the special sauce is "temporal layers". [0]
|
| > Recently, latent diffusion models trained for 2D image
| synthesis have been turned into generative video models by
| inserting temporal layers and finetuning them on small, high-
| quality video datasets
|
| [0] https://stability.ai/research/stable-video-diffusion-
| scaling...
| adventured wrote:
| The hardest problem the Stable Diffusion community has dealt
| with in terms of quality has been in the video space, largely
| in relation to the consistency between frames. It's probably
| the most commonly discussed problem for example on
| r/stablediffusion. Temporal consistency is the popular term for
| that.
|
| So this example was posted an hour ago, and it's jumping all
| over the place frame to frame (somewhat weak temporal
| consistency). The author appears to have used pretty straight-
| forward text2img + Animatediff:
|
| https://www.reddit.com/r/StableDiffusion/comments/180no09/on...
|
| Fixing that frame to frame jitter related to animation is
| probably the most in-demand thing around Stable Diffusion right
| now.
|
| Animatediff motion painting made a splash the other day:
|
| https://www.reddit.com/r/StableDiffusion/comments/17xnqn7/ro...
|
| It's definitely an exciting time around SD + animation. You can
| see how close it is to reaching the next level of generation.
| torginus wrote:
| I admit I'm ignorant about these model's inner workings, but I
| don't understand why text is the chosen input format for these
| models.
|
| It was the same for image generation, where one needed to produce
| text prompts to create the image, and stuff like img2img and
| Controlnet that allowed things like controlling poses and
| inpainting, or having multiple prompts with masks controlling
| which part of the image is influenced by which prompt.
| gorbypark wrote:
| According to the GitHub repo this is an "image-to-video model".
| They tease of an upcoming "text to video" interface on the
| linked landing page, though. My guess is that interface will
| use a text-to-image model and then feed that into the image-to-
| video model.
| pizzafeelsright wrote:
| Imago Deo? The Word is what is spoken when we create.
|
| The input eventually becomes meanings mapped to reality.
| awongh wrote:
| It makes sense that they had to take out all of the cuts and
| fades from the training data to improve results.
|
| I'm the background section of the research paper they mention
| "temporal convolution layers", can anyone explain what that is?
| What sort of training data is the input to represent temporal
| states between images that make up a video? Or does that mean
| something else?
| machinekob wrote:
| I would assume is something similar to joining multiple
| frames/attentions? in channel dimension and then moving values
| inside so convolution will have access to some channels from
| other video frames.
|
| I was working on similar idea few years ago using this paper as
| reference and it was working extremely well for consistency
| also helping with flicker. https://arxiv.org/abs/1811.08383
| spaceman_2020 wrote:
| A seemingly off topic question, but with enough compute and
| optimization, could you eventually simulate "reality"?
|
| Like, at this point, what are the technical counters to the
| assertion that our world is a simulation?
| refulgentis wrote:
| A little too freshman's first bit off a bong for me. There is,
| of course, substantial differences between video and reality.
|
| Let's steel-man -- you mean 3D VR. Let's stipulate there's a
| headset today that renders 3D visually indistinguishable from
| reality. We're still short the other 4 senses
|
| Much like faith, there's always a way to sort of escape the
| traps here and say "can you PROVE this is base reality"
|
| The general technical argument against "brain in a vat being
| stimulated" would be the computation expense of doing such, but
| you can also write that off with the equivalent of foveated
| rendering but for all senses / entities
| 2-718-281-828 wrote:
| > Like, at this point, what are the technical counters to the
| assertion that our world is a simulation?
|
| How about this theory is neither verifiable nor falsifiable.
| vidarh wrote:
| The _general concept_ is not falsifiable, but many variations
| might be, or their inverse might be. E.g. the theory that we
| are _not_ in a simulation would in general be falsifiable by
| finding an "escape" from a simulation and so showing we are
| in one (but not finding an escape of course tells us
| nothing).
|
| It's not a very useful endeavour to worry about, but it can
| be fun to speculate about what might give rise to testable
| hypotheses and what that might tell us about the world.
| tracerbulletx wrote:
| The brain does simulate reality in the sense that what you
| experience isn't direct sensory input, but more like a dream
| being generated to predict what it thinks is happening based on
| conflicting and imperfect sensory input.
| accrual wrote:
| To illustrate your point, an easily accessible example of
| this is how the second hand on clocks appears to freeze for
| longer than a second when you quickly glance at it. The brain
| is predicting/interpolating what it expects to see, creating
| the illusion of a delay.
|
| https://www.popsci.com/how-time-seems-to-stop/
| danielbln wrote:
| Example vision: comes in from the optic nerve warped and
| upside down and as small patches of high resolution captured
| by the eyes zigzagging across the visual field (saccades),
| all of which is assembled and integrated into a coherent
| field of vision by our trusty old grey blob.
| beepbooptheory wrote:
| Why does it matter? Not trying to dismiss, but truly, what
| would it mean to you if you could somehow verify the
| "simulation"?
|
| If it _would_ mean something drastic to you, I would be very
| curious to hear your preexisting existential beliefs
| /commitments.
|
| People say this sometimes and its kind of slowly revealed to me
| that its just a new kind of geocentrism: its not _just_ a
| simulation people have in mind, but one where earth /humans are
| centered, and the rest of the universe is just for the benefit
| of "our" part of the simulation.
|
| Which is a fine theory I guess, but is also just essentially
| wanting God to exist with extra steps!
| KineticLensman wrote:
| (disclaimer: worked in the sim industry for 25 years, still
| active in terms of physics-based rendering).
|
| First off, there are zero technical proofs that we are in a
| sim, just a number of philosophical arguments.
|
| In practical terms, we cannot yet simulate a single human cell
| at the molecular level, given the massive number of
| interactions that occur every microsecond. Simulating our
| entire universe is not technically possible within the lifetime
| of our universe, according to our current understanding of
| computation and physics. You either have to assume that 'the
| sim' is very narrowly focussed in scope and fidelity, and / or
| that the outer universe that hosts 'the sim' has laws of
| physics that are essentially magic from our perspective. In
| which case the simulation hypothesis is essentially a religious
| argument, where the creator typed 'let there be light' into his
| computer. If there isn't such a creator, the sim hypothesis
| 'merely' suggests that our universe, at its lowest levels,
| looks somewhat computational, which is an entirely different
| argument.
| freedomben wrote:
| I don't think you would need to simulate the entire universe,
| just enough of it that the consciousness receiving sense data
| can't encounter any missing info or "glitches" in the
| metaphorical matrix. Still hard of course, but substantially
| less compute intensive than every molecule in the universe.
| gcanyon wrote:
| And if you're in charge of the simulation, you get to
| decide how many "consciousnesses" there are, constraining
| them to be within your available compute. Maybe that's ~8
| billion -- maybe it's 1. Yeah, I'm feeling pretty
| Boltzmann-ish right now...
| KineticLensman wrote:
| > but substantially less compute intensive than every
| molecule in the universe
|
| Very true, but to me this view of the universe and one's
| existence within it as a sort of second-rate solipsist
| bodge isn't a satisfyingly profound answer to the question
| of life the universe and everything.
|
| Although put like that it explains quite a lot.
|
| [Edit] There is also a sense in which the sim-as-a-
| focussed-mini-universe view is even less falsifiable,
| because sim proponents address any doubt about the sim by
| moving the goal posts to accommodate what they claim is
| actually achievable by the putative creator/hacker on
| Planet Tharg or similar.
| kaashif wrote:
| And you don't have to simulate it in real time, maybe 1
| second here takes years or centuries to simulate outside
| the simulation. It's not like we'd have any way to tell.
| hackerlight wrote:
| These are all open questions in philosophy of mind.
| Nobody knows what causes consciousness/qualia so nobody
| knows if it's substrate dependent or not and therefore
| nobody knows if it can be simulated in a computer, or if
| it can nobody knows what type of computer is required for
| consciousness to be a property of the resulting
| simulation.
| SXX wrote:
| Actually it was already done by sentdex with GAN Theft Auto:
|
| https://youtu.be/udPY5rQVoW0
|
| To an extent...
|
| PS: Video is 2 years old, but still really impressive.
| epiccoleman wrote:
| This is really, really cool. A few months ago I was playing with
| some of the "video" generation models on Replicate, and I got
| some really neat results[1], but it was very clear that the
| resulting videos were made from prompting each "frame" with the
| previous one. This looks like it can actually figure out how to
| make something that has a higher level context to it.
|
| It's crazy to see this level of progress in just a bit over half
| a year.
|
| [1]: https://epiccoleman.com/posts/2023-03-05-deforum-stable-
| diff...
| richthekid wrote:
| This is gonna change everything
| jetsetk wrote:
| Is it? How so?
| Chabsff wrote:
| It's really not.
|
| Don't get me wrong, this is insanely cool, but it's still a
| long way from good enough to be truly disruptive.
| echelon wrote:
| One year.
|
| All of Hollywood falls.
| Chabsff wrote:
| No offense, but this is absolutely delusional.
|
| As long as people can "clock" content generated from these
| models, it will be treated by consumers as low-effort
| drivel, no matter how much actual artistic effort goes in
| the exercise. Only once these systems push through the
| threshold of being indistinguishable from artistry will all
| hell break loose, and we are still very far from that.
|
| Paint-by-numbers low-effort market-driven stuff will take a
| hit for sure, but that's only a portion of the market, and
| frankly not one I'm going to be missing.
| ben_w wrote:
| Very far, yes, but also in a fast moving field.
|
| CGI in films used to be obvious all the time no matter
| how good the artists using it, now it's everywhere and
| only noticeable when that's the point; the gap from Tron
| to Fellowship of the Ring was 19.5 years.
|
| My guess is the analogy here puts the quality of existing
| genAI somewhere near the equivalent of early TV CGI,
| given its use in one of the Marvel title sequences etc.,
| but it is just an analogy and there's no guarantees of
| anything either way.
| r3d0c wrote:
| something unrelated improved overtime so something else
| unrelated will also improve to whatever goal you've set
| in your mind
|
| weird logic circles yall keep making to justify your
| beliefs, i mean the world is very easy like you just
| described if you completely strip all nuance and
| complexity
|
| people used to believe at the start of the space race
| we'd have mars colonies by now because they looked at the
| rate of technological advancement from 1910 to 1970, from
| the first flight to landing on the moon; yet that didn't
| happen because everything doesn't follow the same
| repeatable patterns
| pessimizer wrote:
| People also believed that recorded music would destroy
| the player piano industry and the market for piano rolls.
| Just because recorded music is cheaper doesn't mean that
| the audience will be willing to give up the actual sound
| of a piano being played.
| ben_w wrote:
| First, lotta artists already upset with genAI and the
| impact it has.
|
| Second, I _literally_ wrote the same point you seem to
| think is a gotcha:
|
| > it is just an analogy and there's no guarantees of
| anything either way
| woeirua wrote:
| Every time something like this is released someone comments
| how it's going to blow up legacy studios. The only way you
| can possibly think that is that: 1-the studios themselves
| will somehow be prevented from using this tech themselves,
| and 2-that somehow customers will suddenly become amenable
| to low grade garbage movies. Hollywood already produces
| thousands of low grade B or C movies every year that cost
| fractions of what it costs to make a blockbuster. Those
| movies make almost nothing at the box office.
|
| If anything, a deluge of cheap AI generated movies is going
| to lead to a flight to quality. The big studios will be
| more powerful because they will reap the productivity gains
| and use traditional techniques to smooth out the rough
| edges.
| underscoring wrote:
| > 2-that somehow customers will suddenly become amenable
| to low grade garbage movies
|
| People have been amenable to low grade garbage movies for
| a long, long time. See Adam Sandler's back catalog.
| evrenesat wrote:
| In a few years' time, teenagers will be consuming shows and
| films made by their peers, not by streaming providers.
| They'll forgive and perhaps even appreciate the technical
| imperfections for the sake of uncensored, original content
| that fits perfectly with their cultural identity.
|
| Actually, when processing power catches up, I'm expecting a
| movie engine with well-defined characters, scenes, entities,
| etc., so people will be able to share mostly text-based
| scenarios to watch on their hardware players.
| Chabsff wrote:
| Similar to how all the kids today only play itch.io games
| thanks to Unity and Unreal dramatically lowering the bar of
| entry into game development.
|
| Oh wait... No.
|
| All it has done is create an environment where indy games
| are now assumed to be trash unless proven otherwise, making
| getting traction as a small developer orders of magnitude
| harder than it has ever been because their efforts are
| drowning in a sea of mediocrity.
|
| That same thing is already starting to happen on youtube
| with AI content, and there's no reason for me to expect
| this going any other way.
| evrenesat wrote:
| It took ~2 years for my 10 year old daughter to get bored
| and give up the shitty user made roblox games and start
| playing on switch, steam or ps4.
| nwienert wrote:
| They do that now (forget the name there's a popular one my
| niece uses to make animated comics, others do similar
| things in Minecraft etc), and have been doing that since
| forever - nearly 30 years ago my friends and I were
| scribbling comic panels into our notebooks and sharing them
| around class.
| nbzso wrote:
| Model chain:
|
| Instance One : Act as a top tier Hollywood scenarist, use the
| public available data for emotional sentiment to generate a
| storyline, apply the well known archetypes from proven
| blockbusters for character development. Move to instance two.
|
| Instance Two: Act as top tier producer. {insert generated
| prompt}. Move to instance three.
|
| Instance Three: Generate Meta-humans and load personality traits.
| Move to instance four.
|
| Instance Four: Act as a top tier director.{insert generated
| prompt}. Move to instance five.
|
| Instance Five: Act as a top tier editor.{insert generated
| prompt}. Move to instance six.
|
| Instance Six: Act as a top tier marketing and advertisement
| agency.{insert generated prompt}. Move to instance seven.
|
| Instance Seven: Act as a top tier accountant, generate an
| interface to real-time ROI data and give me the results on an
| optimized timeline into my AI induced dream.
|
| Personal GPT: Buy some stocks, diversify my portfolio, stock up
| on synthetic meat, bug-coke and Soma. Call my mom and tell her I
| made it.
| aliljet wrote:
| I've been following this space very very closely and the killer
| feature would be to be able to generate these full featured
| videos for longer than a few seconds with consistently shaped
| "characters" (e.g., flowers, and grass, and houses, and cars,
| actors, etc.). Right now, it's not clear to me that this is
| achieving that objective. This feels like it could be great to
| create short GIFs, but at what cost?
|
| To be clear, this remains wicked, wicked, wicked exciting.
| speedgoose wrote:
| Has anyone managed to run the thing? I got the streamlit demo to
| start after fighting with pytorch, mamba, and pip for half an
| hour, but the demo runs out of GPU memory after a little while. I
| have 24GB on GPU on the machine I used, does it need more?
| mkaic wrote:
| Have heard from others attempting it that it needs 40GB, so
| basically an A100/A6000/H100 or other large card. Or an Apple
| Silicon Mac with a bunch of unified memory, I guess.
| mlboss wrote:
| Give it a week.
| speedgoose wrote:
| Alright thanks for the information. I will try to justify
| using one A100 for my "very important" research activities.
| skonteam wrote:
| Yeah, got a 24GB 4090, try to reduce the number of frames
| decoded to something like 4 or 8. Although, keep in mind it
| caps the 24Gb and goes to RAM (with the latest nvidia drivers).
| speedgoose wrote:
| Oh yes it works, thanks!
| nwoli wrote:
| Is the checkpoint default fp16 or fp32?
| neaumusic wrote:
| It's funny that still don't really have video wallpapers on most
| devices (I'm only aware of Wallpaper Engine on Windows)
| pcj-github wrote:
| Soon the hollywood strike won't even matter, won't need any of
| those jobs. Entire west coast economy obliterated.
| jonplackett wrote:
| Is this available in the stability API any time soon?
| chrononaut wrote:
| Much like in static images, the subtle unintended imperfections
| are quite interesting to observe.
|
| For example, the man in the cowboy hat seems he is almost
| gagging. In the train video the tracks seem to be too wide while
| the train ice skates across them.
___________________________________________________________________
(page generated 2023-11-21 23:00 UTC)