[HN Gopher] Let's talk about animation quality
___________________________________________________________________
Let's talk about animation quality
Author : ibobev
Score : 179 points
Date : 2024-10-10 10:38 UTC (12 hours ago)
(HTM) web link (theorangeduck.com)
(TXT) w3m dump (theorangeduck.com)
| Scene_Cast2 wrote:
| Something I keep seeing is that modern ML makes for some really
| cool and impressive tech demos in the creative field, but is not
| productionizable due to a lack of creative control.
|
| Namely, anything generating music / video / images - tweaking the
| output is not workable.
|
| Some notable exceptions are when you need stock art for a blog
| post (no need for creative control), Adobe's recolorization tool
| (lots of control built in), and a couple more things here and
| there.
|
| I don't know how it is for 3D assets or rigged model animation
| (as per the article), never worked with them. I'd be curious to
| hear about successful applications, maybe there's a pattern.
| jncfhnb wrote:
| Probably accurate for videos and music. Videos because there's
| going to be just too many things to correct to make it time
| efficient. Music because music just needs to be excellent or
| it's trash. That is for high quality art of course. You can
| ship filler garbage for lots of things.
|
| 2D art has a lot of strong tooling though. If you're actually
| trying to use AI art tooling, you won't be just dropping a
| prompt and hoping for the best. You will be using a workflow
| graph and carefully iterating on the same image with controlled
| seeds and then specific areas for inpainting.
|
| We are at an awkward inflection point where we have great
| tooling for the last generation of models like SDXL, but
| haven't really made them ready for the current gen of models
| (Flux) which are substantially better. But it's basically an
| inevitability on the order of months.
| jsheard wrote:
| Even with the relatively strong tooling for 2D art it's still
| very difficult to push the generated image in novel
| directions though, hence the heavy reliance on LoRAs trained
| on prior examples. There doesn't seem to be an answer to "how
| would you create [artists] style with AI" that doesn't
| require [artist] to already exist so you can throw their
| life's work into a blender and make a model that copies it.
|
| I've found this to be observable in practice - I follow
| hundreds of artists who I could reliably name by seeing a new
| example of their work, even if they're only amateurs, but I
| find that AI art just blurs together into a samey mush with
| nothing to distinguish the person at the wheel from anyone
| else using the same models. The tool speaks much louder than
| the person supposedly directing it, which isn't the case with
| say Photoshop, Clip Studio or Blender.
| jncfhnb wrote:
| Shrug. That's a very different goal. Yes, if you want to
| leverage a different style your best bet is to train a Lora
| off a dozen images in that style.
|
| Art made by unskilled randos is always going to blur
| together. But the question I feel we're discussing here is
| whether a dedicated artist can use them for production
| grade content. And the answer is yes.
| RunSet wrote:
| https://www.kiplingsociety.co.uk/poem/poems_conundrum.htm
| AlienRobot wrote:
| Something I realized about AI is that an AI that generates
| "art" be it text, image, animation, video, photography, etc.,
| is cool. The product it generates, however, is not.
|
| It's very cool that we have a technology that can generate
| video, but what's cool is the tech, not the video. It doesn't
| matter if it's a man eating spaghetti or a woman walking in
| front of dozens of reflections. The tech is cool, the video is
| not. It could be ANY video and just the fact AI can generate is
| cool. But nobody likes a video that is generated by AI.
|
| A very cool technology to produce products that nobody wants.
| postexitus wrote:
| While I am in the same camp as you, there is one exception:
| Music. Especially music with lyrics (like suno.com) -
| Although I know that it's not created by humans, the music
| created by Suno is still very listenable and it evokes
| feelings just like any other piece of music does. Especially
| if I am on a playlist and doing something else and the songs
| just progress into the unknown. Even when I am in a more
| conscious state - i.e. creating my own songs in Suno, the end
| result is so good that I can listen to it over and over
| again. Especially those ones that I create for special events
| (like mocking a friend's passing phase of communism and
| reverting back to capitalism).
| calflegal wrote:
| appreciate your position but mine is that everything out of
| suno sounds like copycat dog water.
| xerox13ster wrote:
| Makes sense that GP appreciates the taste of dog water
| when they're mocking their friends for having had values
| (friends whom likely gave up their values to stop being
| mocked)
| Loughla wrote:
| In my opinion, Suno is good for making really funny songs,
| but not for making really moving songs. Examples of songs
| that make me chuckle that I've had it do:
|
| A Bluegrass song about how much fun it is to punch holes in
| drywall like a karate master.
|
| A post-punk/hardcore song about the taste of the mud and
| rocks at the bottom of a mountain stream in the newly
| formed mountains of Oklahoma.
|
| A hair band power ballad about white dad sneakers.
|
| But for "serious" songs, the end result sounds like generic
| muzak you might hear in the background at Wal-Mart.
| w0m wrote:
| That's an over simplification I think. If you're only
| generating a video because 'I can oooh AI' - then of course
| no one wants it. If you treat the tools as what they are,
| Tools - then people may want it.
|
| No one really cares about a tech demo, but if generative
| tools help you make a cool music video to an awesome song?
| People will want it.
|
| Well, as long as they aren't put off by a regressive stigma
| against new tool at least.
| AlienRobot wrote:
| If you used AI to make something awesome, even if I liked
| it, I'd feel scammed if it wasn't clearly labelled as AI,
| and if it was clearly labelled as AI I wouldn't even look
| at it.
| w0m wrote:
| > I'd feel scammed if it wasn't clearly labelled as AI
|
| TBF - have you looked at a digital photo made in the last
| decade? Likely had significant 'AI' processing applied to
| it. That's why I call it a regressive pattern to dislike
| anything with a new label attached - it minimizes at best
| and often flat out ignores the very real work very real
| artists put in to leverage the new tools.
| AlienRobot wrote:
| You still have to take the photo. That's a billion times
| more effort than typing a prompt in ChatGPT.
| mindcandy wrote:
| > if it was clearly labelled as AI I wouldn't even look
| at it.
|
| If you dislike it without even seeing it, that would
| indicate the problem isn't with the video...
| AlienRobot wrote:
| Yes, the problem is with AI. I'm tired of trying to find
| X and finding "AI X" instead. I google "pixel art" I get
| "AI pixel art." I google clipart I get "AI clipart." I go
| to /r/logodesign to see some cool logo designs, it's 50%
| people who used ChatGPT asking if it looks good enough.
|
| The only good AI is AI out of my sight.
| giraffe_lady wrote:
| Are there any valid reasons people might not like this or
| is it only "regressive stigma?"
| bobthepanda wrote:
| Humans find lots of value in human effort towards
| culturally important things.
|
| See: a grandmother's food vs. the industrial equivalent
| namtab00 wrote:
| > A very cool technology to produce products that nobody
| wants.
|
| creative power without control is like a rocket with no
| navigation--sure, you'll launch, but who knows where you'll
| crash!
| krapp wrote:
| Yes, it turns out there's more to creating good art than
| simulating the mechanics and technique of good artists. The
| human factor actually matters, and that factor can't be
| extrapolated from the data in the model itself. In essence
| it's a lossy compression problem.
|
| It is _technically_ interesting, and a lot of what it creates
| does have its own aesthetic appeal just because of how
| uncanny it can get, particularly in a photorealistic format.
| It 's like looking at the product of an alien mind, or an
| alternate reality. But as an expression of actual human
| creative potential and directed intent I think it will always
| fall short of the tools we already have. They require skilled
| human beings who require paychecks and sustenance and sleep
| and toilets, and sometimes form unions, and unfortunately
| _that 's_ the problem AI is being deployed to solve in the
| hope that "extruded AI art product" is good enough to make a
| profit from.
| noja wrote:
| > or a woman walking in front of dozens of reflections
|
| A lot of people will not notice the missing reflections and
| because of this our gatekeepers to quality will disappear.
| jncfhnb wrote:
| The problem in your example is that you wouldn't think a
| picture of a man eating spaghetti taken by a real person
| would be cool.
|
| You may feel different if it's, say, art assets in your new
| favorite video game, frames of a show, or supplementary art
| assets in some sort of media.
| detourdog wrote:
| The generated artwork will initially displace clipart/stock
| footage and then illustrators and graphic designers.
|
| The last 2 can have tremendous talent but the society at large
| isn't that sensitive to the higher quality output.
| doctorpangloss wrote:
| > but is not productionizable due to a lack of creative
| control.
|
| It's just a matter of time until some big IP holder makes
| "productionizable" generative art, no? "Tweaking the output" is
| just an opinion, and people already ship tons of AAA art with
| flaws that lacked budget to tweak. How is this going to be any
| different?
| fwip wrote:
| No, it's not "just a matter of time." It's an open question
| whether it's even possible with anything resembling current
| techniques.
| LoganDark wrote:
| Seems like this site is getting hugged to death right now
| Bilal_io wrote:
| I haven't checked, but I think some of the videos on the page
| might be served directly from the server.
|
| Edit: Wow! they are loaded directly from the server where I
| assume no cdn is involved. And what's even worse they're not
| lazy loaded. No wonder why it cannot handle a little bit of
| traffic.
| numpad0 wrote:
| 3p mirror:
| https://megalodon.jp/2024-1010-2132-09/https://theorangeduck...
| Retr0id wrote:
| Seems like the media files still load from the original domain
| oDot wrote:
| I spend a lot of my time researching live-action anime[0][1], and
| there's an important thing to learn from Japanese animators:
| sometimes an animation style may seem technically lacking, but
| visually stunning.
|
| When animator Ken Arto was on the Trash Taste podcast he
| mentioned how Disney had the resources to perfect the animation,
| while in Japan they had to achieve more with less.
|
| This basically shifts the "what is good animation" discussion in
| ways that are not as clear from looking at the stats.
|
| [0] https://blog.nestful.app/p/ways-to-use-nestful-outlining-
| ani...
|
| [1] https://www.youtube.com/watch?v=WiyqBHNNSlo
| oreally wrote:
| These kinds of perspectives are often found and parroted in
| perceived 'elite' circles. It's no wonder the author works in
| Epic Games, a place in which one would need high technical
| chops to work there.
|
| It's also no wonder why such people get disconnected from some
| realities on the ground. Sure on paper people do want higher
| quality things but they don't even know what those are. Most
| people have low-brow tastes; they'd take a cheaper and well-
| marketed thing over a 1% improvement.
|
| Japan didn't need to compete on the same ladder for success, it
| needed to mix various elements of what they're good at to
| achieve it's own success.
| jncfhnb wrote:
| Those dumb artists focusing on quality instead of revenue!
| oDot wrote:
| Exactly right. Sometimes those "higher quality" things may
| lead to reduced quality, most commonly by reaching the
| uncanny valley.
|
| Interestingly that does not happen in the opposite direction.
| When "reducing" certain stats on real footage (which is what
| live-action anime should do[0]) the uncanny valley is
| skipped. Maybe it's harder to fall into when going backwards?
| More research is needed.
|
| BTW, I love your books
|
| [0] https://www.youtube.com/shorts/3ZiBu5Il2eY
| cameron_b wrote:
| I love the statement in the conclusion.
|
| Curation is something we intrinsically favor over engagement
| algorithms. Noisy is easy to quantify, but greatness is not.
| Greatness might have a lag in engagement metrics while folks read
| or watch the material. It might provoke consideration, instead of
| reaction.
|
| Often we need seasons of production in order to calibrate our
| selection criteria, and hopefully this season of booming
| generation leads to a very rich new opportunity to curate great
| things to elevate from the noise.
| MichaelZuo wrote:
| Why is curation relevant to 'greatness'?
|
| By definition 99% of the content produced has to be in the
| bottom 99 percentiles, in any given year.
|
| Even if the entire world decided everything must be curated,
| that would just mean the vast vast majority of curators have
| not-great taste.
|
| Whereas in a future world where 99% of it is driven by
| algorithms, that would mean the vast majority of curators have
| 'great' taste.
|
| But this seems entirely orthogonal.
| baruchthescribe wrote:
| The author did some very cool work with Raylib interpolating
| between animations to make transitions more natural. I remember
| being blown away at how realistic it looked from the videos he
| posted in the Discord. Glad to see he's still pushing the
| boundaries on what's possible with quality animation. And of
| course Cello rocks!
| meebob wrote:
| Something I really enjoyed about this article is that really
| helps explain a counterintuitive result in hand drawn 2D
| animation. It's a well known phenomenon in hand drawn 2D
| animation that naively tracing over live action footage usually
| results in unconvincing and poor quality animation. The article
| demonstrates how sampling and even small amounts of noise can
| make a movement seem unconvincing or jittery- and seeing that, it
| suddenly helps make sense how something like simple tracing at 12
| fps would produce bad results, without substantial error
| correction (which is where traditional wisdom like arcs,
| simplification etc comes in).
| kderbe wrote:
| 2D animation traced over live action is called rotoscoping.
| Many of Disney's animated movies from the Walt Disney era used
| rotoscoping, so I don't think it's fair to say it results in
| poor quality.
|
| https://en.wikipedia.org/wiki/List_of_rotoscoped_works#Anima...
| autoexec wrote:
| Rotoscoping has its place. It can save a lot of time/money
| for scenes with complex motion and can produce good results,
| but overreliance on it does tend to produce worse animation
| since it can end up being constrained to just what was
| captured on film. Without it, animators are more free to
| exaggerate certain motions, or manipulate the framerate, or
| animate things that could never be captured on camera in the
| first place. That kind of freedom is part of what makes
| animation such a cool medium. Animation would definitely be
| much worse off if rotoscoping was all we had.
| tuna74 wrote:
| "Animation would definitely be much worse off if
| rotoscoping was all we had." Yeah, then it wouldn't be
| animation anymore.
| autoexec wrote:
| I mean, rotoscoping is still animation, but it's just one
| technique/tool of the trade. I thought it was used well
| in Undone, and I enjoyed The Case of Hana & Alice
| Isamu wrote:
| The comment was about naive tracing. When Disney used
| rotoscoping they had animators draw conforming to a character
| model on top of the live action pose.
|
| The experienced animator and inbetweeners knew how to produce
| smooth line motion, and the live action was used for lifelike
| pose, movement, etc. It wasn't really tracing.
|
| There's examples of this in the Disney animation books, the
| finished animation looks very different from the live actors,
| but with the same movement.
| FuriouslyAdrift wrote:
| A Scanner Darkly is rotoscoped
|
| https://youtu.be/l1-xKcf9Q4s
| engeljohnb wrote:
| Rotoscoping was utilized for some difficult shots. Mostly
| live action was used for reference, not directly traced,
| Fleischer style. I've never seen rotoscoping that looked so
| masterful as Snow White and similar golden age films.
|
| https://www.youtube.com/watch?v=smqEmTujHP8
| tech_ken wrote:
| The points about the effects of noise are super interesting. Kind
| of mind blowing to think about the sensitivity of our perception
| being so different across visual channels (color, shape,
| movement, etc).
| djmips wrote:
| 'Obviously a huge part of this is the error propagation that we
| get down the joint chain... but"
|
| This shouldn't be glossed over and a proper consideration of the
| error metric here is key to storing quality animation with fewer
| bits, lower bandwidth and higher performance.
| doctorpangloss wrote:
| Fitting joints onto a text-prompted Sora-generated video: could
| "transformers" not make all this stuff obsolete too? You might
| need the motion capture data for ground truth to fit joints,
| but maybe not to generate animation itself.
| doctorpangloss wrote:
| > The people who are actually trying to build quality content are
| being forced to sink or swim - optimize for engagement or else be
| forgotten... There are many people involved in deep learning who
| are trying very hard to sell you the idea that in this new world
| of big-data...
|
| It's always easy to talk about "actually trying to build quality
| content" in the abstract. Your thing, blog post or whatever,
| doesn't pitch us a game. Where is your quality content?
|
| That said, having opinions is _a_ pitch. A16Z will maybe give you
| like, $10m for your "Human Generated Authentic badge" anti-AI
| company or whatever. Go for it dude, what are you waiting for?
| Sure it's a lot less than $220m for "Spatial Intelligence." But
| it's $10m! Just take it!
|
| You can slap your badge onto Fortnite and try to become a
| household name by shipping someone else's IP. That makes sense to
| me. Whether you can get there without considering "engagement," I
| don't know.
| pvillano wrote:
| Image generation has its own problems with non-cancelling noise.
|
| For example, images are often generated with jpeg artifacts in
| regions but not globally.
|
| Watermarks are also reproduced.
|
| Some generated images have artifacts from CCD cameras
|
| https://www.eso.org/~ohainaut/ccd/CCD_artifacts.html
|
| Images generated from Google Street View data would likely
| contain features specific to the cars/cameras used in each
| country
|
| https://www.geometas.com/metas/categories/google_car/
| doctorpangloss wrote:
| It seems like such an obvious and surmountable problem though.
| Indeed since 2020 there are robust approaches to eliminating
| JPEG artifacts, for example - browse around here -
| https://openmodeldb.info/.
| nmacias wrote:
| the shoulder rotation plotted at various frequencies sparked for
| me: is there an "MP3" of character animation data? The way that
| we have compression optimized for auditory perception... it feels
| like we might be missing an open standard for compressing this
| kind of animation data?
|
| edit: Claude is thinking MP3 could work directly: pack 180Hz
| animation channels into a higher frequency audio signal with some
| scheme like Frequency Division / Time Division Multiplexing, or
| Amplitude Modulation. Boom, high compression with commonplace
| hardware support.
| cfstras wrote:
| That same graph had me jump towards the sampling theorem -
| playing back an animation with linear interpolation creates
| hard edges, e.g. frequency spikes. I'm not sure if the movement
| space is comparable to audio here, but I can't see why not.
|
| so; if the sampling theorem applies; having 2x the maximum
| movement ,,frequency" should be enough to perfectly recreate
| them, as long as you ,,filter out" any higher frequencies when
| playing back the animation by using something like fft
| upscaling (re-sampling) instead of linear or bezier
| interpolation.
|
| (having written this, I realize that's probably what everyone
| is doing.)
| xMissingno wrote:
| I would love to be corrected on this - but my understanding of
| frequency compression is that you have to decode the entire
| file before being able to play back the audio. Therefore, in
| real time applications with limited RAM (video games) you don't
| want to wait for the entire animation to be decoded before
| streaming the first frames.
|
| Can anyone think of a system with better time-to-first-frame
| that achieves good compression?
| nmacias wrote:
| most audio and video schemes support streaming, in the case
| of MP3 we are talking about frame-based compression
|
| I guess to restate my curiosity: are things like Animation
| Pose Compression in Unity or equivalents in other engines
| remotely as good as audio techniques with hardware support?
| The main work on this seems to be here and I didn't see any
| references to audio codecs in the issue history fwiw.
| https://github.com/nfrechette/acl
| roughly wrote:
| The author discusses the perceptual allowances for different
| kinds of inputs (the noise in images, etc), and it's a really
| interesting point that helps sketch some boundaries around where
| the LLM/Diffusion model paradigms are useful.
|
| Human color perception is almost entirely comparative - we see
| something as Blue because within the context of the other objects
| in a scene and the perceived lighting, the color an object would
| be that looked the way the object in the scene does is Blue (this
| is the blue dress phenomenon) - and so noise in images is easy
| for us to ignore. Similarly, audio and especially speech
| perception is also very strongly contextually dependent (as
| attested by the McGurk effect), so we can also deal with a lot of
| noise or imprecision - in other words, generative guesswork.
|
| Motion, on the other hand, and especially human motion, is
| something we're exquisitely attentive to - think of how many
| horror movies convey a character's 'off-ness' by subtle
| variations in how they move. In this case, the diffusion model's
| tendency towards guesswork is much, much less easily ignored -
| our brains are paying tight attention to subtle variations, and
| anything weird alarms us.
|
| A constant part of the conversation around LLMs, etc. is exactly
| this level of detail-mindedness (or, the "hallucinations"
| conversation), and I think that's basically where you're going to
| land with things like this - where you need actual genuine
| precision, where there's some proof point on whether or not
| something is accurate, the generative models are going to be a
| harder fit, whereas areas where you can get by with "pretty
| good", they'll be transformative.
|
| (I've said it elsewhere here, but my rule of thumb for the LLMs
| and generative models is that if a mediocre answer fast moves the
| needle - basically, if there's more value in speed than precision
| - the LLMs are a good fit. If not, they're not.)
| BugsJustFindMe wrote:
| > _one of the highest quality publicly available datasets of
| motion capture in the graphics community_
|
| > _This data is sampled at 120 Hz, with finger and toe motions_
|
| But when I watch the videos they look like the dancer had palsy
| affecting their hands or were wearing astronaut gloves, because
| the fingers barely move for the most part.
| javier_e06 wrote:
| If one looks at the YODA puppet in The Empire Strikes back, of
| course, moves like a puppet, but the motion is real. Jerky,
| emotional, human-like.
|
| One move to The Clone Wars and the CGI moves are mechanic. Maybe
| the way to go about animation is not on the eye of the beholder
| but on careful comparison of analog vs digital renderings: Film a
| human running on analog and pair it pixel by pixel with the
| digital cgi counterpart.
___________________________________________________________________
(page generated 2024-10-10 23:00 UTC)