[HN Gopher] Let's talk about animation quality
       ___________________________________________________________________
        
       Let's talk about animation quality
        
       Author : ibobev
       Score  : 179 points
       Date   : 2024-10-10 10:38 UTC (12 hours ago)
        
 (HTM) web link (theorangeduck.com)
 (TXT) w3m dump (theorangeduck.com)
        
       | Scene_Cast2 wrote:
       | Something I keep seeing is that modern ML makes for some really
       | cool and impressive tech demos in the creative field, but is not
       | productionizable due to a lack of creative control.
       | 
       | Namely, anything generating music / video / images - tweaking the
       | output is not workable.
       | 
       | Some notable exceptions are when you need stock art for a blog
       | post (no need for creative control), Adobe's recolorization tool
       | (lots of control built in), and a couple more things here and
       | there.
       | 
       | I don't know how it is for 3D assets or rigged model animation
       | (as per the article), never worked with them. I'd be curious to
       | hear about successful applications, maybe there's a pattern.
        
         | jncfhnb wrote:
         | Probably accurate for videos and music. Videos because there's
         | going to be just too many things to correct to make it time
         | efficient. Music because music just needs to be excellent or
         | it's trash. That is for high quality art of course. You can
         | ship filler garbage for lots of things.
         | 
         | 2D art has a lot of strong tooling though. If you're actually
         | trying to use AI art tooling, you won't be just dropping a
         | prompt and hoping for the best. You will be using a workflow
         | graph and carefully iterating on the same image with controlled
         | seeds and then specific areas for inpainting.
         | 
         | We are at an awkward inflection point where we have great
         | tooling for the last generation of models like SDXL, but
         | haven't really made them ready for the current gen of models
         | (Flux) which are substantially better. But it's basically an
         | inevitability on the order of months.
        
           | jsheard wrote:
           | Even with the relatively strong tooling for 2D art it's still
           | very difficult to push the generated image in novel
           | directions though, hence the heavy reliance on LoRAs trained
           | on prior examples. There doesn't seem to be an answer to "how
           | would you create [artists] style with AI" that doesn't
           | require [artist] to already exist so you can throw their
           | life's work into a blender and make a model that copies it.
           | 
           | I've found this to be observable in practice - I follow
           | hundreds of artists who I could reliably name by seeing a new
           | example of their work, even if they're only amateurs, but I
           | find that AI art just blurs together into a samey mush with
           | nothing to distinguish the person at the wheel from anyone
           | else using the same models. The tool speaks much louder than
           | the person supposedly directing it, which isn't the case with
           | say Photoshop, Clip Studio or Blender.
        
             | jncfhnb wrote:
             | Shrug. That's a very different goal. Yes, if you want to
             | leverage a different style your best bet is to train a Lora
             | off a dozen images in that style.
             | 
             | Art made by unskilled randos is always going to blur
             | together. But the question I feel we're discussing here is
             | whether a dedicated artist can use them for production
             | grade content. And the answer is yes.
        
             | RunSet wrote:
             | https://www.kiplingsociety.co.uk/poem/poems_conundrum.htm
        
         | AlienRobot wrote:
         | Something I realized about AI is that an AI that generates
         | "art" be it text, image, animation, video, photography, etc.,
         | is cool. The product it generates, however, is not.
         | 
         | It's very cool that we have a technology that can generate
         | video, but what's cool is the tech, not the video. It doesn't
         | matter if it's a man eating spaghetti or a woman walking in
         | front of dozens of reflections. The tech is cool, the video is
         | not. It could be ANY video and just the fact AI can generate is
         | cool. But nobody likes a video that is generated by AI.
         | 
         | A very cool technology to produce products that nobody wants.
        
           | postexitus wrote:
           | While I am in the same camp as you, there is one exception:
           | Music. Especially music with lyrics (like suno.com) -
           | Although I know that it's not created by humans, the music
           | created by Suno is still very listenable and it evokes
           | feelings just like any other piece of music does. Especially
           | if I am on a playlist and doing something else and the songs
           | just progress into the unknown. Even when I am in a more
           | conscious state - i.e. creating my own songs in Suno, the end
           | result is so good that I can listen to it over and over
           | again. Especially those ones that I create for special events
           | (like mocking a friend's passing phase of communism and
           | reverting back to capitalism).
        
             | calflegal wrote:
             | appreciate your position but mine is that everything out of
             | suno sounds like copycat dog water.
        
               | xerox13ster wrote:
               | Makes sense that GP appreciates the taste of dog water
               | when they're mocking their friends for having had values
               | (friends whom likely gave up their values to stop being
               | mocked)
        
             | Loughla wrote:
             | In my opinion, Suno is good for making really funny songs,
             | but not for making really moving songs. Examples of songs
             | that make me chuckle that I've had it do:
             | 
             | A Bluegrass song about how much fun it is to punch holes in
             | drywall like a karate master.
             | 
             | A post-punk/hardcore song about the taste of the mud and
             | rocks at the bottom of a mountain stream in the newly
             | formed mountains of Oklahoma.
             | 
             | A hair band power ballad about white dad sneakers.
             | 
             | But for "serious" songs, the end result sounds like generic
             | muzak you might hear in the background at Wal-Mart.
        
           | w0m wrote:
           | That's an over simplification I think. If you're only
           | generating a video because 'I can oooh AI' - then of course
           | no one wants it. If you treat the tools as what they are,
           | Tools - then people may want it.
           | 
           | No one really cares about a tech demo, but if generative
           | tools help you make a cool music video to an awesome song?
           | People will want it.
           | 
           | Well, as long as they aren't put off by a regressive stigma
           | against new tool at least.
        
             | AlienRobot wrote:
             | If you used AI to make something awesome, even if I liked
             | it, I'd feel scammed if it wasn't clearly labelled as AI,
             | and if it was clearly labelled as AI I wouldn't even look
             | at it.
        
               | w0m wrote:
               | > I'd feel scammed if it wasn't clearly labelled as AI
               | 
               | TBF - have you looked at a digital photo made in the last
               | decade? Likely had significant 'AI' processing applied to
               | it. That's why I call it a regressive pattern to dislike
               | anything with a new label attached - it minimizes at best
               | and often flat out ignores the very real work very real
               | artists put in to leverage the new tools.
        
               | AlienRobot wrote:
               | You still have to take the photo. That's a billion times
               | more effort than typing a prompt in ChatGPT.
        
               | mindcandy wrote:
               | > if it was clearly labelled as AI I wouldn't even look
               | at it.
               | 
               | If you dislike it without even seeing it, that would
               | indicate the problem isn't with the video...
        
               | AlienRobot wrote:
               | Yes, the problem is with AI. I'm tired of trying to find
               | X and finding "AI X" instead. I google "pixel art" I get
               | "AI pixel art." I google clipart I get "AI clipart." I go
               | to /r/logodesign to see some cool logo designs, it's 50%
               | people who used ChatGPT asking if it looks good enough.
               | 
               | The only good AI is AI out of my sight.
        
             | giraffe_lady wrote:
             | Are there any valid reasons people might not like this or
             | is it only "regressive stigma?"
        
               | bobthepanda wrote:
               | Humans find lots of value in human effort towards
               | culturally important things.
               | 
               | See: a grandmother's food vs. the industrial equivalent
        
           | namtab00 wrote:
           | > A very cool technology to produce products that nobody
           | wants.
           | 
           | creative power without control is like a rocket with no
           | navigation--sure, you'll launch, but who knows where you'll
           | crash!
        
           | krapp wrote:
           | Yes, it turns out there's more to creating good art than
           | simulating the mechanics and technique of good artists. The
           | human factor actually matters, and that factor can't be
           | extrapolated from the data in the model itself. In essence
           | it's a lossy compression problem.
           | 
           | It is _technically_ interesting, and a lot of what it creates
           | does have its own aesthetic appeal just because of how
           | uncanny it can get, particularly in a photorealistic format.
           | It 's like looking at the product of an alien mind, or an
           | alternate reality. But as an expression of actual human
           | creative potential and directed intent I think it will always
           | fall short of the tools we already have. They require skilled
           | human beings who require paychecks and sustenance and sleep
           | and toilets, and sometimes form unions, and unfortunately
           | _that 's_ the problem AI is being deployed to solve in the
           | hope that "extruded AI art product" is good enough to make a
           | profit from.
        
           | noja wrote:
           | > or a woman walking in front of dozens of reflections
           | 
           | A lot of people will not notice the missing reflections and
           | because of this our gatekeepers to quality will disappear.
        
           | jncfhnb wrote:
           | The problem in your example is that you wouldn't think a
           | picture of a man eating spaghetti taken by a real person
           | would be cool.
           | 
           | You may feel different if it's, say, art assets in your new
           | favorite video game, frames of a show, or supplementary art
           | assets in some sort of media.
        
         | detourdog wrote:
         | The generated artwork will initially displace clipart/stock
         | footage and then illustrators and graphic designers.
         | 
         | The last 2 can have tremendous talent but the society at large
         | isn't that sensitive to the higher quality output.
        
         | doctorpangloss wrote:
         | > but is not productionizable due to a lack of creative
         | control.
         | 
         | It's just a matter of time until some big IP holder makes
         | "productionizable" generative art, no? "Tweaking the output" is
         | just an opinion, and people already ship tons of AAA art with
         | flaws that lacked budget to tweak. How is this going to be any
         | different?
        
           | fwip wrote:
           | No, it's not "just a matter of time." It's an open question
           | whether it's even possible with anything resembling current
           | techniques.
        
       | LoganDark wrote:
       | Seems like this site is getting hugged to death right now
        
         | Bilal_io wrote:
         | I haven't checked, but I think some of the videos on the page
         | might be served directly from the server.
         | 
         | Edit: Wow! they are loaded directly from the server where I
         | assume no cdn is involved. And what's even worse they're not
         | lazy loaded. No wonder why it cannot handle a little bit of
         | traffic.
        
       | numpad0 wrote:
       | 3p mirror:
       | https://megalodon.jp/2024-1010-2132-09/https://theorangeduck...
        
         | Retr0id wrote:
         | Seems like the media files still load from the original domain
        
       | oDot wrote:
       | I spend a lot of my time researching live-action anime[0][1], and
       | there's an important thing to learn from Japanese animators:
       | sometimes an animation style may seem technically lacking, but
       | visually stunning.
       | 
       | When animator Ken Arto was on the Trash Taste podcast he
       | mentioned how Disney had the resources to perfect the animation,
       | while in Japan they had to achieve more with less.
       | 
       | This basically shifts the "what is good animation" discussion in
       | ways that are not as clear from looking at the stats.
       | 
       | [0] https://blog.nestful.app/p/ways-to-use-nestful-outlining-
       | ani...
       | 
       | [1] https://www.youtube.com/watch?v=WiyqBHNNSlo
        
         | oreally wrote:
         | These kinds of perspectives are often found and parroted in
         | perceived 'elite' circles. It's no wonder the author works in
         | Epic Games, a place in which one would need high technical
         | chops to work there.
         | 
         | It's also no wonder why such people get disconnected from some
         | realities on the ground. Sure on paper people do want higher
         | quality things but they don't even know what those are. Most
         | people have low-brow tastes; they'd take a cheaper and well-
         | marketed thing over a 1% improvement.
         | 
         | Japan didn't need to compete on the same ladder for success, it
         | needed to mix various elements of what they're good at to
         | achieve it's own success.
        
           | jncfhnb wrote:
           | Those dumb artists focusing on quality instead of revenue!
        
           | oDot wrote:
           | Exactly right. Sometimes those "higher quality" things may
           | lead to reduced quality, most commonly by reaching the
           | uncanny valley.
           | 
           | Interestingly that does not happen in the opposite direction.
           | When "reducing" certain stats on real footage (which is what
           | live-action anime should do[0]) the uncanny valley is
           | skipped. Maybe it's harder to fall into when going backwards?
           | More research is needed.
           | 
           | BTW, I love your books
           | 
           | [0] https://www.youtube.com/shorts/3ZiBu5Il2eY
        
       | cameron_b wrote:
       | I love the statement in the conclusion.
       | 
       | Curation is something we intrinsically favor over engagement
       | algorithms. Noisy is easy to quantify, but greatness is not.
       | Greatness might have a lag in engagement metrics while folks read
       | or watch the material. It might provoke consideration, instead of
       | reaction.
       | 
       | Often we need seasons of production in order to calibrate our
       | selection criteria, and hopefully this season of booming
       | generation leads to a very rich new opportunity to curate great
       | things to elevate from the noise.
        
         | MichaelZuo wrote:
         | Why is curation relevant to 'greatness'?
         | 
         | By definition 99% of the content produced has to be in the
         | bottom 99 percentiles, in any given year.
         | 
         | Even if the entire world decided everything must be curated,
         | that would just mean the vast vast majority of curators have
         | not-great taste.
         | 
         | Whereas in a future world where 99% of it is driven by
         | algorithms, that would mean the vast majority of curators have
         | 'great' taste.
         | 
         | But this seems entirely orthogonal.
        
       | baruchthescribe wrote:
       | The author did some very cool work with Raylib interpolating
       | between animations to make transitions more natural. I remember
       | being blown away at how realistic it looked from the videos he
       | posted in the Discord. Glad to see he's still pushing the
       | boundaries on what's possible with quality animation. And of
       | course Cello rocks!
        
       | meebob wrote:
       | Something I really enjoyed about this article is that really
       | helps explain a counterintuitive result in hand drawn 2D
       | animation. It's a well known phenomenon in hand drawn 2D
       | animation that naively tracing over live action footage usually
       | results in unconvincing and poor quality animation. The article
       | demonstrates how sampling and even small amounts of noise can
       | make a movement seem unconvincing or jittery- and seeing that, it
       | suddenly helps make sense how something like simple tracing at 12
       | fps would produce bad results, without substantial error
       | correction (which is where traditional wisdom like arcs,
       | simplification etc comes in).
        
         | kderbe wrote:
         | 2D animation traced over live action is called rotoscoping.
         | Many of Disney's animated movies from the Walt Disney era used
         | rotoscoping, so I don't think it's fair to say it results in
         | poor quality.
         | 
         | https://en.wikipedia.org/wiki/List_of_rotoscoped_works#Anima...
        
           | autoexec wrote:
           | Rotoscoping has its place. It can save a lot of time/money
           | for scenes with complex motion and can produce good results,
           | but overreliance on it does tend to produce worse animation
           | since it can end up being constrained to just what was
           | captured on film. Without it, animators are more free to
           | exaggerate certain motions, or manipulate the framerate, or
           | animate things that could never be captured on camera in the
           | first place. That kind of freedom is part of what makes
           | animation such a cool medium. Animation would definitely be
           | much worse off if rotoscoping was all we had.
        
             | tuna74 wrote:
             | "Animation would definitely be much worse off if
             | rotoscoping was all we had." Yeah, then it wouldn't be
             | animation anymore.
        
               | autoexec wrote:
               | I mean, rotoscoping is still animation, but it's just one
               | technique/tool of the trade. I thought it was used well
               | in Undone, and I enjoyed The Case of Hana & Alice
        
           | Isamu wrote:
           | The comment was about naive tracing. When Disney used
           | rotoscoping they had animators draw conforming to a character
           | model on top of the live action pose.
           | 
           | The experienced animator and inbetweeners knew how to produce
           | smooth line motion, and the live action was used for lifelike
           | pose, movement, etc. It wasn't really tracing.
           | 
           | There's examples of this in the Disney animation books, the
           | finished animation looks very different from the live actors,
           | but with the same movement.
        
           | FuriouslyAdrift wrote:
           | A Scanner Darkly is rotoscoped
           | 
           | https://youtu.be/l1-xKcf9Q4s
        
           | engeljohnb wrote:
           | Rotoscoping was utilized for some difficult shots. Mostly
           | live action was used for reference, not directly traced,
           | Fleischer style. I've never seen rotoscoping that looked so
           | masterful as Snow White and similar golden age films.
           | 
           | https://www.youtube.com/watch?v=smqEmTujHP8
        
       | tech_ken wrote:
       | The points about the effects of noise are super interesting. Kind
       | of mind blowing to think about the sensitivity of our perception
       | being so different across visual channels (color, shape,
       | movement, etc).
        
       | djmips wrote:
       | 'Obviously a huge part of this is the error propagation that we
       | get down the joint chain... but"
       | 
       | This shouldn't be glossed over and a proper consideration of the
       | error metric here is key to storing quality animation with fewer
       | bits, lower bandwidth and higher performance.
        
         | doctorpangloss wrote:
         | Fitting joints onto a text-prompted Sora-generated video: could
         | "transformers" not make all this stuff obsolete too? You might
         | need the motion capture data for ground truth to fit joints,
         | but maybe not to generate animation itself.
        
       | doctorpangloss wrote:
       | > The people who are actually trying to build quality content are
       | being forced to sink or swim - optimize for engagement or else be
       | forgotten... There are many people involved in deep learning who
       | are trying very hard to sell you the idea that in this new world
       | of big-data...
       | 
       | It's always easy to talk about "actually trying to build quality
       | content" in the abstract. Your thing, blog post or whatever,
       | doesn't pitch us a game. Where is your quality content?
       | 
       | That said, having opinions is _a_ pitch. A16Z will maybe give you
       | like, $10m for your  "Human Generated Authentic badge" anti-AI
       | company or whatever. Go for it dude, what are you waiting for?
       | Sure it's a lot less than $220m for "Spatial Intelligence." But
       | it's $10m! Just take it!
       | 
       | You can slap your badge onto Fortnite and try to become a
       | household name by shipping someone else's IP. That makes sense to
       | me. Whether you can get there without considering "engagement," I
       | don't know.
        
       | pvillano wrote:
       | Image generation has its own problems with non-cancelling noise.
       | 
       | For example, images are often generated with jpeg artifacts in
       | regions but not globally.
       | 
       | Watermarks are also reproduced.
       | 
       | Some generated images have artifacts from CCD cameras
       | 
       | https://www.eso.org/~ohainaut/ccd/CCD_artifacts.html
       | 
       | Images generated from Google Street View data would likely
       | contain features specific to the cars/cameras used in each
       | country
       | 
       | https://www.geometas.com/metas/categories/google_car/
        
         | doctorpangloss wrote:
         | It seems like such an obvious and surmountable problem though.
         | Indeed since 2020 there are robust approaches to eliminating
         | JPEG artifacts, for example - browse around here -
         | https://openmodeldb.info/.
        
       | nmacias wrote:
       | the shoulder rotation plotted at various frequencies sparked for
       | me: is there an "MP3" of character animation data? The way that
       | we have compression optimized for auditory perception... it feels
       | like we might be missing an open standard for compressing this
       | kind of animation data?
       | 
       | edit: Claude is thinking MP3 could work directly: pack 180Hz
       | animation channels into a higher frequency audio signal with some
       | scheme like Frequency Division / Time Division Multiplexing, or
       | Amplitude Modulation. Boom, high compression with commonplace
       | hardware support.
        
         | cfstras wrote:
         | That same graph had me jump towards the sampling theorem -
         | playing back an animation with linear interpolation creates
         | hard edges, e.g. frequency spikes. I'm not sure if the movement
         | space is comparable to audio here, but I can't see why not.
         | 
         | so; if the sampling theorem applies; having 2x the maximum
         | movement ,,frequency" should be enough to perfectly recreate
         | them, as long as you ,,filter out" any higher frequencies when
         | playing back the animation by using something like fft
         | upscaling (re-sampling) instead of linear or bezier
         | interpolation.
         | 
         | (having written this, I realize that's probably what everyone
         | is doing.)
        
         | xMissingno wrote:
         | I would love to be corrected on this - but my understanding of
         | frequency compression is that you have to decode the entire
         | file before being able to play back the audio. Therefore, in
         | real time applications with limited RAM (video games) you don't
         | want to wait for the entire animation to be decoded before
         | streaming the first frames.
         | 
         | Can anyone think of a system with better time-to-first-frame
         | that achieves good compression?
        
           | nmacias wrote:
           | most audio and video schemes support streaming, in the case
           | of MP3 we are talking about frame-based compression
           | 
           | I guess to restate my curiosity: are things like Animation
           | Pose Compression in Unity or equivalents in other engines
           | remotely as good as audio techniques with hardware support?
           | The main work on this seems to be here and I didn't see any
           | references to audio codecs in the issue history fwiw.
           | https://github.com/nfrechette/acl
        
       | roughly wrote:
       | The author discusses the perceptual allowances for different
       | kinds of inputs (the noise in images, etc), and it's a really
       | interesting point that helps sketch some boundaries around where
       | the LLM/Diffusion model paradigms are useful.
       | 
       | Human color perception is almost entirely comparative - we see
       | something as Blue because within the context of the other objects
       | in a scene and the perceived lighting, the color an object would
       | be that looked the way the object in the scene does is Blue (this
       | is the blue dress phenomenon) - and so noise in images is easy
       | for us to ignore. Similarly, audio and especially speech
       | perception is also very strongly contextually dependent (as
       | attested by the McGurk effect), so we can also deal with a lot of
       | noise or imprecision - in other words, generative guesswork.
       | 
       | Motion, on the other hand, and especially human motion, is
       | something we're exquisitely attentive to - think of how many
       | horror movies convey a character's 'off-ness' by subtle
       | variations in how they move. In this case, the diffusion model's
       | tendency towards guesswork is much, much less easily ignored -
       | our brains are paying tight attention to subtle variations, and
       | anything weird alarms us.
       | 
       | A constant part of the conversation around LLMs, etc. is exactly
       | this level of detail-mindedness (or, the "hallucinations"
       | conversation), and I think that's basically where you're going to
       | land with things like this - where you need actual genuine
       | precision, where there's some proof point on whether or not
       | something is accurate, the generative models are going to be a
       | harder fit, whereas areas where you can get by with "pretty
       | good", they'll be transformative.
       | 
       | (I've said it elsewhere here, but my rule of thumb for the LLMs
       | and generative models is that if a mediocre answer fast moves the
       | needle - basically, if there's more value in speed than precision
       | - the LLMs are a good fit. If not, they're not.)
        
       | BugsJustFindMe wrote:
       | > _one of the highest quality publicly available datasets of
       | motion capture in the graphics community_
       | 
       | > _This data is sampled at 120 Hz, with finger and toe motions_
       | 
       | But when I watch the videos they look like the dancer had palsy
       | affecting their hands or were wearing astronaut gloves, because
       | the fingers barely move for the most part.
        
       | javier_e06 wrote:
       | If one looks at the YODA puppet in The Empire Strikes back, of
       | course, moves like a puppet, but the motion is real. Jerky,
       | emotional, human-like.
       | 
       | One move to The Clone Wars and the CGI moves are mechanic. Maybe
       | the way to go about animation is not on the eye of the beholder
       | but on careful comparison of analog vs digital renderings: Film a
       | human running on analog and pair it pixel by pixel with the
       | digital cgi counterpart.
        
       ___________________________________________________________________
       (page generated 2024-10-10 23:00 UTC)