[HN Gopher] TheseToonsDoNotExist: StyleGAN2-ADA trained on CG ca...
       ___________________________________________________________________
        
       TheseToonsDoNotExist: StyleGAN2-ADA trained on CG cartoon faces
        
       Author : codetrotter
       Score  : 107 points
       Date   : 2021-09-25 14:33 UTC (8 hours ago)
        
 (HTM) web link (www.thesetoonsdonotexist.com)
 (TXT) w3m dump (www.thesetoonsdonotexist.com)
        
       | andreygrehov wrote:
       | The other way to achieve similar result is to load a face from
       | ThisPersonDoesNotExist.com [1] and then pipe it through
       | Toonify.Photos [2]. Give it a label from ThisWordDoesNotExist.com
       | [3] and here you go - you've got a character :)
       | 
       | Edit: wire it up with Stripe and sell characters to Pixar? Ha!
       | 
       | [1] https://news.ycombinator.com/item?id=19144280
       | 
       | [2] https://news.ycombinator.com/item?id=24494377
       | 
       | [3] https://news.ycombinator.com/item?id=23169962
        
       | hn_throwaway_99 wrote:
       | Most of the toons I got generally look like normal characters but
       | with a mild genetic defect: a lazy eye, potentially concerning
       | lump on a cheek, etc.
       | 
       | Edit: actually, yeah, after looking at more examples nearly every
       | one has some amount of cross-eye/focus disorder where both eyes
       | aren't pointing in the same direction.
        
         | techrat wrote:
         | Like with thispersondoesnotexist, the toons also tend to have
         | severe problems with the ears. Nearly every one of them has an
         | ear that blends into the background or is disproportionately
         | unmatched.
        
       | arketyp wrote:
       | StyleGAN interpolations of latent space are mindblowing but they
       | are fundamentally superficial, which you can at times tell and
       | more often get bored of. Instead, I would like to see a
       | transformer network trained on ontogenetic/phylogenetic
       | development/evolution material which could then generate new
       | creatures. The representations could be abstract, identified by
       | key topological properties for instance, which could then be used
       | to do whatever artistic renderings with. Of course, in the end,
       | perhaps the truest abstract representation would be genes and
       | proteins.
        
         | high_byte wrote:
         | lol. good idea but genes only get you so far. environment is
         | where you get those proteins so that means to stimulate the
         | world. ;)
         | 
         | just to get lots of retarded-baby monkey-fish-frogs.
        
         | qayxc wrote:
         | Unfortunately mainstream ML research is obsessed with end-to-
         | end solutions, so a lot of interesting ideas fall by the
         | wayside.
         | 
         | A rule-based system combined with a Transformer and CV-based
         | postprocessing to filter the most plausible and interesting
         | results would be awesome.
        
       | backspace_ wrote:
       | It's like they took disney, Pixar, and DreamWorks and just mashed
       | it all together.
        
       | dqpb wrote:
       | This would be much more useful if it generated a model.
        
         | ravenstine wrote:
         | I don't see why that would be particularly difficult to
         | accomplish. A dataset made up of 3D assets would actually give
         | an algorithm more information to work with. The question is
         | whether it could generate a _usable_ model and not an Eldrich
         | horror of disconnected triangles and non-manifold geometry.
         | 
         | The easiest win would probably be to have an algorithm pick
         | between predetermined types of assets (heads, appendages,
         | clothes, etc), reshape them without actually adding new
         | geometry, and then doing essentially what the linked page does
         | with skins and shaders.
        
           | barbecue_sauce wrote:
           | At that point, you might as well just make a shape-key based
           | character creator.
        
       | codegladiator wrote:
       | pixar
        
       | stared wrote:
       | I would love to see character names and synopsis (e.g. with GPT)
       | - vide https://www.thiswaifudoesnotexist.net/.
        
       | aasasd wrote:
       | Yeah, the 'DreamWorks face' is strong with this one:
       | https://filmschoolrejects.com/wp-content/uploads/2017/03/dre...
       | 
       | Though exhibited more as some kind of a confused smirk. Which is
       | of course also ubiquitous in cartoons for some reason.
        
       | hi5dev wrote:
       | Wow, the app powering this requires a minimum of at least 1 high
       | end NVIDIA GPU with at least 12GB of GPU memory with 8 GPUs
       | recommended. All to generate some cartoon faces using AI. At
       | least for that website. Dang.
        
         | gwern wrote:
         | I think that's just the training requirements. For simple TXDNE
         | sites, you don't need so much as a single GPU as you can
         | pregenerate them all.
        
           | hi5dev wrote:
           | Makes sense.
           | 
           | I was going to set it up to mess around with until I saw the
           | requirements.
        
       | [deleted]
        
       | diskzero wrote:
       | What would be really interesting, and possibly impossible to do,
       | would be to get all the character designs that were rejected
       | during the development process at the various studios. As has
       | been pointed out in other comments, the GAN is generating some
       | pretty boring variations of faces from source material that has
       | been focus grouped and designed by committee to death before
       | being released to the public.
       | 
       | I have seen some really cool and wacky designs in the hallways of
       | DreamWorks, Blue Sky, Pixar, etc. during my time in the industry.
       | I would love to get all of those designs into a training set as
       | well.
        
       | aaaaaaaaaaab wrote:
       | Yuck... So this pixar 3D crap is what's called "cartoons"
       | nowadays.
        
         | peter-m80 wrote:
         | Is Pixar 3D crap?
        
           | aaaaaaaaaaab wrote:
           | Compared to real handdrawn cartoons? Yes. Soulless, uniform,
           | factory-produced crap.
        
             | krapp wrote:
             | >factory-produced crap.
             | 
             | Wait until you find out how hand-drawn animation is made.
        
               | vlunkr wrote:
               | A quick Google shows that hand drawn animation has been
               | outsourced since the 60s.
        
               | krapp wrote:
               | Yes. Mostly to high-volume animation sweatshops, aka
               | "soulless, uniform factory production."
        
             | steve_adams_86 wrote:
             | From a technical and artistic perspective, I can't think of
             | much to criticize. I think of us as fortunate to have such
             | incredible accomplishments in story telling and
             | presentation.
             | 
             | There are modern 3d movies I particularly dislike and many
             | I don't like or love. I just don't agree that the medium is
             | innately lacking.
        
             | ChrisClark wrote:
             | Now get off my lawn!
        
             | codetrotter wrote:
             | Did someone say factory-produced? I love the classical
             | animations too, but you are aware of this right?:
             | https://youtu.be/hjmaOj3_sKk
             | 
             | :^)
        
               | mongol wrote:
               | Looks like it was mainly Robin Hood that was taking the
               | cheap way out. And Aristocats? I think that was during a
               | low point of Disney animations.
        
               | codetrotter wrote:
               | Here's a comprehensive overview, including both the full
               | length movies that you saw in that YouTube video and a
               | lot of shorter animations that were reused.
               | 
               | https://disney.fandom.com/wiki/List_of_recycled_animation
               | _in...
               | 
               | Definitely something they did quite a bit more than just
               | a couple of times.
               | 
               | Not saying there's anything "wrong" with that per se btw.
               | Just found it relevant to the discussion about comparing
               | modern animation to factory output.
        
         | colordrops wrote:
         | Yeah, it's all the same look. Not much experimentation or
         | creativity at all when it comes to character design, at least
         | for human characters.
        
         | ergot_vacation wrote:
         | Well, really "Pixar." Pixar at least knew how leverage their
         | weird style; everyone else ran with it and made it god awful.
         | But my thought exactly. Why even do this with something that's
         | so offensively awful to look at?
         | 
         | That gripe aside, if you're just training on a bunch of
         | headshots and generating new ones, it's been done, over and
         | over at this point. Want to impress? Figure out how to generate
         | a full sequence of coherent animation frames.
        
       | joshspankit wrote:
       | "Do all current movie 'toons' have one eyebrow up?"
       | 
       | I was suddenly struck by this question, and think there might be
       | something to it. Clearly it was a standard feature of the
       | training set.
        
         | slavik81 wrote:
         | It seems to be generating a lot of characters with "DreamWorks
         | Face".
         | https://tvtropes.org/pmwiki/pmwiki.php/Main/DreamworksFace
        
           | joshspankit wrote:
           | Solid reference. Thank you
        
       | tomtimtall wrote:
       | This much more clearly displays the problem with all these "this
       | service does not exist" because even though it doesn't show the
       | source material it's abundantly clear that these are just simple
       | stitches of the training toons.
       | 
       | Like you get Elsa with different hair or the up grandpa in a
       | suit.
       | 
       | Once you compare them to the closest examples from the training
       | date it becomes a lot less impressive than the implied "this face
       | is completely out of the imagination of a AI model" turns out the
       | model just imagined someone in the training set with the hair of
       | someone else. Quite boring.
        
         | the8472 wrote:
         | > it's abundantly clear that these are just simple stitches of
         | the training toons.
         | 
         | That's not how GANs work in general. With a limited training
         | set the end result may appear _as if_ they were stitched
         | together samples, but that 's not what happens under the hood.
         | 
         | Interpolation videos make it obvious that it encodes visual
         | concepts, can freely manipulate them and even crank up
         | parameters beyond anything found in the training set, thus
         | giving exaggerated results.
        
         | exit wrote:
         | i think the results shown in this paper contradict your
         | assertion:
         | 
         | https://openaccess.thecvf.com/content_ICCV_2019/papers/Abdal...
         | 
         | given an arbitrary face, we can find its embedding in the
         | latent space of the model. this shows that the model has the
         | potential to generalise to real but unseen examples?
         | 
         | on the other hand, i suspect you might be observing a bias in
         | the structuring of the latent space.
         | 
         | thispersondoesnotexist.com likely samples the latent space with
         | a gaussian or uniform distribution, and while the latent space
         | may contain the full spectrum of possibilities, the density of
         | semantically meaningful embeddings may be structured around the
         | distribution of the training set rather than a uniform or
         | gaussian.
         | 
         | i'm stretching my understanding of the topic in trying to
         | convey this.
        
         | gfodor wrote:
         | How do you know this is the case for all of them?
        
           | qayxc wrote:
           | A good indicator is that when I clicked on the "more..."
           | button, I _instantly_ recognised copied featured (eyes, face
           | shape, nose, hair) from The Incredibles and Frozen in 3 out
           | of the 4 samples just mashed together.
           | 
           | This shouldn't be the case unless you start actively looking
           | for it.
           | 
           | It's just much easier to recognise with these cartoon
           | characters than with realistic faces as there's naturally
           | much less variety in the training material. Also the features
           | are simplified to a point of being easily recognisable as
           | well.
        
         | version_five wrote:
         | As others have said, that's not the way a GAN should work.
         | Regurgitating the training set is basically a failure mode that
         | is actively avoided when the models are build and trained.
         | 
         | Looking at these images, and not familiar with how the
         | underlying CG training set is made, I wonder if the original
         | series itself has some comparatively small set of latent
         | features - dimensions you could adjust when drawing the faces -
         | that the model is just learning, so that newly generated faces
         | are effectively the same thing as if one had changed whatever
         | setting you tweak when working with the underlying tool.
        
         | Tenoke wrote:
         | I see what you mean but this is definitely not universal about
         | stylegan and it depends on factors such as size of the training
         | set (I'm guessing it was smaller here) and training parameters.
        
           | godelski wrote:
           | Honestly this seems to be common for GANs in general. Though
           | I don't think most people have looked through CelebA. But if
           | you are lazier, you can scroll through thispersondoesnotexist
           | and you'll find essentially celebrities with similar
           | characteristics to what the OP is saying. More so, you
           | actually see better quality images the closer to a celebrity
           | they look (you see the same thing in the tune version here).
           | I do think ADA is typically worse than the typical StyleGAN2,
           | but that's the tradeoff you get with a smaller sample size
           | (worse because people are training it on smaller datasets so
           | more memorization).
        
             | Tenoke wrote:
             | I believe thispersondoesnotexist is also trained on FFHQ
             | not just CelebA though.
        
       | Pxtl wrote:
       | It's really easy to spot the training references in these - lots
       | of Wreck It Ralph and Big Hero Six characters with slightly
       | reshaped face and hair.
       | 
       | Edit: whenever it gets inspired by Mr Incredible the result ends
       | up looking like Conan O'Brien.
        
       | sam-2727 wrote:
       | This is interesting -- I can almost recognize the various Pixar
       | characters these are influenced by. For instance, one has the
       | distinct jaw of the old man from "up".
        
         | nhinck wrote:
         | I always did wonder how much StyleGAN was compositing existing
         | features rather than generating wholly distinct features.
         | 
         | With real human faces it's almost impossible to tell but with
         | these you can definitely pick a character per feature.
        
         | mysterydip wrote:
         | Begs the question at what point does ownership change. Can I
         | trace batman but change his hair color? what about just his
         | face? or just his mouth? Can I cut and paste a bunch of
         | superheroes together and claim it as my own? etc
        
           | zokier wrote:
           | I guess that's the big debate around GitHub Copilot and open
           | source code, especially GPL (but also others).
        
           | high_byte wrote:
           | I think that would depend on the license of each image on the
           | dataset that influenced the output image. maybe not but it
           | would make sense to be this way
        
           | routerl wrote:
           | This will certainly be the crux of intellectual property
           | lawsuits in the future.
           | 
           | It's striking that some of these examples have distinct
           | features from specific, identifiable datasets: we can
           | occasionally recognize specific characters (the old man from
           | Pixar's UP is getting mentioned a lot), but it also
           | reproduces more general aesthetic patterns. Even when I can't
           | recognize the source data, I can distinctly see in some of
           | these faces "the Pixar look", and in others "the DreamWorks
           | look".
           | 
           | Were I an IP lawyer, I would start thinking of arguments
           | along the lines of "this technology simply obfuscates the
           | source of plagiarisms". I would also start to think about
           | trying to force anyone who uses this technology to disclose
           | the sources of their training data, since a model trained
           | largely on "the Pixar look" could be benefiting from Pixar's
           | character design processes without having to hire any of
           | Pixar's artists.
           | 
           | And, if I were philosophically inclined, I would also start
           | thinking about how this is any different from hiring a random
           | artist and instructing them to "design characters that look
           | like Pixar characters".
           | 
           | I suspect that one key difference is that the human artist's
           | success can't easily be measured, but the GAN's success can
           | very easily be measured.
        
           | amelius wrote:
           | At some point big producers like Pixar will probably use
           | something like StyleGAN to extend their copyright coverage.
           | E.g. generate as many variations as they can, which then all
           | fall under their own copyright.
           | 
           | So in the end this technology might not be as "liberating" as
           | people think it is.
        
             | godelski wrote:
             | Wouldn't it make more sense to use a density based model
             | and then describe some hull centered around your original
             | creation?
        
             | echelon wrote:
             | Not if a tech firm gets there first. Imagine the productive
             | and legal power being in the algorithm.
             | 
             | In a way, this is a much better setup for artists and
             | creatives. There isn't some giant licensing firm
             | controlling your work. You simply buy or rent the best
             | tools to make your work.
             | 
             | That said, it'll only be good for creatives and consumers
             | if there is sufficient competition. And open source
             | equivalents that still enable creation.
        
           | itronitron wrote:
           | The rights surrounding caricatures may provide some insight
           | here. I know some celebrities are particularly vigilant about
           | keeping unapproved photos of their faces out of circulation
           | but they probably wouldn't have the same success with a hand-
           | drawn likeness or caricature.
        
       | [deleted]
        
       | ghoomketu wrote:
       | Really interesting. Speaking as a total noob, if somebody wanted
       | to make thispersondoesnotexist type of program, e.g. this site:
       | 
       | 1) how much approx cost will it be to rent the servers to train
       | such models?
       | 
       | 2) Can this be done on a home computer running a $1k nvidia card
       | in a reasonable time?
       | 
       | 3) Can i use free tools like google colab(?) for this purpose?
       | 
       | I've always been interested in learning more about this field but
       | haven't really bothered because I feel it would cost an arm and
       | leg just to experiment. Can somebody please shed some light on
       | this?
        
         | Tenoke wrote:
         | For yourself you can use colab. For serving it as a site single
         | $1k gpu is fine but depends on traffic.
        
         | qayxc wrote:
         | 1) That depends entirely on the model in question (size,
         | complexity) and the amount of training material.
         | 
         | Using a standard dataset like CelebA [1] and an "HQ" model
         | (512x512) like StyleGAN2, training requires at least 1 GPU with
         | 12GiB of VRAM and training of about a week with a single V100
         | GPU.
         | 
         | Depending on your provider of choice, this will cost anywhere
         | from ~$514 (AWS), ~$420 (Google) to $210 (Lambda Labs, RTX 6000
         | - should be in the same ballpark).
         | 
         | If your training process is interruptible and can be resumed at
         | any time (most training scripts support this), costs will drop
         | _dramatically_ for AWS and Google (think $50 to $200).
         | 
         | 2) Yes. A used ~$200 Tesla K80 will do. Alternatively any
         | NVIDIA card with at least 8 GiB of VRAM is capable of doing the
         | job, but lower batch sizes and increased training time are to
         | be expected. If you can use a dedicated machine with an RTX
         | 3060 or a brand new A4000 (if you're willing to pay the
         | premium), close to a week of training time can be achieved.
         | 
         | 3) Yes*
         | 
         | *your work will be freely available to everyone and your
         | training process is limited to 12h or so per day.
         | 
         | All in all I wouldn't recommend training a StyleGAN model from
         | scratch anyway. Finetuning a pretrained model using your own
         | dataset can be done much more quickly (think hours to a day or
         | two) and on consumer-level hardware (I train my models on an
         | old desktop with a GTX 1070).
         | 
         | [1] http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
        
           | codetrotter wrote:
           | > Finetuning a pretrained model using your own dataset can be
           | done much more quickly (think hours to a day or two) and on
           | consumer-level hardware (I train my models on an old desktop
           | with a GTX 1070).
           | 
           | This is interesting! Do you have some links about doing that?
           | 
           | My desktop computer has a GTX 1060 with 6 GB of VRAM. But
           | hopefully I can use it for something like this.
           | 
           | I've only used Google Colab in the past, and only tried stuff
           | with prompting existing models.
           | 
           | Would love to experiment a bit with fine-tuning models on my
           | own datasets to get some kind of unique stuff.
        
             | godelski wrote:
             | It is the same transfer learning you would do with any
             | model. StyleGAN (and StyleGAN2-ADA) provide pretrained
             | weights for you. Just start there and train on the new
             | dataset. The ADA github even has tools to format your
             | dataset correctly.
        
       | [deleted]
        
       | 01100011 wrote:
       | Now train it to generate cartoons similar to a human face or a
       | dog and take in the $$$.
        
         | kortex wrote:
         | Pixarize-my-pets? Sign me up!
        
         | corysama wrote:
         | https://linktr.ee/voilaaiartist Mobile app does a good job with
         | people. I'd be surprised if it works on dogs.
        
       | kortex wrote:
       | I like how these faces exhibit many of the usual face gan quirks:
       | asymmetric ears, eyes with slight nystagmus, background is some
       | abstract blurry surreal mosaic. At least they get the hair pretty
       | darn good? Probably because the hair is more regular to begin
       | with.
        
       | peanut_worm wrote:
       | Most of these look like warped versions of existing characters.
       | Maybe the dataset is too small ot something.
        
         | qayxc wrote:
         | > Maybe the dataset is too small ot something.
         | 
         | Yeah, I think that might be the problem here. While there's
         | plenty of material for real human faces, there's only so many
         | high quality 3D cartoon characters.
        
       | ogurechny wrote:
       | Way too boring, and has, like, 5 reference faces. I expected a
       | crazy mix of all possible cartoon styles similar to
       | thisanimedoesnotexist.
        
         | gwern wrote:
         | TADNE is trained on n ~2.8m source images (augmented to ~4m),
         | covering k ~ tens of thousands of different characters. OP is
         | trained on... I'm not sure because the site is devoid of any
         | information, but I would guess it's closer to 10k images from a
         | few hundred characters at most. So the diversity will be
         | drastically reduced, although the use of -ADA should mean that
         | it doesn't overfit as catastrophically as one would expect from
         | previous GANs to such small n/k.
        
       ___________________________________________________________________
       (page generated 2021-09-25 23:01 UTC)