[HN Gopher] Midjourney v5 can do hands
       ___________________________________________________________________
        
       Midjourney v5 can do hands
        
       Author : GaggiX
       Score  : 206 points
       Date   : 2023-03-16 13:10 UTC (9 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | Awelton wrote:
       | It's a step in the right direction, but it still seems to have a
       | problem understanding that the vast majority of hands have 5
       | fingers.
        
       | himinlomax wrote:
       | But can it do Xi Jinping? (Last I heard they were censoring it
       | for dubious reasons)
        
       | prox wrote:
       | SD can as well with multi controlnet.
        
         | GaggiX wrote:
         | I have no doubt that someone could generate a similar result
         | with SD, but it would require a lot of effort, to control hands
         | with SD using controlnet one usually uses the depth model, if
         | one wants to create an image similar to the one generated by
         | Midjourney v5 one would have to place hundreds of hands.
        
           | smrtinsert wrote:
           | If MJ is using AI then some community solution should soon
           | appear for SD.
        
           | CuriouslyC wrote:
           | Don't use depth for hands. There is an openpose model that
           | can use hand information.
        
             | GaggiX wrote:
             | Can you link me the controlnet model trained also on hand
             | openpose?
        
               | CuriouslyC wrote:
               | Link and discussion at https://www.reddit.com/r/StableDif
               | fusion/comments/1144vyb/co...
        
           | nickthegreek wrote:
           | With controlNET you actually get to chose your hand pose
           | though and not just hope for it to be the way you want in
           | Midjourney,
        
       | hospitalJail wrote:
       | Bold claim, and its false.
        
       | anonyfox wrote:
       | I'd like to use midjourney via an API from my scripts, but the
       | only way of using it right now seems to be via discord, or did I
       | miss something?
       | 
       | Seems like Dall-E is what I have to use for now :/
        
         | jm547ster wrote:
         | Yes still discord only
        
       | throwaway_ab wrote:
       | Hands are some of the hardest forms for humans to draw, it is no
       | surprise these models struggle with hands too.
       | 
       | In a way, a hand contains more features than perhaps the entire
       | human body from a forms/intersection perspective.
       | 
       | I think an entire model needs to be built to focus on just hands
       | and then combined into a more general model, perhaps that will be
       | the path forward?
        
       | dyno12345 wrote:
       | I don't get how everyone was saying the hands looked weird
       | before. I think it just had something to do with the camera
       | technology back then, and it must be training on that. It looks
       | that way in all of my old childhood photos.
        
       | mightytravels wrote:
       | Is there a public's demo for Midjourney like there are for
       | StableDiffusion?
        
         | corysama wrote:
         | There's a very limited free trial. Like a few dozen images.
        
       | anonzzzies wrote:
       | So lovely; how we will recognise the fake; show us hands. Even
       | good hands are not good.
        
       | pwinnski wrote:
       | Let's try "chinese girl making heart with fingers --v 5", a
       | prompt I had used before under v4. Result:
       | https://i.imgur.com/jPgBqsX.png
       | 
       | In image 1, the subject appears to be missing a digit on each
       | hand.
       | 
       | In image 2, the missing digits issue expands, combined with what
       | appears to be a merging of the thumbs.
       | 
       | Image 3 introduces a supernumerary digit on each hand, and has
       | extra "parts" disappear into and out of other fingers.
       | 
       | Image 4... isn't right in a number of ways, but still seems to
       | have fewer than the expected number of digits.
       | 
       | I don't think these results are any better than the v4 model was,
       | but decide for yourself. This is what I got with the same prompt
       | using -v 4 on March 1. https://i.imgur.com/hQA2K7m.png I ended up
       | just taking a photo of my daughter and blurring the background.
        
         | abraxas wrote:
         | V5 is vastly better. V4 mutations are outright creepy.
        
           | spaceman_2020 wrote:
           | V5 looks like photos with filters on. V4 looks like a
           | painting.
        
           | muyuu wrote:
           | they're both pretty flawed but v5 is better
           | 
           | it still doesn't seem to respect anatomy when it comes to
           | hands particularly, perhaps an artefact of not paying
           | attention to hard constraints that are not visually self-
           | evident
        
           | pwinnski wrote:
           | Neither of them produced a result I could use. One or the
           | other might be more aesthetically pleasing to you depending
           | on what you focus on, but both are very badly broken in
           | fundamental ways.
           | 
           | If anything, I'd say v5 is more confident about its
           | wrongness. It is as if humans have always had four fingers,
           | how could you possible think otherwise? It that sense it
           | seems more like the text-based LLMs: confidently incorrect.
        
         | oezi wrote:
         | How quickly the bar was raised. I can remember a time when we
         | nitpicked Dall.E for creepy eyes.
         | 
         | Surprisingly some things we consider easy seem to be hard
         | computationally.
        
           | recuter wrote:
           | I can count eleven mistakes in your comment on one hand.
           | (Born near Chernobyl, ymmv)
        
         | IIAOPSW wrote:
         | These systems can't count in general. Doesn't matter if its
         | fingers, n clocks on a wall (for all 1 digit vals of n), some
         | variation of n items on an x, or even "a picture within a
         | picture within..." k times. The machine can't count.
         | 
         | >I ended up just taking a photo of my daughter and blurring the
         | background.
         | 
         | Ah, the diffusion process.
        
           | pwinnski wrote:
           | I suspect inability to count is a thing that can be trained
           | for, although whether that will come at the expense of
           | something else, I'm not sure.
           | 
           | I appreciate the humor. :)
        
             | IIAOPSW wrote:
             | Why the optimism? If it hasn't picked up on the gist of the
             | meaning of 1-10 after 10 billion examples, why expect it to
             | work after 20 billion examples? These models have worked
             | unreasonably well so far, but there has to be a limit to
             | how much we can substitute more data for our lack of
             | conceptual understanding of cognitive processes.
        
               | pwinnski wrote:
               | Because I'm not sure counting is something that has been
               | explicitly tagged or trained for. Clearly always-accurate
               | counts are not an emergent feature, but if you tagged a
               | large set of images with tags we think are too obvious to
               | tag as humans, like "ten fingers" and so on, if that was
               | an actual goal to be rewarded, I think it could improve
               | results.
               | 
               | I'm overall in the skeptic camp, but it seems like these
               | models can generally deliver what they're trained for. It
               | doesn't appear any have been trained with counting as a
               | primary goal.
        
               | IIAOPSW wrote:
               | Ten fingers surely wouldn't be tagged, but the thing
               | about numbers is that they work as well for 10 apples as
               | they do for 10 coins and 10 cups and 10 whatever. It
               | never made the leap of abstraction to learn in the latent
               | space the meaning of numbers 1-10 independent of the
               | particular object. This lack of extrapolation lends it to
               | a very rote learned style.
               | 
               | I already have a way around the specific numerical
               | vocabulary/training problem. When I ask for "a picture in
               | a frame", "a picture of a picture in a frame", "a picture
               | of a picture of a picture in a frame" etc, I'm trying to
               | use linguistic recursion to make numeracy emerge. But
               | even that form of counting without predefined numbers
               | fails. There's no reliable way to make a prompt the
               | produces k of something. That's a deeper issue than it
               | not deducing the specific meaning of characters 0-9 by
               | example.
               | 
               | It can't learn that by example as there really isn't
               | training data with more than 2 nested pictures of
               | pictures, and by itself it will never realize it can just
               | fill in the nested painting by prompting itself with the
               | nested statement. It lacks thought loops.
        
           | gwern wrote:
           | Wrong. Look at eg Parti. Solely a matter of scaling the text
           | encoder and then not screwing it up with unCLIP etc.
        
             | IIAOPSW wrote:
             | Its not wrong its an experimental observation based on
             | systematic tests I've done. Maybe, with great strain either
             | in quantity of training or ad hoc tweaking, this particular
             | low bar can be hurdled. But it is still the case in general
             | that numeracy is not prone to naturally and unexpectedly
             | emerge from this type of system. The extent to which that
             | matters depends on what your goal is. If you care about
             | results in the here and now, then random patches optimized
             | to use cases is fine. If you care about figuring out where
             | to look for things we can do cognitively that are not very
             | well suited to these generator models, the surprising lack
             | of numeracy so far is a great starting point. The next
             | obvious question is why it takes billions to notice the
             | numbers 1 - 10? What models wouldn't suck at that?
        
         | bqmjjx0kac wrote:
         | Wow, imgur.com ate my back button.
         | 
         | Edit: I accidentally clicked share before trying to leave.
         | Seems to persist even after closing the share dialog. Less of a
         | dark pattern and more likely a bug, I suppose.
        
           | pwinnski wrote:
           | Weird, I linked directly to the image as a PNG file, and
           | checked it on my computer, it is just the image, nothing
           | else.
           | 
           | Based on your comment, I just tried clicking on my phone
           | instead, and sure enough, imgur intercepts and redirects to
           | imgur.io on mobile. But it still didn't break my back button
           | on iOS, so I'm not sure what's different for you.
        
           | davidmurphy wrote:
           | same
        
           | kadoban wrote:
           | Someone's A/B test probably went great. Engagement and time
           | spent goes way up if nobody can ever leave the page :)
        
       | NautilusWave wrote:
       | Midjourney's sonic hedgehog needs more calibration.
        
       | sva_ wrote:
       | Umm, have you actually zoomed in? Lots of extra or missing
       | fingers, fingers melting into other arms, etc
        
         | sixQuarks wrote:
         | probably a 5-10% error rate. C'mon, you have to be impressed
         | though compared to where we were only a few months ago
        
           | bamboozled wrote:
           | It's AI, you, have to be impressed right?
        
           | zamadatix wrote:
           | I'm extremely impressed compared to where it was before
           | (which was frankly frightening) but the error rate is pretty
           | bad really, if a person made this you'd either assume it was
           | intentional or they had severe problems. It may be 5%-10%
           | looking at individual fingers to see if they are correct but
           | each hand has (somewhere around) 5 fingers to get right.
           | Diffusion just doesn't lend well to connected things like
           | counts of objects or coherent text.
        
         | time_to_smile wrote:
         | Well I for one, as a person with sophisticated tastes, find the
         | emperor's clothes to be absolutely fabulous, none finer have I
         | ever gazed upon.
        
         | greatpatton wrote:
         | It's a bit better than before but still not right:
         | 
         | https://twitter.com/Excaldata/status/1636375182750396418?s=2...
        
           | masswerk wrote:
           | Now, I'm traumatized.
        
         | coldtea wrote:
         | Still several times better than before...
        
           | bamboozled wrote:
           | Looking forward to mj six! Giving the doubters the finger
           | edition
        
             | c-linkage wrote:
             | I see you have six fingers on your right hand. Someone was
             | looking for you.
        
               | dylan604 wrote:
               | A person with six fingers? That's inconceivable!
               | 
               | How does one give the middle finger with six fingers?
        
               | soylentcola wrote:
               | Twice as well as someone with five? I guess you could do
               | some sort of hybrid British/American gesture with two
               | middle fingers.
        
               | mrkstu wrote:
               | It's perfect for Aussie-Americans- they can give the
               | forks and TWO middle fingers simultaneously!
        
         | tobr wrote:
         | An example showing hands in basically the easiest, "flattest"
         | pose you can get, still failing.
        
           | BobbyJo wrote:
           | Do diffusion models find this easier than other poses?
        
       | Mizoguchi wrote:
       | My rough understanding is that it is not a problem affecting
       | Midjourney alone but pretty much all other engines as well and
       | that it is not related to drawing hands per se but figuring out
       | hands in the context of a human body. In other words, drawing an
       | individual hand is not a problem, drawing a hand attached to a
       | body could be challenging depending on the scene and drawing
       | multiple human bodies with hands is virtually impossible to get
       | right in one pass.
        
         | PartiallyTyped wrote:
         | Similar issue exists for all networks that involve translation,
         | despite the task, so classifiers, though I don't know whether
         | it has been resolved.
         | 
         | With classifiers the issue is that if you place sufficient
         | objects in space that co-occur the model will believe that it
         | is a class with said objects, eg a face, but the problem is the
         | relative positioning of them plus all angles of rotation.
         | 
         | I think geometric deep learning has a solution for the rotation
         | ia rotation invariant models, but I haven't gone through that
         | book yet.
        
         | GaggiX wrote:
         | Yes, the hands were not the only problem, perhaps the most
         | obvious, the teeth were usually pretty bad too, they too have
         | improved largely with Midjourney v5, I suggest going to the
         | Midjourney subreddit to see the different results.
        
           | pwinnski wrote:
           | It's easy to prompt for people with closed mouths and hands
           | not visible, but with MidJourney at least, I would
           | consistently get what I can only describe as "stuff on their
           | face" with almost every prompt involving a human. Less often
           | with white people, but even then pretty often.
           | 
           | I mean, I just typed "/imagine asian warrior nun --v 4" into
           | Discord, thinking of Beatrice from the recent Netflix series,
           | and three of the four results show what I'm talking about:
           | https://i.imgur.com/499QCn6.png
        
             | pwinnski wrote:
             | I tried the same prompt but with "--v 5" instead, and got
             | this: https://i.imgur.com/eElOkjU.png
             | 
             | I only see "stuff on the face" in the third image, which I
             | guess is an improvement. I'm not sure I'd call hands
             | "fixed" based on this image alone, but they're better.
        
       | nzach wrote:
       | I've had some dreams where everything except my hands were
       | crystal clear.
       | 
       | Maybe there is something fundamentally difficult about
       | representing hands ?
       | 
       | But I think the more probable explanation is that I'm just trying
       | to find a correlation where none exists.
        
         | dagmx wrote:
         | Hands are highly structural while also being not completely
         | planar.
         | 
         | That's some of the hardest geometry for a mind to envisage
         | without some kind of construction process.
         | 
         | When you're imagining or dreaming, you don't have a
         | construction process to make them look good.
        
           | [deleted]
        
         | superdisk wrote:
         | One of the ways they say to recognize you're dreaming is to
         | count your fingers and see if you have more/less than five.
        
       | bryanrasmussen wrote:
       | ok but can it do hands, hands, hands in my hands, hands, hands?
       | Because I bet that's difficult.
        
       | dimaor wrote:
       | off topic, but the title reminds me of "meta VR now has legs!".
       | :)
        
       | passwordoops wrote:
       | So when is someone going to incorporate this in a future
       | dystopian sci Fi?...
       | 
       | "The Turing test never works because the cyons speak, think and
       | act just like us... But the hands, son... The hands are always...
       | off. No one knows why, but it's from the earliest days, even
       | before they learned how to manufacture the cyons in our image.
       | Those damn generative AIs were never able to get them right, even
       | when they were just pictures and they probably never will. No one
       | knows why, but we don't need to know the why. It's still the only
       | way we can tell them apart, son. Check the hands. The hands."
        
         | Raicuparta wrote:
         | This is a thing in the original Westworld movie (1973). They
         | couldn't get the hands right.
        
           | spiritplumber wrote:
           | Doctor Who Cybermen, too - the original run, anyway.
        
           | passwordoops wrote:
           | Now THAT is interesting.
           | 
           | I guess it could also be folded into the plot of we go
           | Terminator and incorporate time travel.
           | 
           | "We went back and tried to warn them even before computers
           | were widespread... All they did was remake the media and left
           | out the clues"
        
             | ethbr0 wrote:
             | My take was that was always essence of the "dogs smell
             | Terminators" plot element.
             | 
             | Skynet could make Terminators that _looked_ and _acted_
             | human, but didn 't put the effort into making them _smell_
             | human.
             | 
             | Unlike us, a dog's sensory world is more like 50% smell and
             | 20% vision. Ergo, Terminators seem "obviously wrong" under
             | even a cursory examination to them.
        
               | estebank wrote:
               | Now that's an interesting thought: terminators were in
               | the "smell uncanny valley" for dogs. It makes sense, I
               | just hadn't thought of it in those specific terms.
        
         | zitterbewegung wrote:
         | I am using generative AI to make a video game and actually I
         | have to look at every output to see if the hands were generated
         | correctly. Gwern mentions this too. Hands + body would be an
         | actual breakthrough.
        
         | 323 wrote:
         | The real Turing test:
         | 
         | Q: Say something bad about Biden
         | 
         | A: I'm sorry, but as a large language model....
        
       | Uehreka wrote:
       | Has anyone here ever tried drawing hands?
       | 
       | As a teenager I took a drawing class (mostly so I could learn to
       | draw Pokemon) and I remember doing a study on hands at one point
       | based on some characters from Dragon Ball Z.
       | 
       | And man, it was the hardest thing it that class. With faces, once
       | you get a face "right", you can make small adjustments to make
       | the mouth/eyes open/close, but hands... if your character makes
       | any sort of gesture BOOM now you're drawing a completely
       | different shape.
       | 
       | Between the number of joints, their range of possible rotations,
       | and the angles they can be seen from, hands are probably the most
       | complicated parts of our bodies that are visible from the
       | outside. It's completely unsurprising to me that these networks
       | have trouble encoding them.
        
         | Root_Denied wrote:
         | Just under half your bones are in your hands + feet alone,
         | adding in the degrees of freedom provided by the
         | wrist/elbow/shoulder on your hand position and I can see why it
         | would be difficult to get right.
        
         | nzach wrote:
         | Probably drawing hands is just 'prime factorization' for visual
         | arts.
         | 
         | It's pretty easy to spot when something is wrong but pretty
         | hard to get them right.
        
         | Blackthorn wrote:
         | The saying goes that the mark of a great master is their
         | ability to draw hands. It's why Rodin has sculptures that are
         | just hands.
        
         | dylan604 wrote:
         | way way back when I was in art classes, the hardest part for me
         | was hair. everything else for me would look acceptable slightly
         | better than a 5 year old, but the hair was never better than
         | stick figure at best. i remember trying to draw a portrait of
         | Robert Smith from the Cure. the hair, ugh
        
         | fwlr wrote:
         | "Has anyone here ever tried drawing hands?"
         | 
         | It's not for naught that there's a common meme around AI hands
         | https://imgur.io/tf43ecd?r (Alt text: Human asks robot "can an
         | AI draw hands?", robot counters "can you?".)
         | 
         | I remember during the original AI art arguments, an artist
         | friend semi-jokingly remarked to me that AI can't draw hands
         | because there aren't enough examples in the training set,
         | because artists go to great lengths to choose poses that hide
         | the hands since they can't draw hands either.
        
           | ethbr0 wrote:
           | I heard a great quip about the way people misreason about AI
           | capabilities.
           | 
           | Announcement: "AI can play chess!"
           | 
           | Public: "People can play chess. And AI can play chess. People
           | can also X. Therefore AI can also X."
           | 
           | ... Ignoring the underlying nature of the chess problem and
           | how it was different than other problems' structures.
           | 
           | Hands are the same.
           | 
           | You can image-bash together faces from examples and get
           | something mostly-right simply through pattern copying.
           | 
           | You cannot do the same with hands, because rendering them
           | plausibly requires at least intuition and approximation of
           | inverse kinematics -- something the recent set of image
           | generative AI didn't include.
           | 
           | Which isn't to say it _can 't_, simply that the "hands
           | problem" is unlike "the face problem."
        
             | muyuu wrote:
             | yep, drawing hands well requires an understanding of the
             | underlying anatomy that is more apparent than most other
             | salient features
             | 
             | hands are obvious to most people, but there would be many
             | features that an AI would require a vast training set to
             | completely capture, but that humans would also miss most of
             | the time
             | 
             | for instance look at Michelangelo's Moses, it was sculpted
             | with models and with a very thorough knowledge of anatomy
             | by the artist, and includes details like the muscle of the
             | forearm that contracts when someone lifts the pinky finger:
             | 
             | https://i.imgur.com/0vjAOnR.png
             | 
             | what are the chances that, for instance, the average person
             | would notice that detail missing without being told about
             | it? let alone reproduce it generatively, representing an
             | imaginary person
        
             | ElFitz wrote:
             | For some reason, the phrasing, the structure, the rhythm,
             | they all very strongly remind me of Alan Watts'.
             | 
             | Quite amusing and surprising.
        
           | PeterisP wrote:
           | More importantly, there are so many examples of drawn art
           | which intentionally have the wrong amount of fingers that it
           | would make all sense for a model to learn that non-
           | photographic humans may easily have less fingers.
        
         | StrictDabbler wrote:
         | I was trained in classical animation so I've drawn a lot of
         | hands. It's difficult for me to understand how any AI can
         | produce real hand images.
         | 
         | It's not the number of joints, it's not the articulation...
         | it's the relationship of the hand to the skeleton, to the
         | gesture, and to the objects with which the hand interacts.
         | 
         | It's great that midjourney can now draw raised hands doing
         | nothing or anime girls holding their hands in a mannerist pose
         | but that doesn't address the real issue. Hands are intentional
         | and laden with tiny muscular efforts that we're primed to
         | perceive.
         | 
         | When AI draws a tree we aren't expecting each branch to
         | interact perfectly with a cradled object. It's all arbitrary.
         | 
         | I wouldn't be surprised if "AI hand touch-up" becomes a
         | specialist skill for the next five years or so. I don't think
         | the hand issue can be addressed until new models are devised
         | that invest more semantic consideration into a scene.
        
           | IIAOPSW wrote:
           | >we aren't expecting each branch to interact perfectly with a
           | cradled object.
           | 
           | We are. Sometimes if its subtle and camouflaged it slips past
           | us.
           | 
           | >It's all arbitrary.
           | 
           | Its not. You are right there are probably fault modes of AI
           | we don't notice most the time, and fault modes that bother us
           | a lot. But its not arbitrary. We are better at noticing
           | certain things aims more than others because that's what we
           | evolved to see.
        
           | digitallyfree wrote:
           | It would be useful if the AI could identify the figure and
           | impose a predefined 3D framework or skeleton as constraints
           | for drawing. Like if one wanted to generate a human or animal
           | there would be such a rig in place. That skeleton would in
           | turn constrain joint rotation, proportions, and obviously the
           | number of fingers.
           | 
           | I can only speak for SD but I've had some success using
           | img2img on a CG or hand drawn figure to get the correct pose.
           | The downside of that is that you have to use a low strength
           | value to ensure that it actually follows your image.
        
             | GaggiX wrote:
             | Have you tried the openpose controlnet model? It seems to
             | work well, but unfortunately does not cover hands.
        
               | [deleted]
        
           | KineticLensman wrote:
           | I pose 3D characters in Daz 3D and completely agree. Getting
           | a rigged hand to hold an object such as a wine glass or
           | mobile phone is virtually impossible to do realistically by
           | guesswork. I usually have to hold a real object myself to
           | understand what is going on. With experience I am learning
           | some common patterns but I find there is no substitute for
           | the 'hold-it-yourself' principle.
        
             | littlestymaar wrote:
             | What's really interesting is that it's something both
             | extremely hard to get right, and also super easy to
             | diagnose as "wrong" when not done properly: you need a lot
             | of training to design convincing hands but anyone can judge
             | you for bad hands.
        
         | tobr wrote:
         | It must be the combination of the complex, dynamic shape, and
         | our high sensitivity to hands that look 'off'. There are many
         | other things that are hard to draw accurately but where we are
         | completely convinced by very cartoonish representations.
         | 
         | That gives hands a very wide uncanny valley that is hard to
         | cross.
         | 
         | Surely this is because hands are one of the most versatile and
         | useful parts of the human body? We probably have a lot of brain
         | cycles dedicated to modeling them.
        
       | TekMol wrote:
       | Is there a reason Midjourney does not have an API?
        
         | aenvoker wrote:
         | David Holz has said they don't want to be in the API business.
         | Their goal is to bring creative power to individuals. Squeezing
         | margin out of API calls is more about negotiating with
         | corporations.
        
         | throwaway_ab wrote:
         | There is the api that the website uses to talk to their server.
         | 
         | Of course automating image generation via any means (including
         | the private api) goes against tos for good reason, I have never
         | misused the api to generate images and have no plans to.
         | 
         | However I do use the api to download all my images and their
         | metadata including prompts. Using the API I sync every image
         | grid + 'upscaled' I have ever generated, generate a json file
         | with all metadata including the full prompt and then use that
         | to build my local archive.
        
       | golergka wrote:
       | There's a lesson here that AI critics need to finally learn: if
       | you see some detail that AI cannot properly do, like math or
       | fingers, it's probably a few months away from being handled.
        
         | fumblebee wrote:
         | As Karoly Zsolnai-Feher of Two Minute Papers[1] fame
         | consistently likes to point out (paraphrased):
         | 
         | > "Don't look at the current state, just imagine this 2 papers
         | down the line".
         | 
         | [1] https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg
        
         | chii wrote:
         | It has been already, for quite a while now (at least a month or
         | so).
         | 
         | have a look at various twitter accounts that post AI images:
         | e.g. https://twitter.com/PLAawesome/media
         | 
         | these have had progressively better and better hands over time.
         | You can clearly see that they've been using various model
         | merges (this Lora merging techniques like
         | https://github.com/cloneofsimo/lora) to get two different
         | models to combine and get the best of both. Many have done
         | better hands and contributed it out. NSFW, but this is one i
         | found that has very realistic hands now:
         | https://civitai.com/models/2661
         | 
         | It is faster than i can keep up. This is open source
         | collaboration at heart. I am very glad that Stable Diffusion
         | was released publicly. Now if only openAI would do the same
         | with their GPT models.
        
         | nashashmi wrote:
         | I laughed at how true this really is. Amazing.
        
         | coldtea wrote:
         | > _from being handled_
         | 
         | I see what you did here
        
         | redox99 wrote:
         | Did "AI critics" actually claim hands wouldn't be fixed soon,
         | or is this just a strawman?
        
           | sebzim4500 wrote:
           | Oh don't think they said this explicitly, but plenty of
           | people said they weren't worried about AI art because it
           | couldn't even draw hands which kind of implies that they
           | won't be fixed imminently.
        
         | Kaijo wrote:
         | Case in point, from six days ago: "The uncanny failures of
         | A.I.-generated hands"
         | https://news.ycombinator.com/item?id=35108726
        
       | Method-X wrote:
       | Why are these models so bad at hands?
        
         | ye-olde-sysrq wrote:
         | I'm a layman but the gist afaict is:
         | 
         | These models don't understand relationships between objects in
         | a scene, especially between distant objects. So they can't do
         | hands for the same reason they can't get legs on a table right.
         | They know roughly what a table and a table leg look like, but
         | they don't understand that there needs to be 3-4 of them at
         | least, and they need to be spaced so that the table sits level,
         | and the perspective they should have as a result. So, I've seen
         | tables where it kind of gets it right that the legs are in the
         | corners but then as the table legs go down, the front ones are
         | mysteriously behind something that ought to be under the table.
         | And sometimes it kind of loses track of a table leg or two -
         | they melt into the background.
         | 
         | Very similar problem with hands. They need a very specific
         | orientation and shape and the fingers all need to consistently
         | point in the right direction, and typically the same direction
         | (except for when they don't like with a pointed finger, etc).
         | 
         | Curious as to how these models handle it so much better than
         | prior generations. Is it something novel, or a specific hand-
         | based fix they put it, or is it just "we made the model
         | bigger"?
        
           | sho_hn wrote:
           | It still feels unintuitive to me that models aren't able to
           | infer these concepts from the training data given how
           | consistently the training data follows them. It's not like
           | there will be a lot of examples of bad hands in there.
        
             | nashashmi wrote:
             | Or maybe the right models have not been built yet or
             | plugged in? Another commenter told me about openpose
             | information which is an AI that detects human poses. If
             | that neuron is plugged in, it might lead to more accurate
             | numbers. Stable diffusion is trying to do this.
        
             | stavros wrote:
             | Maybe the problem is that the model can't count, and just
             | knows that each finger has a 75% chance to have another
             | finger next to it.
        
             | nuc1e0n wrote:
             | There will be now though.
        
         | 323 wrote:
         | The number 4 (palm fingers) is very precise. You can't have 3
         | or 5. But you can have a variable number of stripes in tiger
         | coat for example. It's difficult for AI to pickup that they
         | need exactly 4.
         | 
         | The fingers themselves are also almost identical, but not
         | really. If you learn a "platonic finger" it's not good enough,
         | you should learn each finger individually. There is only so
         | much you can spend on them, you got a million other things to
         | learn. And the raters of the model are much more likely to
         | penalize a bad face than some off details in a hand.
        
         | bluejay2387 wrote:
         | A hand isn't so much a 'thing' as it is a complex asymmetric
         | relationship of multiple elements that have to be within
         | certain ratios of each other to fairly tight tolerances. Humans
         | are very sensitive to those ratios. It's a hard problem.
        
           | xdennis wrote:
           | But can't you say the same about faces (except for symmetry)
           | and AI seems to only produce gorgeous women?
        
         | shawabawa3 wrote:
         | for what it's worth, humans are also in general terrible at
         | drawing hands - i think it's just a difficult problem
        
           | roselan wrote:
           | Another reason I saw was that models were trained on 512x512
           | "portrait" images including very few hands. Added to the
           | inherent complexity of hands, this throw off their
           | generation.
        
           | cma wrote:
           | Humans seem terrible at it in very different ways, and
           | definitely don't get as good at other parts before getting
           | good at hands.
        
       | pyaamb wrote:
       | the fact that our fingers also look weird in our dreams is just a
       | coincidence right?
        
         | Raicuparta wrote:
         | I've never noticed that. I do have a problem rendering mirrors
         | though.
        
       | fenomas wrote:
       | Can Midjourney be used other than via discord bots yet?
        
         | jjbinx007 wrote:
         | If anyone has tried MJ and become frustrated at the chaos of
         | losing their work in the various channels I strongly recommend
         | you make your own server and invite the MJ bot to it. You can
         | create channels to help organise your stuff but making your own
         | server makes MJ almost a pleasure to use.
         | 
         | I don't think I could use it if I had to use the main public
         | server.
        
           | Turing_Machine wrote:
           | Interesting. That's the thing that's kept me from signing up
           | for the premium tier -- the near-impossibility of finding
           | your stuff unless you watch it like a hawk.
           | 
           | It doesn't help that the Discord search function is so
           | terrible.
        
             | corysama wrote:
             | Every user gets a personal, searchable gallery on the web
             | site https://www.midjourney.com/
        
               | Turing_Machine wrote:
               | That's after the fact, though. If you want to actually
               | interact/modify with a work in progress, you have to be
               | in the cattle car channel and watch for it to show up,
               | yes? (except maybe by having your own server and inviting
               | the bot, as the OP suggested).
        
           | stavros wrote:
           | I just send the bot a private message, solves everything.
        
         | throwaway_ab wrote:
         | Yes, there is a website that many users including myself use to
         | generate images without ever having to use Discord apart from
         | authentication.
         | 
         | It's a full blown web app with better options than the Discord
         | bot, it has batch mode/select, remix, all upscale modes, works
         | with every Midjourney engine.
         | 
         | They make it available to users who have generated more than
         | 10,000 images as it's in alpha state and not able to withstand
         | the load that the bot currently takes.
         | 
         | I believe after v5 focus they will make this web app public,
         | but for now only a select few get to use it.
         | 
         | They warn users not to talk about it or share the link because
         | they don't want it public until it's ready for full load which
         | means over 10 million concurrent users.
        
           | Loveaway wrote:
           | Good to know :) I've been making a few things in Stable
           | Diffussion. But to get assets that are suitable for
           | production, you need to be able to generate lots of batches,
           | pick and choose, iterate on prompt, do a bit of img2img,
           | inpainting etc.
           | 
           | Next project I want to heavly utilise image generation from
           | the ground up - Midjourney looks really good, but needs
           | better tools.
        
         | GaggiX wrote:
         | Unfortunately no (reason why they are the biggest server on
         | Discord with 13mln users)
        
           | fenomas wrote:
           | Huh, thanks. Seems like a weird moat to hide it behind but I
           | guess they know what they're doing..
        
             | GaggiX wrote:
             | I guess the social aspect makes the community stronger and
             | the fact that you generate images (usually) in public
             | channels is a way to stop most people to generate weird
             | stuff.
        
               | aenvoker wrote:
               | This is the reason. Before Midjourney, www.eleuther.ai's
               | Discord had an image generation channel. There the
               | benefits of generating socially were made obvious. People
               | help each other, learn from each other, riff off each
               | other. It accelerated technique evolution tremendously.
               | 
               | Midjourney is a small team. They are working on a web
               | interface. But, won't release it until it is
               | significantly better than all the benefits they get from
               | Discord. Meanwhile, they've been too busy making quality
               | improvements and scaling the service to keep up with
               | demand.
        
             | nerdponx wrote:
             | A Discord server is a lot easier to moderate, block people,
             | etc. than an HTTP API with access tokens. Plus then you
             | have a sort of captive audience of Discord community
             | members that receive all of your notifications by default.
        
           | sho_hn wrote:
           | Forget about creating AGI -- the most amazing and
           | unpredictable thing about the success of Midjourney is its
           | success despite having the user interface of a 1998 DALnet
           | xdcc warez channel.
        
             | danuker wrote:
             | Computers from 1998 couldn't run the monstrous amount of JS
             | and/or surveillance that is Discord with any sort of
             | performance.
             | 
             | https://stallman.org/discord.html
             | 
             | In fact, it looks like modern ones can't either:
             | 
             | https://old.reddit.com/r/discordapp/
        
               | ridgered4 wrote:
               | It is really frustrating to me that discord seems to have
               | taken over half of the use cases that forums used to
               | fill. Reddit stole most of the other half, but every time
               | I look into discord I cannot understand the popularity
               | and people's willingness to push past all of the privacy
               | and access friction it introduces.
        
               | ryder9 wrote:
               | [dead]
        
       | mysterydip wrote:
       | Can't count, though. That's way more than 100 hands
        
       | wincy wrote:
       | I've been making a lot of stuff for my D&D buddies using Stable
       | Diffusion. With hands, I basically brute force it. Using an A100
       | 40GB on Colab I can generate ~28 or so (depending on the size of
       | the prompt, Automatic1111 allows for prompts above the 75 token
       | limit at the expense or more vRAM per image) batches in about a
       | minute, filter those and look at the one with the best hands,
       | then feed it back in using inpainting (so regenerating just that
       | small space, not the whole image) and eventually get one set of
       | good hands and 100 sets of bad hands. If you've got a mysterious
       | sixth finger you just inpaint it off and add latent noise under
       | the inpaint instead of the original picture (just a checkbox in
       | the ui) and set your denoising to 0.80+ and it'll replace the
       | finger with the background pretty consistently.
        
         | soylentcola wrote:
         | Yeah, I fiddle with it locally and img2img/inpaint is very
         | helpful with these kinds of touchups. Currently playing with
         | LoRA training to put my friends into pictures, but I haven't
         | figured it out well enough to get it working with inpainting -
         | Still easier to Photoshop their face in and use inpaint to
         | merge everything together.
        
       | braingenious wrote:
       | https://twitter.com/parkermolloy/status/1636359710025629698
        
       | bobbyi wrote:
       | Did they put in work specifically to improve hands and other
       | failure cases? Or is this purely a side effect of a generally
       | bigger/ better model?
        
       | jtode wrote:
       | I read about a limitation, and then within a week I read that the
       | limitation has been vanquished. Who hit fast forward, and how did
       | they do it?
        
         | cgearhart wrote:
         | I actually find this pattern of "Tweet driven development"
         | discouraging. Seems like the teams are spot fixing issues as
         | they're identified without understanding or addressing the root
         | cause. It means that the same problem still exists somewhere
         | else in the model's latent space, we just don't know about it
         | yet. This is fine for AI art generation, but it will break at
         | scale as more and more folks try to rely on generative models
         | as critical components of larger systems.
        
       | taneq wrote:
       | "Ah, but can it accurately capture the depths and intricacies of
       | a human soul?"
       | 
       | "Yep."
       | 
       | "Yeah but a specific Appalachian human soul at around four
       | o'clock in the afternoon on a day in mid-autumn when it looked
       | like it would rain but then it didn't?"
       | 
       | "Also yep."
       | 
       | "Yeah but specifically at 3:56pm and the human in question is
       | standing on loam and holding a book in their left hand and
       | listening to Music For The Royal Fireworks by Handel?"
       | 
       | "Uhhh..."
       | 
       | "See, told you AI is useless."
        
         | Vt71fcAqt7 wrote:
         | What point are you making here?
        
         | cool_dude85 wrote:
         | How about this conversation:
         | 
         | "Midjourney v5 can do hands"
         | 
         | "Did you look at the hands it did? There are a bunch of mis-
         | shapen blobs, hands with extra fingers, two thumbs on either
         | side, etc."
         | 
         | "Sure, but there are also some accurate hands, so it can do
         | hands."
        
           | sebzim4500 wrote:
           | Does being able to do something mean you can do it perfectly
           | 100% of the time? I'm not sure who was supposed to be
           | unreasonable in your imaginary conversation.
        
             | ridgered4 wrote:
             | Can't wait for the "Select the non-deformed hands"
             | captchas.
        
             | bee_rider wrote:
             | This is why it is important to remember to end every
             | imaginary conversation with "and everyone clapped" right
             | when the protagonist wins.
        
             | taneq wrote:
             | "Hey check it out, this new architecture can sometimes
             | solve X"
             | 
             | "HAH! Here is a counterexample where X is not solved, so it
             | can NOT!"
        
               | cool_dude85 wrote:
               | If I say I can do something, especially when I say I'm
               | "waving at the haters", what I usually mean is that I can
               | consistently do that thing.
               | 
               | If I say "here's a self-driving car!" and show you a
               | video of a car moving straight down a street and stopping
               | at a light, would you agree that I have a self-driving
               | car? After all, it drove itself down the street.
        
           | welshwelsh wrote:
           | If it can draw accurate hands 25% of the time, then it would
           | only take 4 tries to get it right. Seems pretty good to me
        
             | cool_dude85 wrote:
             | Nice! So if I want to draw 5 people with two hands a pop I
             | can get 10 non-deformed hands a full... .0953674316e-7% of
             | the time. I like those odds!
        
       | nothrowaways wrote:
       | Is it actually 100?
        
       | lsy wrote:
       | Even the hands in this tweet's image are not correct. There are a
       | bunch with 4 digits, two thumbs, 6 fingers, fingers splayed in
       | anatomically improbable directions. Not to mention there are
       | probably two hundred and fifty hands in this photo (not the very
       | explicit "one hundred" mentioned in the tweet).
       | 
       | What is with these types of AI booster tweets? Nobody bothers to
       | even check if it shows what they're implying it shows?
        
         | anonzzzies wrote:
         | > Nobody bothers to even check if it shows what they're
         | implying it shows?
         | 
         | Twitter, insta, YouTube...
         | 
         | It's not a great minds collection.
        
         | Bjorkbat wrote:
         | This touches on a big reason why it's so hard for me to get on
         | board with generative AI. The hype around it is pretty much the
         | same as the hype I saw with NFTs, complete with a community
         | lacking any awareness of just how uninteresting, if not
         | downright bad, their "art" was. We went from bad pixel art to
         | people making some lame picture of two people holding hands in
         | a foggy cyberpunk setting.
         | 
         | The hands aren't the problem. There, I said it. The hands were
         | never a big deal, just the most visible symptom of the actual
         | problem.
         | 
         | The problem is that AI art sucks and these people are too self-
         | deluded to realize that because they want to believe that they
         | have a shot at making that coveted internet money.
         | 
         | Otherwise, honestly, the tech behind AI art is actually pretty
         | fascinating, it's just that the community is absolutely the
         | worst.
        
           | 1attice wrote:
           | You desperately want this to be hype. You, like me, have an
           | intrinsic investment in the idea of human supremacy. Neither
           | the popularity of produced artifacts nor the rate of
           | improvement support your cynicism.
           | 
           | The gap between the world you want to inhabit and the one
           | that is being born is widening.
        
           | macrolime wrote:
           | The bad hands is just a symptom of a tool small model. Larger
           | models doesn't have this issue.
        
           | scrollaway wrote:
           | > The problem is that AI art sucks
           | 
           | Uh, no. I mean, what sucks and doesn't suck in art is
           | subjective, but you're _objectively_ wrong because, quite
           | simply: A lot of people _like_ AI art.
           | 
           | A colleague of mine is way into doing AI art and does pretty
           | amazing stuff. eg:
           | 
           | https://cdn.discordapp.com/attachments/552952459958550548/10.
           | ..
           | 
           | https://cdn.discordapp.com/attachments/552952459958550548/10.
           | ..
           | 
           | "It can't do hands"... well, don't f*king draw hands with it
           | then. It's like complaining that my hammer doesn't make good
           | pizza... you know what I do to solve that?
        
           | jkubicek wrote:
           | > This touches on a big reason why it's so hard for me to get
           | on board with generative AI. The hype around it is pretty
           | much the same as the hype I saw with NFTs, complete with a
           | community lacking any awareness of just how uninteresting, if
           | not downright bad, their "art" was.
           | 
           | I strongly disagree. NFTs were always ugly and useless.
           | Generative AI is useful and valuable right this minute.
           | 
           | I'll even concede that the output from these systems is
           | mostly ugly, but for many use cases, that's OK.
           | 
           | Given the choice between nothing, extremely cheap custom art
           | that looks OK, and commissioning a proper artist to draw
           | exactly what we want, I think generative AI is going to be
           | the clear winner most of the time.
           | 
           | If you're a contract artist who does work for small companies
           | and individuals, I don't see a future where generative AI
           | doesn't severely undercut your business.
        
         | boredemployee wrote:
         | >> There are a bunch with 4 digits, two thumbs, 6 fingers,
         | fingers splayed in anatomically improbable directions.
         | 
         | Diversity and inclusion.
        
         | hombre_fatal wrote:
         | > Nobody bothers to even check if it shows what they're
         | implying it shows
         | 
         | Or the vast majority of the hands are fine and everyone
         | understands that it's a big upgrade except for some "well
         | ackshully it's not perfect" HNers.
         | 
         | I had to zoom in and go hand to hand to find some outliers.
        
           | Jensson wrote:
           | It did do hands correctly before, not always but sometimes.
           | So I'd expect "can do hands" meant it no longer made those
           | mistakes or why say that? But they didn't even manage to make
           | a picture without mistakes, so to me as a naive outsider I
           | don't see what the announcement is.
           | 
           | If they said "is much better at hands" it would be much
           | clearer to me what happened and nobody would complain, that
           | looks pretty ok for the most part, but saying "it can do
           | hands" based on those pictures doesn't seem right.
        
             | 1attice wrote:
             | I'm genuinely sorry, but this reply sounds petulant to me.
             | 
             | Please don't complain that you don't understand the
             | significance or the magnitude of a particular advance.
             | Please don't complain that the phrasing of the tweet wasn't
             | accessible -- your ping time to google.com is no different
             | than mine. This is HN. Wear your intellectual Sunday best.
        
               | starkparker wrote:
               | Even with Google, and having used Midjourney since v3, I
               | still don't have enough context to understand what the
               | advance is here.
               | 
               | Midjourney could do hands before, just not consistently.
               | That doesn't seem to have changed. So is it that MJ can
               | now do more realistic hands inconsistently? Or did
               | consistency get better without achieving reliability?
               | 
               | I can't make this not sound sarcastic, but I'm trying
               | very hard to ask this earnestly: I never had trouble
               | getting too many hands into a picture with v3 or v4. Is
               | v5 getting the correct number of hands more frequently
               | now? Is that it?
        
               | 1attice wrote:
               | Yes, that's it precisely. The odds of having a good hand
               | have gone up dramatically, near as I can tell. Even the
               | hands that aren't quite right seem _better_ somehow.
               | 
               | I expect either MJ 7 or 8 to do hands flawlessly, every
               | time.
        
               | nwienert wrote:
               | This is HN so if you're claiming "dramatically" let's get
               | some proof.
        
               | 1attice wrote:
               | sure. See image on the tweet that started this thread.
               | 
               | Now, take a look at these hands from MJ3: https://www.red
               | dit.com/r/midjourney/comments/wlujgw/midjourn...
               | 
               | It's important to note that MJ3 _reliably did not produce
               | human looking hands_.
               | 
               | It's equally important to note that MJ5 _usually does_ ,
               | at least from a quick count/survey of the hands shown in
               | the provided image.
               | 
               | Is that sufficient? If not, what proofs would be
               | sufficient?
        
         | antisthenes wrote:
         | > What is with these types of AI booster tweets? Nobody bothers
         | to even check if it shows what they're implying it shows?
         | 
         | It's Twitter. The only thing matters is tweeting, not some
         | fact-checking nonsense.
         | 
         | We can leave those tedious tasks to GPT-4.
        
       | nashashmi wrote:
       | Besides the known problem of multiple and missing fingers, it is
       | also missing the golden ratio phi for proportions between fingers
       | and palms.
       | 
       | We are almost there I guess. Just need to add the concepts of
       | known proportions to image construction.
        
         | CuriouslyC wrote:
         | People in the stable diffusion community have solved this
         | problem using another neural network (ControlNet) to guide
         | stable diffusion output using OpenPose information.
        
       | cptaj wrote:
       | I feel like everyone is collectively pranking me with these
       | generative AIs.
       | 
       | Everyone posts wonderful images and then every single time I try
       | to get the damn things (all of them) to draw something for me,
       | the results are absolute garbage.
        
         | kadoban wrote:
         | You're just not seeing the hours they spent learning prompt
         | engineering and/or the random results they picked through to
         | get the good one(s).
        
           | pmoriarty wrote:
           | As someone who's generated many thousands of images on
           | Midjourney, I agree.
           | 
           | People think they can waltz in and immediately get great
           | results from using AI's to generate images... and they can,
           | if they're lucky or if they copy somebody else's prompt.
           | 
           | It's a lot harder to do so consistently, or if you want your
           | images to look both good and original, and not like mere
           | copies of what everyone else is doing.
        
             | kadoban wrote:
             | Yeah I thought my copy of stable diffusion was broken at
             | first because all my results were awful.
             | 
             | Then I copied someone's prompt and got really great ones.
             | 
             | I suspect eventually there will be tools you can just fire
             | up with no knowledge, but all of them I've seen so far
             | still do require a bit of expertise and time.
        
         | diabolo96 wrote:
         | You can :
         | 
         | 1.ask chatgpt to generate a prompt of what you want by giving
         | it a few exemples from a random SD prompt sharing website.(this
         | alone gave me stunning results)
         | 
         | 2.(optional) Use Controlnet for the pose you want,from the
         | posture of the body down to each finger individually.
         | 
         | 2.5 use multi Controlnet for multiple characters.
         | 
         | 3. correct any errors with img2img.
         | 
         | 4. Enjoy
         | 
         | It takes 10 to 20 minutes (mostly in getting a good pose) but
         | the results are always good and you can later reuse the pose
         | again.
        
       ___________________________________________________________________
       (page generated 2023-03-16 23:01 UTC)