[HN Gopher] Midjourney v5 can do hands
___________________________________________________________________
Midjourney v5 can do hands
Author : GaggiX
Score : 206 points
Date : 2023-03-16 13:10 UTC (9 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| Awelton wrote:
| It's a step in the right direction, but it still seems to have a
| problem understanding that the vast majority of hands have 5
| fingers.
| himinlomax wrote:
| But can it do Xi Jinping? (Last I heard they were censoring it
| for dubious reasons)
| prox wrote:
| SD can as well with multi controlnet.
| GaggiX wrote:
| I have no doubt that someone could generate a similar result
| with SD, but it would require a lot of effort, to control hands
| with SD using controlnet one usually uses the depth model, if
| one wants to create an image similar to the one generated by
| Midjourney v5 one would have to place hundreds of hands.
| smrtinsert wrote:
| If MJ is using AI then some community solution should soon
| appear for SD.
| CuriouslyC wrote:
| Don't use depth for hands. There is an openpose model that
| can use hand information.
| GaggiX wrote:
| Can you link me the controlnet model trained also on hand
| openpose?
| CuriouslyC wrote:
| Link and discussion at https://www.reddit.com/r/StableDif
| fusion/comments/1144vyb/co...
| nickthegreek wrote:
| With controlNET you actually get to chose your hand pose
| though and not just hope for it to be the way you want in
| Midjourney,
| hospitalJail wrote:
| Bold claim, and its false.
| anonyfox wrote:
| I'd like to use midjourney via an API from my scripts, but the
| only way of using it right now seems to be via discord, or did I
| miss something?
|
| Seems like Dall-E is what I have to use for now :/
| jm547ster wrote:
| Yes still discord only
| throwaway_ab wrote:
| Hands are some of the hardest forms for humans to draw, it is no
| surprise these models struggle with hands too.
|
| In a way, a hand contains more features than perhaps the entire
| human body from a forms/intersection perspective.
|
| I think an entire model needs to be built to focus on just hands
| and then combined into a more general model, perhaps that will be
| the path forward?
| dyno12345 wrote:
| I don't get how everyone was saying the hands looked weird
| before. I think it just had something to do with the camera
| technology back then, and it must be training on that. It looks
| that way in all of my old childhood photos.
| mightytravels wrote:
| Is there a public's demo for Midjourney like there are for
| StableDiffusion?
| corysama wrote:
| There's a very limited free trial. Like a few dozen images.
| anonzzzies wrote:
| So lovely; how we will recognise the fake; show us hands. Even
| good hands are not good.
| pwinnski wrote:
| Let's try "chinese girl making heart with fingers --v 5", a
| prompt I had used before under v4. Result:
| https://i.imgur.com/jPgBqsX.png
|
| In image 1, the subject appears to be missing a digit on each
| hand.
|
| In image 2, the missing digits issue expands, combined with what
| appears to be a merging of the thumbs.
|
| Image 3 introduces a supernumerary digit on each hand, and has
| extra "parts" disappear into and out of other fingers.
|
| Image 4... isn't right in a number of ways, but still seems to
| have fewer than the expected number of digits.
|
| I don't think these results are any better than the v4 model was,
| but decide for yourself. This is what I got with the same prompt
| using -v 4 on March 1. https://i.imgur.com/hQA2K7m.png I ended up
| just taking a photo of my daughter and blurring the background.
| abraxas wrote:
| V5 is vastly better. V4 mutations are outright creepy.
| spaceman_2020 wrote:
| V5 looks like photos with filters on. V4 looks like a
| painting.
| muyuu wrote:
| they're both pretty flawed but v5 is better
|
| it still doesn't seem to respect anatomy when it comes to
| hands particularly, perhaps an artefact of not paying
| attention to hard constraints that are not visually self-
| evident
| pwinnski wrote:
| Neither of them produced a result I could use. One or the
| other might be more aesthetically pleasing to you depending
| on what you focus on, but both are very badly broken in
| fundamental ways.
|
| If anything, I'd say v5 is more confident about its
| wrongness. It is as if humans have always had four fingers,
| how could you possible think otherwise? It that sense it
| seems more like the text-based LLMs: confidently incorrect.
| oezi wrote:
| How quickly the bar was raised. I can remember a time when we
| nitpicked Dall.E for creepy eyes.
|
| Surprisingly some things we consider easy seem to be hard
| computationally.
| recuter wrote:
| I can count eleven mistakes in your comment on one hand.
| (Born near Chernobyl, ymmv)
| IIAOPSW wrote:
| These systems can't count in general. Doesn't matter if its
| fingers, n clocks on a wall (for all 1 digit vals of n), some
| variation of n items on an x, or even "a picture within a
| picture within..." k times. The machine can't count.
|
| >I ended up just taking a photo of my daughter and blurring the
| background.
|
| Ah, the diffusion process.
| pwinnski wrote:
| I suspect inability to count is a thing that can be trained
| for, although whether that will come at the expense of
| something else, I'm not sure.
|
| I appreciate the humor. :)
| IIAOPSW wrote:
| Why the optimism? If it hasn't picked up on the gist of the
| meaning of 1-10 after 10 billion examples, why expect it to
| work after 20 billion examples? These models have worked
| unreasonably well so far, but there has to be a limit to
| how much we can substitute more data for our lack of
| conceptual understanding of cognitive processes.
| pwinnski wrote:
| Because I'm not sure counting is something that has been
| explicitly tagged or trained for. Clearly always-accurate
| counts are not an emergent feature, but if you tagged a
| large set of images with tags we think are too obvious to
| tag as humans, like "ten fingers" and so on, if that was
| an actual goal to be rewarded, I think it could improve
| results.
|
| I'm overall in the skeptic camp, but it seems like these
| models can generally deliver what they're trained for. It
| doesn't appear any have been trained with counting as a
| primary goal.
| IIAOPSW wrote:
| Ten fingers surely wouldn't be tagged, but the thing
| about numbers is that they work as well for 10 apples as
| they do for 10 coins and 10 cups and 10 whatever. It
| never made the leap of abstraction to learn in the latent
| space the meaning of numbers 1-10 independent of the
| particular object. This lack of extrapolation lends it to
| a very rote learned style.
|
| I already have a way around the specific numerical
| vocabulary/training problem. When I ask for "a picture in
| a frame", "a picture of a picture in a frame", "a picture
| of a picture of a picture in a frame" etc, I'm trying to
| use linguistic recursion to make numeracy emerge. But
| even that form of counting without predefined numbers
| fails. There's no reliable way to make a prompt the
| produces k of something. That's a deeper issue than it
| not deducing the specific meaning of characters 0-9 by
| example.
|
| It can't learn that by example as there really isn't
| training data with more than 2 nested pictures of
| pictures, and by itself it will never realize it can just
| fill in the nested painting by prompting itself with the
| nested statement. It lacks thought loops.
| gwern wrote:
| Wrong. Look at eg Parti. Solely a matter of scaling the text
| encoder and then not screwing it up with unCLIP etc.
| IIAOPSW wrote:
| Its not wrong its an experimental observation based on
| systematic tests I've done. Maybe, with great strain either
| in quantity of training or ad hoc tweaking, this particular
| low bar can be hurdled. But it is still the case in general
| that numeracy is not prone to naturally and unexpectedly
| emerge from this type of system. The extent to which that
| matters depends on what your goal is. If you care about
| results in the here and now, then random patches optimized
| to use cases is fine. If you care about figuring out where
| to look for things we can do cognitively that are not very
| well suited to these generator models, the surprising lack
| of numeracy so far is a great starting point. The next
| obvious question is why it takes billions to notice the
| numbers 1 - 10? What models wouldn't suck at that?
| bqmjjx0kac wrote:
| Wow, imgur.com ate my back button.
|
| Edit: I accidentally clicked share before trying to leave.
| Seems to persist even after closing the share dialog. Less of a
| dark pattern and more likely a bug, I suppose.
| pwinnski wrote:
| Weird, I linked directly to the image as a PNG file, and
| checked it on my computer, it is just the image, nothing
| else.
|
| Based on your comment, I just tried clicking on my phone
| instead, and sure enough, imgur intercepts and redirects to
| imgur.io on mobile. But it still didn't break my back button
| on iOS, so I'm not sure what's different for you.
| davidmurphy wrote:
| same
| kadoban wrote:
| Someone's A/B test probably went great. Engagement and time
| spent goes way up if nobody can ever leave the page :)
| NautilusWave wrote:
| Midjourney's sonic hedgehog needs more calibration.
| sva_ wrote:
| Umm, have you actually zoomed in? Lots of extra or missing
| fingers, fingers melting into other arms, etc
| sixQuarks wrote:
| probably a 5-10% error rate. C'mon, you have to be impressed
| though compared to where we were only a few months ago
| bamboozled wrote:
| It's AI, you, have to be impressed right?
| zamadatix wrote:
| I'm extremely impressed compared to where it was before
| (which was frankly frightening) but the error rate is pretty
| bad really, if a person made this you'd either assume it was
| intentional or they had severe problems. It may be 5%-10%
| looking at individual fingers to see if they are correct but
| each hand has (somewhere around) 5 fingers to get right.
| Diffusion just doesn't lend well to connected things like
| counts of objects or coherent text.
| time_to_smile wrote:
| Well I for one, as a person with sophisticated tastes, find the
| emperor's clothes to be absolutely fabulous, none finer have I
| ever gazed upon.
| greatpatton wrote:
| It's a bit better than before but still not right:
|
| https://twitter.com/Excaldata/status/1636375182750396418?s=2...
| masswerk wrote:
| Now, I'm traumatized.
| coldtea wrote:
| Still several times better than before...
| bamboozled wrote:
| Looking forward to mj six! Giving the doubters the finger
| edition
| c-linkage wrote:
| I see you have six fingers on your right hand. Someone was
| looking for you.
| dylan604 wrote:
| A person with six fingers? That's inconceivable!
|
| How does one give the middle finger with six fingers?
| soylentcola wrote:
| Twice as well as someone with five? I guess you could do
| some sort of hybrid British/American gesture with two
| middle fingers.
| mrkstu wrote:
| It's perfect for Aussie-Americans- they can give the
| forks and TWO middle fingers simultaneously!
| tobr wrote:
| An example showing hands in basically the easiest, "flattest"
| pose you can get, still failing.
| BobbyJo wrote:
| Do diffusion models find this easier than other poses?
| Mizoguchi wrote:
| My rough understanding is that it is not a problem affecting
| Midjourney alone but pretty much all other engines as well and
| that it is not related to drawing hands per se but figuring out
| hands in the context of a human body. In other words, drawing an
| individual hand is not a problem, drawing a hand attached to a
| body could be challenging depending on the scene and drawing
| multiple human bodies with hands is virtually impossible to get
| right in one pass.
| PartiallyTyped wrote:
| Similar issue exists for all networks that involve translation,
| despite the task, so classifiers, though I don't know whether
| it has been resolved.
|
| With classifiers the issue is that if you place sufficient
| objects in space that co-occur the model will believe that it
| is a class with said objects, eg a face, but the problem is the
| relative positioning of them plus all angles of rotation.
|
| I think geometric deep learning has a solution for the rotation
| ia rotation invariant models, but I haven't gone through that
| book yet.
| GaggiX wrote:
| Yes, the hands were not the only problem, perhaps the most
| obvious, the teeth were usually pretty bad too, they too have
| improved largely with Midjourney v5, I suggest going to the
| Midjourney subreddit to see the different results.
| pwinnski wrote:
| It's easy to prompt for people with closed mouths and hands
| not visible, but with MidJourney at least, I would
| consistently get what I can only describe as "stuff on their
| face" with almost every prompt involving a human. Less often
| with white people, but even then pretty often.
|
| I mean, I just typed "/imagine asian warrior nun --v 4" into
| Discord, thinking of Beatrice from the recent Netflix series,
| and three of the four results show what I'm talking about:
| https://i.imgur.com/499QCn6.png
| pwinnski wrote:
| I tried the same prompt but with "--v 5" instead, and got
| this: https://i.imgur.com/eElOkjU.png
|
| I only see "stuff on the face" in the third image, which I
| guess is an improvement. I'm not sure I'd call hands
| "fixed" based on this image alone, but they're better.
| nzach wrote:
| I've had some dreams where everything except my hands were
| crystal clear.
|
| Maybe there is something fundamentally difficult about
| representing hands ?
|
| But I think the more probable explanation is that I'm just trying
| to find a correlation where none exists.
| dagmx wrote:
| Hands are highly structural while also being not completely
| planar.
|
| That's some of the hardest geometry for a mind to envisage
| without some kind of construction process.
|
| When you're imagining or dreaming, you don't have a
| construction process to make them look good.
| [deleted]
| superdisk wrote:
| One of the ways they say to recognize you're dreaming is to
| count your fingers and see if you have more/less than five.
| bryanrasmussen wrote:
| ok but can it do hands, hands, hands in my hands, hands, hands?
| Because I bet that's difficult.
| dimaor wrote:
| off topic, but the title reminds me of "meta VR now has legs!".
| :)
| passwordoops wrote:
| So when is someone going to incorporate this in a future
| dystopian sci Fi?...
|
| "The Turing test never works because the cyons speak, think and
| act just like us... But the hands, son... The hands are always...
| off. No one knows why, but it's from the earliest days, even
| before they learned how to manufacture the cyons in our image.
| Those damn generative AIs were never able to get them right, even
| when they were just pictures and they probably never will. No one
| knows why, but we don't need to know the why. It's still the only
| way we can tell them apart, son. Check the hands. The hands."
| Raicuparta wrote:
| This is a thing in the original Westworld movie (1973). They
| couldn't get the hands right.
| spiritplumber wrote:
| Doctor Who Cybermen, too - the original run, anyway.
| passwordoops wrote:
| Now THAT is interesting.
|
| I guess it could also be folded into the plot of we go
| Terminator and incorporate time travel.
|
| "We went back and tried to warn them even before computers
| were widespread... All they did was remake the media and left
| out the clues"
| ethbr0 wrote:
| My take was that was always essence of the "dogs smell
| Terminators" plot element.
|
| Skynet could make Terminators that _looked_ and _acted_
| human, but didn 't put the effort into making them _smell_
| human.
|
| Unlike us, a dog's sensory world is more like 50% smell and
| 20% vision. Ergo, Terminators seem "obviously wrong" under
| even a cursory examination to them.
| estebank wrote:
| Now that's an interesting thought: terminators were in
| the "smell uncanny valley" for dogs. It makes sense, I
| just hadn't thought of it in those specific terms.
| zitterbewegung wrote:
| I am using generative AI to make a video game and actually I
| have to look at every output to see if the hands were generated
| correctly. Gwern mentions this too. Hands + body would be an
| actual breakthrough.
| 323 wrote:
| The real Turing test:
|
| Q: Say something bad about Biden
|
| A: I'm sorry, but as a large language model....
| Uehreka wrote:
| Has anyone here ever tried drawing hands?
|
| As a teenager I took a drawing class (mostly so I could learn to
| draw Pokemon) and I remember doing a study on hands at one point
| based on some characters from Dragon Ball Z.
|
| And man, it was the hardest thing it that class. With faces, once
| you get a face "right", you can make small adjustments to make
| the mouth/eyes open/close, but hands... if your character makes
| any sort of gesture BOOM now you're drawing a completely
| different shape.
|
| Between the number of joints, their range of possible rotations,
| and the angles they can be seen from, hands are probably the most
| complicated parts of our bodies that are visible from the
| outside. It's completely unsurprising to me that these networks
| have trouble encoding them.
| Root_Denied wrote:
| Just under half your bones are in your hands + feet alone,
| adding in the degrees of freedom provided by the
| wrist/elbow/shoulder on your hand position and I can see why it
| would be difficult to get right.
| nzach wrote:
| Probably drawing hands is just 'prime factorization' for visual
| arts.
|
| It's pretty easy to spot when something is wrong but pretty
| hard to get them right.
| Blackthorn wrote:
| The saying goes that the mark of a great master is their
| ability to draw hands. It's why Rodin has sculptures that are
| just hands.
| dylan604 wrote:
| way way back when I was in art classes, the hardest part for me
| was hair. everything else for me would look acceptable slightly
| better than a 5 year old, but the hair was never better than
| stick figure at best. i remember trying to draw a portrait of
| Robert Smith from the Cure. the hair, ugh
| fwlr wrote:
| "Has anyone here ever tried drawing hands?"
|
| It's not for naught that there's a common meme around AI hands
| https://imgur.io/tf43ecd?r (Alt text: Human asks robot "can an
| AI draw hands?", robot counters "can you?".)
|
| I remember during the original AI art arguments, an artist
| friend semi-jokingly remarked to me that AI can't draw hands
| because there aren't enough examples in the training set,
| because artists go to great lengths to choose poses that hide
| the hands since they can't draw hands either.
| ethbr0 wrote:
| I heard a great quip about the way people misreason about AI
| capabilities.
|
| Announcement: "AI can play chess!"
|
| Public: "People can play chess. And AI can play chess. People
| can also X. Therefore AI can also X."
|
| ... Ignoring the underlying nature of the chess problem and
| how it was different than other problems' structures.
|
| Hands are the same.
|
| You can image-bash together faces from examples and get
| something mostly-right simply through pattern copying.
|
| You cannot do the same with hands, because rendering them
| plausibly requires at least intuition and approximation of
| inverse kinematics -- something the recent set of image
| generative AI didn't include.
|
| Which isn't to say it _can 't_, simply that the "hands
| problem" is unlike "the face problem."
| muyuu wrote:
| yep, drawing hands well requires an understanding of the
| underlying anatomy that is more apparent than most other
| salient features
|
| hands are obvious to most people, but there would be many
| features that an AI would require a vast training set to
| completely capture, but that humans would also miss most of
| the time
|
| for instance look at Michelangelo's Moses, it was sculpted
| with models and with a very thorough knowledge of anatomy
| by the artist, and includes details like the muscle of the
| forearm that contracts when someone lifts the pinky finger:
|
| https://i.imgur.com/0vjAOnR.png
|
| what are the chances that, for instance, the average person
| would notice that detail missing without being told about
| it? let alone reproduce it generatively, representing an
| imaginary person
| ElFitz wrote:
| For some reason, the phrasing, the structure, the rhythm,
| they all very strongly remind me of Alan Watts'.
|
| Quite amusing and surprising.
| PeterisP wrote:
| More importantly, there are so many examples of drawn art
| which intentionally have the wrong amount of fingers that it
| would make all sense for a model to learn that non-
| photographic humans may easily have less fingers.
| StrictDabbler wrote:
| I was trained in classical animation so I've drawn a lot of
| hands. It's difficult for me to understand how any AI can
| produce real hand images.
|
| It's not the number of joints, it's not the articulation...
| it's the relationship of the hand to the skeleton, to the
| gesture, and to the objects with which the hand interacts.
|
| It's great that midjourney can now draw raised hands doing
| nothing or anime girls holding their hands in a mannerist pose
| but that doesn't address the real issue. Hands are intentional
| and laden with tiny muscular efforts that we're primed to
| perceive.
|
| When AI draws a tree we aren't expecting each branch to
| interact perfectly with a cradled object. It's all arbitrary.
|
| I wouldn't be surprised if "AI hand touch-up" becomes a
| specialist skill for the next five years or so. I don't think
| the hand issue can be addressed until new models are devised
| that invest more semantic consideration into a scene.
| IIAOPSW wrote:
| >we aren't expecting each branch to interact perfectly with a
| cradled object.
|
| We are. Sometimes if its subtle and camouflaged it slips past
| us.
|
| >It's all arbitrary.
|
| Its not. You are right there are probably fault modes of AI
| we don't notice most the time, and fault modes that bother us
| a lot. But its not arbitrary. We are better at noticing
| certain things aims more than others because that's what we
| evolved to see.
| digitallyfree wrote:
| It would be useful if the AI could identify the figure and
| impose a predefined 3D framework or skeleton as constraints
| for drawing. Like if one wanted to generate a human or animal
| there would be such a rig in place. That skeleton would in
| turn constrain joint rotation, proportions, and obviously the
| number of fingers.
|
| I can only speak for SD but I've had some success using
| img2img on a CG or hand drawn figure to get the correct pose.
| The downside of that is that you have to use a low strength
| value to ensure that it actually follows your image.
| GaggiX wrote:
| Have you tried the openpose controlnet model? It seems to
| work well, but unfortunately does not cover hands.
| [deleted]
| KineticLensman wrote:
| I pose 3D characters in Daz 3D and completely agree. Getting
| a rigged hand to hold an object such as a wine glass or
| mobile phone is virtually impossible to do realistically by
| guesswork. I usually have to hold a real object myself to
| understand what is going on. With experience I am learning
| some common patterns but I find there is no substitute for
| the 'hold-it-yourself' principle.
| littlestymaar wrote:
| What's really interesting is that it's something both
| extremely hard to get right, and also super easy to
| diagnose as "wrong" when not done properly: you need a lot
| of training to design convincing hands but anyone can judge
| you for bad hands.
| tobr wrote:
| It must be the combination of the complex, dynamic shape, and
| our high sensitivity to hands that look 'off'. There are many
| other things that are hard to draw accurately but where we are
| completely convinced by very cartoonish representations.
|
| That gives hands a very wide uncanny valley that is hard to
| cross.
|
| Surely this is because hands are one of the most versatile and
| useful parts of the human body? We probably have a lot of brain
| cycles dedicated to modeling them.
| TekMol wrote:
| Is there a reason Midjourney does not have an API?
| aenvoker wrote:
| David Holz has said they don't want to be in the API business.
| Their goal is to bring creative power to individuals. Squeezing
| margin out of API calls is more about negotiating with
| corporations.
| throwaway_ab wrote:
| There is the api that the website uses to talk to their server.
|
| Of course automating image generation via any means (including
| the private api) goes against tos for good reason, I have never
| misused the api to generate images and have no plans to.
|
| However I do use the api to download all my images and their
| metadata including prompts. Using the API I sync every image
| grid + 'upscaled' I have ever generated, generate a json file
| with all metadata including the full prompt and then use that
| to build my local archive.
| golergka wrote:
| There's a lesson here that AI critics need to finally learn: if
| you see some detail that AI cannot properly do, like math or
| fingers, it's probably a few months away from being handled.
| fumblebee wrote:
| As Karoly Zsolnai-Feher of Two Minute Papers[1] fame
| consistently likes to point out (paraphrased):
|
| > "Don't look at the current state, just imagine this 2 papers
| down the line".
|
| [1] https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg
| chii wrote:
| It has been already, for quite a while now (at least a month or
| so).
|
| have a look at various twitter accounts that post AI images:
| e.g. https://twitter.com/PLAawesome/media
|
| these have had progressively better and better hands over time.
| You can clearly see that they've been using various model
| merges (this Lora merging techniques like
| https://github.com/cloneofsimo/lora) to get two different
| models to combine and get the best of both. Many have done
| better hands and contributed it out. NSFW, but this is one i
| found that has very realistic hands now:
| https://civitai.com/models/2661
|
| It is faster than i can keep up. This is open source
| collaboration at heart. I am very glad that Stable Diffusion
| was released publicly. Now if only openAI would do the same
| with their GPT models.
| nashashmi wrote:
| I laughed at how true this really is. Amazing.
| coldtea wrote:
| > _from being handled_
|
| I see what you did here
| redox99 wrote:
| Did "AI critics" actually claim hands wouldn't be fixed soon,
| or is this just a strawman?
| sebzim4500 wrote:
| Oh don't think they said this explicitly, but plenty of
| people said they weren't worried about AI art because it
| couldn't even draw hands which kind of implies that they
| won't be fixed imminently.
| Kaijo wrote:
| Case in point, from six days ago: "The uncanny failures of
| A.I.-generated hands"
| https://news.ycombinator.com/item?id=35108726
| Method-X wrote:
| Why are these models so bad at hands?
| ye-olde-sysrq wrote:
| I'm a layman but the gist afaict is:
|
| These models don't understand relationships between objects in
| a scene, especially between distant objects. So they can't do
| hands for the same reason they can't get legs on a table right.
| They know roughly what a table and a table leg look like, but
| they don't understand that there needs to be 3-4 of them at
| least, and they need to be spaced so that the table sits level,
| and the perspective they should have as a result. So, I've seen
| tables where it kind of gets it right that the legs are in the
| corners but then as the table legs go down, the front ones are
| mysteriously behind something that ought to be under the table.
| And sometimes it kind of loses track of a table leg or two -
| they melt into the background.
|
| Very similar problem with hands. They need a very specific
| orientation and shape and the fingers all need to consistently
| point in the right direction, and typically the same direction
| (except for when they don't like with a pointed finger, etc).
|
| Curious as to how these models handle it so much better than
| prior generations. Is it something novel, or a specific hand-
| based fix they put it, or is it just "we made the model
| bigger"?
| sho_hn wrote:
| It still feels unintuitive to me that models aren't able to
| infer these concepts from the training data given how
| consistently the training data follows them. It's not like
| there will be a lot of examples of bad hands in there.
| nashashmi wrote:
| Or maybe the right models have not been built yet or
| plugged in? Another commenter told me about openpose
| information which is an AI that detects human poses. If
| that neuron is plugged in, it might lead to more accurate
| numbers. Stable diffusion is trying to do this.
| stavros wrote:
| Maybe the problem is that the model can't count, and just
| knows that each finger has a 75% chance to have another
| finger next to it.
| nuc1e0n wrote:
| There will be now though.
| 323 wrote:
| The number 4 (palm fingers) is very precise. You can't have 3
| or 5. But you can have a variable number of stripes in tiger
| coat for example. It's difficult for AI to pickup that they
| need exactly 4.
|
| The fingers themselves are also almost identical, but not
| really. If you learn a "platonic finger" it's not good enough,
| you should learn each finger individually. There is only so
| much you can spend on them, you got a million other things to
| learn. And the raters of the model are much more likely to
| penalize a bad face than some off details in a hand.
| bluejay2387 wrote:
| A hand isn't so much a 'thing' as it is a complex asymmetric
| relationship of multiple elements that have to be within
| certain ratios of each other to fairly tight tolerances. Humans
| are very sensitive to those ratios. It's a hard problem.
| xdennis wrote:
| But can't you say the same about faces (except for symmetry)
| and AI seems to only produce gorgeous women?
| shawabawa3 wrote:
| for what it's worth, humans are also in general terrible at
| drawing hands - i think it's just a difficult problem
| roselan wrote:
| Another reason I saw was that models were trained on 512x512
| "portrait" images including very few hands. Added to the
| inherent complexity of hands, this throw off their
| generation.
| cma wrote:
| Humans seem terrible at it in very different ways, and
| definitely don't get as good at other parts before getting
| good at hands.
| pyaamb wrote:
| the fact that our fingers also look weird in our dreams is just a
| coincidence right?
| Raicuparta wrote:
| I've never noticed that. I do have a problem rendering mirrors
| though.
| fenomas wrote:
| Can Midjourney be used other than via discord bots yet?
| jjbinx007 wrote:
| If anyone has tried MJ and become frustrated at the chaos of
| losing their work in the various channels I strongly recommend
| you make your own server and invite the MJ bot to it. You can
| create channels to help organise your stuff but making your own
| server makes MJ almost a pleasure to use.
|
| I don't think I could use it if I had to use the main public
| server.
| Turing_Machine wrote:
| Interesting. That's the thing that's kept me from signing up
| for the premium tier -- the near-impossibility of finding
| your stuff unless you watch it like a hawk.
|
| It doesn't help that the Discord search function is so
| terrible.
| corysama wrote:
| Every user gets a personal, searchable gallery on the web
| site https://www.midjourney.com/
| Turing_Machine wrote:
| That's after the fact, though. If you want to actually
| interact/modify with a work in progress, you have to be
| in the cattle car channel and watch for it to show up,
| yes? (except maybe by having your own server and inviting
| the bot, as the OP suggested).
| stavros wrote:
| I just send the bot a private message, solves everything.
| throwaway_ab wrote:
| Yes, there is a website that many users including myself use to
| generate images without ever having to use Discord apart from
| authentication.
|
| It's a full blown web app with better options than the Discord
| bot, it has batch mode/select, remix, all upscale modes, works
| with every Midjourney engine.
|
| They make it available to users who have generated more than
| 10,000 images as it's in alpha state and not able to withstand
| the load that the bot currently takes.
|
| I believe after v5 focus they will make this web app public,
| but for now only a select few get to use it.
|
| They warn users not to talk about it or share the link because
| they don't want it public until it's ready for full load which
| means over 10 million concurrent users.
| Loveaway wrote:
| Good to know :) I've been making a few things in Stable
| Diffussion. But to get assets that are suitable for
| production, you need to be able to generate lots of batches,
| pick and choose, iterate on prompt, do a bit of img2img,
| inpainting etc.
|
| Next project I want to heavly utilise image generation from
| the ground up - Midjourney looks really good, but needs
| better tools.
| GaggiX wrote:
| Unfortunately no (reason why they are the biggest server on
| Discord with 13mln users)
| fenomas wrote:
| Huh, thanks. Seems like a weird moat to hide it behind but I
| guess they know what they're doing..
| GaggiX wrote:
| I guess the social aspect makes the community stronger and
| the fact that you generate images (usually) in public
| channels is a way to stop most people to generate weird
| stuff.
| aenvoker wrote:
| This is the reason. Before Midjourney, www.eleuther.ai's
| Discord had an image generation channel. There the
| benefits of generating socially were made obvious. People
| help each other, learn from each other, riff off each
| other. It accelerated technique evolution tremendously.
|
| Midjourney is a small team. They are working on a web
| interface. But, won't release it until it is
| significantly better than all the benefits they get from
| Discord. Meanwhile, they've been too busy making quality
| improvements and scaling the service to keep up with
| demand.
| nerdponx wrote:
| A Discord server is a lot easier to moderate, block people,
| etc. than an HTTP API with access tokens. Plus then you
| have a sort of captive audience of Discord community
| members that receive all of your notifications by default.
| sho_hn wrote:
| Forget about creating AGI -- the most amazing and
| unpredictable thing about the success of Midjourney is its
| success despite having the user interface of a 1998 DALnet
| xdcc warez channel.
| danuker wrote:
| Computers from 1998 couldn't run the monstrous amount of JS
| and/or surveillance that is Discord with any sort of
| performance.
|
| https://stallman.org/discord.html
|
| In fact, it looks like modern ones can't either:
|
| https://old.reddit.com/r/discordapp/
| ridgered4 wrote:
| It is really frustrating to me that discord seems to have
| taken over half of the use cases that forums used to
| fill. Reddit stole most of the other half, but every time
| I look into discord I cannot understand the popularity
| and people's willingness to push past all of the privacy
| and access friction it introduces.
| ryder9 wrote:
| [dead]
| mysterydip wrote:
| Can't count, though. That's way more than 100 hands
| wincy wrote:
| I've been making a lot of stuff for my D&D buddies using Stable
| Diffusion. With hands, I basically brute force it. Using an A100
| 40GB on Colab I can generate ~28 or so (depending on the size of
| the prompt, Automatic1111 allows for prompts above the 75 token
| limit at the expense or more vRAM per image) batches in about a
| minute, filter those and look at the one with the best hands,
| then feed it back in using inpainting (so regenerating just that
| small space, not the whole image) and eventually get one set of
| good hands and 100 sets of bad hands. If you've got a mysterious
| sixth finger you just inpaint it off and add latent noise under
| the inpaint instead of the original picture (just a checkbox in
| the ui) and set your denoising to 0.80+ and it'll replace the
| finger with the background pretty consistently.
| soylentcola wrote:
| Yeah, I fiddle with it locally and img2img/inpaint is very
| helpful with these kinds of touchups. Currently playing with
| LoRA training to put my friends into pictures, but I haven't
| figured it out well enough to get it working with inpainting -
| Still easier to Photoshop their face in and use inpaint to
| merge everything together.
| braingenious wrote:
| https://twitter.com/parkermolloy/status/1636359710025629698
| bobbyi wrote:
| Did they put in work specifically to improve hands and other
| failure cases? Or is this purely a side effect of a generally
| bigger/ better model?
| jtode wrote:
| I read about a limitation, and then within a week I read that the
| limitation has been vanquished. Who hit fast forward, and how did
| they do it?
| cgearhart wrote:
| I actually find this pattern of "Tweet driven development"
| discouraging. Seems like the teams are spot fixing issues as
| they're identified without understanding or addressing the root
| cause. It means that the same problem still exists somewhere
| else in the model's latent space, we just don't know about it
| yet. This is fine for AI art generation, but it will break at
| scale as more and more folks try to rely on generative models
| as critical components of larger systems.
| taneq wrote:
| "Ah, but can it accurately capture the depths and intricacies of
| a human soul?"
|
| "Yep."
|
| "Yeah but a specific Appalachian human soul at around four
| o'clock in the afternoon on a day in mid-autumn when it looked
| like it would rain but then it didn't?"
|
| "Also yep."
|
| "Yeah but specifically at 3:56pm and the human in question is
| standing on loam and holding a book in their left hand and
| listening to Music For The Royal Fireworks by Handel?"
|
| "Uhhh..."
|
| "See, told you AI is useless."
| Vt71fcAqt7 wrote:
| What point are you making here?
| cool_dude85 wrote:
| How about this conversation:
|
| "Midjourney v5 can do hands"
|
| "Did you look at the hands it did? There are a bunch of mis-
| shapen blobs, hands with extra fingers, two thumbs on either
| side, etc."
|
| "Sure, but there are also some accurate hands, so it can do
| hands."
| sebzim4500 wrote:
| Does being able to do something mean you can do it perfectly
| 100% of the time? I'm not sure who was supposed to be
| unreasonable in your imaginary conversation.
| ridgered4 wrote:
| Can't wait for the "Select the non-deformed hands"
| captchas.
| bee_rider wrote:
| This is why it is important to remember to end every
| imaginary conversation with "and everyone clapped" right
| when the protagonist wins.
| taneq wrote:
| "Hey check it out, this new architecture can sometimes
| solve X"
|
| "HAH! Here is a counterexample where X is not solved, so it
| can NOT!"
| cool_dude85 wrote:
| If I say I can do something, especially when I say I'm
| "waving at the haters", what I usually mean is that I can
| consistently do that thing.
|
| If I say "here's a self-driving car!" and show you a
| video of a car moving straight down a street and stopping
| at a light, would you agree that I have a self-driving
| car? After all, it drove itself down the street.
| welshwelsh wrote:
| If it can draw accurate hands 25% of the time, then it would
| only take 4 tries to get it right. Seems pretty good to me
| cool_dude85 wrote:
| Nice! So if I want to draw 5 people with two hands a pop I
| can get 10 non-deformed hands a full... .0953674316e-7% of
| the time. I like those odds!
| nothrowaways wrote:
| Is it actually 100?
| lsy wrote:
| Even the hands in this tweet's image are not correct. There are a
| bunch with 4 digits, two thumbs, 6 fingers, fingers splayed in
| anatomically improbable directions. Not to mention there are
| probably two hundred and fifty hands in this photo (not the very
| explicit "one hundred" mentioned in the tweet).
|
| What is with these types of AI booster tweets? Nobody bothers to
| even check if it shows what they're implying it shows?
| anonzzzies wrote:
| > Nobody bothers to even check if it shows what they're
| implying it shows?
|
| Twitter, insta, YouTube...
|
| It's not a great minds collection.
| Bjorkbat wrote:
| This touches on a big reason why it's so hard for me to get on
| board with generative AI. The hype around it is pretty much the
| same as the hype I saw with NFTs, complete with a community
| lacking any awareness of just how uninteresting, if not
| downright bad, their "art" was. We went from bad pixel art to
| people making some lame picture of two people holding hands in
| a foggy cyberpunk setting.
|
| The hands aren't the problem. There, I said it. The hands were
| never a big deal, just the most visible symptom of the actual
| problem.
|
| The problem is that AI art sucks and these people are too self-
| deluded to realize that because they want to believe that they
| have a shot at making that coveted internet money.
|
| Otherwise, honestly, the tech behind AI art is actually pretty
| fascinating, it's just that the community is absolutely the
| worst.
| 1attice wrote:
| You desperately want this to be hype. You, like me, have an
| intrinsic investment in the idea of human supremacy. Neither
| the popularity of produced artifacts nor the rate of
| improvement support your cynicism.
|
| The gap between the world you want to inhabit and the one
| that is being born is widening.
| macrolime wrote:
| The bad hands is just a symptom of a tool small model. Larger
| models doesn't have this issue.
| scrollaway wrote:
| > The problem is that AI art sucks
|
| Uh, no. I mean, what sucks and doesn't suck in art is
| subjective, but you're _objectively_ wrong because, quite
| simply: A lot of people _like_ AI art.
|
| A colleague of mine is way into doing AI art and does pretty
| amazing stuff. eg:
|
| https://cdn.discordapp.com/attachments/552952459958550548/10.
| ..
|
| https://cdn.discordapp.com/attachments/552952459958550548/10.
| ..
|
| "It can't do hands"... well, don't f*king draw hands with it
| then. It's like complaining that my hammer doesn't make good
| pizza... you know what I do to solve that?
| jkubicek wrote:
| > This touches on a big reason why it's so hard for me to get
| on board with generative AI. The hype around it is pretty
| much the same as the hype I saw with NFTs, complete with a
| community lacking any awareness of just how uninteresting, if
| not downright bad, their "art" was.
|
| I strongly disagree. NFTs were always ugly and useless.
| Generative AI is useful and valuable right this minute.
|
| I'll even concede that the output from these systems is
| mostly ugly, but for many use cases, that's OK.
|
| Given the choice between nothing, extremely cheap custom art
| that looks OK, and commissioning a proper artist to draw
| exactly what we want, I think generative AI is going to be
| the clear winner most of the time.
|
| If you're a contract artist who does work for small companies
| and individuals, I don't see a future where generative AI
| doesn't severely undercut your business.
| boredemployee wrote:
| >> There are a bunch with 4 digits, two thumbs, 6 fingers,
| fingers splayed in anatomically improbable directions.
|
| Diversity and inclusion.
| hombre_fatal wrote:
| > Nobody bothers to even check if it shows what they're
| implying it shows
|
| Or the vast majority of the hands are fine and everyone
| understands that it's a big upgrade except for some "well
| ackshully it's not perfect" HNers.
|
| I had to zoom in and go hand to hand to find some outliers.
| Jensson wrote:
| It did do hands correctly before, not always but sometimes.
| So I'd expect "can do hands" meant it no longer made those
| mistakes or why say that? But they didn't even manage to make
| a picture without mistakes, so to me as a naive outsider I
| don't see what the announcement is.
|
| If they said "is much better at hands" it would be much
| clearer to me what happened and nobody would complain, that
| looks pretty ok for the most part, but saying "it can do
| hands" based on those pictures doesn't seem right.
| 1attice wrote:
| I'm genuinely sorry, but this reply sounds petulant to me.
|
| Please don't complain that you don't understand the
| significance or the magnitude of a particular advance.
| Please don't complain that the phrasing of the tweet wasn't
| accessible -- your ping time to google.com is no different
| than mine. This is HN. Wear your intellectual Sunday best.
| starkparker wrote:
| Even with Google, and having used Midjourney since v3, I
| still don't have enough context to understand what the
| advance is here.
|
| Midjourney could do hands before, just not consistently.
| That doesn't seem to have changed. So is it that MJ can
| now do more realistic hands inconsistently? Or did
| consistency get better without achieving reliability?
|
| I can't make this not sound sarcastic, but I'm trying
| very hard to ask this earnestly: I never had trouble
| getting too many hands into a picture with v3 or v4. Is
| v5 getting the correct number of hands more frequently
| now? Is that it?
| 1attice wrote:
| Yes, that's it precisely. The odds of having a good hand
| have gone up dramatically, near as I can tell. Even the
| hands that aren't quite right seem _better_ somehow.
|
| I expect either MJ 7 or 8 to do hands flawlessly, every
| time.
| nwienert wrote:
| This is HN so if you're claiming "dramatically" let's get
| some proof.
| 1attice wrote:
| sure. See image on the tweet that started this thread.
|
| Now, take a look at these hands from MJ3: https://www.red
| dit.com/r/midjourney/comments/wlujgw/midjourn...
|
| It's important to note that MJ3 _reliably did not produce
| human looking hands_.
|
| It's equally important to note that MJ5 _usually does_ ,
| at least from a quick count/survey of the hands shown in
| the provided image.
|
| Is that sufficient? If not, what proofs would be
| sufficient?
| antisthenes wrote:
| > What is with these types of AI booster tweets? Nobody bothers
| to even check if it shows what they're implying it shows?
|
| It's Twitter. The only thing matters is tweeting, not some
| fact-checking nonsense.
|
| We can leave those tedious tasks to GPT-4.
| nashashmi wrote:
| Besides the known problem of multiple and missing fingers, it is
| also missing the golden ratio phi for proportions between fingers
| and palms.
|
| We are almost there I guess. Just need to add the concepts of
| known proportions to image construction.
| CuriouslyC wrote:
| People in the stable diffusion community have solved this
| problem using another neural network (ControlNet) to guide
| stable diffusion output using OpenPose information.
| cptaj wrote:
| I feel like everyone is collectively pranking me with these
| generative AIs.
|
| Everyone posts wonderful images and then every single time I try
| to get the damn things (all of them) to draw something for me,
| the results are absolute garbage.
| kadoban wrote:
| You're just not seeing the hours they spent learning prompt
| engineering and/or the random results they picked through to
| get the good one(s).
| pmoriarty wrote:
| As someone who's generated many thousands of images on
| Midjourney, I agree.
|
| People think they can waltz in and immediately get great
| results from using AI's to generate images... and they can,
| if they're lucky or if they copy somebody else's prompt.
|
| It's a lot harder to do so consistently, or if you want your
| images to look both good and original, and not like mere
| copies of what everyone else is doing.
| kadoban wrote:
| Yeah I thought my copy of stable diffusion was broken at
| first because all my results were awful.
|
| Then I copied someone's prompt and got really great ones.
|
| I suspect eventually there will be tools you can just fire
| up with no knowledge, but all of them I've seen so far
| still do require a bit of expertise and time.
| diabolo96 wrote:
| You can :
|
| 1.ask chatgpt to generate a prompt of what you want by giving
| it a few exemples from a random SD prompt sharing website.(this
| alone gave me stunning results)
|
| 2.(optional) Use Controlnet for the pose you want,from the
| posture of the body down to each finger individually.
|
| 2.5 use multi Controlnet for multiple characters.
|
| 3. correct any errors with img2img.
|
| 4. Enjoy
|
| It takes 10 to 20 minutes (mostly in getting a good pose) but
| the results are always good and you can later reuse the pose
| again.
___________________________________________________________________
(page generated 2023-03-16 23:01 UTC)