[HN Gopher] Show HN: YoHa - A practical hand tracking engine
___________________________________________________________________
Show HN: YoHa - A practical hand tracking engine
Author : b-3-n
Score : 180 points
Date : 2021-10-11 07:41 UTC (15 hours ago)
(HTM) web link (handtracking.io)
(TXT) w3m dump (handtracking.io)
| rglover wrote:
| Great demo.
| b-3-n wrote:
| Thank you for your feedback.
| programmarchy wrote:
| Was wondering how easy it'd be to port to native mobile, so went
| looking for the source code, but doesn't appear to actually be
| open source. The meat is distributed as binary (WASM for
| "backend" code and a .bin for model weights).
|
| Aside from being a cool hand tracker, it's a very clever way to
| distribute closed source JavaScript packages.
| b-3-n wrote:
| Thank you for the feedback. You are right that the project is
| not open source right now. It's "only" MIT licensed. That's why
| I also don't advertise it as open source (if you see the word
| open source somewhere it would be a mistake on my end, feel
| free to tell me if you see it somewhere). I wanted to start out
| from just an API contract so that it is easier to manage and
| get started. In general I have no problem open sourcing the JS
| part. But first there is some refactoring to do so it is easier
| to maintain upon open sourcing. Stay tuned!
|
| As a side note: The wasm files are actually from the inference
| engine (tfjs).
|
| Please let me know if you have any more questions in that
| regard.
| boxfire wrote:
| This architecture was also used in the link referenced when
| bringing up alternative implementations:
|
| https://github.com/google/mediapipe/issues/877#issuecomment-...
| itake wrote:
| I think these tools are super interesting, but I tools like this
| marginalize users with non-standard number of limbs or fingers.
| colordrops wrote:
| What do you suggest be done about it?
| itake wrote:
| I'm fine with using them, as long as alternatives are
| available for people with disabilities are able to
| participate as well.
|
| Imagine if your bank started using these to access your
| account and suddenly disabled customers could no longer use
| their adaptive input devices to interact with their account.
| rpmisms wrote:
| So does the real world. Things are hard to do with
| disabilities. That's what the word means. This has great
| potential, and it's not worth shutting down because some people
| aren't able to use it.
|
| I can also see this being very helpful for people who have
| cerebral palsy, for example. Larger movements are easier, this
| might help someone use the web more easily.
| itake wrote:
| What if a bank used this for authentication and disable
| people can't use their custom interface devices? Does that
| mean that disabled people shouldn't access to their bank
| accounts?
|
| Maybe if this was the input device that interacts with the
| standard web, then there is potential here, but it would be
| unfortunate if a company used this as a primary means of
| input.
| mkl wrote:
| That's the bank's mistake, not this library's.
| pantulis wrote:
| This is a very valid point, but as a counter argument the
| technique implemented here could be adapted to help users with
| other needs like say, a browser extension that can help you
| navigate back and forward with the blink of an eye.
| itake wrote:
| This all gets complicated, because not everyone has 2 eyes
| :-/.
|
| You end up with complicated systems trying to cover all of
| the edge cases.
| [deleted]
| [deleted]
| jakearmitage wrote:
| I wish there was a nice open-source model for tracking hands and
| arms with multiple viewpoints (multiple cameras), similar to
| commercial software like this: https://www.ipisoft.com/
| borplk wrote:
| Very impressive.
|
| I want something like this so I can bind hand gestures to
| commands.
|
| For example scroll down on a page by a hand gesture.
| b-3-n wrote:
| One can build this pretty easily for a website that you are
| hosting with the existing API (https://github.com/handtracking-
| io/yoha/tree/master/docs).
|
| However, you likely want this functionality on any website that
| you are visiting for which you probably need to build a browser
| extension. I haven't tried incorporating YoHa into a browser
| extension but if somebody were to try I'd be happy to help.
| layer8 wrote:
| What would be nice is a version that can be used to paint on the
| screen with your fingers, such that the lines are visible on a
| remotely shared screen. The use-case is marking up/highlighting
| on a normal desktop monitor (i.e. non-touch) while screen-
| sharing, which is awkward using a mouse or touchpad (think
| circling stuff in source code and documents, drawing arrows
| etc.). That would mean (a) a camera from behind (facing the
| screen), so that the fingers can touch (or almost touch) the
| screen (i.e. be co-located to the screen contents you want to
| markup), and (b) native integration, so that the painting is done
| on a transparent always-on-top OS window (so that it's picked up
| by the screen-sharing software); or just as a native pointing
| device, since such on-screen painting/diagramming software
| already exists.
| adnanc wrote:
| Great idea which is brilliantly executed.
|
| So many educational uses, well done.
| b-3-n wrote:
| Thank you for the feedback.
| tomcooks wrote:
| This is a GREAT website, I can understand what it does with zero
| clicks, zero scrolls.
|
| Really great, congratulations, I hope that I can find a way to
| apply this lesson to my SaaS.
| SV_BubbleTime wrote:
| Agreed, but also nature of the beast. It's really easy to
| explain hand tracking software in a single media element. It's
| a lot harder to explain some crypto AEAD encapsulation format
| the same way.
|
| I assume YoHa means Your Hands... I don't think I could have
| resisted OhHi for hand tracking.
| brundolf wrote:
| Bit of feedback: the home page is pretty sparse. The video is
| great, but it wasn't obvious how to find the repo or where to get
| the package (or even what language it can be used with). I had to
| open the Demo, wait for it to load, and then click the Github
| link there, and then the readme told me it was available on NPM.
|
| Otherwise looks pretty impressive! I've been looking for
| something like this and I may give it a whirl
| b-3-n wrote:
| Thank you for the feedback. You are right, the home page should
| probably be enriched with more information and maybe I can make
| the information you were looking for stand out better. As a
| side note: There is a link to GitHub in the footer. The
| language ("TypeScript API") is also mentioned in the body of
| the page. But I see that these two can quickly go unnoticed.
| tomcooks wrote:
| BTW this would be great for spaced repetition foreign character
| learning (Chinese, Arabic, Japanese, Korean, etc.): if the drawn
| figure is similar enough to the character the student is learning
| mark it as studied.
|
| Congrats again
| b-3-n wrote:
| Thank you for your feedback and for sharing this potential use
| case. I think it is a very creative idea.
| eminence32 wrote:
| The demo doesn't seem to work on my chromebook. Maybe it's too
| underpowered?
|
| Web page doesn't say anything after `Warming up...` and the
| latest message in the browser console is:
| Setting up wasm backend.
|
| I expected to see a message from my browser along the lines of
| "Do you want to let this site use your camera", but I saw no such
| message.
| karxxm wrote:
| Would you provide the related paper to this approach?
| b-3-n wrote:
| In contrast to similar works there is no dedicated paper that
| presents e.g. the neural network or the training procedure. Of
| course ideas from many papers influenced this work and I can't
| list them all here. Maybe it helps that the backbone of the
| network is very similar to MobileNetV2
| (https://arxiv.org/abs/1801.04381). Let me know if you have any
| more questions in that regard.
| karxxm wrote:
| Thanks for your reply! I just thought that SIGCHI is around
| the corner and it will be presented there! Awesome work!
| gitgud wrote:
| The demo really sells it here [1]. It's amazingly intuitive and
| easy to use, it should be a part of video-conferencing software.
|
| [1] https://handtracking.io/draw_demo/
| Graffur wrote:
| It's like an initial beta of the software - it's not production
| ready. I can't imagine this adding value to a meeting _yet_.
| Seems promising though.
| b-3-n wrote:
| Thank you for the feedback. Such an integration would be nice
| indeed.
| lost-found wrote:
| Demo keeps crashing on iOS.
| iainctduncan wrote:
| Hi, I'm not sure if you've looked into this or not, but another
| area that is interested in this sort of thing and might be very
| excited is musical gesture recognition.
| hondadriver wrote:
| Also look at leap motion.
| https://www.ultraleap.com/product/leap-motion-controller/ (tip:
| mouser has them in stock and usually the best price) with
| midipaw http://www.midipaw.com/ (free)
|
| Latency is very low which is very important for this use case.
| Look on YouTube for demos.
| b-3-n wrote:
| Hey, I believe there are multiple things you could have meant.
| From the top of my head one thing that might be interesting
| would be an application that allows conductors to conduct a
| virtual orchestra. But there are other possibilities in this
| space too I'm sure! If you had something else in mind feel free
| to share.
|
| I have not explored this space much so far as my focus is
| rather to build the infrastructure that enables such
| applications rather than building the applications myself.
| tjchear wrote:
| This reminds me of TAFFI [0], a pinching gesture recognition
| algorithm that is surprisingly easy to implement with classical
| computer vision techniques.
|
| [0] https://www.microsoft.com/en-
| us/research/publication/robust-...
| phailhaus wrote:
| An "undo" gesture seems necessary, it was a bit too easy to
| accidentally wipe the screen. Aside from that, this is fantastic!
| Love to see what WASM is enabling these days on the web.
| b-3-n wrote:
| Thank you for the feedback. Indeed such a functionality would
| be nice. One could solve this via another hand pose or in some
| way also with the existing hand poses. E.g. make a fist for say
| 2 seconds to clear the whole screen. Anything shorter will just
| issue an "undo".
|
| YoHa uses tfjs.js which provides several backends for
| computation. One indeed uses WASM, the other one is WebGL
| based. The latter one is usually the more powerful one.
| smoyer wrote:
| I've been working on a couple of chording keyboard designs and
| was thinking I might be able to create a virtual keyboard using
| this library. It would be nice to also be able to recognize the
| hand from the back. A keyboard would also obviously be necessary
| to track two hands at a time.
|
| How does the application deal with different skin-tones?
| b-3-n wrote:
| That's an interesting idea. I have not tried to build something
| similar but a humble word of caution that I want to put out is
| that no matter what kind of ML you use the mechanical version
| of the instrument will always be more precise (you likely are
| aware of it, just want to make sure). However, you might be
| able to approximate precision of the mechanical version.
|
| Two hand support would be nice and I would love to add it in
| the future.
|
| The engine should work well with different skin tones as the
| training data was collected from a set of many and diverse
| individuals. The training data will also grow further over time
| making it more and more robust.
| inetsee wrote:
| My first question is whether this has the capability of being
| adapted to interpret/translate American Sign Language (ASL)?
| b-3-n wrote:
| Thank you for this inspiring question. For interpreting sign
| language you need multi-hand support which YoHa is currently
| lacking. Apart from that you likely also need to account for
| the temporal dimension which YoHa also does not do right now.
| If those things were implemented I'm confident that it would
| produce meaningful results.
| rafamct wrote:
| It's worth noting that movements of the mouth are extremely
| important in ASL (and other sign languages) and so this
| probably isn't as useful as it might seem at first.
___________________________________________________________________
(page generated 2021-10-11 23:00 UTC)