[HN Gopher] Hand Tracking for Mouse Input (2023)
       ___________________________________________________________________
        
       Hand Tracking for Mouse Input (2023)
        
       Author : wonger_
       Score  : 216 points
       Date   : 2024-11-19 17:18 UTC (1 days ago)
        
 (HTM) web link (chernando.com)
 (TXT) w3m dump (chernando.com)
        
       | SomeoneOnTheWeb wrote:
       | Very impressive! This opens up a whole new set of usages for this
       | headset
        
       | ancientstraits wrote:
       | It unsettled me a lot about just how much work was put into
       | making the JavaScript version of this work instead of a purely
       | Python version, due to how OpenCV works. I wonder how universal
       | the laggy OpenCV thing is, because my friend faced it too when
       | working on an OpenCV application. Is it so unavoidable that the
       | only option is to not use Python? I really hope that there is
       | another way of going about this.
       | 
       | Anyways, I am very glad that you put in all that effort to make
       | the JavaScript version work well. Working under limitations is
       | sometimes cool. I remember having to figure out how PyTorch
       | evaluated neural networks, and having to convert the PyTorch
       | neural network into Java code that could evaluate the model
       | without any external libraries (it was very inefficient) for a
       | Java code competition. Although there may have been a better way,
       | what I did was good enough.
        
         | kevmo314 wrote:
         | Creating a faster python implementation can definitely be done.
         | OpenCV is a thin wrapper over the C++ API so it's not due to
         | some intrinsic python slowness. It is not easy to resolve
         | though and I suspect the way python code is typically written
         | lends itself to an accidentally blocking operation more often
         | than JS code. It's hard to know without seeing the code.
        
         | reynaldi wrote:
         | author here, sorry you have to see my janky JavaScript solution
         | XD but one good thing of going with Tauri is that developing
         | the UI is pretty easy, since it's basically just some web
         | pages, but with access to the system, through the JS <-> Rust
         | communication.
         | 
         | also, rewriting neural network from PyTorch to Java sounds like
         | a big task, I wonder if people are doing ML in Java
        
       | xnx wrote:
       | Mediapipe is a lot of fun to play with and I'm surprised how
       | little it seems to be used.
       | 
       | You might also be interested in Project Gameface, open source
       | Windows and Android software for face input:
       | https://github.com/google/project-gameface
       | 
       | Also https://github.com/takeyamayuki/NonMouse
        
         | brcmthrowaway wrote:
         | Probably because the API is written like enterprise Java
         | garbage
        
       | KaoruAoiShiho wrote:
       | If compelling enough I don't mind setting up a downward facing
       | camera. Would like to see some more examples though where it
       | shows some supremacy over just using a mouse. I'm sure there are
       | some scenarios where it is.
        
       | liendolucas wrote:
       | Very nice! The sort of thing that I expect to see on HN. Do you
       | currently use it? I mean maybe is not perfect for a mouse
       | replacement but as a remote movie control as shown in one of the
       | last videos is definitely a legit use case. Congrats!
        
         | reynaldi wrote:
         | I'm glad it is up to the HN standard :) No, I don't currently
         | use it, I am back on mouse and touchpad, but I can definitely
         | see what you mean by remote movie control. I would love to
         | control my movie projector with my hand.
         | 
         | I've been thinking on and off on how to improve the forward
         | facing mode. Since having the hand straight ahead of the camera
         | is messing with the readings, I think the MediaPipe is trained
         | on seeing the hand from above or below (and maybe sides) but
         | not straight ahead.
         | 
         | Ideally, the camera should be like kind of above the hand
         | (pointing downwards) to get the best results. But in the
         | current version of downward facing mode, the way to move the
         | cursor is actually by moving the hand around (x and y position
         | of the hand translates to x and y of the cursor). If the camera
         | FOV is very big (capturing from far away), then you would have
         | to move your hand very far in order to move the cursor, which
         | is probably not ideal.
         | 
         | I later found the idea of improvement for this when playing
         | around with a smart TV, where the remote is controlling a
         | cursor. We do that by tilting the remote like up and down or
         | left and right, I think it uses gyroscope or accelerometer (idk
         | which is which). I wish I have a video of it to show it better,
         | but I don't. I think it is possible to apply the same concept
         | here to the hand tracking, so we use the tilt of the hand for
         | controlling the cursor. This way, we don't have to rely on the
         | hand position captured by the camera. Plus, this will work if
         | the camera is far away, since it is only detecting the hand
         | tilt. Still thinking about this.
         | 
         | Anyway, I'm glad you find the article interesting!
        
           | DontNoodles wrote:
           | I tried to implement Johnny Lee's amazing idea
           | (https://www.youtube.com/watch?v=Jd3-eiid-Uw) using mediapipe
           | face tracking. I could not move much far using simple webcams
           | since it was getting difficult to determine the distance of
           | the face from the camera when the face was turned. I had an
           | Inel RealSense 415 depth tracking camera from a different
           | project and it took care of the distance thing at least. But
           | the jitter thing had me stumped for a long time and I put the
           | project away. With your ideas, I get the strength to revisit
           | it. Thanks!
        
       | aranelsurion wrote:
       | > Python version is super laggy, something to do with OpenCV
       | 
       | Most probably I'm wrong, but I wonder if it has anything to do
       | with all the text being written to stdout. In the odd chance that
       | it happens on the same thread, it might be blocking.
        
         | ikanreed wrote:
         | Could it then be resolved by using the no-gil version of python
         | they just released?
        
           | mananaysiempre wrote:
           | I'm not sure what your reasoning is, but note that blocking
           | I/O including print() releases the GIL. (So your seemingly
           | innocent debugging print can be extremely not harmless under
           | the wrong circumstances.)
        
         | reynaldi wrote:
         | Hmm, I couldn't remember if I tried it without the text being
         | written to stdout. But that's an interesting point, I just
         | didn't expect the print() blocking to be significant.
        
       | kelseyfrog wrote:
       | It's projects like this that _really_ make me want to start on a
       | virtual theremin. Wish I had the time :(
        
         | polishdude20 wrote:
         | Oh that's an awesome idea!
        
         | jcheng wrote:
         | My son did a basic version for a class project, surprisingly
         | simple with MediaPipe
         | 
         | https://s-ocheng.github.io/theremin/
         | 
         | https://github.com/s-ocheng/theremin
        
       | vkweb wrote:
       | Man, I feel making diagrams / writing handwritten notes will be
       | great with this!
        
       | AlfredBarnes wrote:
       | I did a very similar project a few months back. My goal was to
       | help alleviate some of the RSI issues I have, and give myself a
       | different input device.
       | 
       | The precision was always tricky, and while fun, i eventually
       | abandoned the project and switched to face tracking and blinking
       | so i didn't have to hold up my hand.
       | 
       | For some reason the idea of pointing my webcam down, didn't dawn
       | on me ever. I then discovered Project Gameface and just started
       | using that.
       | 
       | Happy programming thank you for the excellent write up and read!
        
         | bottom999mottob wrote:
         | I'm curious how your experience is using Gameface for day-to-
         | day tasks like coding. I assume you still use a keyboard for
         | typing, but what about selecting blocks of text or general
         | navigation?
        
         | reynaldi wrote:
         | Glad you enjoyed reading it! I just checked Project Gameface
         | demo [1], and really cool that it is accurate enough for
         | drawing text, I wonder what it is tracking. Are you still using
         | it?
         | 
         | [1] https://blog.google/technology/ai/google-project-gameface/
        
         | maeil wrote:
         | Similar situation here, super interested in hearing how well
         | gameface works for you. Do you use it for non-gaming as well?
         | 
         | I've succeeded in fully replacing the keyboard (I use Talon
         | voice) but find replacing the mouse tougher. Tried eyetracking
         | but could never get it accurate enough not to be frustrating.
        
       | omikun wrote:
       | Such a cool and inspirational project! Regarding the drift on
       | pinch, have you tried storing the pointer position of the last
       | second and use that as the click position? You could show this
       | position as a second cursor maybe? I've always wondered why Apple
       | doesn't do this for their "eye moves faster than hands" issue as
       | well.
        
       | Aspos wrote:
       | Some problems in life can be easily fixed with crimson red nail
       | polish.
        
         | MrMcCall wrote:
         | That made me smirk, but I am curious, "What would be the best
         | color for general webcam colored-object tracking?" I'm sure it
         | would depend on the sensor, but I wonder if one color would be
         | best for the most basic hardware.
        
           | 0_____0 wrote:
           | Something not found in the background. If you can cleanly
           | segment the image purely on color, that makes the object
           | tracking very very easy if you're tracking a single object.
        
       | mufasachan wrote:
       | An inspiring project. I am looking forward to see some gloves
       | connected to a VR device. I think that some cheap sensors, a bit
       | of bayesian modelling and a calibration step can offer a proper
       | realtime hand gesture tracking.* I am already picturing being
       | able to type on a AR keyboard. If the gloves are more expansive
       | there might be some haptic feedbacks. VR devices might have more
       | open OSes in the future or could use a "streaming" platform to
       | access remote desktop environments. I am eager to see all the
       | incoming use cases!
       | 
       | *: a lot of it. Plus, the tracking might be task-centered. I
       | would not bet on a general hand gesture tracking with cheap
       | sensors and bayesian modelling only.
        
         | hoc wrote:
         | Tap (tapwithus.com) had a IMU-based solution early on in the
         | current VR hype cycle using a IMU for each finger and some kind
         | of chord-based letter typing system. Was a fancy proof of your
         | geekiness to wear them during VR meetups back then.
         | 
         | I think they have a camera-based wristband version now.
         | 
         | Still doesn't have any room positioning info though, AFAIK.
        
       | 0x20cowboy wrote:
       | This is very cool - can you do window focus based on the window I
       | am looking at next? :)
        
       | jacobsimon wrote:
       | So cool! I was just wondering the other day if it would be
       | possible to build this! For front facing mode, I wonder if you
       | could add a brief "calibration" step to help it learn the correct
       | scale and adjust angles, e.g. give users a few targets to hit on
       | the screen
        
         | reynaldi wrote:
         | Hi Jacob, thanks for checking it out. Regarding the calibration
         | step for front facing mode, I'm glad you brought this up. I did
         | think of this, because the distance from the camera/screen to
         | the hand affect the movement so much (the part where the angle
         | of the hand is part of the position calculation).
         | 
         | And you are absolutely right regarding its use for the correct
         | scale. For my implementation, I actually just hardcoded the
         | calibration values, based on where I want the boundaries for
         | the Z axis. This value I got from the reading, so in a way it's
         | like a manual calibration. :D But having calibration is
         | definitely the right idea, I just didn't want to overcomplicate
         | things at that time.
         | 
         | BTW, I am a happy user of Exponent, thanks for making it! I am
         | doing some courses and also peer mocks for interview prep!
        
       | zh3 wrote:
       | Related online demo on using mediapipe for flying spaceships and
       | camera/hand interaction to grab VR cubes (2nd link for the demo).
       | There was a discussion on hackaday recently [2].
       | 
       | [0]
       | https://tympanus.net/codrops/2024/10/24/creating-a-3d-hand-c...
       | 
       | [1] https://tympanus.net/Tutorials/webcam-3D-handcontrols/
       | 
       | [2] [https://hackaday.com/2024/10/25/diy-3d-hand-controller-
       | using... DIY 3d hand controller
        
       | hoc wrote:
       | Cool path and write-up. Thank you!
       | 
       | Just because of the use case, and me not having used it in an AR
       | app while wanting to, I'd like to point to doublepoint.com 's
       | totally different but great working approach where they trained a
       | NN to interpret a Samsung Watch's IMU data to detect taps. They
       | also added a mouse mode.
       | 
       | I think Google's OS also allows client BT mode for the device, so
       | I think it can be paired directly as a HID, IIRC.
       | 
       | Not affiliated, but impressed by the funding they received :)
        
         | reynaldi wrote:
         | Wow interesting, reminded me of that Meta Orion wristband, I
         | wonder if that is the goal.
        
       | jcheng wrote:
       | Mediapipe makes hand tracking so easy and it looks SO cool. I did
       | a demo at PyData NYC a couple of years ago that let you rotate a
       | Plotly 3D plot using your hand:
       | 
       | https://youtu.be/ijRBbtT2tgc?si=2jhYLONw0nCNfs65&t=1453
       | 
       | Source: https://github.com/jcheng5/brownian
        
         | notpublic wrote:
         | That demo is pretty impressive!
        
       | bogardon wrote:
       | could this be the next evolution of gaming mice?
        
       | ps8 wrote:
       | Remeinds me of the Leap Motion controller, now there's a version
       | 2: https://leap2.ultraleap.com/downloads/leap-motion-
       | controller...
        
       | plasticeagle wrote:
       | This is cool, but a moving average filter is pretty bad at
       | removing noise - it tends to be longer than it needs to be
       | because its passband is so bad. Try using a IIR filter instead.
       | You don't need to deal with calculating the coefficients
       | correctly because they'll just be empirically determined.
       | 
       | out = last_out * x + input * (1-x)
       | 
       | Where x is between zero and one. Closer to one, the more
       | filtering you'll do. You can cascade these too, to make a higher
       | order filter, which will work even better.
        
         | thefroh wrote:
         | i've heard good things about using the 1 euro filter for user
         | input related tasks, where you're trying to effectively remove
         | noise, but also keep latency down.
         | 
         | see https://gery.casiez.net/1euro/ with plenty of existing
         | implementations to pick from
        
           | plasticeagle wrote:
           | That sounds very interesting. I've been needing a filter to
           | deal with noisy A/D conversions for pots in an audio project.
           | Noise on a volume control turns into noise on the output, and
           | sounds horrible, but excessive filtering causes unpleasant
           | latency when using the dials.
        
         | reynaldi wrote:
         | Interesting, never heard of the IIR filter before, will keep in
         | mind as one of the options if I ever worked with removing noise
         | again, thanks for sharing!
        
           | jmiskovic wrote:
           | You are already using the IIR filter as part of one-euro
           | filter. The 1EUR filter is an adaptive filter that uses
           | first-order IIR, also called exponential filter as its bases.
           | Depending on your filtering parameters you can turn off the
           | adaptive part and you are left with just the IIR.
        
       | HermanMartinus wrote:
       | This is a very cool demo! Well done!
       | 
       | One suggestion for fixing the cursor drift during finger taps is
       | instead of using hand position, use index finger. Then tap the
       | middle finger to the thumb for selection. Since this doesn't
       | change the cursor position, yet is still a comfortable and easy
       | to parse action.
        
         | reynaldi wrote:
         | Thanks Herman, glad you enjoyed it! I agree with your
         | suggestion, having the middle finger + thumb for tap and index
         | finger for the movement will mitigate the cursor drift. The
         | only reason I used index finger + thumb is so that it is like
         | the Apple Vision Pro input. But definitely could be an
         | improvement.
         | 
         | Unrelated, but shoutout to bearblog. My first blog was on
         | bearblog, which made me start writing. Although I later ended
         | up self-hosting my own blog.
        
       | alana314 wrote:
       | This has tons of potential in the creative technology space.
       | Thanks for sharing!
        
       | ewuhic wrote:
       | A great demo, but how I wish there was a keyboard-less method for
       | words input based on swipe-typing, meaning I do not press virtual
       | keys, I just wave my index finger in the air, and the vision pick
       | ups the path traces and converts them for words. Well, if there's
       | something else asking for even less effort, maybe even something
       | that's already implemented - I am all open to suggestions!
        
       | pacifi30 wrote:
       | Amazing work! I have been working on robotifying operation task
       | for my company - a robot hand and a vision that can complete a
       | task on the monitor just like humans do. Have been toying with
       | openAI vision model to get the mouse coordinates but it's slow
       | and does not return the correct coordinates always (probably due
       | to LLM not understanding geometry)
       | 
       | Anyhow , looking forward to try your approach with mediapipe.
       | Thanks for the write up and demo, inspirational.
        
       | yireawu wrote:
       | erm i snuk in hackers news i kid erm what the sigmwa
        
       ___________________________________________________________________
       (page generated 2024-11-20 23:01 UTC)