[HN Gopher] Hand Tracking for Mouse Input
       ___________________________________________________________________
        
       Hand Tracking for Mouse Input
        
       Author : wonger_
       Score  : 119 points
       Date   : 2024-11-19 17:18 UTC (5 hours ago)
        
 (HTM) web link (chernando.com)
 (TXT) w3m dump (chernando.com)
        
       | SomeoneOnTheWeb wrote:
       | Very impressive! This opens up a whole new set of usages for this
       | headset
        
       | ancientstraits wrote:
       | It unsettled me a lot about just how much work was put into
       | making the JavaScript version of this work instead of a purely
       | Python version, due to how OpenCV works. I wonder how universal
       | the laggy OpenCV thing is, because my friend faced it too when
       | working on an OpenCV application. Is it so unavoidable that the
       | only option is to not use Python? I really hope that there is
       | another way of going about this.
       | 
       | Anyways, I am very glad that you put in all that effort to make
       | the JavaScript version work well. Working under limitations is
       | sometimes cool. I remember having to figure out how PyTorch
       | evaluated neural networks, and having to convert the PyTorch
       | neural network into Java code that could evaluate the model
       | without any external libraries (it was very inefficient) for a
       | Java code competition. Although there may have been a better way,
       | what I did was good enough.
        
         | kevmo314 wrote:
         | Creating a faster python implementation can definitely be done.
         | OpenCV is a thin wrapper over the C++ API so it's not due to
         | some intrinsic python slowness. It is not easy to resolve
         | though and I suspect the way python code is typically written
         | lends itself to an accidentally blocking operation more often
         | than JS code. It's hard to know without seeing the code.
        
         | reynaldi wrote:
         | author here, sorry you have to see my janky JavaScript solution
         | XD but one good thing of going with Tauri is that developing
         | the UI is pretty easy, since it's basically just some web
         | pages, but with access to the system, through the JS <-> Rust
         | communication.
         | 
         | also, rewriting neural network from PyTorch to Java sounds like
         | a big task, I wonder if people are doing ML in Java
        
       | xnx wrote:
       | Mediapipe is a lot of fun to play with and I'm surprised how
       | little it seems to be used.
       | 
       | You might also be interested in Project Gameface, open source
       | Windows and Android software for face input:
       | https://github.com/google/project-gameface
       | 
       | Also https://github.com/takeyamayuki/NonMouse
        
       | KaoruAoiShiho wrote:
       | If compelling enough I don't mind setting up a downward facing
       | camera. Would like to see some more examples though where it
       | shows some supremacy over just using a mouse. I'm sure there are
       | some scenarios where it is.
        
       | liendolucas wrote:
       | Very nice! The sort of thing that I expect to see on HN. Do you
       | currently use it? I mean maybe is not perfect for a mouse
       | replacement but as a remote movie control as shown in one of the
       | last videos is definitely a legit use case. Congrats!
        
         | reynaldi wrote:
         | I'm glad it is up to the HN standard :) No, I don't currently
         | use it, I am back on mouse and touchpad, but I can definitely
         | see what you mean by remote movie control. I would love to
         | control my movie projector with my hand.
         | 
         | I've been thinking on and off on how to improve the forward
         | facing mode. Since having the hand straight ahead of the camera
         | is messing with the readings, I think the MediaPipe is trained
         | on seeing the hand from above or below (and maybe sides) but
         | not straight ahead.
         | 
         | Ideally, the camera should be like kind of above the hand
         | (pointing downwards) to get the best results. But in the
         | current version of downward facing mode, the way to move the
         | cursor is actually by moving the hand around (x and y position
         | of the hand translates to x and y of the cursor). If the camera
         | FOV is very big (capturing from far away), then you would have
         | to move your hand very far in order to move the cursor, which
         | is probably not ideal.
         | 
         | I later found the idea of improvement for this when playing
         | around with a smart TV, where the remote is controlling a
         | cursor. We do that by tilting the remote like up and down or
         | left and right, I think it uses gyroscope or accelerometer (idk
         | which is which). I wish I have a video of it to show it better,
         | but I don't. I think it is possible to apply the same concept
         | here to the hand tracking, so we use the tilt of the hand for
         | controlling the cursor. This way, we don't have to rely on the
         | hand position captured by the camera. Plus, this will work if
         | the camera is far away, since it is only detecting the hand
         | tilt. Still thinking about this.
         | 
         | Anyway, I'm glad you find the article interesting!
        
       | aranelsurion wrote:
       | > Python version is super laggy, something to do with OpenCV
       | 
       | Most probably I'm wrong, but I wonder if it has anything to do
       | with all the text being written to stdout. In the odd chance that
       | it happens on the same thread, it might be blocking.
        
         | ikanreed wrote:
         | Could it then be resolved by using the no-gil version of python
         | they just released?
        
           | mananaysiempre wrote:
           | I'm not sure what your reasoning is, but note that blocking
           | I/O including print() releases the GIL. (So your seemingly
           | innocent debugging print can be extremely not harmless under
           | the wrong circumstances.)
        
       | kelseyfrog wrote:
       | It's projects like this that _really_ make me want to start on a
       | virtual theremin. Wish I had the time :(
        
         | polishdude20 wrote:
         | Oh that's an awesome idea!
        
         | jcheng wrote:
         | My son did a basic version for a class project, surprisingly
         | simple with MediaPipe
         | 
         | https://s-ocheng.github.io/theremin/
         | 
         | https://github.com/s-ocheng/theremin
        
       | vkweb wrote:
       | Man, I feel making diagrams / writing handwritten notes will be
       | great with this!
        
       | AlfredBarnes wrote:
       | I did a very similar project a few months back. My goal was to
       | help alleviate some of the RSI issues I have, and give myself a
       | different input device.
       | 
       | The precision was always tricky, and while fun, i eventually
       | abandoned the project and switched to face tracking and blinking
       | so i didn't have to hold up my hand.
       | 
       | For some reason the idea of pointing my webcam down, didn't dawn
       | on me ever. I then discovered Project Gameface and just started
       | using that.
       | 
       | Happy programming thank you for the excellent write up and read!
        
         | bottom999mottob wrote:
         | I'm curious how your experience is using Gameface for day-to-
         | day tasks like coding. I assume you still use a keyboard for
         | typing, but what about selecting blocks of text or general
         | navigation?
        
       | omikun wrote:
       | Such a cool and inspirational project! Regarding the drift on
       | pinch, have you tried storing the pointer position of the last
       | second and use that as the click position? You could show this
       | position as a second cursor maybe? I've always wondered why Apple
       | doesn't do this for their "eye moves faster than hands" issue as
       | well.
        
       | Aspos wrote:
       | Some problems in life can be easily fixed with crimson red nail
       | polish.
        
         | MrMcCall wrote:
         | That made me smirk, but I am curious, "What would be the best
         | color for general webcam colored-object tracking?" I'm sure it
         | would depend on the sensor, but I wonder if one color would be
         | best for the most basic hardware.
        
       | mufasachan wrote:
       | An inspiring project. I am looking forward to see some gloves
       | connected to a VR device. I think that some cheap sensors, a bit
       | of bayesian modelling and a calibration step can offer a proper
       | realtime hand gesture tracking.* I am already picturing being
       | able to type on a AR keyboard. If the gloves are more expansive
       | there might be some haptic feedbacks. VR devices might have more
       | open OSes in the future or could use a "streaming" platform to
       | access remote desktop environments. I am eager to see all the
       | incoming use cases!
       | 
       | *: a lot of it. Plus, the tracking might be task-centered. I
       | would not bet on a general hand gesture tracking with cheap
       | sensors and bayesian modelling only.
        
         | hoc wrote:
         | Tap (tapwithus.com) had a IMU-based solution early on in the
         | current VR hype cycle using a IMU for each finger and some kind
         | of chord-based letter typing system. Was a fancy proof of your
         | geekiness to wear them during VR meetups back then.
         | 
         | I think they have a camera-based wristband version now.
         | 
         | Still doesn't have any room positioning info though, AFAIK.
        
       | 0x20cowboy wrote:
       | This is very cool - can you do window focus based on the window I
       | am looking at next? :)
        
       | jacobsimon wrote:
       | So cool! I was just wondering the other day if it would be
       | possible to build this! For front facing mode, I wonder if you
       | could add a brief "calibration" step to help it learn the correct
       | scale and adjust angles, e.g. give users a few targets to hit on
       | the screen
        
       | zh3 wrote:
       | Related online demo on using mediapipe for flying spaceships and
       | camera/hand interaction to grab VR cubes (2nd link for the demo).
       | There was a discussion on hackaday recently [2].
       | 
       | [0]
       | https://tympanus.net/codrops/2024/10/24/creating-a-3d-hand-c...
       | 
       | [1] https://tympanus.net/Tutorials/webcam-3D-handcontrols/
       | 
       | [2] [https://hackaday.com/2024/10/25/diy-3d-hand-controller-
       | using... DIY 3d hand controller
        
       | hoc wrote:
       | Cool path and write-up. Thank you!
       | 
       | Just because of the use case, and me not having used it in an AR
       | app while wanting to, I'd like to point to doublepoint.com 's
       | totally different but great working approach where they trained a
       | NN to interpret a Samsung Watch's IMU data to detect taps. They
       | also added a mouse mode.
       | 
       | I think Google's OS also allows client BT mode for the device, so
       | I think it can be paired directly as a HID, IIRC.
       | 
       | Not affiliated, but impressed by the funding they received :)
        
         | reynaldi wrote:
         | Wow interesting, reminded me of that Meta Orion wristband, I
         | wonder if that is the goal.
        
       | jcheng wrote:
       | Mediapipe makes hand tracking so easy and it looks SO cool. I did
       | a demo at PyData NYC a couple of years ago that let you rotate a
       | Plotly 3D plot using your hand:
       | 
       | https://youtu.be/ijRBbtT2tgc?si=2jhYLONw0nCNfs65&t=1453
       | 
       | Source: https://github.com/jcheng5/brownian
        
       ___________________________________________________________________
       (page generated 2024-11-19 23:00 UTC)