[HN Gopher] Hand Tracking for Mouse Input
___________________________________________________________________
Hand Tracking for Mouse Input
Author : wonger_
Score : 119 points
Date : 2024-11-19 17:18 UTC (5 hours ago)
(HTM) web link (chernando.com)
(TXT) w3m dump (chernando.com)
| SomeoneOnTheWeb wrote:
| Very impressive! This opens up a whole new set of usages for this
| headset
| ancientstraits wrote:
| It unsettled me a lot about just how much work was put into
| making the JavaScript version of this work instead of a purely
| Python version, due to how OpenCV works. I wonder how universal
| the laggy OpenCV thing is, because my friend faced it too when
| working on an OpenCV application. Is it so unavoidable that the
| only option is to not use Python? I really hope that there is
| another way of going about this.
|
| Anyways, I am very glad that you put in all that effort to make
| the JavaScript version work well. Working under limitations is
| sometimes cool. I remember having to figure out how PyTorch
| evaluated neural networks, and having to convert the PyTorch
| neural network into Java code that could evaluate the model
| without any external libraries (it was very inefficient) for a
| Java code competition. Although there may have been a better way,
| what I did was good enough.
| kevmo314 wrote:
| Creating a faster python implementation can definitely be done.
| OpenCV is a thin wrapper over the C++ API so it's not due to
| some intrinsic python slowness. It is not easy to resolve
| though and I suspect the way python code is typically written
| lends itself to an accidentally blocking operation more often
| than JS code. It's hard to know without seeing the code.
| reynaldi wrote:
| author here, sorry you have to see my janky JavaScript solution
| XD but one good thing of going with Tauri is that developing
| the UI is pretty easy, since it's basically just some web
| pages, but with access to the system, through the JS <-> Rust
| communication.
|
| also, rewriting neural network from PyTorch to Java sounds like
| a big task, I wonder if people are doing ML in Java
| xnx wrote:
| Mediapipe is a lot of fun to play with and I'm surprised how
| little it seems to be used.
|
| You might also be interested in Project Gameface, open source
| Windows and Android software for face input:
| https://github.com/google/project-gameface
|
| Also https://github.com/takeyamayuki/NonMouse
| KaoruAoiShiho wrote:
| If compelling enough I don't mind setting up a downward facing
| camera. Would like to see some more examples though where it
| shows some supremacy over just using a mouse. I'm sure there are
| some scenarios where it is.
| liendolucas wrote:
| Very nice! The sort of thing that I expect to see on HN. Do you
| currently use it? I mean maybe is not perfect for a mouse
| replacement but as a remote movie control as shown in one of the
| last videos is definitely a legit use case. Congrats!
| reynaldi wrote:
| I'm glad it is up to the HN standard :) No, I don't currently
| use it, I am back on mouse and touchpad, but I can definitely
| see what you mean by remote movie control. I would love to
| control my movie projector with my hand.
|
| I've been thinking on and off on how to improve the forward
| facing mode. Since having the hand straight ahead of the camera
| is messing with the readings, I think the MediaPipe is trained
| on seeing the hand from above or below (and maybe sides) but
| not straight ahead.
|
| Ideally, the camera should be like kind of above the hand
| (pointing downwards) to get the best results. But in the
| current version of downward facing mode, the way to move the
| cursor is actually by moving the hand around (x and y position
| of the hand translates to x and y of the cursor). If the camera
| FOV is very big (capturing from far away), then you would have
| to move your hand very far in order to move the cursor, which
| is probably not ideal.
|
| I later found the idea of improvement for this when playing
| around with a smart TV, where the remote is controlling a
| cursor. We do that by tilting the remote like up and down or
| left and right, I think it uses gyroscope or accelerometer (idk
| which is which). I wish I have a video of it to show it better,
| but I don't. I think it is possible to apply the same concept
| here to the hand tracking, so we use the tilt of the hand for
| controlling the cursor. This way, we don't have to rely on the
| hand position captured by the camera. Plus, this will work if
| the camera is far away, since it is only detecting the hand
| tilt. Still thinking about this.
|
| Anyway, I'm glad you find the article interesting!
| aranelsurion wrote:
| > Python version is super laggy, something to do with OpenCV
|
| Most probably I'm wrong, but I wonder if it has anything to do
| with all the text being written to stdout. In the odd chance that
| it happens on the same thread, it might be blocking.
| ikanreed wrote:
| Could it then be resolved by using the no-gil version of python
| they just released?
| mananaysiempre wrote:
| I'm not sure what your reasoning is, but note that blocking
| I/O including print() releases the GIL. (So your seemingly
| innocent debugging print can be extremely not harmless under
| the wrong circumstances.)
| kelseyfrog wrote:
| It's projects like this that _really_ make me want to start on a
| virtual theremin. Wish I had the time :(
| polishdude20 wrote:
| Oh that's an awesome idea!
| jcheng wrote:
| My son did a basic version for a class project, surprisingly
| simple with MediaPipe
|
| https://s-ocheng.github.io/theremin/
|
| https://github.com/s-ocheng/theremin
| vkweb wrote:
| Man, I feel making diagrams / writing handwritten notes will be
| great with this!
| AlfredBarnes wrote:
| I did a very similar project a few months back. My goal was to
| help alleviate some of the RSI issues I have, and give myself a
| different input device.
|
| The precision was always tricky, and while fun, i eventually
| abandoned the project and switched to face tracking and blinking
| so i didn't have to hold up my hand.
|
| For some reason the idea of pointing my webcam down, didn't dawn
| on me ever. I then discovered Project Gameface and just started
| using that.
|
| Happy programming thank you for the excellent write up and read!
| bottom999mottob wrote:
| I'm curious how your experience is using Gameface for day-to-
| day tasks like coding. I assume you still use a keyboard for
| typing, but what about selecting blocks of text or general
| navigation?
| omikun wrote:
| Such a cool and inspirational project! Regarding the drift on
| pinch, have you tried storing the pointer position of the last
| second and use that as the click position? You could show this
| position as a second cursor maybe? I've always wondered why Apple
| doesn't do this for their "eye moves faster than hands" issue as
| well.
| Aspos wrote:
| Some problems in life can be easily fixed with crimson red nail
| polish.
| MrMcCall wrote:
| That made me smirk, but I am curious, "What would be the best
| color for general webcam colored-object tracking?" I'm sure it
| would depend on the sensor, but I wonder if one color would be
| best for the most basic hardware.
| mufasachan wrote:
| An inspiring project. I am looking forward to see some gloves
| connected to a VR device. I think that some cheap sensors, a bit
| of bayesian modelling and a calibration step can offer a proper
| realtime hand gesture tracking.* I am already picturing being
| able to type on a AR keyboard. If the gloves are more expansive
| there might be some haptic feedbacks. VR devices might have more
| open OSes in the future or could use a "streaming" platform to
| access remote desktop environments. I am eager to see all the
| incoming use cases!
|
| *: a lot of it. Plus, the tracking might be task-centered. I
| would not bet on a general hand gesture tracking with cheap
| sensors and bayesian modelling only.
| hoc wrote:
| Tap (tapwithus.com) had a IMU-based solution early on in the
| current VR hype cycle using a IMU for each finger and some kind
| of chord-based letter typing system. Was a fancy proof of your
| geekiness to wear them during VR meetups back then.
|
| I think they have a camera-based wristband version now.
|
| Still doesn't have any room positioning info though, AFAIK.
| 0x20cowboy wrote:
| This is very cool - can you do window focus based on the window I
| am looking at next? :)
| jacobsimon wrote:
| So cool! I was just wondering the other day if it would be
| possible to build this! For front facing mode, I wonder if you
| could add a brief "calibration" step to help it learn the correct
| scale and adjust angles, e.g. give users a few targets to hit on
| the screen
| zh3 wrote:
| Related online demo on using mediapipe for flying spaceships and
| camera/hand interaction to grab VR cubes (2nd link for the demo).
| There was a discussion on hackaday recently [2].
|
| [0]
| https://tympanus.net/codrops/2024/10/24/creating-a-3d-hand-c...
|
| [1] https://tympanus.net/Tutorials/webcam-3D-handcontrols/
|
| [2] [https://hackaday.com/2024/10/25/diy-3d-hand-controller-
| using... DIY 3d hand controller
| hoc wrote:
| Cool path and write-up. Thank you!
|
| Just because of the use case, and me not having used it in an AR
| app while wanting to, I'd like to point to doublepoint.com 's
| totally different but great working approach where they trained a
| NN to interpret a Samsung Watch's IMU data to detect taps. They
| also added a mouse mode.
|
| I think Google's OS also allows client BT mode for the device, so
| I think it can be paired directly as a HID, IIRC.
|
| Not affiliated, but impressed by the funding they received :)
| reynaldi wrote:
| Wow interesting, reminded me of that Meta Orion wristband, I
| wonder if that is the goal.
| jcheng wrote:
| Mediapipe makes hand tracking so easy and it looks SO cool. I did
| a demo at PyData NYC a couple of years ago that let you rotate a
| Plotly 3D plot using your hand:
|
| https://youtu.be/ijRBbtT2tgc?si=2jhYLONw0nCNfs65&t=1453
|
| Source: https://github.com/jcheng5/brownian
___________________________________________________________________
(page generated 2024-11-19 23:00 UTC)