[HN Gopher] Hand Tracking for Mouse Input (2023)
___________________________________________________________________
Hand Tracking for Mouse Input (2023)
Author : wonger_
Score : 216 points
Date : 2024-11-19 17:18 UTC (1 days ago)
(HTM) web link (chernando.com)
(TXT) w3m dump (chernando.com)
| SomeoneOnTheWeb wrote:
| Very impressive! This opens up a whole new set of usages for this
| headset
| ancientstraits wrote:
| It unsettled me a lot about just how much work was put into
| making the JavaScript version of this work instead of a purely
| Python version, due to how OpenCV works. I wonder how universal
| the laggy OpenCV thing is, because my friend faced it too when
| working on an OpenCV application. Is it so unavoidable that the
| only option is to not use Python? I really hope that there is
| another way of going about this.
|
| Anyways, I am very glad that you put in all that effort to make
| the JavaScript version work well. Working under limitations is
| sometimes cool. I remember having to figure out how PyTorch
| evaluated neural networks, and having to convert the PyTorch
| neural network into Java code that could evaluate the model
| without any external libraries (it was very inefficient) for a
| Java code competition. Although there may have been a better way,
| what I did was good enough.
| kevmo314 wrote:
| Creating a faster python implementation can definitely be done.
| OpenCV is a thin wrapper over the C++ API so it's not due to
| some intrinsic python slowness. It is not easy to resolve
| though and I suspect the way python code is typically written
| lends itself to an accidentally blocking operation more often
| than JS code. It's hard to know without seeing the code.
| reynaldi wrote:
| author here, sorry you have to see my janky JavaScript solution
| XD but one good thing of going with Tauri is that developing
| the UI is pretty easy, since it's basically just some web
| pages, but with access to the system, through the JS <-> Rust
| communication.
|
| also, rewriting neural network from PyTorch to Java sounds like
| a big task, I wonder if people are doing ML in Java
| xnx wrote:
| Mediapipe is a lot of fun to play with and I'm surprised how
| little it seems to be used.
|
| You might also be interested in Project Gameface, open source
| Windows and Android software for face input:
| https://github.com/google/project-gameface
|
| Also https://github.com/takeyamayuki/NonMouse
| brcmthrowaway wrote:
| Probably because the API is written like enterprise Java
| garbage
| KaoruAoiShiho wrote:
| If compelling enough I don't mind setting up a downward facing
| camera. Would like to see some more examples though where it
| shows some supremacy over just using a mouse. I'm sure there are
| some scenarios where it is.
| liendolucas wrote:
| Very nice! The sort of thing that I expect to see on HN. Do you
| currently use it? I mean maybe is not perfect for a mouse
| replacement but as a remote movie control as shown in one of the
| last videos is definitely a legit use case. Congrats!
| reynaldi wrote:
| I'm glad it is up to the HN standard :) No, I don't currently
| use it, I am back on mouse and touchpad, but I can definitely
| see what you mean by remote movie control. I would love to
| control my movie projector with my hand.
|
| I've been thinking on and off on how to improve the forward
| facing mode. Since having the hand straight ahead of the camera
| is messing with the readings, I think the MediaPipe is trained
| on seeing the hand from above or below (and maybe sides) but
| not straight ahead.
|
| Ideally, the camera should be like kind of above the hand
| (pointing downwards) to get the best results. But in the
| current version of downward facing mode, the way to move the
| cursor is actually by moving the hand around (x and y position
| of the hand translates to x and y of the cursor). If the camera
| FOV is very big (capturing from far away), then you would have
| to move your hand very far in order to move the cursor, which
| is probably not ideal.
|
| I later found the idea of improvement for this when playing
| around with a smart TV, where the remote is controlling a
| cursor. We do that by tilting the remote like up and down or
| left and right, I think it uses gyroscope or accelerometer (idk
| which is which). I wish I have a video of it to show it better,
| but I don't. I think it is possible to apply the same concept
| here to the hand tracking, so we use the tilt of the hand for
| controlling the cursor. This way, we don't have to rely on the
| hand position captured by the camera. Plus, this will work if
| the camera is far away, since it is only detecting the hand
| tilt. Still thinking about this.
|
| Anyway, I'm glad you find the article interesting!
| DontNoodles wrote:
| I tried to implement Johnny Lee's amazing idea
| (https://www.youtube.com/watch?v=Jd3-eiid-Uw) using mediapipe
| face tracking. I could not move much far using simple webcams
| since it was getting difficult to determine the distance of
| the face from the camera when the face was turned. I had an
| Inel RealSense 415 depth tracking camera from a different
| project and it took care of the distance thing at least. But
| the jitter thing had me stumped for a long time and I put the
| project away. With your ideas, I get the strength to revisit
| it. Thanks!
| aranelsurion wrote:
| > Python version is super laggy, something to do with OpenCV
|
| Most probably I'm wrong, but I wonder if it has anything to do
| with all the text being written to stdout. In the odd chance that
| it happens on the same thread, it might be blocking.
| ikanreed wrote:
| Could it then be resolved by using the no-gil version of python
| they just released?
| mananaysiempre wrote:
| I'm not sure what your reasoning is, but note that blocking
| I/O including print() releases the GIL. (So your seemingly
| innocent debugging print can be extremely not harmless under
| the wrong circumstances.)
| reynaldi wrote:
| Hmm, I couldn't remember if I tried it without the text being
| written to stdout. But that's an interesting point, I just
| didn't expect the print() blocking to be significant.
| kelseyfrog wrote:
| It's projects like this that _really_ make me want to start on a
| virtual theremin. Wish I had the time :(
| polishdude20 wrote:
| Oh that's an awesome idea!
| jcheng wrote:
| My son did a basic version for a class project, surprisingly
| simple with MediaPipe
|
| https://s-ocheng.github.io/theremin/
|
| https://github.com/s-ocheng/theremin
| vkweb wrote:
| Man, I feel making diagrams / writing handwritten notes will be
| great with this!
| AlfredBarnes wrote:
| I did a very similar project a few months back. My goal was to
| help alleviate some of the RSI issues I have, and give myself a
| different input device.
|
| The precision was always tricky, and while fun, i eventually
| abandoned the project and switched to face tracking and blinking
| so i didn't have to hold up my hand.
|
| For some reason the idea of pointing my webcam down, didn't dawn
| on me ever. I then discovered Project Gameface and just started
| using that.
|
| Happy programming thank you for the excellent write up and read!
| bottom999mottob wrote:
| I'm curious how your experience is using Gameface for day-to-
| day tasks like coding. I assume you still use a keyboard for
| typing, but what about selecting blocks of text or general
| navigation?
| reynaldi wrote:
| Glad you enjoyed reading it! I just checked Project Gameface
| demo [1], and really cool that it is accurate enough for
| drawing text, I wonder what it is tracking. Are you still using
| it?
|
| [1] https://blog.google/technology/ai/google-project-gameface/
| maeil wrote:
| Similar situation here, super interested in hearing how well
| gameface works for you. Do you use it for non-gaming as well?
|
| I've succeeded in fully replacing the keyboard (I use Talon
| voice) but find replacing the mouse tougher. Tried eyetracking
| but could never get it accurate enough not to be frustrating.
| omikun wrote:
| Such a cool and inspirational project! Regarding the drift on
| pinch, have you tried storing the pointer position of the last
| second and use that as the click position? You could show this
| position as a second cursor maybe? I've always wondered why Apple
| doesn't do this for their "eye moves faster than hands" issue as
| well.
| Aspos wrote:
| Some problems in life can be easily fixed with crimson red nail
| polish.
| MrMcCall wrote:
| That made me smirk, but I am curious, "What would be the best
| color for general webcam colored-object tracking?" I'm sure it
| would depend on the sensor, but I wonder if one color would be
| best for the most basic hardware.
| 0_____0 wrote:
| Something not found in the background. If you can cleanly
| segment the image purely on color, that makes the object
| tracking very very easy if you're tracking a single object.
| mufasachan wrote:
| An inspiring project. I am looking forward to see some gloves
| connected to a VR device. I think that some cheap sensors, a bit
| of bayesian modelling and a calibration step can offer a proper
| realtime hand gesture tracking.* I am already picturing being
| able to type on a AR keyboard. If the gloves are more expansive
| there might be some haptic feedbacks. VR devices might have more
| open OSes in the future or could use a "streaming" platform to
| access remote desktop environments. I am eager to see all the
| incoming use cases!
|
| *: a lot of it. Plus, the tracking might be task-centered. I
| would not bet on a general hand gesture tracking with cheap
| sensors and bayesian modelling only.
| hoc wrote:
| Tap (tapwithus.com) had a IMU-based solution early on in the
| current VR hype cycle using a IMU for each finger and some kind
| of chord-based letter typing system. Was a fancy proof of your
| geekiness to wear them during VR meetups back then.
|
| I think they have a camera-based wristband version now.
|
| Still doesn't have any room positioning info though, AFAIK.
| 0x20cowboy wrote:
| This is very cool - can you do window focus based on the window I
| am looking at next? :)
| jacobsimon wrote:
| So cool! I was just wondering the other day if it would be
| possible to build this! For front facing mode, I wonder if you
| could add a brief "calibration" step to help it learn the correct
| scale and adjust angles, e.g. give users a few targets to hit on
| the screen
| reynaldi wrote:
| Hi Jacob, thanks for checking it out. Regarding the calibration
| step for front facing mode, I'm glad you brought this up. I did
| think of this, because the distance from the camera/screen to
| the hand affect the movement so much (the part where the angle
| of the hand is part of the position calculation).
|
| And you are absolutely right regarding its use for the correct
| scale. For my implementation, I actually just hardcoded the
| calibration values, based on where I want the boundaries for
| the Z axis. This value I got from the reading, so in a way it's
| like a manual calibration. :D But having calibration is
| definitely the right idea, I just didn't want to overcomplicate
| things at that time.
|
| BTW, I am a happy user of Exponent, thanks for making it! I am
| doing some courses and also peer mocks for interview prep!
| zh3 wrote:
| Related online demo on using mediapipe for flying spaceships and
| camera/hand interaction to grab VR cubes (2nd link for the demo).
| There was a discussion on hackaday recently [2].
|
| [0]
| https://tympanus.net/codrops/2024/10/24/creating-a-3d-hand-c...
|
| [1] https://tympanus.net/Tutorials/webcam-3D-handcontrols/
|
| [2] [https://hackaday.com/2024/10/25/diy-3d-hand-controller-
| using... DIY 3d hand controller
| hoc wrote:
| Cool path and write-up. Thank you!
|
| Just because of the use case, and me not having used it in an AR
| app while wanting to, I'd like to point to doublepoint.com 's
| totally different but great working approach where they trained a
| NN to interpret a Samsung Watch's IMU data to detect taps. They
| also added a mouse mode.
|
| I think Google's OS also allows client BT mode for the device, so
| I think it can be paired directly as a HID, IIRC.
|
| Not affiliated, but impressed by the funding they received :)
| reynaldi wrote:
| Wow interesting, reminded me of that Meta Orion wristband, I
| wonder if that is the goal.
| jcheng wrote:
| Mediapipe makes hand tracking so easy and it looks SO cool. I did
| a demo at PyData NYC a couple of years ago that let you rotate a
| Plotly 3D plot using your hand:
|
| https://youtu.be/ijRBbtT2tgc?si=2jhYLONw0nCNfs65&t=1453
|
| Source: https://github.com/jcheng5/brownian
| notpublic wrote:
| That demo is pretty impressive!
| bogardon wrote:
| could this be the next evolution of gaming mice?
| ps8 wrote:
| Remeinds me of the Leap Motion controller, now there's a version
| 2: https://leap2.ultraleap.com/downloads/leap-motion-
| controller...
| plasticeagle wrote:
| This is cool, but a moving average filter is pretty bad at
| removing noise - it tends to be longer than it needs to be
| because its passband is so bad. Try using a IIR filter instead.
| You don't need to deal with calculating the coefficients
| correctly because they'll just be empirically determined.
|
| out = last_out * x + input * (1-x)
|
| Where x is between zero and one. Closer to one, the more
| filtering you'll do. You can cascade these too, to make a higher
| order filter, which will work even better.
| thefroh wrote:
| i've heard good things about using the 1 euro filter for user
| input related tasks, where you're trying to effectively remove
| noise, but also keep latency down.
|
| see https://gery.casiez.net/1euro/ with plenty of existing
| implementations to pick from
| plasticeagle wrote:
| That sounds very interesting. I've been needing a filter to
| deal with noisy A/D conversions for pots in an audio project.
| Noise on a volume control turns into noise on the output, and
| sounds horrible, but excessive filtering causes unpleasant
| latency when using the dials.
| reynaldi wrote:
| Interesting, never heard of the IIR filter before, will keep in
| mind as one of the options if I ever worked with removing noise
| again, thanks for sharing!
| jmiskovic wrote:
| You are already using the IIR filter as part of one-euro
| filter. The 1EUR filter is an adaptive filter that uses
| first-order IIR, also called exponential filter as its bases.
| Depending on your filtering parameters you can turn off the
| adaptive part and you are left with just the IIR.
| HermanMartinus wrote:
| This is a very cool demo! Well done!
|
| One suggestion for fixing the cursor drift during finger taps is
| instead of using hand position, use index finger. Then tap the
| middle finger to the thumb for selection. Since this doesn't
| change the cursor position, yet is still a comfortable and easy
| to parse action.
| reynaldi wrote:
| Thanks Herman, glad you enjoyed it! I agree with your
| suggestion, having the middle finger + thumb for tap and index
| finger for the movement will mitigate the cursor drift. The
| only reason I used index finger + thumb is so that it is like
| the Apple Vision Pro input. But definitely could be an
| improvement.
|
| Unrelated, but shoutout to bearblog. My first blog was on
| bearblog, which made me start writing. Although I later ended
| up self-hosting my own blog.
| alana314 wrote:
| This has tons of potential in the creative technology space.
| Thanks for sharing!
| ewuhic wrote:
| A great demo, but how I wish there was a keyboard-less method for
| words input based on swipe-typing, meaning I do not press virtual
| keys, I just wave my index finger in the air, and the vision pick
| ups the path traces and converts them for words. Well, if there's
| something else asking for even less effort, maybe even something
| that's already implemented - I am all open to suggestions!
| pacifi30 wrote:
| Amazing work! I have been working on robotifying operation task
| for my company - a robot hand and a vision that can complete a
| task on the monitor just like humans do. Have been toying with
| openAI vision model to get the mouse coordinates but it's slow
| and does not return the correct coordinates always (probably due
| to LLM not understanding geometry)
|
| Anyhow , looking forward to try your approach with mediapipe.
| Thanks for the write up and demo, inspirational.
| yireawu wrote:
| erm i snuk in hackers news i kid erm what the sigmwa
___________________________________________________________________
(page generated 2024-11-20 23:01 UTC)