[HN Gopher] Numen: Voice Control for Handsfree Computing
___________________________________________________________________
Numen: Voice Control for Handsfree Computing
Author : memorable
Score : 98 points
Date : 2023-02-16 07:24 UTC (2 days ago)
(HTM) web link (numenvoice.com)
(TXT) w3m dump (numenvoice.com)
| penjelly wrote:
| interesting... i just broke my arm so this is potentially useful
| for me. The words you use will take some getting used to though.
| teknopaul wrote:
| This is soo needed.
|
| All big techs use of voice has so far required Internet access
| and is creepy. Googles is apawling in that it changes so things
| that did work, stop working.
|
| What voice needed was for humans to adjust a little to make the
| computer work easier. e.g. "Computer" "file save" is much more
| efficient all round than sending off audio to the bork for AI to
| try work out what it means.
| jzellis wrote:
| Dunno what the video is, but it's broken on Firefox mobile at
| least.
| pfortuny wrote:
| Doesn't work on ipad safari either...
| pimlottc wrote:
| Broken in Safari on iPhone as well
| 58x14 wrote:
| Same here - but I bet it's the HN hug
| zerop wrote:
| Intresting. Why will someone use handsfree computing. It's slow,
| i would rather type.
| replicanteven wrote:
| TBF, you need working hands to use hands-on computers.
|
| Plus, the offline part could make a good starting point for a
| DIY personal assistant.
|
| That said, their "getting started" sounds...esoteric.
|
| >There normally isn't any output but you should be able to type
| "hey" by saying "hoof eve yank" and transcribe a sentence after
| saying "scribe". You can terminate it by pressing Ctrl+c or
| saying "troll cap".
| simplyinfinity wrote:
| Because they might be physically impaired in some way. Or have
| severe Repetitive Strain Injury (RSI).
| [deleted]
| gramiro wrote:
| Interesting project for providing better accessibility!
|
| Reminded me a bit of those scenes on Blade Runner where Deckard
| is asking the computer to zoom in a certain area and enhance
| image :D
| noduerme wrote:
| > Deckard
|
| It does. Bloody awesome. I'm re-watching this video trying to
| understand some of the shorthand being used. There's "bang" for
| exclamation mark; "cap drum" (?) for `cd`. I can't figure out
| what words he uses to invoke `git clone` at 1:27 but it's
| incredibly futuristic. I wish my daily driver wasn't a Mac
| these days =(
| ArchieMaclean wrote:
| It looks like they have a word (or multiple) for each letter
| of the alphabet. So CD is "change drum", git clone is " guest
| ice traps space cap look [Ctrl right - autocomplete]", where
| you can read the commands from the first letters of each
| word.
|
| Edit: the default 'phrases' are here:
| https://git.sr.ht/~geb/numen/tree/master/item/phrases
| simultsop wrote:
| Find it counter intuitive, like we have to memorize new
| constants that a programer defines... But having this with
| english words like next line or page down or page up,
| gamechanger.
| ArchieMaclean wrote:
| I like this a lot. This is built upon Vosk [0], open source voice
| recognition. I must try it for some of my own projects!
|
| [0] https://alphacephei.com/vosk/
| CodexArcana wrote:
| I'm only interested if you have to activate it by saying
| "Hello... Numen..." ala Seinfeld.
| TheHumanist wrote:
| [dead]
| nathias wrote:
| this looks much better than any voice control I've seen so far, I
| wonder if it requires tiles or you can integrate it with other
| tiling managers
| cube2222 wrote:
| It's worth mentioning Talon[0] here, which is a system for
| offline voice control as well, with great python-based scripting
| (and also supports eye tracking, though I haven't used it
| myself).
|
| Using your computer or programming with it works like a charm,
| with some interesting and impressive projects based on it coming
| out as well, like Cursorless[1].
|
| There's a great strangeloop talk[2] demonstrating talon and the
| actual state of voice coding, which is how I discovered it (hint:
| it's much better than you'd expect, and straightforward to learn
| at that).
|
| [0]: https://talonvoice.com/
|
| [1]: https://github.com/cursorless-dev/cursorless
|
| [2]: https://youtu.be/YKuRkGkf5HU
|
| Disclaimer: not affiliated, just a happy occasional user
| 2Gkashmiri wrote:
| I can go back to win 7 and it had "speech recognition". Before
| that in xp days I dabbled with offline dragon and stuff.
|
| Point is, I've been bugged with this problem.
|
| " I need a dictation software to read me back what it
| understood and typed". ALL the software either assume you are
| looking at the screen and like the win7 (scratch that) I don't
| want that.
|
| Let me say "I was walking and running besides the train."
| <pause> "I was walking and besides The train." Would be
| response so I would say "scratch that." And I would repeat it
| or ask for help and all.
|
| Why isn't such a system there?
|
| Think of it as a person doing the typing. You write a line,
| they read back what you said, okay, next. Otherwise fix that
| like this
| pcdoodle wrote:
| It seems SAPI might be removed from the latest versions of
| windows. It was pretty simple to use in VB6 in pure dictation
| mode or you could even load a dictionary of listen words for
| even higher false positives. Any replacements that anyone is
| aware of for offline dictation / dictionary?
| comfypotato wrote:
| Was hoping for a comparison to Talon. Talon is incredible. I'm
| particularly interested to see if any project spawns focused
| around augmenting the keyboard as opposed to replacing it in a
| programming context.
| rom-antics wrote:
| You might be interested in Cursorless's experimental keyboard
| mode: https://www.cursorless.org/docs/user/experimental/keybo
| ard/m...
| orbisvicis wrote:
| The talon demonstration from the last link was inspiring, but
| it works in the exact opposite fashion that I would have
| imagined. The code-development examples are command-based, with
| a command to enter phrase mode. I'd have expected with
| technology such as tree-sitter and IntelliJ etc, that by
| parsing the syntax tree of current computer language for
| completions, development could occur completely in phrase mode
| with only a few commands for handling unknown inputs such as
| new variable names.
|
| I'm curious if anyone has ever tried implementing the latter,
| or compared the two approaches. I'm sure there would be many
| obstacles I haven't considered.
| lunixbochs wrote:
| Fixed commands are fast, precise, and predictable.
|
| Assuming you mean speaking in natural language, that's slower
| to say, and likely less precise and predictable if you want
| to be able to just say "anything" any have a result.
|
| You need a command system either way. If you want to express
| some precise intention, you need to understand what the
| command system will do.
|
| There is a combined "mixed mode" system I've been testing in
| the talon beta where you can use both phrases and commands
| without switching modes.
| unshavedyak wrote:
| Wow eyetracking is not something i thought of.. and now i want
| it.
|
| I wonder if we could replace mouse with eyetracking? I wouldn't
| expect it to be accurate enough though, give micro movements
| that eyes do.. and in general erratic movements.. but i'd love
| to be wrong.
| orbisvicis wrote:
| Eye tracking is useful if you can or want to sit in front of
| a desk. I'm concerned at the lack of diversity in eye-
| tracking manufacturers. Tobii is the only commercial brand
| I'm aware of or that Talon supports and initial setup
| requires Windows (I don't know if recalibration also requires
| Windows).
|
| I haven't used eye tracking but I'd imagine that commands
| could be given in the short time that an on-screen element is
| focused... and the rest of the time the cursor jumps
| erratically.
| russellbeattie wrote:
| I've been researching eye tracking for my own project for the
| past year. I have a Tobii eye tracker which is probably the
| best eye tracking device for consumers currently (or the only
| one really). It's much more accurate than trying to repurpose
| a webcam.
|
| So the problem with eye tracking is what's called the "midas
| touch" problem. Everything you look at is potentially a
| target. If you were to simply connect your mouse pointer to
| your gaze, for example, any sort of hover effect on a web
| page would be activated simply by glancing at it. [1]
|
| Additionally, our eyes are constantly making small movements
| call saccades [2]. If you track eye movement perfectly, the
| target will wobble all over the screen like mad. The ways to
| alleviate this is by expanding the target visually so that
| the small movements are contained within a "bubble" or by
| delaying the target slightly so the movements can be smoothed
| out which naturally causes inaccuracy and latency. [3] There
| are efforts to predict the eyes movements to give the user
| the impression of lower latency, but it's imperfect solution.
|
| Another issue is gaze activation. Computers can't read our
| minds, so systems which require one to stare fixedly at an
| object in order to activate an interface are common. The
| problem with this is the both the delay and effort required.
| You can easily get a headache from the effort of trying to
| fixate your eyes on a target. Eye tracking in VR and AR have
| similar problems.
|
| There are other forms of activation - if you open your
| iPhone's accessibility menu in the settings, you'll see a
| bunch of options including head nods, facial gestures, eye
| blinks and more. [4]
|
| The future of eye tracking is definitely multimodal. A
| specific gaze target combined with a gesture or hotword is
| the way humans naturally interact with other humans (you look
| at a person, get confirmation through eye contact or a nod,
| and then speak or gesture.) What's amazing is the amount of
| redundant effort being made in this area. Some of this stuff
| has been known a decade or more. There are tons of both
| research papers and thousands of patents to explore which
| cover the topic in great detail. There is very little that
| hasn't already been solved.
|
| 1. https://uxdesign.cc/the-midas-touch-effect-the-most-
| unknown-...
|
| 2. https://en.m.wikipedia.org/wiki/Saccade
|
| 3. https://help.tobii.com/hc/en-us/articles/210245345-How-to-
| se...
|
| 4. https://support.apple.com/accessibility
| lunixbochs wrote:
| Talon's eye tracking functions as a mouse replacement. Is
| there a specific demo you'd like to see? I can record one.
| 58x14 wrote:
| That strangeloop talk inspired me to explore a lot of things,
| including my methodology for writing command phrases that are
| phonetically distinct and succinct.
|
| Glad to hear Talon is still around! Their slack has grown and
| they really seem like they have a product now.
| theusus wrote:
| I tried this and the speech recognition is really poor.
| lunixbochs wrote:
| The Talon model is fairly accurate, but it can be confusing
| for new users to use the command system correctly. I posted a
| sibling reply about this, but the most common reason for
| Talon users to complain about the recognition is that they
| are in the strict "command mode" and say things that aren't
| actually commands.
|
| If you encounter what feels like poor recognition in Talon, I
| recommend enabling Save Recordings and zipping+sharing some
| examples on the Slack and asking for advice.
|
| The current command set is definitely harder to learn than a
| system designed for chat/email where "what you say is what
| you get", but it's much more powerful for tasks like
| programming once you learn it.
|
| I'm dubious about what kind of general command accuracy Numen
| is able to get with the Vosk models, as Vosk to my
| understanding is more designed for natural language than
| commands.
| yewenjie wrote:
| Last time I checked Talon's models were very bad at recognizing
| my voice. Does it support better models now, for example
| OpenAI's Whisper?
| caternoster wrote:
| The creator of Talon has tested the Whisper models
| extensively[0].
|
| [0]:
| https://twitter.com/lunixbochs/status/1574848899897884672
| orbisvicis wrote:
| I don't know what type of speech each dataset represents,
| but the talon results are extremely impressive... I assume
| it wasn't trained on at least some subset (depending on the
| train/test split) of this data?
| lunixbochs wrote:
| A handful of the datasets I tested are fully held out (I
| have reason to believe none of the models have trained on
| them), and talon was trained on none of the dev or test
| data of any of the datasets in question.
|
| Due to whisper's weakly supervised training on a large
| amount of automatically scraped data and reliance on a
| bigger language model, it's far more likely whisper had
| seen some of the test data before.
| lunixbochs wrote:
| Depending on when that was: in 2018 the free model was the
| macOS speech engine, in 2019 it was a fast but relatively
| weak model, and as of late 2021 it's a much stronger model.
| I'm currently working on the next model series with a lot
| more resources than I had before.
|
| It's also worth saying that if you only tried things out
| briefly, there are a handful of reasons recognition may have
| seemed worse. Talon uses a strict command system by default,
| because that improves precision and speed for trained users,
| but the tradeoff there is it's more confusing for people who
| haven't learned it yet.
|
| For example, Talon isn't in "dictation mode" by default, so
| you need to switch to that if you're trying to write email-
| like text and don't want to prefix your phrases with a
| command like "say".
|
| The timeout system may also be confusing at first. When you
| pause, Talon assumes you were done speaking and tries to run
| whatever you said. You can mitigate this by speaking faster
| or increasing the timeout.
|
| The default commands (like the alphabet) may also just not be
| very good for some accents, and that will be the case for any
| speech engine - you will likely need to change some commands
| if they're hard to enunciate in your accent.
|
| I recommend joining the slack [1] and asking there if you
| want more specific feedback. I definitely want to support
| many accents and even have some users testing Talon with
| other spoken languages.
|
| [1] https://talonvoice.com/chat
| Xevi wrote:
| Impressive, I'm looking forward to seeing more of this project.
| Did you draw inspiration from Talon? There are a lot of
| similarities when it comes to the voice commands.
| [deleted]
___________________________________________________________________
(page generated 2023-02-18 23:01 UTC)