[HN Gopher] Launch HN: Aqua Voice (YC W24) - Voice-driven text e...
       ___________________________________________________________________
        
       Launch HN: Aqua Voice (YC W24) - Voice-driven text editor
        
       Hey HN! We're Jack and Finn from Aqua Voice
       (https://withaqua.com/). Aqua is a voice-native document editor
       that combines reliable dictation and natural language commands,
       letting you say things like: "make this a list" or "it's Erin with
       an E" or "add an inline citation here for page 86 of this book".
       Here is a demo: https://youtu.be/qwSAKg1YafM.  Finn, who is big-
       time dyslexic, has been using dictation software since the sixth
       grade when his dad set him up on Dragon Dictation. He used it
       through school to write papers, and has been keeping his own
       transcription benchmarks since college. All that time, writing with
       your voice has remained a cumbersome and brittle experience that is
       riddled with painpoints.  Dictation software is still terrible. All
       the solutions basically compete on accuracy (i.e. speech
       recognition), but none of them deal with the fundamentally brittle
       nature of the text that they generate. They don't try to format
       text correctly and require you to learn a bunch of specialized
       commands, which often are not worth it. They're not even close to a
       voice replacement for a keyboard.  Even post LLM, you are limited
       to a set of specific commands and the most accurate models don't
       have any commands. Outside of these rules, the models have no sense
       for what is an instruction and what is content. You can't say "and
       format this like an email" or "make the last bullet point shorter".
       Aqua solves this.  This problem is important to Finn and millions
       of other people who would write with their voice if they could.
       Initially, we didn't think of it as a startup project. It was just
       something we wanted for ourselves. We thought maybe we'd write a
       novel with it - or something. After friends started asking to use
       the early versions of Aqua, it occurred to us that, if we didn't
       build it, maybe nobody would.  Aqua Voice is a text editor that you
       talk to like a person. Depending on the way that you say it and the
       context in which you're operating, Aqua decides whether to
       transcribe what you said verbatim, execute a command, or subtly
       modify what you said into what you meant to write.  For example, if
       you were to dictate: "Gryphons have classic forms resembling shield
       volcanoes," Aqua would output your text verbatim. But if you
       stumble over your words or start a sentence over a few times, Aqua
       is smart enough to figure that out and to only take the last
       version of the sentence.  The vision is not only to provide a more
       natural dictation experience, but to enable for the first time an
       AI-writing experience that feels natural and collaborative. This
       requires moving away from using LLMs for one-off chat requests and
       towards something that is more like streaming where you are in
       constant contact with the model. Voice is the natural medium for
       this.  Aqua is actually 6 models working together to transcribe,
       interpret, and rewrite the document according to your intent.
       Technically, executing a real-time voice application with a
       language model at its core requires complex coordination between
       multiple pieces. We use MoE transcription to outperform what was
       previously thought possible in terms of real-time accuracy. Then we
       sync up with a language model to determine what should be on the
       screen as quickly as possible.  The model isn't perfect, but it is
       ready for early adopters and we've already been getting feedback
       from grateful users. For example, a historian with carpal tunnel
       sent us an email he wrote using Aqua and said that he is now able
       to be five times as productive as he was previously. We've heard
       from other people with disabilities that prevent them from typing.
       We've also seen good adoption from people who are dyslexic or
       simply prefer talking to typing. It's being used for everything
       from emails to brainstorming to papers to legal briefings.  While
       there is much left to do in terms of latency and robustness, the
       best experiences with Aqua are beginning to feel magical. We would
       love for you to try it out and give us feedback, which you can do
       with no account on https://withaqua.com. If you find it useful,
       it's $10/month after a 1000-token free trial. (We want to bump the
       free trial in the future, but we're a small team, and running this
       thing isn't cheap.)  We'd love to hear your ideas and comments with
       voice-to-text!
        
       Author : the_king
       Score  : 310 points
       Date   : 2024-03-26 14:53 UTC (8 hours ago)
        
       | jasonjmcghee wrote:
       | What are your opinions on https://github.com/cursorless-
       | dev/cursorless?
       | 
       | Are you targeting developers?
       | 
       | My understanding was people who are serious about developing via
       | voice use it pretty exclusively.
       | 
       | Like, yeah you need to learn commands, but "are often not worth
       | it" feels like brushing a pretty massive offering under the rug.
       | 
       | Is learning vi / emacs commands not worth it (or shortcuts in
       | another IDE?)
       | 
       | Is there a middle ground?
        
         | the_king wrote:
         | Cursorless is really cool, but we see the ideal computer-voice
         | interaction a little differently.
         | 
         | Our approach is based around understanding intent from speech
         | alone. We think this will be the ideal division of labor
         | between man and machine going forward - let the person think
         | and the machine fit it into the document/file/text. Over time
         | we think this will reduce the number of commands you have to
         | learn to use it to zero.
         | 
         | But our "command-less" approach isn't reliable for every use
         | case yet - and as a fan of voice interfaces I am rooting for
         | Cursorless - it's super sci-fi.
        
       | tomberin wrote:
       | I was impressed with the Demo, ready to pay 10 and no option to
       | sign up with email :(
        
         | jmcintire1 wrote:
         | Made some tradeoffs for the sake of speed -- email signup will
         | come. We want it too!
        
           | freedomben wrote:
           | Good to hear! I very much using my google account and other
           | third parties to sign up for accounts.
           | 
           | Do you have any idea of how soon? Not looking for a public
           | commitment to hang you with, just wondering if this is one of
           | those "we're working on it now" (so days) or one of those
           | "it's in the backlog" (months or maybe never depending on
           | priorities).
        
       | iknownthing wrote:
       | Are these your models or a wrapper around model apis?
        
         | the_king wrote:
         | We use our own fusion model in the transcription pipeline for
         | intent understanding from encoded audio, but most of the
         | rewriting tasks like "Turn this into a list" call out to fine-
         | tunes of GPT-4. It's a combination.
         | 
         | The fusion model is similar to the architecture described here:
         | https://arxiv.org/abs/2310.13289
        
       | elektor wrote:
       | Trying out the app on Firefox gets me this error:
       | 
       | NotSupportedError:
       | 
       | AudioContext.createMediaStreamSource:
       | 
       | Connecting AudioNodes from AudioContexts with different sample-
       | rate is currently not supported.
       | 
       | I would add that this really needs to be a native app with
       | ability to use it within Microsoft Word, which itself has a
       | decent voice to text tool built in.
        
         | jmcintire1 wrote:
         | Sorry about the Firefox error! Agreed on the sentiment behind
         | native app -- we plan to get Aqua in as many places as possible
         | asap. For product iteration, you can't beat the speed the
         | browser affords.
        
           | btown wrote:
           | Make an Electron app that simply wraps your website! Just
           | build in best-practices updating of the wrapper as well from
           | day one, in case you want to ship improvements to the wrapper
           | or start to move more things to client side processing.
           | 
           | As a side benefit, you get real estate in people's docks and
           | desktops :)
        
             | freedomben wrote:
             | Would that help with the problem of integration though?
             | What would be absolutely killer would be to emulate a USB
             | HID keyboard or something, which would make it usable with
             | pretty much everything, though there are definitely some
             | security considerations there. Or if there are higher-level
             | APIs to hook into that could work, but I would guess those
             | would also require native function calls.
             | 
             | The way Google's keyboard works on Android, but on my Linux
             | computer (and my Android phone) would be my dream here. I'd
             | pay $10 a month for that for sure.
        
       | 35mm wrote:
       | I tried the demo, it worked well, allowing me to add a line and
       | then delete the first line - a test that Dragon or Apple would
       | have failed.
       | 
       | What does the actual app look like though? Is it only in a
       | browser or can I use this anywhere on my Mac?
        
       | apinstein wrote:
       | This is really great. I imagined such a thing should be created,
       | amazing to see it in reality. It would be great for those of us
       | not limited to exclusively voice to be able to use commands as
       | well, as I still think in some cases doing explicitly what I want
       | for simple things is easier than figuring out how to explain it
       | :)
        
         | the_king wrote:
         | We agree totally; voice only can be ridiculous, for example, if
         | you're spelling out a username or something.
         | 
         | The sandbox doesn't have typing, but the full app does - you
         | can switch between typing and talking seamlessly there.
         | 
         | (written with Aqua)
        
       | nylonstrung wrote:
       | This is very cool. I would immediately buy it if someone ends up
       | making an Obsidian plugin
        
         | tremarley wrote:
         | This would be very effective
        
       | justanotheratom wrote:
       | This is awesome.
       | 
       | Video talks about a Mac App. Where can I get that?
       | 
       | Voice input did not work on Edge browser on Windows, btw.
        
         | the_king wrote:
         | Thanks!
         | 
         | We had to make a bunch of breaking API changes over the last
         | week and the Mac app isn't ready to go on it quite yet, but
         | we'll bring it back as soon as we can, max two weeks, hopefully
         | sooner.
        
       | parentheses wrote:
       | This is very well done!!
        
       | oliviabenson wrote:
       | You're early and this is effectively a demo but just in case this
       | is a blind spot: "token" is an in-the-weeds LLMism that means
       | nothing in the context of transcription. Your costs may be
       | measured in tokens but that's not relevant to customers. Just "A
       | free trial" with no quantifier would be better than 1k tokens.
        
         | the_king wrote:
         | Appreciate the feedback, we'll take a look at that.
        
         | agotterer wrote:
         | This is a great point and a topic I've been thinking about
         | myself. As more LLM services pop up that are subject to
         | token/consumption pricing, what is the right pricing model for
         | consumer based consumption products like this?
        
           | oliviabenson wrote:
           | Price based on value. Pricing is hard, something as simple as
           | per-token is alluring because it doesn't require any thought
           | but it's leaving a lot of money on the table. There's nothing
           | unique about LLMs when it comes to pricing, all common
           | pricing wisdom applies.
        
             | agotterer wrote:
             | That seems challenging to do with a writing/note taking app
             | like this. First, what would the pricing tiers be based on?
             | Word count? That would just be another way of saying token.
             | Number of documents created? That puts you at risk of long
             | unprofitable documents. Google Sheets doesn't really have
             | this problem because the incremental cost of storage is
             | relatively cheap. Tokens on the other hand are not cheap.
             | 
             | How do you price based on value without a corollary to
             | tokens? If you charged $40 for this service then maybe you
             | don't provide enough value for the casual user who does the
             | occasional school report. On the other hand you may be
             | unprofitable for the doctor that decides to dictate all of
             | her interactions every day or the author who dictates an
             | entire book.
        
           | nprateem wrote:
           | Words. Just estimate how many tokens that'd be and talk in
           | words, paragraphs, etc instead.
        
       | jppope wrote:
       | the signup just failed for me. the console was logging out the
       | token... you might want to fix that
        
         | jmcintire1 wrote:
         | patching now!! good catch.
        
           | jmcintire1 wrote:
           | should be fixed.
        
       | GordonS wrote:
       | I have neuropathy in my arms, so this is something I'm very
       | interested in!
       | 
       | Do I have to use a specific Aqua Voice text editor, or can I use
       | it in apps like JetBrains Rider and Visual Studio Code? If so,
       | are there some kind of plugins that would allow using IDE-
       | specific features? (e.g. "build and run the API project")
        
         | jmcintire1 wrote:
         | Hey! Right now our focus is getting the core tech solid and we
         | can do that much faster if we aren't juggling multiple
         | platforms and plugins (we learned this the hard way), but after
         | that we are going to blitz into as many places as possible.
        
       | hidelooktropic wrote:
       | This was such a well executed demo. A few seconds in and I'm
       | seeing the value. The core of the product is fully explained in
       | just 36 seconds.
       | 
       | It's less about how quickly all that transpires and more about
       | presenting the product in a way that doesn't require a lot of
       | talking around it. Well done.
        
         | matsemann wrote:
         | I agree, very well spent seconds. Straight to the point and
         | immediately obvious what the product is doing and how useful it
         | could be.
         | 
         | My first thought, when reading the headline, was that this
         | could be useful for my coworker that got RSI in both hands and
         | codes using special commands to a mic. But after having watched
         | it I think it can be much more than such a niche product.
        
       | benpacker wrote:
       | This is really great. I was hoping someone would build this:
       | https://bprp.xyz/__site/Looking+for+Collaborators/Better+Loc...
       | 
       | I would really happily pay $10 / month for this, but what I
       | really want is either: - A Raycast plugin or Desktop app that
       | lets this interact with any editable text area in my environment
       | - An API that I can pass existing text / context + audio stream
       | to and get back a heartbeat of full document updates. Then, the
       | community can build Obsidian/VSCode/browser plugins for the huge
       | surface area of text entry
       | 
       | Going to give you $10 later this afternoon regardless, and
       | congrats!
        
         | samstave wrote:
         | Take this [TEXT] read it and then let me tell you how to edit
         | it:
         | 
         | > _Certainly - let me grok your text!!... OK - I am ready!_
         | 
         | BLAH BLAH BLAH...
         | 
         | etc
        
       | GordonS wrote:
       | The demo seemed to struggle a bit with my accent (Scottish),
       | getting quite a few words wrong - for example, every time I said
       | "test" it would write "taste". Is this something you can improve
       | going forward?
        
         | umanwizard wrote:
         | https://m.youtube.com/watch?v=NMS2VnDveP8
        
           | GordonS wrote:
           | In the past when I've been in the USA, I've legit had to put
           | on an American accent when calling for taxis and the like!
           | 
           | I don't even have _that_ strong an accent, and I always try
           | my best to enunciate correctly when talking to others _shrug_
        
             | jmcintire1 wrote:
             | I'm getting married in Scotland in December and will
             | presumably want to be able to demo so you can bank on
             | priority support and a hard deadline :)
        
               | GordonS wrote:
               | Lol, excellent :)
        
         | the_king wrote:
         | Sorry about that. We know we need to be better about that and
         | of course add more languages.
         | 
         | A few things to try to maximize your accuracy right now are:
         | 
         | - Don't use AirPods, especially not AirPods Pro. Most built-in
         | laptop mics or EarPods or a gaming headset are perfect. It
         | doesn't need to be podcast quality.
         | 
         | - Correct transcription mistakes as you would a person, then
         | "plow through" and often the error will be corrected as you
         | complete the sentence.
        
       | user_7832 wrote:
       | Congrats on the launch!
       | 
       | I absolutely love the idea, as a fellow neurodivergent who works
       | much better over voice than text. My only feedback is... I'd love
       | to run this with more control. I already run LLMs locally (LM
       | Studio), and I can run something like whisper too. I understand
       | that open-sourcing (or even making the source code available)
       | might go against any commercialization attempt. However, there
       | are some options (Red Hat-esque) where it may be possible to
       | charge for business use and allow local running for free for
       | personal use.
       | 
       | On one hand you've got a solid first-mover advantage in a field
       | where lots can benefit and use this, however if someone can bork
       | together several layers of LLM output they might be able to offer
       | competition (and such projects are often opensource, albeit
       | sometimes less "polished".) If you offer a good deal you might
       | have a good chance of major success. Best of luck!
        
       | amirhirsch wrote:
       | Congratulations! This is really cool. Maybe your website could
       | just load into the demo? Have a talking avatar that looks like a
       | paperclip with googly eyes to explain how to use it...
       | 
       | edit: I refreshed and then it did load with the blue mic button
        
       | moconnor wrote:
       | You don't say so explicitly, but it'd be good to know what data
       | goes to the cloud - I presume all of it including speech
       | recordings? Or is STT on device? Also what your privacy /
       | retention policies are around this data.
       | 
       | Excellent demo and great-looking product btw!
        
         | geor9e wrote:
         | I just spent 10 seconds trying it. It was able to interpret my
         | intentions and parse out commands from the literal
         | transcription. "bazinga but in all caps and with a j" became
         | "BAZINJA". So at the minimum, it's going through an LLM in the
         | league of llama, which if run locally in browser is slow as
         | molasses on my ancient MacBook. So it's definitely going to the
         | cloud. As a rule of thumb, you should just assume any website
         | you didn't completely code yourself is sending every mouse
         | movement and every text that you type and then backspace,
         | including passwords, to a cloud big data analytics repo via a
         | few javascript listeners.
        
           | blueberrychpstx wrote:
           | That's a hilarious over assumption but point taken
           | 
           | Also I really enjoyed your analysis
        
       | FlamingMoe wrote:
       | First impression: Wow, this is awesome.
       | 
       | So let's say I work in a quiet home office by myself. Could I
       | just have Aqua open throughout the day and give it notes / to-dos
       | without having to click the microphone on/off each time?
        
         | jmcintire1 wrote:
         | Thank you! And yes, the app has a Background mode which is
         | designed for this use case exactly
        
       | rickydroll wrote:
       | I developed an RSI-related injury back in 94/95 and have been
       | using speech recognition ever since. I would love a solution that
       | would let me move off of Windows. I would love a solution
       | allowing me to easily dictate text areas in Firefox, Thunderbird,
       | or VS code. Most important, however, would be the ability to
       | edit/manipulate the text using what Nuance used to call Select-
       | and-Say. The ability to do minor edits, replace sentences with
       | new dictation, etc., is so powerful and makes speech much easier
       | to use than straight captured dictation like most whisper apps.
       | If you can do that, I will be a lifelong customer.
       | 
       | The next most important thing would be the ability to write
       | action routines for grammar. My preference is for Python because
       | it's the easiest target when using chatGPT to write code.
       | However, I could probably learn to live with other languages
       | (except JavaScript, which I hate). I refer you to Joel Gould's
       | "natPython" package he wrote for NaturallySpeaking. Here's the
       | original presentation that people built on.
       | https://slideplayer.com/slide/5924729/
       | 
       | Here's a lesson from the past. In the early days of
       | DragonDictate/NaturallySpeaking, when the Bakers ran Dragon
       | Systems, they regularly had employees drop into the local speech
       | recognition user group meetings and talk to us about what worked
       | for us and what failed. They knew that watching us Crips would
       | give them more information about how to build a good speech
       | recognition environment than almost any other user community. We
       | found the corner cases before anybody else. They did some nice
       | things, such as supporting a couple of speech recognition user
       | group conferences with space and employee time.
       | 
       | It seems like nuance has forgotten those lessons.
       | 
       | Anyway, I was planning on getting work done today, but your
       | announcement shoots that in the head. :-)
       | 
       | [edit] Freaking impressive. It is clear that I should spend more
       | time on this. I can see how my experience of Naturally Speaking
       | limited my view, and you have a much wider view of what the user
       | interface could be.
        
         | zellyn wrote:
         | You should check out cursorless... it may be more directly
         | targeting your use case
        
           | rickydroll wrote:
           | I saw it was based on Talon, but unfortunately, Talon makes
           | things overly complex and focuses the user on the wrong part
           | of the process. The learning curve to get started, especially
           | when writing your action routines, is much higher than it
           | needs to be. See: https://vocola.net/. It's not perfect; it's
           | clumsy, but you can start creating action routines within 5
           | to 10 minutes of reading the documentation. Once you exceed
           | the capabilities of Vocola, you can develop extensions in
           | Python based on what you've learned in Vocola. One could say
           | that Talon is the second system implementation according to
           | Mythical Man Month.
           | 
           | My use case is dictating text into various applications and
           | correcting that text within the text area. If I have to, I
           | can use the dictation box and then paste it into the target
           | application.
           | 
           | When you talk about using speech recognition for creating
           | code, I've been through enough brute-force solutions like
           | Talon to know they are the wrong way because they always
           | focus the user on the wrong thing. When creating code, you
           | should be thinking about the data structure and the
           | environment in which it operates. When you use speech-driven
           | programming systems, you focus on what you have to say to get
           | the syntax you need to make it compile correctly. As a
           | result, you lose your connection to the problem you're trying
           | to solve.
           | 
           | Whether you like it or not, ChatGPT is currently the best
           | solution as long as you never edit the code directly.
        
         | stcredzero wrote:
         | I remember being in a conversation back in 2002 or so, where
         | some Smalltalkers were brainstorming over the idea of
         | controlling the IDE and debugger with voice.
         | 
         | It just so happens, that many of the interfaces one has to deal
         | with are somewhat low bandwidth. (For example, many spend most
         | of their time stepping over, stepping into, or setting
         | breakpoints in a debugger.) Code completion greatly cuts down
         | the number of options to be navigated second to second. It
         | seems like the time has arrived for an interactive voice
         | operated AI pair programmer agent, where the human is taking
         | the "strategic" role.
        
         | phillco wrote:
         | > when the Bakers ran Dragon Systems
         | 
         | For those who don't know what happened next, and why Dragon
         | seem to stagnant so much in the aughts, the story about how
         | Goldman Sachs helped them sell to essentially Belgian Enron,
         | months before they collapsed, was quite illuminating to me, and
         | sad.
         | 
         | https://archive.ph/Zck6i
        
           | nerpderp82 wrote:
           | Goldman Sachs is such a wonderful model of what is possible
           | via Capitalism. I think they are holding on what they really
           | could achieve with a little will.
        
           | Aeolun wrote:
           | It's crazy to me they were helped by what were essentially
           | boys right out of college, and they had any faith it would
           | work...
        
         | jmcintire1 wrote:
         | Thank you! We love hearing stories like this.
         | 
         | We want to get Aqua into as many places as possible -- and will
         | go full tilt into that as soon as the core is extremely
         | extremely solid (this is our focus right now).
         | 
         | Great lessons from Dragon Dictation. Would love to learn more
         | about the speech recognition user group meetings! Are those
         | still running? Are you a part of any?
        
           | rickydroll wrote:
           | Unfortunately no. I think they faded out almost 20 years ago.
           | The main problem was that without having someone able to
           | create solutions, the speech recognition user group devolved
           | into a bunch of crips complaining about how fewer and fewer
           | applications work with speech recognition. We knew what was
           | wrong; we knew how to iterate to where NaturallySpeaking
           | should be, but nobody was there to do it.
           | 
           | FWIW, I am fleeing Fusebase, formally known as Nimbus,
           | because they "pivoted" and messed up my notetaking
           | environment. In the beginning, I went with Nimbus because it
           | was the only notetaking environment that worked with Dragon.
           | After the pivot, not so much. I'm giving Joplin a try. Aqua
           | might work well as an extension to Joplin, especially if
           | there was a WYSIMWYG (what you see is mostly what you get)
           | front-end like Rich Markdown. I'd also look at heynote.
        
         | stevenkkim wrote:
         | On a somewhat unrelated note, I remember Nuance used to be
         | quite litigious, using its deep patent collection to sue
         | startups and competitors. I'm not sure if this is still the
         | case now that they're owned by Microsoft, but you may want to
         | look into that.
        
       | ikliuger wrote:
       | This is super awesome. Do you develop your own models, or is this
       | a wrapper around existing APIs? It would be great to have a way
       | to introduce environment variables like my name, my preferences,
       | and the topics I usually write about. I've actually written this
       | comment using your service. Thank you. Looking forward to seeing
       | what it becomes.
        
       | paviva wrote:
       | Great work, really hope you'll be able to pierce into the medical
       | market eventually. Dragon is still useless to anyone who can
       | touch type.
        
       | justinlloyd wrote:
       | From one dyslexic to another, who never got the option to even
       | use a computer in school or college and instead was forced to
       | write out everything long-hand, thank you so much for this.
       | 
       | I use voice-to-text in the workshop and when taking notes and
       | reviewing a PR. And all the current options are pretty much what
       | you would expect. More focused on accuracy, which is usually
       | quite poor, which, to paraphrase, "It's Erin with an E. Oh for
       | **s sake, Erin. ERIN! E. R. I. N. <pause> N. I said N. Eh-rin.
       | Fine. Whatever." so anything that can improve on that experience
       | will be immensely helpful.
       | 
       | Looking forward to seeing where you go with this, and I hope at
       | some point you make a native desktop application.
        
         | the_king wrote:
         | I think I developed "bad handwriting" partially to hide
         | misspellings--this is necessary in school though.
         | 
         | On pure WER we are state-of-the-art in our testing, but more
         | importantly, mistakes in Aqua are _correctable_.
         | 
         | So you can speak your mind instead of having to wait until you
         | have the perfect sentence and then dictate it.
         | 
         | That said, we know it's not perfect, but we know a few more
         | months of work will have it really solid.
        
       | samstave wrote:
       | _I have goosebumps!_
       | 
       | Jiminy Crickets...
       | 
       | I have SOOO many use cases for your thing.
       | 
       | [edit: what does this mean: https://i.imgur.com/rHQt6ul.png when
       | attempting to demo?]
       | 
       | ---
       | 
       | * I want an agent that I can speak to on Mobile headset as I love
       | to think out loud - and air my thoughts and thought process
       | through talking through my internal dialogue - if this could just
       | capture what I am saying and log it and I can refine thoughts as
       | I go.
       | 
       | For example - I ride a lot. I try to cycle 1000 miles a month if
       | I am doing a solid month - but else - I ride daily and its a
       | movement meditation. as I ride - I think through things and I
       | speak through thought processes with differing opposing 'experts'
       | in my internal monologue to self-argue through to a solution....
       | 
       | If I could have this record all that, then random epipehnies I
       | think through while on ride will be captured in a _meaningful_
       | way.
       | 
       | ---
       | 
       | * A meeting-notes-transcriber for whiteboard sessions.
       | 
       | * record everything you say in an interview and be able to review
       | after for self-coaching
       | 
       | * talking through a dish as you wing the ingredients so that you
       | speak out loud what you did (my grandmother was friends with
       | Julia Child - my grandmother taught me to cook and when it came
       | to measurements of things - they always wing it per feel/taste
       | "salt to taste" for example means "eh... whatever"
       | 
       | so to be able to talk through what your 'winging it with' and it
       | captures it into a salient reproducible recipe (i make a mean
       | Chimi Churry (sometimes if I can recall)
       | 
       | * a voice "body cam" for things I may say in situations where I
       | may be too flustered to recall.
       | 
       | * Speak authoring - start telling a story outline so it captures
       | a synopsis that you can further develop
       | 
       | * Speech (like giving a speech) refinement as you can talk
       | through the speech and capture and rework and reiterate etc
       | 
       | and thats just off the top of my head through your demo....
       | 
       | LOVE this.
        
         | jmcintire1 wrote:
         | Thank you, awesome to see how many ideas this inspired! We've
         | thought about a lot of similar things ourselves and will
         | certainly build some of them :)
         | 
         | Sorry about the error you are getting! It's a Firefox thing. We
         | will patch. In the meantime, Chrome/Safari will work
        
           | samstave wrote:
           | < _how many ideas this inspired!_
           | 
           | I just want to qualify - you did not _inspire_ these ideas.
           | 
           | These are desires sought which have been there for eons...
           | 
           | You are not _inspiring_ them
           | 
           | You have atool that _ENABLES_ them.
           | 
           | Seek that which already is a flustered pop of ideas waiting
           | for a release valve for such thought.
           | 
           | You are not inspiring - you are enabling that which is
           | already there, think of it as which valve to open - the
           | pressure is mounting upon your dyke.
        
             | jmcintire1 wrote:
             | that's a better way to put it!
        
       | aneeqdhk wrote:
       | Hands down one of the best AI demos I have seen. Last time I got
       | a wow feeling like this was when ChatGPT was released.
        
       | ryanisnan wrote:
       | Wow, great demo! Excited to see this grow.
        
       | rhyme-boss wrote:
       | I use Apple dictation heavily for transcribing interviews. I've
       | tried all the voice-to-text services out there and none have been
       | reliable enough *at transcribing an audio file. I've settled on
       | playing audio in my headphones and pausing while I carefully
       | dictate text into a document. If I could upload the audio file,
       | get a first-pass transcription, and then go through and edit /
       | make corrections with voice, that would be awesome.
       | 
       | A difference in error rate from 20-something percent down to less
       | than 5 percent sounds incredible.
        
         | codeptualize wrote:
         | Have you tried openai whisper? Last time I compared it was
         | quite a bit better than all the other options.
        
         | mathisd wrote:
         | Have you tried using Whisper from OpenAI ? Aiko [0] have
         | Whisper-v2-large built-in and allow for transcription of audio
         | file
         | 
         | [0] https://apps.apple.com/fr/app/aiko/id1672085276
        
         | hantusk wrote:
         | Check out Descript. It's been awesome when I used it in the
         | past
        
       | hamzakc wrote:
       | Things like this always remind me of this excellent talk:
       | https://youtu.be/8SkdfdXWYaI?si=MFxs7wFdqws0OeCi
       | 
       | Worth a watch.
        
       | michaelbuckbee wrote:
       | Friendly FYI - not sure if this is a skill issue on my part or
       | something that's not possible yet, but I couldn't figure out how
       | to change the audio input. I think when it asked for microphone
       | access (chrome latest, Mac) that it chose the Macbook microphone
       | which won't work as it's docked.
        
       | jdalgetty wrote:
       | Not working for me on firefox/macos
        
         | the_king wrote:
         | Sorry about that, will fix asap. I love firefox.
        
       | tremarley wrote:
       | How much would 1000 tokens give us?
        
         | the_king wrote:
         | It's 1000-1500 words. I know that seems cheap of us but the
         | cost to run the Aqua stack is eye-watering right now. We will
         | increase this amount as we optimize.
        
       | aaroninsf wrote:
       | Infinite details to remark on, but,
       | 
       | NO NOTES.
       | 
       | This is the sort of the thing that I forward to people who are
       | skeptical about the disruptive capacity AI has, to take long-
       | standing seemingly intractable problems, and "solve" them.
       | 
       | Hats off. Truly inspiring in many senses!
        
       | heaths1 wrote:
       | Awesome demo. A challenge where I work is an extreme acronym rich
       | lingo. Is your model open to extension or learning in some
       | fashion to accommodate picking up thousands of acronyms? We also
       | can shift into rapid, specialized speaking patterns that I think
       | are quite learnable but that are not really 'out of the box' for
       | normal software products. I would think many industries could
       | feature their own lingos like this.
        
       | aronhegedus wrote:
       | I liked how easy the demo was to play around with! I don't have
       | the most amount of use for this product, but kudos to making
       | something that clearly works very well!
        
       | hubraumhugo wrote:
       | Dictation software is huge in the healthcare industry. Every
       | doctor uses it, and a solution like yours could likely make their
       | work much more efficient.
       | 
       | Have you explored this market segment?
        
         | gardenhedge wrote:
         | Why do doctors use it?
        
           | lmiller1990 wrote:
           | Not OP but a big part of a doctor's job is clinical notes.
           | Typing is slow, talking is fast. Less time spent taking notes
           | == more time with patient.
        
       | rafram wrote:
       | This is cool! Some feedback:
       | 
       | - As others have said, "1000 tokens" doesn't mean anything to
       | non-technical users and barely means anything to me. Just tell me
       | how many words I can dictate!
       | 
       | - That serif-font LaTeX error rate table is also way too boring.
       | People want something flashy: "Up to 7x fewer errors than macOS
       | dictation" is cool, a comparison table is not.
       | 
       | - Similarly, ".05 Word Error Rate" has to go. Spell out what that
       | means and use percentages.
       | 
       | - "Forgot a name, word, fact, or number? Just ask Aqua to fill it
       | in for you." It would be nice to be able to turn this off, or at
       | least have a clear indication when content that I did not say is
       | inserted into my document. If I'm dictating, I don't usually want
       | anything but the words I say on the page.
        
         | jmcintire1 wrote:
         | Thanks for the feedback! On the last point, you can't see it in
         | the sandbox, but the app has a Strict mode that does what
         | you're looking for
        
       | passion__desire wrote:
       | Just wanted to inform you that your demo video is actually
       | unlisted and invisible to public. I hope that is not intentional.
        
         | the_king wrote:
         | Fixed. Thanks for the heads up.
        
       | freedomben wrote:
       | Some of my thoughts:
       | 
       | 1. This is an amazing idea!
       | 
       | 2. I love that it is browser-based so can work everywhere. Native
       | app would let you integrate more tightly (such as becoming a
       | "keyboard" on the system), but that probably means "a mac app"
       | which doesn't do me any good on Linux. If you could keep the bulk
       | of it in cross-platform tech and just do the small integration
       | part with native code, I think supporting at least "the big
       | three" is doable. I bet if you provided a good API, somebody in
       | the open source world would even do the work for you, on Linux at
       | least.
       | 
       | 3. Would really prefer being able to sign up with my email, and
       | not having to log in with a third party account.
       | 
       | 4. Online-only access is definitely fine for now, but to stay
       | competitive in the future I would keep an eye toward being able
       | to run inference locally so you don't have to be online to use
       | it. This would also be a way for you to reduce costs and offer a
       | cheaper version. If I were you, my long-term goal would be for
       | this to be used by everybody (though that's years down the road).
       | Local inference does complicate monetization, but that can be
       | figured out.
       | 
       | 5. For me to really use this enough to pay out every month, it
       | needs to be relatively easy for me to get the output into
       | whatever app I'm using, whether that is Chrome, Slack, Gmail,
       | Google Docs, Vim, Gedit, or anything else. This is undoubtedly
       | related to item 2 above, but I figured it warranted it's own
       | mention as there may be solutions besides browser-based vs.
       | native.
       | 
       | 6. You're gonna have competitors hot on your heels, if they
       | aren't already. Google in particular with GBoard on Android could
       | be absolutely killer. Since it is Android-only, I don't think
       | it's a major competitor now, but if they broadened it absolutely
       | could be.
       | 
       | 7. Do you have an exit strategy in mind already? Would you be
       | willing to share anything on that? (I ask because it's relevant
       | because your product could easily become part of my standard
       | workflow, and I'm very conservative about becoming dependent on
       | proprietary products, especially from startups). Please do not go
       | native-only and only release a Mac app. At a minimum, please
       | maintain the web-based version. And please for the love of all
       | that is holy, don't sell to/get acquired by Apple! I want and
       | need your product, and I don't and won't switch platforms (Fedora
       | Linux currently) to get it.
       | 
       | Really amazing idea and great work! It is rare that I see
       | products that I think could actually "change the world" but this
       | one has some potential by changing the way we interact with our
       | computers!
        
       | Centigonal wrote:
       | Great product idea, excellent demo. Fantastic use case for LLMs.
       | Keep it up!
        
       | lxe wrote:
       | Fascinating. Are you still using Whisper in any of these
       | MoExperts to tanscribe or do you have something custom? Would
       | love to learn more about the tech.
        
       | youssefabdelm wrote:
       | I feel like I'd much prefer this as an API I can request and get
       | realtime updates from so that I can hook it into any application.
       | Is that on the roadmap?
       | 
       | Also latency seems to be a bit slow, wish it was faster, maybe
       | thats due to traffic now
        
       | ajolly wrote:
       | I'll certainly go give this a spin later as I use voice to text
       | daily. My first few questions:
       | 
       | How's the dictation accuracy compare to Talons latest model, or
       | Microsoft's new voice access? Or dragon? You've got a few
       | comparisons already but nothing that I actually use.
       | 
       | What's the latency like?
       | 
       | At least for me a general voice editor isn't useful, give me
       | something that can send text to wherever my mouse is pointing and
       | that's useful. Then make sure it works with Microsoft's voice
       | without borders, synergy, barrier, input director etc.
       | 
       | Oh and does it support a user dictionary?
        
         | the_king wrote:
         | We'll be releasing a custom dictionary and templates soon. We
         | are testing them internally now, and they aren't quite reliable
         | enough to release, but we understand how important this is for
         | many workflows.
         | 
         | On accuracy, we benchmark very well against even large async
         | models, with a WER of .05-.06 and when Aqua does make a mistake
         | you can often correct it by just telling it "no it's our side
         | not outside" and it won't mangle the text.
        
       | tumidpandora wrote:
       | This site can't be reached(?)
        
       | theonething wrote:
       | > Aqua is smart enough to figure that out and to only take the
       | last version of the sentence
       | 
       | I wish Siri, Alexa, et al would do this as well. They seem to
       | expect you to speak perfectly the first time.
        
       | gleb wrote:
       | Tried it. Seemed quite impressive. Two issues:
       | 
       | - it consistently uses word two instead of to
       | 
       | - forcing Google OAuth as the only way to sign up is not a good
       | idea. That prevented me from signing up.
        
       | jzellis wrote:
       | I tried it in Firefox on my Android and got this error when I
       | tried to use the demo:
       | 
       | "Error: NotSupportedError: AudioContext.createMediaStreamSource:
       | Connecting AudioNodes from AudioContexts with different sample-
       | rate is currently not supported."
        
         | saint11 wrote:
         | Same here
        
         | mrandish wrote:
         | FYI to the devs... I got the same error on Firefox Win11 x64.
        
         | jmcintire1 wrote:
         | Patching this now!
        
       | feverishaaron wrote:
       | My child is profoundly dyslexic. This kind of tool is a game-
       | changer for him.
        
         | the_king wrote:
         | Hope this can be helpful. We know there are still many kinks to
         | iron out.
         | 
         | On another note, I think once you leave school dyslexia can
         | become a wash or even a net positive in the right setting. I
         | think whatever the brain config is can be a huge unlock for
         | creative thinking - it's not always super helpful in the school
         | context, but can be really asymmetric in tech and probably
         | other industries.
        
       | WheelsAtLarge wrote:
       | WOW!!! Just wow...
       | 
       | When will we PC peeps get to use it?
        
         | the_king wrote:
         | You can use it in the browser right now! but we get... native
         | is better for voice stuff and we'll be in more places soon.
        
       | samstave wrote:
       | Train this model on this:
       | 
       | >> _Dearest creature in creation, Study English pronunciation. I
       | will teach you in my verse Sounds like corpse, corps, horse, and
       | worse. I will keep you, Suzy, busy, Make your head with heat grow
       | dizzy. Tear in eye, your dress will tear. So shall I! Oh hear my
       | prayer._
       | 
       | >> _Just compare heart, beard, and heard, Dies and diet, lord and
       | word, Sword and sward, retain and Britain. (Mind the latter, how
       | it 's written.) Now I surely will not plague you With such words
       | as plaque and ague. But be careful how you speak: Say break and
       | steak, but bleak and streak; Cloven, oven, how and low, Script,
       | receipt, show, poem, and toe._
       | 
       | >> _Hear me say, devoid of trickery, Daughter, laughter, and
       | Terpsichore, Typhoid, measles, topsails, aisles, Exiles, similes,
       | and reviles; Scholar, vicar, and cigar, Solar, mica, war and far;
       | One, anemone, Balmoral, Kitchen, lichen, laundry, laurel;
       | Gertrude, German, wind and mind, Scene, Melpomene, mankind._
       | 
       | >> _Billet does not rhyme with ballet, Bouquet, wallet, mallet,
       | chalet. Blood and flood are not like food, Nor is mould like
       | should and would. Viscous, viscount, load and broad, Toward, to
       | forward, to reward. And your pronunciation 's OK When you
       | correctly say croquet, Rounded, wounded, grieve and sieve, Friend
       | and fiend, alive and live._
       | 
       | >> _Ivy, privy, famous; clamour And enamour rhyme with hammer.
       | River, rival, tomb, bomb, comb, Doll and roll and some and home.
       | Stranger does not rhyme with anger, Neither does devour with
       | clangour. Souls but foul, haunt but aunt, Font, front, wont,
       | want, grand, and grant, Shoes, goes, does. Now first say finger,
       | And then singer, ginger, linger, Real, zeal, mauve, gauze, gouge
       | and gauge, Marriage, foliage, mirage, and age._
       | 
       | >> _Query does not rhyme with very, Nor does fury sound like
       | bury. Dost, lost, post and doth, cloth, loth. Job, nob, bosom,
       | transom, oath. Though the differences seem little, We say actual
       | but victual. Refer does not rhyme with deafer. Feoffer does, and
       | zephyr, heifer. Mint, pint, senate and sedate; Dull, bull, and
       | George ate late. Scenic, Arabic, Pacific, Science, conscience,
       | scientific.
       | 
       | >>_Liberty, library, heave and heaven, Rachel, ache, moustache,
       | eleven. We say hallowed, but allowed, People, leopard, towed, but
       | vowed. Mark the differences, moreover, Between mover, cover,
       | clover; Leeches, breeches, wise, precise, Chalice, but police and
       | lice; Camel, constable, unstable, Principle, disciple, label.*
       | 
       | >> _Petal, panel, and canal, Wait, surprise, plait, promise, pal.
       | Worm and storm, chaise, chaos, chair, Senator, spectator, mayor.
       | Tour, but our and succour, four. Gas, alas, and Arkansas. Sea,
       | idea, Korea, area, Psalm, Maria, but malaria. Youth, south,
       | southern, cleanse and clean. Doctrine, turpentine, marine._
       | 
       | >> _Compare alien with Italian, Dandelion and battalion. Sally
       | with ally, yea, ye, Eye, I, ay, aye, whey, and key. Say aver, but
       | ever, fever, Neither, leisure, skein, deceiver. Heron, granary,
       | canary. Crevice and device and aerie._
       | 
       | >> _Face, but preface, not efface. Phlegm, phlegmatic, ass,
       | glass, bass. Large, but target, gin, give, verging, Ought, out,
       | joust and scour, scourging. Ear, but earn and wear and tear Do
       | not rhyme with here but ere. Seven is right, but so is even,
       | Hyphen, roughen, nephew Stephen, Monkey, donkey, Turk and jerk,
       | Ask, grasp, wasp, and cork and work._
       | 
       | >> _Pronunciation -- think of Psyche! Is a paling stout and
       | spikey? Won 't it make you lose your wits, Writing groats and
       | saying grits? It's a dark abyss or tunnel: Strewn with stones,
       | stowed, solace, gunwale, Islington and Isle of Wight, Housewife,
       | verdict and indict._
       | 
       | >> _Finally, which rhymes with enough -- Though, through, plough,
       | or dough, or cough? Hiccough has the sound of cup. My advice is
       | to give up!!!_
       | 
       | =====
       | 
       | --
       | 
       | I dont have the energy to defend a F up -- but there is a LOT of
       | really cool development happening on HN... from AI to all sorts
       | of SHOW and ASK and a just F-TON of keeping track.
       | 
       | Iam not an OCD content influencuer focused type...
       | 
       | But know --
       | 
       | the VELOCITY of thought that is flowing through HN and human
       | conscious as excelleratated by our tipping-the-cup on AI is
       | having IRL consequences on both mentality and reality....
       | 
       | If there is a community for a higher velocity firehose of where
       | we are going share it.
       | 
       | So - we are sprewing a firehose of ideas into the quantum future,
       | as unknown-boomerangs
       | 
       | The truth is to understand the boomerangs...
       | 
       | ( to de-vague-lize this: Tesla:
       | 
       | Pre compute an AI token. 3:6:9
       | 
       | This token is a prime reflection of that.
        
         | pablopeniche wrote:
         | lol based, we'll do
        
       | Arctic_fly wrote:
       | I remember hearing about Dragon when I was in elementary school.
       | It's cool to reflect on how far things have progressed in the
       | last decade and a half.
        
       | FloatArtifact wrote:
       | Congratulations on an interesting project. There is a lost
       | opportunity with your natural language only approach. The issue
       | is natural language will never be efficient as an interface.
       | Natural language helps with low domain knowledge. That's the plus
       | side as it allows the end user to say a variety of phrases to get
       | the desired result. Commands allow for surgical precision and
       | efficiency/less voice strain for its end user. So there needs to
       | be an approach that allows for both elements natural language and
       | commands. As users develop their own process and workflow they
       | will create actions as commands. (high domain knowledge)
       | 
       | Since these commands are self created by the end user they
       | remember them for their specific purposes. These often are high
       | frequency of use commands where low use would still leverage
       | large language model. You have an opportunity here to leverage
       | this workflow. Being able to create commands with large language
       | model is not something many projects have explored.
        
       | lukko wrote:
       | This is amazing! It's very satisfying to use and the combination
       | of transcription + intent seems like it has huge potential.
       | 
       | I would love to use this in healthcare for dictating patient
       | letters etc. I guess a local model / HIPAA compliance is some way
       | off?
        
       | dharma1 wrote:
       | This is super cool! Should ideally happen at the OS level (some
       | future version of Siri) across whatever apps you're using
        
       | lolpanda wrote:
       | I love this idea. Wish there's a browser extension so I can
       | dictate in my emails.
        
       | whiplash451 wrote:
       | Congrats on the launch. The demo is truly impressive. On my Apple
       | cell phone with a Chrome browser, the latency feels a little
       | sluggish (I am sure you are working on it). Congrats again and
       | all the best!
        
       ___________________________________________________________________
       (page generated 2024-03-26 23:00 UTC)