[HN Gopher] Talk-to-ChatGPT
       ___________________________________________________________________
        
       Talk-to-ChatGPT
        
       Author : indigodaddy
       Score  : 99 points
       Date   : 2023-02-19 17:37 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | mmaia wrote:
       | Awesome. I saw someone asking for this feature to practice other
       | languages just yesterday.
       | 
       | In Firefox, it only supports reading which is already cool.
       | 
       | Here's a CDN script in case someone wants to load it in JS
       | Console:
       | 
       | https://cdn.jsdelivr.net/gh/C-Nedelcu/talk-to-chatgpt@main/c...
       | 
       | Edit: script url
        
       | youssefabdelm wrote:
       | Nobody has done this well enough yet. What's required:
       | 
       | 1. Transcribe your speech using Whisper (in that case you don't
       | have to make an effort to speak clearly so long as you're in a
       | relatively quiet room)
       | 
       | 2. Get a TTS system that actually sounds good (e.g. Descript,
       | Eleven Labs, etc.)
       | 
       | 3. Have RAPID responses like a normal human conversation (mostly
       | on OpenAI's side... so hopefully ChatGPT Plus fixes that)
        
         | ericlewis wrote:
         | rapid responses is basically the biggest problem, it is quite
         | hard but also because you can't stream tokens into any TTS
         | system and have it sound good. The more "complete" the corpus
         | the better it seems to be at using the right sort of pausing
         | and such. So it is more like a: LLM needs to be directly
         | connected to a TTS type of issue, somehow.
        
         | nojs wrote:
         | The bottleneck is currently TTS. The best option is probably
         | Eleven Labs, but response times are unpredictable. GPT response
         | times can be worked around by falling back to a faster model,
         | but you can't do that with TTS because the voice needs to be
         | consistent. It seems like current state of the art are
         | diffusion models ala DALL-E, see e.g. [1] (the developer, James
         | Betker now incidentally works for OpenAI). It's nontrivial to
         | turn this into something that works in real-time without a
         | decent budget, though.
         | 
         | Whisper (for transcription) is insanely fast and good.
         | 
         | 1. https://github.com/neonbjb/tortoise-tts
        
       | riskneutral wrote:
       | It would be good to implement an initial command phrase to begin
       | dictating, like "Hey Alexa," "OK Google," or in this case e.g.
       | "Hey GPT"
       | 
       | Also, I feel like it sends the text to ChatGPT too quickly, for
       | me at least. Wish it would wait a bit longer in case I have
       | anything to add. A command phrase to end the sentence might be
       | too much.
        
         | basch wrote:
         | Maybe expand GPT to Geppetto.
         | 
         | Yo Geppetto
        
           | ZunarJ5 wrote:
           | No wonder it's full of beans half the time, it also made
           | Pinocchio.
        
       | fnordpiglet wrote:
       | I'm surprised no one has made a chatgpt alexa skill. Although I
       | realize throttling and costs probably stop that.
        
         | lgas wrote:
         | If you google, you will find many, eg.
         | https://www.chatgptalexa.com/
        
       | ericlewis wrote:
       | everyone is figuring it out :P
       | 
       | https://www.youtube.com/watch?v=ky9L1eGxj_k&t=1s
        
       | de6u99er wrote:
       | Your Code says it uses the browser's speech recognition API.
       | 
       | ``` // Start speech recognition using the browser's speech
       | recognition API function CN_StartSpeechRecognition() { ```
       | 
       | As far as I know, speech recognition in Chrome, unlike on Android
       | phones, is being done online. Means audio is being sent to Google
       | servers. How does this comply with GDPR?
        
         | jcims wrote:
         | Does it claim to be compliant anywhere? I would assume it
         | isn't.
        
           | itcrowd wrote:
           | Gdpr compliance is not optional for services offered in the
           | EU
        
             | shagie wrote:
             | Does every client side javascript project someone builds on
             | GitHub for fun need to be verified if it is GDPR compliant?
             | 
             | And what if the source someone shares is using APIs that
             | aren't GDPR complaint? That you can download, compile it,
             | run it on your machine and then say that it was offered to
             | you and so must be GDPR compliant.
             | 
             | I'm not in the EU and not familiar with the particulars of
             | GDPR but that feels like it is stretching for a reason to
             | complain.
        
             | [deleted]
        
         | wizzwizz4 wrote:
         | > _As far as I know, speech recognition in Chrome, unlike on
         | Android phones, is being done online._
         | 
         | I'm no expert, but I would imagine this is Google's problem.
         | The website is calling an API that no reasonable person would
         | expect to leak data to Google - in fact, Google _has an
         | implementation that doesn 't_ - yet personal data is being
         | leaked, without the user's consent.
         | 
         | At no point does the operator of this website act as controller
         | for the voice data.
        
         | [deleted]
        
         | jlaporte wrote:
         | There are a lot of admirable things about the EU, and this is
         | by no means intended as EU bashing.
         | 
         | But could there be a more on the nose example of why the EU
         | lags in tech innovation and entrepreneurship? An solo maker
         | builds a cool little tool as a personal project, open sources
         | it on github, and is immediately attacked for the effort by
         | cosplay compliance regulators. It defies parody.
         | 
         | The commenters also have a cartoon understanding of the GDPR -
         | the author of the Chrome extension is neither a data controller
         | nor a data processor.
         | 
         | I'd ask the commenters that jump on a project like this to
         | introspect a bit and try to understand why their first impulse
         | on seeing someone's effort like this is to try to take it down.
        
           | iamjackg wrote:
           | You raise a very valid point, but one could make an argument
           | that it's _okay_ for innovation and entrepreneurship to slow
           | down, if it's done to protect people's rights and ensure that
           | companies do things "properly."
           | 
           | It seems to be an eternal cyclical process: people come up
           | with something that either sidesteps or has no regulation
           | whatsoever, you get massive growth and innovation, which
           | turns to exploitation, which leads to regulation.
           | 
           | We've seen it in all fields, from tech to pharmaceuticals to
           | big box retail. Is it an acceptable compromise in society to
           | "let things happen" for a while before regulating? Or should
           | we all slow down and think about the consequences before
           | pushing forward?
        
           | de6u99er wrote:
           | The author should at least make users aware that data is
           | being sent to Google servers.
           | 
           | I don't see GDPR as preventing innovation. Quite on the
           | contrary, it has enabled European companies, which host their
           | data by default in the EU, to create competitive products.
           | 
           | That being said, I think in Europe there's, compared to the
           | US, more old money controlled by people who don't like to
           | share or lose their wealth.
        
         | iamjackg wrote:
         | Hm, this is actually a very interesting case. If the extension
         | is (I'm assuming) literally just leveraging the API and not
         | storing anything at all, would it be sufficient to let people
         | know it's doing so in order to be compliant with GDPR?
         | 
         | Does this even count as something that would be covered by
         | GDPR? Is it because the data collection is tangential to the
         | "service" being offered?
         | 
         | Chrome's Privacy page [1] doesn't say anything about the API.
         | Other people have also been wondering about the privacy
         | implications of using the Speech API. What an interesting
         | rabbit hole!
         | 
         | 1: https://www.google.com/intl/en/chrome/privacy/
        
         | renewiltord wrote:
         | Why don't you tell us how it violates GDPR? And then you can
         | make an issue on Github to tell the guy to stop offering his
         | thing in the EU.
        
       | zaptrem wrote:
       | Has anyone built a Siri integration/Shortcut for this? I'm
       | referring to the actual ChatGPT model, not via the normal OpenAI
       | API.
        
         | ericHosick wrote:
         | Have not looked into this but someone I know said they got it
         | working: https://www.youtube.com/watch?v=gePhjvKdUro
        
         | LeoPanthera wrote:
         | ChatGPT does not (yet) have an API. If there are any, they
         | would have to resort to screen-scraping.
         | 
         | An API for ChatGPT is apparently coming.
        
           | Nuzzerino wrote:
           | https://github.com/acheong08/ChatGPT
        
             | LeoPanthera wrote:
             | That's really no different to screen scraping. It could
             | break at any moment. And I'm pretty sure it violates the
             | OpenAI TOS too.
        
       | pandominium wrote:
       | rip to the corporate voice assistants!
        
       | xony wrote:
       | [dead]
        
       | bilater wrote:
       | Nice - I think Web Speech APIs are a super powerful tool that a
       | lot of devs would be surprised to learn they get out of the box
       | (I was).
        
       | riskneutral wrote:
       | This is great, thanks!!!
        
       | dddrh wrote:
       | Been noodling on this idea and love to have an
       | example/inspiration. Thanks for sharing.
       | 
       | (Edit) Specifically glad to learn about the web speech APIs. As a
       | hobby programmer by night and Product Manager by day learning
       | about more and more capabilities I can leverage in toy projects
       | is awesome.
       | 
       | I was considering trying out whisperAI but this seems like a
       | better stepping stone for "simpler" starter projects.
        
       | anitakirkovska wrote:
       | [flagged]
        
       | jug wrote:
       | Funny. I was just today thinking of this but on a Raspberry Pi
       | with mic and speakers. Google Bard is surely coming to Google
       | Home but it would be a fun project to get a head start of sorts!
        
       | leobg wrote:
       | Cool. I built something similar for myself. Though it uses the
       | GPT-3 API rather than ChatGPT:
       | 
       | Whisper.cpp for text input. Google WaveNet Voice for text output.
       | One button for the user to start and stop speaking, like Siri.
       | Allows me and my daughter to literally talk to GPT-3 and have
       | real conversations with it.
       | 
       | (Though I'd never let her do that without supervision. Also, she
       | has learned very quickly that she needs to take everything it
       | says with a grain of salt, and that it's important to fact-check
       | its answers.)
       | 
       | I'd be happy to show it, but Whisper is quite CPU intensive. I
       | don't know how to host it so it can handle any meaningful number
       | of concurrent users without breaking the bank. If anyone has
       | suggestions or wants to help, let me know.
        
         | louison11 wrote:
         | Use a cloud service for speech to text, like Google's API, and
         | you should be able to handle a pretty good number of users
         | without breaking the bank. (I believe they have a free tier,
         | then pretty reasonable pricing. You'd just have to set up a
         | rate limiter on your server to make sure nobody's abusing it).
        
         | jsharf wrote:
         | Theoretically, one could compile whisper.cpp to run in the
         | browser using emscripten, maybe made faster with webgl...
         | 
         | I think this would be quite a heavy page load time for a
         | website, but if the model file gets cached, and the user has a
         | decent CPU/GPU, it... could work?
        
       | Animats wrote:
       | This is using Google's text to speech and speech to text, right?
       | Isn't that a paid service?
        
         | russellbeattie wrote:
         | Chrome provides basic speech to text for free via the browser's
         | implementation of the Web Speech API. There is a different
         | Google Cloud version, which is a commercial service that
         | provides a better model for more accuracy and optional data
         | logging. The page below can be used by Safari as well, which
         | sends the data to Apple's servers for processing instead.
         | 
         | https://www.google.com/intl/en/chrome/demos/speech.html
        
       ___________________________________________________________________
       (page generated 2023-02-19 23:01 UTC)