[HN Gopher] Talk-to-ChatGPT
___________________________________________________________________
Talk-to-ChatGPT
Author : indigodaddy
Score : 99 points
Date : 2023-02-19 17:37 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mmaia wrote:
| Awesome. I saw someone asking for this feature to practice other
| languages just yesterday.
|
| In Firefox, it only supports reading which is already cool.
|
| Here's a CDN script in case someone wants to load it in JS
| Console:
|
| https://cdn.jsdelivr.net/gh/C-Nedelcu/talk-to-chatgpt@main/c...
|
| Edit: script url
| youssefabdelm wrote:
| Nobody has done this well enough yet. What's required:
|
| 1. Transcribe your speech using Whisper (in that case you don't
| have to make an effort to speak clearly so long as you're in a
| relatively quiet room)
|
| 2. Get a TTS system that actually sounds good (e.g. Descript,
| Eleven Labs, etc.)
|
| 3. Have RAPID responses like a normal human conversation (mostly
| on OpenAI's side... so hopefully ChatGPT Plus fixes that)
| ericlewis wrote:
| rapid responses is basically the biggest problem, it is quite
| hard but also because you can't stream tokens into any TTS
| system and have it sound good. The more "complete" the corpus
| the better it seems to be at using the right sort of pausing
| and such. So it is more like a: LLM needs to be directly
| connected to a TTS type of issue, somehow.
| nojs wrote:
| The bottleneck is currently TTS. The best option is probably
| Eleven Labs, but response times are unpredictable. GPT response
| times can be worked around by falling back to a faster model,
| but you can't do that with TTS because the voice needs to be
| consistent. It seems like current state of the art are
| diffusion models ala DALL-E, see e.g. [1] (the developer, James
| Betker now incidentally works for OpenAI). It's nontrivial to
| turn this into something that works in real-time without a
| decent budget, though.
|
| Whisper (for transcription) is insanely fast and good.
|
| 1. https://github.com/neonbjb/tortoise-tts
| riskneutral wrote:
| It would be good to implement an initial command phrase to begin
| dictating, like "Hey Alexa," "OK Google," or in this case e.g.
| "Hey GPT"
|
| Also, I feel like it sends the text to ChatGPT too quickly, for
| me at least. Wish it would wait a bit longer in case I have
| anything to add. A command phrase to end the sentence might be
| too much.
| basch wrote:
| Maybe expand GPT to Geppetto.
|
| Yo Geppetto
| ZunarJ5 wrote:
| No wonder it's full of beans half the time, it also made
| Pinocchio.
| fnordpiglet wrote:
| I'm surprised no one has made a chatgpt alexa skill. Although I
| realize throttling and costs probably stop that.
| lgas wrote:
| If you google, you will find many, eg.
| https://www.chatgptalexa.com/
| ericlewis wrote:
| everyone is figuring it out :P
|
| https://www.youtube.com/watch?v=ky9L1eGxj_k&t=1s
| de6u99er wrote:
| Your Code says it uses the browser's speech recognition API.
|
| ``` // Start speech recognition using the browser's speech
| recognition API function CN_StartSpeechRecognition() { ```
|
| As far as I know, speech recognition in Chrome, unlike on Android
| phones, is being done online. Means audio is being sent to Google
| servers. How does this comply with GDPR?
| jcims wrote:
| Does it claim to be compliant anywhere? I would assume it
| isn't.
| itcrowd wrote:
| Gdpr compliance is not optional for services offered in the
| EU
| shagie wrote:
| Does every client side javascript project someone builds on
| GitHub for fun need to be verified if it is GDPR compliant?
|
| And what if the source someone shares is using APIs that
| aren't GDPR complaint? That you can download, compile it,
| run it on your machine and then say that it was offered to
| you and so must be GDPR compliant.
|
| I'm not in the EU and not familiar with the particulars of
| GDPR but that feels like it is stretching for a reason to
| complain.
| [deleted]
| wizzwizz4 wrote:
| > _As far as I know, speech recognition in Chrome, unlike on
| Android phones, is being done online._
|
| I'm no expert, but I would imagine this is Google's problem.
| The website is calling an API that no reasonable person would
| expect to leak data to Google - in fact, Google _has an
| implementation that doesn 't_ - yet personal data is being
| leaked, without the user's consent.
|
| At no point does the operator of this website act as controller
| for the voice data.
| [deleted]
| jlaporte wrote:
| There are a lot of admirable things about the EU, and this is
| by no means intended as EU bashing.
|
| But could there be a more on the nose example of why the EU
| lags in tech innovation and entrepreneurship? An solo maker
| builds a cool little tool as a personal project, open sources
| it on github, and is immediately attacked for the effort by
| cosplay compliance regulators. It defies parody.
|
| The commenters also have a cartoon understanding of the GDPR -
| the author of the Chrome extension is neither a data controller
| nor a data processor.
|
| I'd ask the commenters that jump on a project like this to
| introspect a bit and try to understand why their first impulse
| on seeing someone's effort like this is to try to take it down.
| iamjackg wrote:
| You raise a very valid point, but one could make an argument
| that it's _okay_ for innovation and entrepreneurship to slow
| down, if it's done to protect people's rights and ensure that
| companies do things "properly."
|
| It seems to be an eternal cyclical process: people come up
| with something that either sidesteps or has no regulation
| whatsoever, you get massive growth and innovation, which
| turns to exploitation, which leads to regulation.
|
| We've seen it in all fields, from tech to pharmaceuticals to
| big box retail. Is it an acceptable compromise in society to
| "let things happen" for a while before regulating? Or should
| we all slow down and think about the consequences before
| pushing forward?
| de6u99er wrote:
| The author should at least make users aware that data is
| being sent to Google servers.
|
| I don't see GDPR as preventing innovation. Quite on the
| contrary, it has enabled European companies, which host their
| data by default in the EU, to create competitive products.
|
| That being said, I think in Europe there's, compared to the
| US, more old money controlled by people who don't like to
| share or lose their wealth.
| iamjackg wrote:
| Hm, this is actually a very interesting case. If the extension
| is (I'm assuming) literally just leveraging the API and not
| storing anything at all, would it be sufficient to let people
| know it's doing so in order to be compliant with GDPR?
|
| Does this even count as something that would be covered by
| GDPR? Is it because the data collection is tangential to the
| "service" being offered?
|
| Chrome's Privacy page [1] doesn't say anything about the API.
| Other people have also been wondering about the privacy
| implications of using the Speech API. What an interesting
| rabbit hole!
|
| 1: https://www.google.com/intl/en/chrome/privacy/
| renewiltord wrote:
| Why don't you tell us how it violates GDPR? And then you can
| make an issue on Github to tell the guy to stop offering his
| thing in the EU.
| zaptrem wrote:
| Has anyone built a Siri integration/Shortcut for this? I'm
| referring to the actual ChatGPT model, not via the normal OpenAI
| API.
| ericHosick wrote:
| Have not looked into this but someone I know said they got it
| working: https://www.youtube.com/watch?v=gePhjvKdUro
| LeoPanthera wrote:
| ChatGPT does not (yet) have an API. If there are any, they
| would have to resort to screen-scraping.
|
| An API for ChatGPT is apparently coming.
| Nuzzerino wrote:
| https://github.com/acheong08/ChatGPT
| LeoPanthera wrote:
| That's really no different to screen scraping. It could
| break at any moment. And I'm pretty sure it violates the
| OpenAI TOS too.
| pandominium wrote:
| rip to the corporate voice assistants!
| xony wrote:
| [dead]
| bilater wrote:
| Nice - I think Web Speech APIs are a super powerful tool that a
| lot of devs would be surprised to learn they get out of the box
| (I was).
| riskneutral wrote:
| This is great, thanks!!!
| dddrh wrote:
| Been noodling on this idea and love to have an
| example/inspiration. Thanks for sharing.
|
| (Edit) Specifically glad to learn about the web speech APIs. As a
| hobby programmer by night and Product Manager by day learning
| about more and more capabilities I can leverage in toy projects
| is awesome.
|
| I was considering trying out whisperAI but this seems like a
| better stepping stone for "simpler" starter projects.
| anitakirkovska wrote:
| [flagged]
| jug wrote:
| Funny. I was just today thinking of this but on a Raspberry Pi
| with mic and speakers. Google Bard is surely coming to Google
| Home but it would be a fun project to get a head start of sorts!
| leobg wrote:
| Cool. I built something similar for myself. Though it uses the
| GPT-3 API rather than ChatGPT:
|
| Whisper.cpp for text input. Google WaveNet Voice for text output.
| One button for the user to start and stop speaking, like Siri.
| Allows me and my daughter to literally talk to GPT-3 and have
| real conversations with it.
|
| (Though I'd never let her do that without supervision. Also, she
| has learned very quickly that she needs to take everything it
| says with a grain of salt, and that it's important to fact-check
| its answers.)
|
| I'd be happy to show it, but Whisper is quite CPU intensive. I
| don't know how to host it so it can handle any meaningful number
| of concurrent users without breaking the bank. If anyone has
| suggestions or wants to help, let me know.
| louison11 wrote:
| Use a cloud service for speech to text, like Google's API, and
| you should be able to handle a pretty good number of users
| without breaking the bank. (I believe they have a free tier,
| then pretty reasonable pricing. You'd just have to set up a
| rate limiter on your server to make sure nobody's abusing it).
| jsharf wrote:
| Theoretically, one could compile whisper.cpp to run in the
| browser using emscripten, maybe made faster with webgl...
|
| I think this would be quite a heavy page load time for a
| website, but if the model file gets cached, and the user has a
| decent CPU/GPU, it... could work?
| Animats wrote:
| This is using Google's text to speech and speech to text, right?
| Isn't that a paid service?
| russellbeattie wrote:
| Chrome provides basic speech to text for free via the browser's
| implementation of the Web Speech API. There is a different
| Google Cloud version, which is a commercial service that
| provides a better model for more accuracy and optional data
| logging. The page below can be used by Safari as well, which
| sends the data to Apple's servers for processing instead.
|
| https://www.google.com/intl/en/chrome/demos/speech.html
___________________________________________________________________
(page generated 2023-02-19 23:01 UTC)