[HN Gopher] Show HN: openai-realtime-embedded-SDK Build AI assis...
___________________________________________________________________
Show HN: openai-realtime-embedded-SDK Build AI assistants on
microcontrollers
Hi HN! This is an SDK for ESP32s (microcontrollers) that runs
against OpenAI's new WebRTC service [0] My hope is that people can
easily add AI to lots of 'real' devices. Wearable devices, speakers
around the house, toys etc... You don't have to write any code,
just buy a device and set some env variables. If you have any
feedback/questions I would love to hear! I hope this kicks off a
generation of new interesting devices. If you aren't familiar with
WebRTC it can do some magical things. Check out WebRTC for the
Curious[1] and would love to talk about all the cool things that
does also. [0] https://platform.openai.com/docs/guides/realtime-
webrtc [1] https://webrtcforthecurious.com
Author : Sean-Der
Score : 22 points
Date : 2024-12-18 15:47 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| johanam wrote:
| Love this! Excited to give it a try.
| Sean-Der wrote:
| Thank you! If you run into problems shoot me a message. I
| really want to make this easy enough for everyone to build with
| it.
|
| I have talked with incredibly creative developers that are
| hampered by domain knowledge requirements. I hope to see an
| explosion of cool projects if we get this right :)
| kaycebasques wrote:
| Took a bit of poking to figure out what the use case is. Doesn't
| seem to be mentioned in the README (usage section is empty) or
| the intro above. Looks like the main use case is speech-to-
| speech. Which makes sense since we're talking about embedded
| products, and text-to-speech (for example) wouldn't usually be
| relevant (because most embedded products don't have a keyboard
| interface). Congrats on the launch! Cool to see WebRTC applied to
| embedded space. Streaming speech-to-speech with WebRTC could make
| a lot of sense.
| Sean-Der wrote:
| Sorry I forgot to put use cases in! Here are the ones I am
| excited about.
|
| * Making a toy. I have had a lot of fun putting a
| silly/sarcastic voice in toys. My 4 year old thinks it is VERY
| funny.
|
| * Smart Speaker/Assistant. I want to put one in each room. If I
| am in the kitchen it has a prompt to assist with recipes.
|
| I have A LOT more in the future I want to do. The
| microcontrollers I was using can't do video yet BUT ESP32 does
| have newer ones that can. When I pull that I can do smart
| cameras, then it gets really fun :)
| kaycebasques wrote:
| "Use case" perhaps wasn't the right word for me to use. Maybe
| "applications" would have been a better word. What this
| enables is speech-to-speech applications in embedded devices.
| (From my quick scan) it doesn't seem to do anything around
| other ML applications that OpenAI could potentially be
| involved in, such as speech-to-text, text-to-speech, or
| computer vision.
|
| But yeah, once I figured out that this enables streaming
| speech-to-speech applications on embedded devices, then it's
| easy to think up use cases.
| jonathan-adly wrote:
| Here is a nice use-case. Put this in a pharmacy - have people hit
| a button, and ask questions about over-the-counter medications.
|
| Really - any physical place where people are easily overwhelmed,
| have something like that would be really nice.
|
| With some work - you can probably even run RAG on the questions
| and answer esoteric things like where the food court in an
| airport or the ATM in a hotel.
| pixelsort wrote:
| Thanks for digging that out. Yes, that makes sense to me as
| someone who made a fully local speech-2-speech prototype with
| Electron, including VAD and AEC. It was responsive but taxing.
| I had to use a mix of specialty models over onnx/wasm in the
| renderer and llama.cpp in the main process. One day, multimodal
| model will just do it all.
___________________________________________________________________
(page generated 2024-12-20 23:00 UTC)