[HN Gopher] Show HN: openai-realtime-embedded-SDK Build AI assis...
___________________________________________________________________
Show HN: openai-realtime-embedded-SDK Build AI assistants on
microcontrollers
Hi HN! This is an SDK for ESP32s (microcontrollers) that runs
against OpenAI's new WebRTC service [0] My hope is that people can
easily add AI to lots of 'real' devices. Wearable devices, speakers
around the house, toys etc... You don't have to write any code,
just buy a device and set some env variables. If you have any
feedback/questions I would love to hear! I hope this kicks off a
generation of new interesting devices. If you aren't familiar with
WebRTC it can do some magical things. Check out WebRTC for the
Curious[1] and would love to talk about all the cool things that
does also. [0] https://platform.openai.com/docs/guides/realtime-
webrtc [1] https://webrtcforthecurious.com
Author : Sean-Der
Score : 55 points
Date : 2024-12-18 15:47 UTC (3 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| johanam wrote:
| Love this! Excited to give it a try.
| Sean-Der wrote:
| Thank you! If you run into problems shoot me a message. I
| really want to make this easy enough for everyone to build with
| it.
|
| I have talked with incredibly creative developers that are
| hampered by domain knowledge requirements. I hope to see an
| explosion of cool projects if we get this right :)
| kaycebasques wrote:
| Took a bit of poking to figure out what the use case is. Doesn't
| seem to be mentioned in the README (usage section is empty) or
| the intro above. Looks like the main use case is speech-to-
| speech. Which makes sense since we're talking about embedded
| products, and text-to-speech (for example) wouldn't usually be
| relevant (because most embedded products don't have a keyboard
| interface). Congrats on the launch! Cool to see WebRTC applied to
| embedded space. Streaming speech-to-speech with WebRTC could make
| a lot of sense.
| Sean-Der wrote:
| Sorry I forgot to put use cases in! Here are the ones I am
| excited about.
|
| * Making a toy. I have had a lot of fun putting a
| silly/sarcastic voice in toys. My 4 year old thinks it is VERY
| funny.
|
| * Smart Speaker/Assistant. I want to put one in each room. If I
| am in the kitchen it has a prompt to assist with recipes.
|
| I have A LOT more in the future I want to do. The
| microcontrollers I was using can't do video yet BUT ESP32 does
| have newer ones that can. When I pull that I can do smart
| cameras, then it gets really fun :)
| kaycebasques wrote:
| "Use case" perhaps wasn't the right word for me to use. Maybe
| "applications" would have been a better word. What this
| enables is speech-to-speech applications in embedded devices.
| (From my quick scan) it doesn't seem to do anything around
| other ML applications that OpenAI could potentially be
| involved in, such as speech-to-text, text-to-speech, or
| computer vision.
|
| But yeah, once I figured out that this enables streaming
| speech-to-speech applications on embedded devices, then it's
| easy to think up use cases.
| swatcoder wrote:
| It doesn't help that this was posted to HN with the
| "Usages" section of the README left blank. That alone would
| probably have addressed your question. The submission is
| just a little prematue.
|
| Beyond that, while it does seem like its primarily vision
| is for speech-to-speech interfaces, it could easily be
| stretched to do things like send a templatized text prompt
| that was constructed based on toggle states, sensor
| readings, etc and (optimistically) asking for a structured
| response that could control lights or servos or whatever.
|
| Generally, this looks like a very early stage in a hobby
| project (the code practices fall short of my expectations
| for good embedded work, being presented as a library would
| be better than as an application, the README needs lots of
| work, etc), but something more sophisticated isn't too far
| out of reach.
| Sean-Der wrote:
| I will work on making it better! This was announced
| Tuesday [0] I still need to give it lots of love.
|
| Even though the README isn't completely done, give it a
| chance I bet you can have fun with it :)
|
| [0]
| https://youtu.be/14leJ1fg4Pw?t=625&si=aqHm1UAdDEz91TnD
| jonathan-adly wrote:
| Here is a nice use-case. Put this in a pharmacy - have people hit
| a button, and ask questions about over-the-counter medications.
|
| Really - any physical place where people are easily overwhelmed,
| have something like that would be really nice.
|
| With some work - you can probably even run RAG on the questions
| and answer esoteric things like where the food court in an
| airport or the ATM in a hotel.
| pixelsort wrote:
| Thanks for digging that out. Yes, that makes sense to me as
| someone who made a fully local speech-2-speech prototype with
| Electron, including VAD and AEC. It was responsive but taxing.
| I had to use a mix of specialty models over onnx/wasm in the
| renderer and llama.cpp in the main process. One day, multimodal
| model will just do it all.
| swatcoder wrote:
| > Put this in a pharmacy - have people hit a button, and ask
| questions about over-the-counter medications.
|
| Even if _you_ trust OpenAI 's models more than your trained,
| certified, and insured pharmacist -- the pharmacists, their
| regulators, and their insurers sure won't!
|
| They've got a century of sunk costs to consider (and maybe even
| some valid concern over the answers a model might give on their
| behalf...)
|
| Don't be expecting anything like that in an traditional
| regulated medical setting any time soon.
| dymk wrote:
| The last few doctors appointments I've had, the clinician
| used a service to record and summarize the visit. It was
| using some sort of TTS and LLM to do so. It's already in
| medical settings.
| swatcoder wrote:
| Transcription and summary is a vastly different thing than
| providing medical advice to patients.
| roland35 wrote:
| Favorited and starred! I wonder if the real power of this could
| be in integrating large low cost sensor networks? I think with
| things like video and audio it might make more sense to bump up
| to a single board Linux board - but maybe the AI could help parse
| or create notifications based on sensor readings, and push back
| events to the real world (lights, solenoids, etc)
|
| I think it would help to either have a freertos example, or if
| you want to go real crazy create a zephyr integration! It would
| be a lot of fun to work on AI and microcontroller combination -
| what a cool niche!
| Sean-Der wrote:
| I'm very curious about what a LLM could deduce if you sent in
| lots of sensor data.
|
| I love my Airthings. It don't know if it's actionable, but it
| would be cool to see what conclusions would come up from
| sending co2 and radon readings in. Could make understanding
| your home a lot easirr
___________________________________________________________________
(page generated 2024-12-21 18:02 UTC)