[HN Gopher] Show HN: openai-realtime-embedded-SDK Build AI assis...
       ___________________________________________________________________
        
       Show HN: openai-realtime-embedded-SDK Build AI assistants on
       microcontrollers
        
       Hi HN! This is an SDK for ESP32s (microcontrollers) that runs
       against OpenAI's new WebRTC service [0] My hope is that people can
       easily add AI to lots of 'real' devices. Wearable devices, speakers
       around the house, toys etc... You don't have to write any code,
       just buy a device and set some env variables.  If you have any
       feedback/questions I would love to hear! I hope this kicks off a
       generation of new interesting devices. If you aren't familiar with
       WebRTC it can do some magical things. Check out WebRTC for the
       Curious[1] and would love to talk about all the cool things that
       does also.  [0] https://platform.openai.com/docs/guides/realtime-
       webrtc  [1] https://webrtcforthecurious.com
        
       Author : Sean-Der
       Score  : 55 points
       Date   : 2024-12-18 15:47 UTC (3 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | johanam wrote:
       | Love this! Excited to give it a try.
        
         | Sean-Der wrote:
         | Thank you! If you run into problems shoot me a message. I
         | really want to make this easy enough for everyone to build with
         | it.
         | 
         | I have talked with incredibly creative developers that are
         | hampered by domain knowledge requirements. I hope to see an
         | explosion of cool projects if we get this right :)
        
       | kaycebasques wrote:
       | Took a bit of poking to figure out what the use case is. Doesn't
       | seem to be mentioned in the README (usage section is empty) or
       | the intro above. Looks like the main use case is speech-to-
       | speech. Which makes sense since we're talking about embedded
       | products, and text-to-speech (for example) wouldn't usually be
       | relevant (because most embedded products don't have a keyboard
       | interface). Congrats on the launch! Cool to see WebRTC applied to
       | embedded space. Streaming speech-to-speech with WebRTC could make
       | a lot of sense.
        
         | Sean-Der wrote:
         | Sorry I forgot to put use cases in! Here are the ones I am
         | excited about.
         | 
         | * Making a toy. I have had a lot of fun putting a
         | silly/sarcastic voice in toys. My 4 year old thinks it is VERY
         | funny.
         | 
         | * Smart Speaker/Assistant. I want to put one in each room. If I
         | am in the kitchen it has a prompt to assist with recipes.
         | 
         | I have A LOT more in the future I want to do. The
         | microcontrollers I was using can't do video yet BUT ESP32 does
         | have newer ones that can. When I pull that I can do smart
         | cameras, then it gets really fun :)
        
           | kaycebasques wrote:
           | "Use case" perhaps wasn't the right word for me to use. Maybe
           | "applications" would have been a better word. What this
           | enables is speech-to-speech applications in embedded devices.
           | (From my quick scan) it doesn't seem to do anything around
           | other ML applications that OpenAI could potentially be
           | involved in, such as speech-to-text, text-to-speech, or
           | computer vision.
           | 
           | But yeah, once I figured out that this enables streaming
           | speech-to-speech applications on embedded devices, then it's
           | easy to think up use cases.
        
             | swatcoder wrote:
             | It doesn't help that this was posted to HN with the
             | "Usages" section of the README left blank. That alone would
             | probably have addressed your question. The submission is
             | just a little prematue.
             | 
             | Beyond that, while it does seem like its primarily vision
             | is for speech-to-speech interfaces, it could easily be
             | stretched to do things like send a templatized text prompt
             | that was constructed based on toggle states, sensor
             | readings, etc and (optimistically) asking for a structured
             | response that could control lights or servos or whatever.
             | 
             | Generally, this looks like a very early stage in a hobby
             | project (the code practices fall short of my expectations
             | for good embedded work, being presented as a library would
             | be better than as an application, the README needs lots of
             | work, etc), but something more sophisticated isn't too far
             | out of reach.
        
               | Sean-Der wrote:
               | I will work on making it better! This was announced
               | Tuesday [0] I still need to give it lots of love.
               | 
               | Even though the README isn't completely done, give it a
               | chance I bet you can have fun with it :)
               | 
               | [0]
               | https://youtu.be/14leJ1fg4Pw?t=625&si=aqHm1UAdDEz91TnD
        
       | jonathan-adly wrote:
       | Here is a nice use-case. Put this in a pharmacy - have people hit
       | a button, and ask questions about over-the-counter medications.
       | 
       | Really - any physical place where people are easily overwhelmed,
       | have something like that would be really nice.
       | 
       | With some work - you can probably even run RAG on the questions
       | and answer esoteric things like where the food court in an
       | airport or the ATM in a hotel.
        
         | pixelsort wrote:
         | Thanks for digging that out. Yes, that makes sense to me as
         | someone who made a fully local speech-2-speech prototype with
         | Electron, including VAD and AEC. It was responsive but taxing.
         | I had to use a mix of specialty models over onnx/wasm in the
         | renderer and llama.cpp in the main process. One day, multimodal
         | model will just do it all.
        
         | swatcoder wrote:
         | > Put this in a pharmacy - have people hit a button, and ask
         | questions about over-the-counter medications.
         | 
         | Even if _you_ trust OpenAI 's models more than your trained,
         | certified, and insured pharmacist -- the pharmacists, their
         | regulators, and their insurers sure won't!
         | 
         | They've got a century of sunk costs to consider (and maybe even
         | some valid concern over the answers a model might give on their
         | behalf...)
         | 
         | Don't be expecting anything like that in an traditional
         | regulated medical setting any time soon.
        
           | dymk wrote:
           | The last few doctors appointments I've had, the clinician
           | used a service to record and summarize the visit. It was
           | using some sort of TTS and LLM to do so. It's already in
           | medical settings.
        
             | swatcoder wrote:
             | Transcription and summary is a vastly different thing than
             | providing medical advice to patients.
        
       | roland35 wrote:
       | Favorited and starred! I wonder if the real power of this could
       | be in integrating large low cost sensor networks? I think with
       | things like video and audio it might make more sense to bump up
       | to a single board Linux board - but maybe the AI could help parse
       | or create notifications based on sensor readings, and push back
       | events to the real world (lights, solenoids, etc)
       | 
       | I think it would help to either have a freertos example, or if
       | you want to go real crazy create a zephyr integration! It would
       | be a lot of fun to work on AI and microcontroller combination -
       | what a cool niche!
        
         | Sean-Der wrote:
         | I'm very curious about what a LLM could deduce if you sent in
         | lots of sensor data.
         | 
         | I love my Airthings. It don't know if it's actionable, but it
         | would be cool to see what conclusions would come up from
         | sending co2 and radon readings in. Could make understanding
         | your home a lot easirr
        
       ___________________________________________________________________
       (page generated 2024-12-21 18:02 UTC)