hngopher.com

       [HN Gopher] Show HN: openai-realtime-embedded-SDK Build AI assis...
       ___________________________________________________________________
        
       Show HN: openai-realtime-embedded-SDK Build AI assistants on
       microcontrollers
        
       Hi HN! This is an SDK for ESP32s (microcontrollers) that runs
       against OpenAI's new WebRTC service [0] My hope is that people can
       easily add AI to lots of 'real' devices. Wearable devices, speakers
       around the house, toys etc... You don't have to write any code,
       just buy a device and set some env variables.  If you have any
       feedback/questions I would love to hear! I hope this kicks off a
       generation of new interesting devices. If you aren't familiar with
       WebRTC it can do some magical things. Check out WebRTC for the
       Curious[1] and would love to talk about all the cool things that
       does also.  [0] https://platform.openai.com/docs/guides/realtime-
       webrtc  [1] https://webrtcforthecurious.com
        
       Author : Sean-Der
       Score  : 22 points
       Date   : 2024-12-18 15:47 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | johanam wrote:
       | Love this! Excited to give it a try.
        
         | Sean-Der wrote:
         | Thank you! If you run into problems shoot me a message. I
         | really want to make this easy enough for everyone to build with
         | it.
         | 
         | I have talked with incredibly creative developers that are
         | hampered by domain knowledge requirements. I hope to see an
         | explosion of cool projects if we get this right :)
        
       | kaycebasques wrote:
       | Took a bit of poking to figure out what the use case is. Doesn't
       | seem to be mentioned in the README (usage section is empty) or
       | the intro above. Looks like the main use case is speech-to-
       | speech. Which makes sense since we're talking about embedded
       | products, and text-to-speech (for example) wouldn't usually be
       | relevant (because most embedded products don't have a keyboard
       | interface). Congrats on the launch! Cool to see WebRTC applied to
       | embedded space. Streaming speech-to-speech with WebRTC could make
       | a lot of sense.
        
         | Sean-Der wrote:
         | Sorry I forgot to put use cases in! Here are the ones I am
         | excited about.
         | 
         | * Making a toy. I have had a lot of fun putting a
         | silly/sarcastic voice in toys. My 4 year old thinks it is VERY
         | funny.
         | 
         | * Smart Speaker/Assistant. I want to put one in each room. If I
         | am in the kitchen it has a prompt to assist with recipes.
         | 
         | I have A LOT more in the future I want to do. The
         | microcontrollers I was using can't do video yet BUT ESP32 does
         | have newer ones that can. When I pull that I can do smart
         | cameras, then it gets really fun :)
        
           | kaycebasques wrote:
           | "Use case" perhaps wasn't the right word for me to use. Maybe
           | "applications" would have been a better word. What this
           | enables is speech-to-speech applications in embedded devices.
           | (From my quick scan) it doesn't seem to do anything around
           | other ML applications that OpenAI could potentially be
           | involved in, such as speech-to-text, text-to-speech, or
           | computer vision.
           | 
           | But yeah, once I figured out that this enables streaming
           | speech-to-speech applications on embedded devices, then it's
           | easy to think up use cases.
        
       | jonathan-adly wrote:
       | Here is a nice use-case. Put this in a pharmacy - have people hit
       | a button, and ask questions about over-the-counter medications.
       | 
       | Really - any physical place where people are easily overwhelmed,
       | have something like that would be really nice.
       | 
       | With some work - you can probably even run RAG on the questions
       | and answer esoteric things like where the food court in an
       | airport or the ATM in a hotel.
        
         | pixelsort wrote:
         | Thanks for digging that out. Yes, that makes sense to me as
         | someone who made a fully local speech-2-speech prototype with
         | Electron, including VAD and AEC. It was responsive but taxing.
         | I had to use a mix of specialty models over onnx/wasm in the
         | renderer and llama.cpp in the main process. One day, multimodal
         | model will just do it all.
        
       ___________________________________________________________________
       (page generated 2024-12-20 23:00 UTC)