hngopher.com

       [HN Gopher] Show HN: open source framework OpenAI uses for Advan...
       ___________________________________________________________________
        
       Show HN: open source framework OpenAI uses for Advanced Voice
        
       Hey HN, we've been working with OpenAI for the past few months on
       the new Realtime API.  The goal is to give everyone access to the
       same stack that underpins Advanced Voice in the ChatGPT app.  Under
       the hood it works like this: - A user's speech is captured by a
       LiveKit client SDK in the ChatGPT app - Their speech is streamed
       using WebRTC to OpenAI's voice agent - The agent relays the speech
       prompt over websocket to GPT-4o - GPT-4o runs inference and streams
       speech packets (over websocket) back to the agent - The agent
       relays generated speech using WebRTC back to the user's device  The
       Realtime API that OpenAI launched is the websocket interface to
       GPT-4o. This backend framework covers the voice agent portion.
       Besides having additional logic like function calling, the agent
       fundamentally proxies WebRTC to websocket.  The reason for this is
       because websocket isn't the best choice for client-server
       communication. The vast majority of packet loss occurs between a
       server and client device and websocket doesn't provide programmatic
       control or intervention in lossy network environments like WiFi or
       cellular. Packet loss leads to higher latency and choppy or garbled
       audio.
        
       Author : russ
       Score  : 45 points
       Date   : 2024-10-04 17:01 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | mycall wrote:
       | I wonder when Azure OpenAI will get this.
        
       | gastonmorixe wrote:
       | Nice they have many partners on this. I see Azure as well.
       | 
       | There is a common consensus that the new Realtime API is not
       | actually using the same Advanced Voice model / engine - or
       | however it works - since at least the TTS part doesn't seem to be
       | as capable as the one shipped with the official OpenAI app.
       | 
       | Any idea on this?
       | 
       | Source: https://github.com/openai/openai-realtime-api-
       | beta/issues/2
        
       | FanaHOVA wrote:
       | Olivier, Michelle, and Romain gave you guys a shoutout like 3
       | times in our DevDay recap podcast if you need more testimonial
       | quotes :) https://www.latent.space/p/devday-2024
        
       | pj_mukh wrote:
       | Super cool! Didn't realize OpenAI is just using LiveKit.
       | 
       | Does the pricing breakdown to be the same as having a OpenAI
       | Advanced Voice socket open the whole time? It's like $9/hr!
       | 
       | It would be theoretically cheaper to use this without keeping the
       | advanced voice socket open the whole time and just use the GPT4o
       | streaming service [1] for whenever inference is needed (pay per
       | token) and use livekits other components to do the rest (TTS, VAD
       | etc.).
       | 
       | What's the trade off here?
       | 
       | [1]: https://platform.openai.com/docs/api-reference/streaming
        
       | willsmith72 wrote:
       | That was cool, but got up to $1 usage real quick
        
       ___________________________________________________________________
       (page generated 2024-10-04 23:00 UTC)