[HN Gopher] OpenAI DevDay 2024 live blog
       ___________________________________________________________________
        
       OpenAI DevDay 2024 live blog
        
       Author : plurby
       Score  : 163 points
       Date   : 2024-10-01 17:45 UTC (5 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | bigcat12345678 wrote:
       | Seems mostly standard items so far.
        
       | thenameless7741 wrote:
       | Blog updates:
       | 
       | - Introducing the Realtime API:
       | https://openai.com/index/introducing-the-realtime-api/
       | 
       | - Introducing vision to the fine-tuning API:
       | https://openai.com/index/introducing-vision-to-the-fine-tuni...
       | 
       | - Prompt Caching in the API: https://openai.com/index/api-prompt-
       | caching/
       | 
       | - Model Distillation in the API: https://openai.com/index/api-
       | model-distillation/
       | 
       | Docs updates:
       | 
       | - Realtime API: https://platform.openai.com/docs/guides/realtime
       | 
       | - Vision fine-tuning:
       | https://platform.openai.com/docs/guides/fine-tuning/vision
       | 
       | - Prompt Caching: https://platform.openai.com/docs/guides/prompt-
       | caching
       | 
       | - Model Distillation:
       | https://platform.openai.com/docs/guides/distillation
       | 
       | - Evaluating model performance:
       | https://platform.openai.com/docs/guides/evals
       | 
       | Additional updates from @OpenAIDevs:
       | https://x.com/OpenAIDevs/status/1841175537060102396
       | 
       | - New prompt generator on https://playground.openai.com
       | 
       | - Access to the o1 model is expanded to developers on usage tier
       | 3, and rate limits are increased (to the same limits as GPT-4o)
       | 
       | Additional updates from @OpenAI:
       | https://x.com/OpenAI/status/1841179938642411582
       | 
       | - Advanced Voice is rolling out globally to ChatGPT Enterprise,
       | Edu, and Team users. Free users will get a sneak peak of it
       | (except EU).
        
         | visarga wrote:
         | > Advanced Voice is rolling out globally to ChatGPT Enterprise,
         | Edu, and Team users. Free users will get a sneak peak of it.
         | 
         | So regular paying users from EU are still left out in the cold.
        
           | AlanYx wrote:
           | It's probably stuck in legal limbo in the EU. The recently
           | passed EU AI Act prohibits "AI systems aiming to identify or
           | infer emotions", and Advanced Voice does definitely infer the
           | user's emotions.
           | 
           | (There is an exemption for "AI systems placed on the market
           | strictly for medical or safety reasons, such as systems
           | intended for therapeutical use", but Advanced Voice probably
           | doesn't benefit from that exemption.)
        
             | qwertox wrote:
             | Apparently this prohibition only applies to " _situations
             | related to the workplace and education_ ", and, in this
             | context, " _That prohibition should not cover AI systems
             | placed on the market strictly for medical or safety
             | reasons_ "
             | 
             | So it seems to be possible to use this in a personal
             | context.
             | 
             | https://artificialintelligenceact.eu/recital/44/
             | 
             | > Therefore, the placing on the market, the putting into
             | service, or the use of AI systems intended to be used to
             | detect the emotional state of individuals in _situations
             | related to the workplace and education should be
             | prohibited. That prohibition should not cover AI systems
             | placed on the market strictly for medical or safety
             | reasons_ , such as systems intended for therapeutical use.
        
               | AlanYx wrote:
               | This is true, though it may not make sense commercially
               | for them to offer an API that can't be used for workplace
               | (business) applications or education.
        
               | qwertox wrote:
               | I see what you mean, but I think that "workplace"
               | specifically refers to the context of the workplace, so
               | that an employer cannot use AI to monitor the employees,
               | even if they have been pressured to agree to such a
               | monitoring. I think this is unrelated to "commercially
               | offering services which can detect emotions".
               | 
               | But then I don't get the spirit of that limitation, as it
               | should be just as applicable to TVs listening in on your
               | conversations and trying to infer your emotions. Then
               | again, I guess that for these cases there are other rules
               | in place which prohibit doing this without the explicit
               | consent of the user.
        
               | runako wrote:
               | > I think that
               | 
               | > I think this
               | 
               | > I don't get the spirit of that limitation
               | 
               | > I guess that
               | 
               | In a nutshell, this uncertainty is why firms are going to
               | slow-roll EU rollout of AI and, for designated
               | gatekeepers, other features. Until there is a body of
               | litigated cases to use as reference, companies would be
               | placing themselves on the hook for tremendous fines, not
               | to mention the distraction of the executives.
               | 
               | Which, not making any value judgement here, is the point
               | of these laws. To slow down innovation so that society,
               | government, regulation, can digest new technologies. This
               | is the intended effect, and the laws are working.
        
           | Version467 wrote:
           | Yes, but it works with a vpn and the change in latency isn't
           | big enough to have a noticeable impact on usability.
        
       | hidelooktropic wrote:
       | Any word on increased weekly caps on o1 usage?
        
         | zamadatix wrote:
         | Weekly caps are for standard accounts (not going to be talked
         | about at DevDay). The blog does note RPM changes for the API
         | though:
         | 
         | "10:30 They started with some demos of o1 being used in
         | applications, and announced that the rate limit for o1 doubled
         | to 10000 RPM (from 5000 RPM) - same as GPT-4 now."
        
       | nielsole wrote:
       | > The first big announcement: a realtime API, providing the
       | ability to use WebSockets to implement voice input and output
       | against their models.
       | 
       | I guess this is using their "old" turn-based voice system?
        
         | bcherry wrote:
         | No, it's the same thing as ChatGPT advanced voice. Full speech-
         | to-speech model.
        
           | chrisshroba wrote:
           | Right, see the "Handling interruptions" section here:
           | https://platform.openai.com/docs/guides/realtime/integration
        
       | qwertox wrote:
       | > The Realtime API improves this by streaming audio inputs and
       | outputs directly, enabling more natural conversational
       | experiences. It can also handle interruptions automatically, much
       | like Advanced Voice Mode in ChatGPT.
       | 
       | > Under the hood, the Realtime API lets you create a persistent
       | WebSocket connection to exchange messages with GPT-4o. The API
       | supports function calling(opens in a new window), which makes it
       | possible for voice assistants to respond to user requests by
       | triggering actions or pulling in new context.
       | 
       | -
       | 
       | This sounds really interesting, and I see a great use cases for
       | it. However, I'm wondering if the API provides a text
       | transcription of both the input and output so that I can store
       | the data directly in a database without needing to transcribe the
       | audio separately.
       | 
       | -
       | 
       | Edit: Apparently it does.
       | 
       | It sends `conversation.item.input_audio_transcription.completed`
       | [0] events when the input transcription is done (I guess a couple
       | of them in real-time)
       | 
       | and `response.done` [1] with the response text.
       | 
       | [0] https://platform.openai.com/docs/api-reference/realtime-
       | serv...
       | 
       | [1] https://platform.openai.com/docs/api-reference/realtime-
       | serv...
        
         | tough wrote:
         | saw velvet show hn the other dya, could be usful for storng
         | these https://news.ycombinator.com/item?id=41637550
        
           | BoorishBears wrote:
           | OpenAI just launched the equivalent of Velvet as a full
           | fledged feature today.
           | 
           | But seperate from that you typically want some application
           | specific storage of the current "conversation" in a very
           | different format than raw request logging.
        
         | bcherry wrote:
         | yes it transcribes inputs automatically, but not in realtime.
         | 
         | outputs are sent in text + audio but you'll get the text very
         | quickly and audio a bit slower, and of course the audio takes
         | time to play back. the text also doesn't currently have timing
         | cues so its up to you if you want to try to play it "in sync".
         | if the user interrupts the audio, you need to send back a
         | truncation event so it can roll its own context back, and if
         | you never presented the text to the user you'll need to
         | truncate it there as well to ensure your storage isn't polluted
         | with fragments the user never heard.
        
         | pants2 wrote:
         | It's incredible that people are talking about the downfall of
         | software engineering - now, at many companies, hundreds of call
         | center roles will be replaced by a few engineering roles. With
         | image fine-tuning, now we can replace radiologists with
         | software engineers, etc. etc.
        
       | serjester wrote:
       | The eval platform is a game changer.
       | 
       | It's nice to have have a solution from OpenAI given how much they
       | use a variant of this internally. I've tried like 5 YC startups
       | and I don't think anyone's really solved this.
       | 
       | There's the very real risk of vendor lock-in but quickly scanning
       | the docs seems like it's a pretty portable implementation.
        
       | ponty_rick wrote:
       | > 11:43 Fields are generated in the same order that you defined
       | them in the schema, even though JSON is supposed to ignore key
       | order. This ensures you can implement things like chain-of-
       | thought by adding those keys in the correct order in your schema
       | design.
       | 
       | Why not use an array of key value pairs if you want to maintain
       | ordering without breaking traditional JSON rules?
       | 
       | [ {key1:value1}, {key2:value2} ]
        
         | YetAnotherNick wrote:
         | I don't think openai models supports this pattern. You can only
         | have array of fixed types. Or basically keys should be same.
         | See [1]
         | 
         | [1]: https://platform.openai.com/docs/guides/structured-
         | outputs/s...
        
         | benatkin wrote:
         | > even though JSON is supposed to ignore key order
         | 
         | Most tools preserve the order. I consider it to be an
         | unofficial feature of JSON at this point. A lot of people think
         | of it as a soft guarantee, but it's a hard guarantee in all the
         | recent JavaScript and python versions. There are some common
         | places where it's lost, like JSONB in Postgres, but it's good
         | to be aware that this unofficial feature is commonly being
         | used.
        
       | superdisk wrote:
       | Holy crud, I figured they would guard this for a long time and I
       | was really salivating to make some stuff with it. The doors are
       | wide open for all sorts of stuff now, Advanced Voice is the first
       | feature since ChatGPT initially came out that really has my jaw
       | on the floor.
        
         | jacooper wrote:
         | Try notebook LM, it's the chatgpt moment for Google's deepmind
        
           | world2vec wrote:
           | I wish I could but not available in UK, IIRC
        
       | minimaxir wrote:
       | From the Realtime API blog post:
       | https://openai.com/index/introducing-the-realtime-api/
       | 
       | > Audio in the Chat Completions API will be released in the
       | coming weeks, as a new model `gpt-4o-audio-preview`. With
       | `gpt-4o-audio-preview`, developers can input text or audio into
       | GPT-4o and receive responses in text, audio, or both.
       | 
       | > The Realtime API uses both text tokens and audio tokens. Text
       | input tokens are priced at $5 per 1M and $20 per 1M output
       | tokens. Audio input is priced at $100 per 1M tokens and output is
       | $200 per 1M tokens. This equates to approximately $0.06 per
       | minute of audio input and $0.24 per minute of audio output. Audio
       | in the Chat Completions API will be the same price.
       | 
       | As usual, OpenAI failed to emphasize the real-game changer
       | feature at their Dev Day: audio output from the standard
       | generation API.
       | 
       | This has severe implications for text-to-speech apps,
       | particularly if the audio output style is as steerable as the
       | gpt-4o voice demos.
        
         | OutOfHere wrote:
         | > and $0.24 per minute of audio output
         | 
         | That is substantially more expensive than TTS (text-to-speech)
         | which already is quite expensive.
        
           | qwertox wrote:
           | I agree. I'm wondering if it is possible to disable output
           | streaming of audio and just get the text response event.
        
             | colaco wrote:
             | It seems so.
             | 
             | The configuration of the session accepts a parameter
             | (modalities) that could restrict the response only to text.
             | See it in https://platform.openai.com/docs/api-
             | reference/realtime-clie....
        
               | bcherry wrote:
               | correct - you should also be able to save a lot by
               | skipping their built-in VAD and doing turn detection (if
               | you need it) locally to avoid paying for silent inputs.
        
           | minimaxir wrote:
           | Fair, it wouldn't work well for on-demand generation in an
           | app, but for ad-hoc cases like a voice-over it's not a huge
           | expense.
           | 
           | If OpenAI decides to fully ignore ethics and dive deep into
           | voice cloning, then all bets are off.
        
       | siva7 wrote:
       | I've never seen a company publishing consistently groundbreaking
       | features at such a speed like this one. I really wonder how their
       | teams work. It's unprecedented at what i've seen in 15 years
       | software
        
         | pheeney wrote:
         | I wonder how much they use their own products internally to
         | speed up development and decisions.
        
           | amlib wrote:
           | And I wonder how much they use them externally to influence
           | the online conversations about their own products/company.
        
           | abound wrote:
           | They definitely use their own products internally, perhaps to
           | a fault: While chatting with OpenAI recruiters, I received
           | calendar events with nonsensical DALLE-generated calendar
           | images, and "interview prep" guides that were clearly written
           | by an older GPT model.
        
         | roboboffin wrote:
         | Is it that most models are based on the transformer
         | architecture ? And so performance improvements can then we used
         | throughout their different products ?
        
         | IdiocyInAction wrote:
         | AFAIK a lot of these ideas are not new (the JSON thing was done
         | with OS models before) and OpenAI is possibly the hottest
         | startup with the most funding this decade (maybe even past two
         | decades?), so I think this is actually all within expectations.
        
           | sk11001 wrote:
           | They're exceptional at executing and delivering, you don't
           | get that just through having more funding.
        
             | jiggawatts wrote:
             | How are they exceptional?
             | 
             | Their web UI was a glitchy mess for over a year. Rollouts
             | of _just data_ is staggered and often delayed. They still
             | can't adhere to a JSON schema accurately, even though
             | others have figured this out ages ago. There are global
             | outages regularly. Etc...
             | 
             | I'm impressed by some aspects of their rapid growth, but
             | these are financial achievements (credit due Sam) more than
             | technical ones.
        
               | closewith wrote:
               | I have a few qualms with this app:
               | 
               | 1. For a Linux user, you can already build such a system
               | yourself quite trivially by getting an FTP account,
               | mounting it locally with curlftpfs, and then using SVN or
               | CVS on the mounted filesystem. From Windows or Mac, this
               | FTP account could be accessed through built-in software.
               | 
               | 2. It doesn't actually replace a USB drive. Most people I
               | know e-mail files to themselves or host them somewhere
               | online to be able to perform presentations, but they
               | still carry a USB drive in case there are connectivity
               | problems. This does not solve the connectivity issue.
               | 
               | 3. It does not seem very "viral" or income-generating. I
               | know this is premature at this point, but without
               | charging users for the service, is it reasonable to
               | expect to make money off of this?
        
               | hobofan wrote:
               | Not sure why you are being downvoted. You are generally
               | right. Most of their new product rollouts were
               | acoompanied with huge production instabilities for paying
               | customers. Only in the most recent ones did they manage
               | that better.
               | 
               | > They still can't adhere to a JSON schema accurately
               | 
               | Strict mode for structured output fixes at least this
               | though.
        
             | testfrequency wrote:
             | It's literally just a bunch of ex-stripe employees and data
             | scientists..
        
           | throwup238 wrote:
           | _> OpenAI is possibly the hottest startup with the most
           | funding this decade (maybe even past two decades?)_
           | 
           | It depends on how you define startup but I don't think they
           | will surpass Uber, ByteDance, or SpaceX until this next
           | rumored funding round.
           | 
           | I'm excluding companies that have raised funding post IPO
           | since that's an obvious cutoff for startups. The other cuttof
           | being break even, in which case Uber has raised well over $20
           | billion.
        
         | nextworddev wrote:
         | GPT 5 is writing their code
        
       | sammyteee wrote:
       | Loving these live updates, keep em coming! Thanks Simon!
        
       | lysecret wrote:
       | Using structured outputs for generative ui is such a cool idea
       | does anyone know some cool web demos related to this ?
        
         | jiggawatts wrote:
         | I just had an evil thought: once AIs are fast enough, it would
         | be possible to create a "dynamic" user interface on the fly
         | using an AI. Instead of Java or C# code running in an event
         | loop processing mouse clicks, in principle we could have a chat
         | bot generate the UI elements in a script like WPF or plain HTML
         | and process user mouse and keyboard input events!
         | 
         | If you squint at it, this is what chat bots do now, except with
         | a "terminal" style text UI instead of a GUI or true Web UI.
         | 
         | The first incremental step had already been taken: pretty-
         | printing of maths and code. Interactive components are a
         | logical next step.
         | 
         | It would be a mere afternoon of work to write a web server
         | where the dozens of "controllers" is replaced with a single
         | call to an LLM API that simply sends the previous page HTML and
         | the request HTML with headers and all.
         | 
         |  _"Based on the previous HTML above and the HTTP request below,
         | output the response HTML."_
         | 
         | Just sprinkle on some function calling and a database schema,
         | and the site is done!
        
           | ghthor wrote:
           | That actually sounds pretty entertaining. Especially if there
           | is dynamic user input, like text box input
        
       | og_kalu wrote:
       | Image output for 4o in the API would be very nice but i'm not
       | sure if that's at all in the cards.
       | 
       | Audio output in the api now but you lose image input. Why ?
       | That's a shame.
        
       | 101008 wrote:
       | I understand the Realtime API voice novelty, and the
       | techonological achievement it is, but I don't see it from the
       | product point of view. It looks like one of those startups
       | finding a solution before knowing the problem.
       | 
       | The two examples shown in the DevDay are the things I don't
       | really want to do in the future. I don't want to talk to anybody,
       | and I don't want to wait for their answer in a human form. That's
       | why I order my food through an app or Whatsapp, or why I prefer
       | to buy my tickets online. In the rare case I call to order food,
       | it's because I have a weird question or a weird request (can I
       | pick it up in X minutes? Can you prepare it in a different way?)
       | 
       | I hope we don't start seeing apps using conversations as
       | interfaces because it would really horrible (leaving aside the
       | fact that a lot of people don't know how to communicate
       | themselves, different accents, sound environments, etc), while
       | clicking or typing work almost the same for everyone (at least
       | much more normalized than talking)
        
         | ilaksh wrote:
         | You're right, having a voice conversation for any reason is
         | just so passe these days. They should stop adding microphones
         | to phones and everything. So old-fashioned and inefficient. And
         | who wants to ever have to actually talk to someone or some AI
         | to ask for anything? I'm sure our vocal cords will evolve away
         | soon. They are so primitive. Vestigial organs.
        
           | olafgeibig wrote:
           | You made my day
        
         | com2kid wrote:
         | > I understand the Realtime API voice novelty, and the
         | techonological achievement it is, but I don't see it from the
         | product point of view. It looks like one of those startups
         | finding a solution before knowing the problem.
         | 
         | The market for realistic voice agents is huge, but also very
         | fragmented. Customer service is the obvious example, large
         | companies employ tens of thousands of customer service phone
         | agents, and a large # of those calls can be handled, at least
         | in part, with a sufficiently smart voice agent.
         | 
         | Sales is another, just calling back leads and checking in on
         | them. Voice clone the original sales agent, give the AI enough
         | context about previous interactions, and a lot of boring
         | legwork can be handled by AI.
         | 
         | Answering simple questions is another great example,
         | restaurants get slammed with calls during their busiest hours
         | (seriously getting ahold restaurant staff during peak hours can
         | be literally impossible!) having an AI that can pick up the
         | phone and answer basic questions (what's in certain dishes,
         | what is the current wait time, what is the largest group that
         | can be sat together, etc) is super useful.
         | 
         | A lot of small businesses with only a single employee can
         | benefit from having a voice AI assistant picking up the phone
         | and answering the easy everyday queries and then handing
         | everything else off to the owner.
         | 
         | The key is that these voice AIs should be seamless, you ask
         | your question, they answer, and you ideally don't even know it
         | is an AI.
        
           | axus wrote:
           | And after your mis-led by a sales agent, it doesn't make you
           | as angry because it's just an AI.
        
             | 93po wrote:
             | they're definitely going to instruct the AI agents to lie
             | to you, and deliberately waste your time, and be pushier
             | than ever, because it's not costing them anything to have a
             | real human on the line even longer. at least we'll have our
             | own agents to waste their compute in turn
        
               | com2kid wrote:
               | Any company that is that scummy already has sales people
               | working for it who are that scummy and lying non-stop.
               | 
               | The AI isn't changing that equation at all.
        
               | JamesBarney wrote:
               | AI is actually better here.
               | 
               | 1. AI instructions are legible. There is no record asking
               | John to sell the customer things they don't need. There
               | is a record if the AI does it.
               | 
               | 2. AI interactions are legible. If a sales guy tells you
               | something false on a zoom call, there is no record of it.
               | If the AI does, there is a record.
        
         | bcherry wrote:
         | keep in mind that this is just v1 of the realtime api. they'll
         | add realtime vision/video down the road which can also have
         | wide applications beyond synchronous communication.
        
         | corlinp wrote:
         | One thing I'm really excited for is having this real-time voice
         | model in video game characters. It would be _really_ cool to be
         | able to have conversations with NPCs, and actually have to pick
         | their brain for information about a quest or something.
        
       | modeless wrote:
       | I didn't expect an API for advanced voice so soon. That's pretty
       | great. Here's the thing I was really wondering: Audio is $.06/min
       | in, $.24/min out. Can't wait to try some language learning apps
       | built with this. It'll also be fun for controlling robots.
        
       | N_A_T_E wrote:
       | I just need their API to be faster. 15-30 seconds per request
       | using 4o-mini isn't good enough for responsive applications.
        
         | carlgreene wrote:
         | That is odd. Longest I've experienced in my use of it is a few
         | seconds.
        
         | BoorishBears wrote:
         | You should try Azure: it comes with dedicated capacity which is
         | typically a very expensive "call our sales team" feature with
         | OpenAI
        
         | petesergeant wrote:
         | That doesn't match my experience using it a lot at all
        
       | alach11 wrote:
       | It's pretty amazing that they made prompt caching automatic. It's
       | rare that a company gives a 50% discount without the customer
       | explicitly requesting it! Of course... they might be retaining
       | some margin, judging by their discount being 50% vs. Anthropic's
       | 90%.
        
         | WiSaGaN wrote:
         | This was first done by deepseek. [1]
         | 
         | [1]: https://platform.deepseek.com/api-docs/news/news0802/
        
           | nextworddev wrote:
           | Haven't tried Deepseek - how do they compare to OaI?
        
             | WiSaGaN wrote:
             | They release SOTA open source coding models. [1] Their API
             | us also incredibly cheap due to the novel attention and MoE
             | arch.
             | 
             | [1]: https://aider.chat/docs/leaderboards/
        
       ___________________________________________________________________
       (page generated 2024-10-01 23:01 UTC)