[HN Gopher] OpenAI DevDay 2024 live blog
___________________________________________________________________
OpenAI DevDay 2024 live blog
Author : plurby
Score : 163 points
Date : 2024-10-01 17:45 UTC (5 hours ago)
(HTM) web link (simonwillison.net)
(TXT) w3m dump (simonwillison.net)
| bigcat12345678 wrote:
| Seems mostly standard items so far.
| thenameless7741 wrote:
| Blog updates:
|
| - Introducing the Realtime API:
| https://openai.com/index/introducing-the-realtime-api/
|
| - Introducing vision to the fine-tuning API:
| https://openai.com/index/introducing-vision-to-the-fine-tuni...
|
| - Prompt Caching in the API: https://openai.com/index/api-prompt-
| caching/
|
| - Model Distillation in the API: https://openai.com/index/api-
| model-distillation/
|
| Docs updates:
|
| - Realtime API: https://platform.openai.com/docs/guides/realtime
|
| - Vision fine-tuning:
| https://platform.openai.com/docs/guides/fine-tuning/vision
|
| - Prompt Caching: https://platform.openai.com/docs/guides/prompt-
| caching
|
| - Model Distillation:
| https://platform.openai.com/docs/guides/distillation
|
| - Evaluating model performance:
| https://platform.openai.com/docs/guides/evals
|
| Additional updates from @OpenAIDevs:
| https://x.com/OpenAIDevs/status/1841175537060102396
|
| - New prompt generator on https://playground.openai.com
|
| - Access to the o1 model is expanded to developers on usage tier
| 3, and rate limits are increased (to the same limits as GPT-4o)
|
| Additional updates from @OpenAI:
| https://x.com/OpenAI/status/1841179938642411582
|
| - Advanced Voice is rolling out globally to ChatGPT Enterprise,
| Edu, and Team users. Free users will get a sneak peak of it
| (except EU).
| visarga wrote:
| > Advanced Voice is rolling out globally to ChatGPT Enterprise,
| Edu, and Team users. Free users will get a sneak peak of it.
|
| So regular paying users from EU are still left out in the cold.
| AlanYx wrote:
| It's probably stuck in legal limbo in the EU. The recently
| passed EU AI Act prohibits "AI systems aiming to identify or
| infer emotions", and Advanced Voice does definitely infer the
| user's emotions.
|
| (There is an exemption for "AI systems placed on the market
| strictly for medical or safety reasons, such as systems
| intended for therapeutical use", but Advanced Voice probably
| doesn't benefit from that exemption.)
| qwertox wrote:
| Apparently this prohibition only applies to " _situations
| related to the workplace and education_ ", and, in this
| context, " _That prohibition should not cover AI systems
| placed on the market strictly for medical or safety
| reasons_ "
|
| So it seems to be possible to use this in a personal
| context.
|
| https://artificialintelligenceact.eu/recital/44/
|
| > Therefore, the placing on the market, the putting into
| service, or the use of AI systems intended to be used to
| detect the emotional state of individuals in _situations
| related to the workplace and education should be
| prohibited. That prohibition should not cover AI systems
| placed on the market strictly for medical or safety
| reasons_ , such as systems intended for therapeutical use.
| AlanYx wrote:
| This is true, though it may not make sense commercially
| for them to offer an API that can't be used for workplace
| (business) applications or education.
| qwertox wrote:
| I see what you mean, but I think that "workplace"
| specifically refers to the context of the workplace, so
| that an employer cannot use AI to monitor the employees,
| even if they have been pressured to agree to such a
| monitoring. I think this is unrelated to "commercially
| offering services which can detect emotions".
|
| But then I don't get the spirit of that limitation, as it
| should be just as applicable to TVs listening in on your
| conversations and trying to infer your emotions. Then
| again, I guess that for these cases there are other rules
| in place which prohibit doing this without the explicit
| consent of the user.
| runako wrote:
| > I think that
|
| > I think this
|
| > I don't get the spirit of that limitation
|
| > I guess that
|
| In a nutshell, this uncertainty is why firms are going to
| slow-roll EU rollout of AI and, for designated
| gatekeepers, other features. Until there is a body of
| litigated cases to use as reference, companies would be
| placing themselves on the hook for tremendous fines, not
| to mention the distraction of the executives.
|
| Which, not making any value judgement here, is the point
| of these laws. To slow down innovation so that society,
| government, regulation, can digest new technologies. This
| is the intended effect, and the laws are working.
| Version467 wrote:
| Yes, but it works with a vpn and the change in latency isn't
| big enough to have a noticeable impact on usability.
| hidelooktropic wrote:
| Any word on increased weekly caps on o1 usage?
| zamadatix wrote:
| Weekly caps are for standard accounts (not going to be talked
| about at DevDay). The blog does note RPM changes for the API
| though:
|
| "10:30 They started with some demos of o1 being used in
| applications, and announced that the rate limit for o1 doubled
| to 10000 RPM (from 5000 RPM) - same as GPT-4 now."
| nielsole wrote:
| > The first big announcement: a realtime API, providing the
| ability to use WebSockets to implement voice input and output
| against their models.
|
| I guess this is using their "old" turn-based voice system?
| bcherry wrote:
| No, it's the same thing as ChatGPT advanced voice. Full speech-
| to-speech model.
| chrisshroba wrote:
| Right, see the "Handling interruptions" section here:
| https://platform.openai.com/docs/guides/realtime/integration
| qwertox wrote:
| > The Realtime API improves this by streaming audio inputs and
| outputs directly, enabling more natural conversational
| experiences. It can also handle interruptions automatically, much
| like Advanced Voice Mode in ChatGPT.
|
| > Under the hood, the Realtime API lets you create a persistent
| WebSocket connection to exchange messages with GPT-4o. The API
| supports function calling(opens in a new window), which makes it
| possible for voice assistants to respond to user requests by
| triggering actions or pulling in new context.
|
| -
|
| This sounds really interesting, and I see a great use cases for
| it. However, I'm wondering if the API provides a text
| transcription of both the input and output so that I can store
| the data directly in a database without needing to transcribe the
| audio separately.
|
| -
|
| Edit: Apparently it does.
|
| It sends `conversation.item.input_audio_transcription.completed`
| [0] events when the input transcription is done (I guess a couple
| of them in real-time)
|
| and `response.done` [1] with the response text.
|
| [0] https://platform.openai.com/docs/api-reference/realtime-
| serv...
|
| [1] https://platform.openai.com/docs/api-reference/realtime-
| serv...
| tough wrote:
| saw velvet show hn the other dya, could be usful for storng
| these https://news.ycombinator.com/item?id=41637550
| BoorishBears wrote:
| OpenAI just launched the equivalent of Velvet as a full
| fledged feature today.
|
| But seperate from that you typically want some application
| specific storage of the current "conversation" in a very
| different format than raw request logging.
| bcherry wrote:
| yes it transcribes inputs automatically, but not in realtime.
|
| outputs are sent in text + audio but you'll get the text very
| quickly and audio a bit slower, and of course the audio takes
| time to play back. the text also doesn't currently have timing
| cues so its up to you if you want to try to play it "in sync".
| if the user interrupts the audio, you need to send back a
| truncation event so it can roll its own context back, and if
| you never presented the text to the user you'll need to
| truncate it there as well to ensure your storage isn't polluted
| with fragments the user never heard.
| pants2 wrote:
| It's incredible that people are talking about the downfall of
| software engineering - now, at many companies, hundreds of call
| center roles will be replaced by a few engineering roles. With
| image fine-tuning, now we can replace radiologists with
| software engineers, etc. etc.
| serjester wrote:
| The eval platform is a game changer.
|
| It's nice to have have a solution from OpenAI given how much they
| use a variant of this internally. I've tried like 5 YC startups
| and I don't think anyone's really solved this.
|
| There's the very real risk of vendor lock-in but quickly scanning
| the docs seems like it's a pretty portable implementation.
| ponty_rick wrote:
| > 11:43 Fields are generated in the same order that you defined
| them in the schema, even though JSON is supposed to ignore key
| order. This ensures you can implement things like chain-of-
| thought by adding those keys in the correct order in your schema
| design.
|
| Why not use an array of key value pairs if you want to maintain
| ordering without breaking traditional JSON rules?
|
| [ {key1:value1}, {key2:value2} ]
| YetAnotherNick wrote:
| I don't think openai models supports this pattern. You can only
| have array of fixed types. Or basically keys should be same.
| See [1]
|
| [1]: https://platform.openai.com/docs/guides/structured-
| outputs/s...
| benatkin wrote:
| > even though JSON is supposed to ignore key order
|
| Most tools preserve the order. I consider it to be an
| unofficial feature of JSON at this point. A lot of people think
| of it as a soft guarantee, but it's a hard guarantee in all the
| recent JavaScript and python versions. There are some common
| places where it's lost, like JSONB in Postgres, but it's good
| to be aware that this unofficial feature is commonly being
| used.
| superdisk wrote:
| Holy crud, I figured they would guard this for a long time and I
| was really salivating to make some stuff with it. The doors are
| wide open for all sorts of stuff now, Advanced Voice is the first
| feature since ChatGPT initially came out that really has my jaw
| on the floor.
| jacooper wrote:
| Try notebook LM, it's the chatgpt moment for Google's deepmind
| world2vec wrote:
| I wish I could but not available in UK, IIRC
| minimaxir wrote:
| From the Realtime API blog post:
| https://openai.com/index/introducing-the-realtime-api/
|
| > Audio in the Chat Completions API will be released in the
| coming weeks, as a new model `gpt-4o-audio-preview`. With
| `gpt-4o-audio-preview`, developers can input text or audio into
| GPT-4o and receive responses in text, audio, or both.
|
| > The Realtime API uses both text tokens and audio tokens. Text
| input tokens are priced at $5 per 1M and $20 per 1M output
| tokens. Audio input is priced at $100 per 1M tokens and output is
| $200 per 1M tokens. This equates to approximately $0.06 per
| minute of audio input and $0.24 per minute of audio output. Audio
| in the Chat Completions API will be the same price.
|
| As usual, OpenAI failed to emphasize the real-game changer
| feature at their Dev Day: audio output from the standard
| generation API.
|
| This has severe implications for text-to-speech apps,
| particularly if the audio output style is as steerable as the
| gpt-4o voice demos.
| OutOfHere wrote:
| > and $0.24 per minute of audio output
|
| That is substantially more expensive than TTS (text-to-speech)
| which already is quite expensive.
| qwertox wrote:
| I agree. I'm wondering if it is possible to disable output
| streaming of audio and just get the text response event.
| colaco wrote:
| It seems so.
|
| The configuration of the session accepts a parameter
| (modalities) that could restrict the response only to text.
| See it in https://platform.openai.com/docs/api-
| reference/realtime-clie....
| bcherry wrote:
| correct - you should also be able to save a lot by
| skipping their built-in VAD and doing turn detection (if
| you need it) locally to avoid paying for silent inputs.
| minimaxir wrote:
| Fair, it wouldn't work well for on-demand generation in an
| app, but for ad-hoc cases like a voice-over it's not a huge
| expense.
|
| If OpenAI decides to fully ignore ethics and dive deep into
| voice cloning, then all bets are off.
| siva7 wrote:
| I've never seen a company publishing consistently groundbreaking
| features at such a speed like this one. I really wonder how their
| teams work. It's unprecedented at what i've seen in 15 years
| software
| pheeney wrote:
| I wonder how much they use their own products internally to
| speed up development and decisions.
| amlib wrote:
| And I wonder how much they use them externally to influence
| the online conversations about their own products/company.
| abound wrote:
| They definitely use their own products internally, perhaps to
| a fault: While chatting with OpenAI recruiters, I received
| calendar events with nonsensical DALLE-generated calendar
| images, and "interview prep" guides that were clearly written
| by an older GPT model.
| roboboffin wrote:
| Is it that most models are based on the transformer
| architecture ? And so performance improvements can then we used
| throughout their different products ?
| IdiocyInAction wrote:
| AFAIK a lot of these ideas are not new (the JSON thing was done
| with OS models before) and OpenAI is possibly the hottest
| startup with the most funding this decade (maybe even past two
| decades?), so I think this is actually all within expectations.
| sk11001 wrote:
| They're exceptional at executing and delivering, you don't
| get that just through having more funding.
| jiggawatts wrote:
| How are they exceptional?
|
| Their web UI was a glitchy mess for over a year. Rollouts
| of _just data_ is staggered and often delayed. They still
| can't adhere to a JSON schema accurately, even though
| others have figured this out ages ago. There are global
| outages regularly. Etc...
|
| I'm impressed by some aspects of their rapid growth, but
| these are financial achievements (credit due Sam) more than
| technical ones.
| closewith wrote:
| I have a few qualms with this app:
|
| 1. For a Linux user, you can already build such a system
| yourself quite trivially by getting an FTP account,
| mounting it locally with curlftpfs, and then using SVN or
| CVS on the mounted filesystem. From Windows or Mac, this
| FTP account could be accessed through built-in software.
|
| 2. It doesn't actually replace a USB drive. Most people I
| know e-mail files to themselves or host them somewhere
| online to be able to perform presentations, but they
| still carry a USB drive in case there are connectivity
| problems. This does not solve the connectivity issue.
|
| 3. It does not seem very "viral" or income-generating. I
| know this is premature at this point, but without
| charging users for the service, is it reasonable to
| expect to make money off of this?
| hobofan wrote:
| Not sure why you are being downvoted. You are generally
| right. Most of their new product rollouts were
| acoompanied with huge production instabilities for paying
| customers. Only in the most recent ones did they manage
| that better.
|
| > They still can't adhere to a JSON schema accurately
|
| Strict mode for structured output fixes at least this
| though.
| testfrequency wrote:
| It's literally just a bunch of ex-stripe employees and data
| scientists..
| throwup238 wrote:
| _> OpenAI is possibly the hottest startup with the most
| funding this decade (maybe even past two decades?)_
|
| It depends on how you define startup but I don't think they
| will surpass Uber, ByteDance, or SpaceX until this next
| rumored funding round.
|
| I'm excluding companies that have raised funding post IPO
| since that's an obvious cutoff for startups. The other cuttof
| being break even, in which case Uber has raised well over $20
| billion.
| nextworddev wrote:
| GPT 5 is writing their code
| sammyteee wrote:
| Loving these live updates, keep em coming! Thanks Simon!
| lysecret wrote:
| Using structured outputs for generative ui is such a cool idea
| does anyone know some cool web demos related to this ?
| jiggawatts wrote:
| I just had an evil thought: once AIs are fast enough, it would
| be possible to create a "dynamic" user interface on the fly
| using an AI. Instead of Java or C# code running in an event
| loop processing mouse clicks, in principle we could have a chat
| bot generate the UI elements in a script like WPF or plain HTML
| and process user mouse and keyboard input events!
|
| If you squint at it, this is what chat bots do now, except with
| a "terminal" style text UI instead of a GUI or true Web UI.
|
| The first incremental step had already been taken: pretty-
| printing of maths and code. Interactive components are a
| logical next step.
|
| It would be a mere afternoon of work to write a web server
| where the dozens of "controllers" is replaced with a single
| call to an LLM API that simply sends the previous page HTML and
| the request HTML with headers and all.
|
| _"Based on the previous HTML above and the HTTP request below,
| output the response HTML."_
|
| Just sprinkle on some function calling and a database schema,
| and the site is done!
| ghthor wrote:
| That actually sounds pretty entertaining. Especially if there
| is dynamic user input, like text box input
| og_kalu wrote:
| Image output for 4o in the API would be very nice but i'm not
| sure if that's at all in the cards.
|
| Audio output in the api now but you lose image input. Why ?
| That's a shame.
| 101008 wrote:
| I understand the Realtime API voice novelty, and the
| techonological achievement it is, but I don't see it from the
| product point of view. It looks like one of those startups
| finding a solution before knowing the problem.
|
| The two examples shown in the DevDay are the things I don't
| really want to do in the future. I don't want to talk to anybody,
| and I don't want to wait for their answer in a human form. That's
| why I order my food through an app or Whatsapp, or why I prefer
| to buy my tickets online. In the rare case I call to order food,
| it's because I have a weird question or a weird request (can I
| pick it up in X minutes? Can you prepare it in a different way?)
|
| I hope we don't start seeing apps using conversations as
| interfaces because it would really horrible (leaving aside the
| fact that a lot of people don't know how to communicate
| themselves, different accents, sound environments, etc), while
| clicking or typing work almost the same for everyone (at least
| much more normalized than talking)
| ilaksh wrote:
| You're right, having a voice conversation for any reason is
| just so passe these days. They should stop adding microphones
| to phones and everything. So old-fashioned and inefficient. And
| who wants to ever have to actually talk to someone or some AI
| to ask for anything? I'm sure our vocal cords will evolve away
| soon. They are so primitive. Vestigial organs.
| olafgeibig wrote:
| You made my day
| com2kid wrote:
| > I understand the Realtime API voice novelty, and the
| techonological achievement it is, but I don't see it from the
| product point of view. It looks like one of those startups
| finding a solution before knowing the problem.
|
| The market for realistic voice agents is huge, but also very
| fragmented. Customer service is the obvious example, large
| companies employ tens of thousands of customer service phone
| agents, and a large # of those calls can be handled, at least
| in part, with a sufficiently smart voice agent.
|
| Sales is another, just calling back leads and checking in on
| them. Voice clone the original sales agent, give the AI enough
| context about previous interactions, and a lot of boring
| legwork can be handled by AI.
|
| Answering simple questions is another great example,
| restaurants get slammed with calls during their busiest hours
| (seriously getting ahold restaurant staff during peak hours can
| be literally impossible!) having an AI that can pick up the
| phone and answer basic questions (what's in certain dishes,
| what is the current wait time, what is the largest group that
| can be sat together, etc) is super useful.
|
| A lot of small businesses with only a single employee can
| benefit from having a voice AI assistant picking up the phone
| and answering the easy everyday queries and then handing
| everything else off to the owner.
|
| The key is that these voice AIs should be seamless, you ask
| your question, they answer, and you ideally don't even know it
| is an AI.
| axus wrote:
| And after your mis-led by a sales agent, it doesn't make you
| as angry because it's just an AI.
| 93po wrote:
| they're definitely going to instruct the AI agents to lie
| to you, and deliberately waste your time, and be pushier
| than ever, because it's not costing them anything to have a
| real human on the line even longer. at least we'll have our
| own agents to waste their compute in turn
| com2kid wrote:
| Any company that is that scummy already has sales people
| working for it who are that scummy and lying non-stop.
|
| The AI isn't changing that equation at all.
| JamesBarney wrote:
| AI is actually better here.
|
| 1. AI instructions are legible. There is no record asking
| John to sell the customer things they don't need. There
| is a record if the AI does it.
|
| 2. AI interactions are legible. If a sales guy tells you
| something false on a zoom call, there is no record of it.
| If the AI does, there is a record.
| bcherry wrote:
| keep in mind that this is just v1 of the realtime api. they'll
| add realtime vision/video down the road which can also have
| wide applications beyond synchronous communication.
| corlinp wrote:
| One thing I'm really excited for is having this real-time voice
| model in video game characters. It would be _really_ cool to be
| able to have conversations with NPCs, and actually have to pick
| their brain for information about a quest or something.
| modeless wrote:
| I didn't expect an API for advanced voice so soon. That's pretty
| great. Here's the thing I was really wondering: Audio is $.06/min
| in, $.24/min out. Can't wait to try some language learning apps
| built with this. It'll also be fun for controlling robots.
| N_A_T_E wrote:
| I just need their API to be faster. 15-30 seconds per request
| using 4o-mini isn't good enough for responsive applications.
| carlgreene wrote:
| That is odd. Longest I've experienced in my use of it is a few
| seconds.
| BoorishBears wrote:
| You should try Azure: it comes with dedicated capacity which is
| typically a very expensive "call our sales team" feature with
| OpenAI
| petesergeant wrote:
| That doesn't match my experience using it a lot at all
| alach11 wrote:
| It's pretty amazing that they made prompt caching automatic. It's
| rare that a company gives a 50% discount without the customer
| explicitly requesting it! Of course... they might be retaining
| some margin, judging by their discount being 50% vs. Anthropic's
| 90%.
| WiSaGaN wrote:
| This was first done by deepseek. [1]
|
| [1]: https://platform.deepseek.com/api-docs/news/news0802/
| nextworddev wrote:
| Haven't tried Deepseek - how do they compare to OaI?
| WiSaGaN wrote:
| They release SOTA open source coding models. [1] Their API
| us also incredibly cheap due to the novel attention and MoE
| arch.
|
| [1]: https://aider.chat/docs/leaderboards/
___________________________________________________________________
(page generated 2024-10-01 23:01 UTC)