[HN Gopher] GPT-4o
       ___________________________________________________________________
        
       GPT-4o
        
       Author : Lealen
       Score  : 1479 points
       Date   : 2024-05-13 17:28 UTC (5 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | skilled wrote:
       | Live now,
       | 
       |  _OpenAI Spring Update_
       | (https://www.youtube.com/watch?v=DQacCB9tDaw)
       | 
       | https://news.ycombinator.com/item?id=40343950
        
       | EcommerceFlow wrote:
       | A new "flagship" model with no improvement of intelligence, very
       | disappointed. Maybe this is a strategy for them to mass collect
       | "live" data before they're left behind by Google/Twitter live
       | data...
        
       | belter wrote:
       | https://youtu.be/DQacCB9tDaw
        
       | chzblck wrote:
       | real time audio is mind blowing
        
       | throwup238 wrote:
       | So what's the point of paying for ChatGPT Plus? And who on earth
       | chose to make the app Mac only...
        
         | CSMastermind wrote:
         | 5x the capacity threshold is the only thing I heard them
         | mention on the live stream.
         | 
         | Though presumably when they are ready to release new models the
         | Plus users will get them first.
        
           | anuar12 wrote:
           | I think because usability increases so much (use cases of
           | real-time conversation, and video-based coding, presentation
           | feedback at work etc...) they would expect usage to
           | drastically increase hence paying users would actually still
           | have incentive to pay.
        
         | agd wrote:
         | They mentioned an announcement about a new frontier model
         | coming soon. Presumably this will be exclusive to paid users.
        
           | johnsimer wrote:
           | Did they mention this in the gpt4o announcement video? I must
           | have missed this
        
         | riffic wrote:
         | > Plus users will have a message limit that is up to 5x greater
         | than free users
         | 
         | from https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-
         | fr...
        
       | tomschwiha wrote:
       | I like the demo for sure more than the "reduced latency" Gemini
       | demo [0].
       | 
       | [0] https://www.youtube.com/watch?v=UIZAiXYceBI
        
       | Powdering7082 wrote:
       | Wow this versioning scheme really messed up this prediction
       | market: https://kalshi.com/markets/gpt4p5/gpt45-released
        
       | smusamashah wrote:
       | That im-also-a-good-gpt2-chatbot[1] was in fact the new ChatGPT
       | model as people were assuming few days ago here on HN[2].
       | 
       | Edit: may be not, name of that bot was just "gpt2-chatbot". May
       | be that one was some initial iteration?
       | 
       | [1]
       | https://twitter.com/LiamFedus/status/1790064963966370209/pho...
       | 
       | [2] https://news.ycombinator.com/item?id=40199715
        
       | theusus wrote:
       | This 4o is already rolling out?
        
         | belter wrote:
         | They mentioned capabilities will be rolled out over the next
         | few weeks: https://youtu.be/DQacCB9tDaw?t=5018
        
       | GalaxyNova wrote:
       | It is really cool that they are bringing this to free users. It
       | does make me wonder what justifies ChatGPT plus now though...
        
         | InfiniteVortex wrote:
         | they stated that they will be announcing something new that is
         | on the next frontier (or close to it IIRC) soon. so there will
         | definitely be an incentive to pay because it will be something
         | better than gpt 4o.
        
         | pantsforbirds wrote:
         | I assume the desktop app with voice and vision is rolling out
         | to plus users first?
        
       | ppollaki wrote:
       | I've noticed that the GPT-4 model's capabilities seem limited
       | compared to its initial release. Others have also pointed this
       | out. I suspect that making the model free might have required
       | reducing its capabilities to meet cost efficiency goals. I'll
       | have to try it out to see for myself.
        
       | EcommerceFlow wrote:
       | As I commented in the other thread, really really disappointed
       | there's no intelligence update and more of a focus on "gimmicks".
       | The desktop app did look really good, especially as the models
       | get smarter. Will be canceling my premium as there's no real
       | purpose of it until that new "flag ship" model comes out.
        
         | adroniser wrote:
         | Agree on hoping for an intelligence update, but I think it was
         | clear from teasers that this was not gonna be GPT-5.
         | 
         | I'm not sure how fair it is to classify the new multimodal
         | capabilities as just a gimmick though. I personally haven't
         | integrated GPT-4 into my workflow that much and the latency and
         | the fact I have to type a query out is a big reason why.
        
       | OutOfHere wrote:
       | I don't see 4o or anything new at
       | https://platform.openai.com/docs/models
       | 
       | Overall I am highly skeptical of newer models as they risk
       | worsening the completion quality to make them cheaper for OpenAI
       | to run.
        
         | frabcus wrote:
         | It's there now! And still 128k context window
        
         | IanCal wrote:
         | It's there right now for me.
        
       | atgctg wrote:
       | Tiktoken added support for GPT-4o:
       | https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...
       | 
       | It has an increased vocab size of 200k.
        
         | minimaxir wrote:
         | For posterity, GPT-3.5/4's tokenizer was 100k. The benefit of a
         | larger tokenizer is more efficient tokenization (and therefore
         | cheaper/faster) but with massive diminishing returns: the
         | larger tokenizer makes the model more difficult to train but
         | tends to reduce token usage by 10-15%.
        
         | simonw wrote:
         | Oh interesting, does that mean languages other than English
         | won't be paying such a large penalty in terms of token lengths?
         | 
         | With previous tokenizers there was a notable increase in the
         | number of tokens needed to represent non-English sentences:
         | https://simonwillison.net/2023/Jun/8/gpt-tokenizers/
        
         | mike_hearn wrote:
         | Does that imply they retrained the foundation model from
         | scratch? I thought changing the tokenization was something you
         | couldn't really retrofit to an existing model. I mean sure they
         | might have initialized the weights from the prior GPT-4 model
         | but it'd still require a lot of retraining.
        
           | og_kalu wrote:
           | Yeah and they say as much in the blog.
        
         | moffkalast wrote:
         | Lots of those tokens would have to be pixel patches and sound
         | samples right?
        
           | nojvek wrote:
           | Yep. Since it's multimodal. Pictures, text, audio all go into
           | token space.
        
         | kristofferR wrote:
         | How are they able to use such a brand name, Tiktoken? Is it
         | because TikTok is Chinese? Tiktoken, it's almost like if Apple
         | released the Facebooken library for something entirely
         | unrelated to Facebook.
        
       | FergusArgyll wrote:
       | First Impressions in no particular order:                 Being
       | able to interrupt while GPT is talking       2x faster/cheaper
       | not really a much smarter model       Desktop app that can see
       | screenshots       Can display emotions with and change the sound
       | of "it's" voice
        
         | riffic wrote:
         | wondering what apple is cooking up and what they'll announce
         | next month.
         | 
         | by the way the contraction "it's" is used to say "it is" or "it
         | has", it is never a possessive form.
        
           | karaterobot wrote:
           | Unless you're talking about that sewer clown's balloon!
        
         | throwup238 wrote:
         | _Mac only_ desktop app. Windows version  "later this year". No
         | Linux.
         | 
         | Welp there goes my Plus subscription.
        
           | bmoxb wrote:
           | It seems like a very odd decision. It's not like OpenAI can't
           | afford to develop versions of the application for each OS in
           | parallel.
        
           | unstatusthequo wrote:
           | Why? Just use the API or normal web access version like you
           | have been since ChatGPT became available at all.
        
       | ralusek wrote:
       | Can't find info which of these new features are available via the
       | API
        
         | tazu wrote:
         | > Developers can also now access GPT-4o in the API as a text
         | and vision model. GPT-4o is 2x faster, half the price, and has
         | 5x higher rate limits compared to GPT-4 Turbo. We plan to
         | launch support for GPT-4o's new audio and video capabilities to
         | a small group of trusted partners in the API in the coming
         | weeks.
        
           | ralusek wrote:
           | [EDIT] The model has since been added to the docs
           | 
           | Not seeing it or any of those documented here:
           | 
           | https://platform.openai.com/docs/models/overview
        
             | OutOfHere wrote:
             | It is not listed as of yet, but it does work if you punch
             | in gpt-4o. I will stick with gpt-4-0125-preview for now
             | because gpt-4o seems majorly prone to hallucinations
             | whereas gpt-4-0125-preview doesn't.
        
       | Jensson wrote:
       | The most impressive part is that the voice uses the right
       | feelings and tonal language during the presentation. I'm not sure
       | how much of that was that they had tested this over and over, but
       | it is really hard to get that right so if they didn't fake it in
       | some way I'd say that is revolutionary.
        
         | gdb wrote:
         | (I work at OpenAI.)
         | 
         | It's really how it works.
        
           | xanderlewis wrote:
           | I like the humility in your first statement.
        
             | moab wrote:
             | Pretty sure the snark is unnecessary.
        
               | ayhanfuat wrote:
               | Was it snark? To me it sounds like "we all know you
               | Greg"?
        
               | xanderlewis wrote:
               | This was my intention.
        
               | moab wrote:
               | I misunderstood; my apologies.
        
               | colecut wrote:
               | I don't think it was snark. The guy is co-founder and cto
               | of OpenAi, and he didn't mention any of that..
        
               | renewiltord wrote:
               | I downvoted independently. No problem with groupies. They
               | just contaminate the thread.
               | 
               | Greg Brockman is famous for good reasons but constant "oh
               | wow it's Greg Brockman" are noisy.
        
               | egillie wrote:
               | not snark. if only hn comments could show the right
               | feelings and tonal language
        
             | Induane wrote:
             | I like their username.
        
               | belter wrote:
               | You might be talking to GPT-5...
        
             | theboat wrote:
             | I love how this comment proves the need for audio2audio. I
             | initially read it as sarcastic, but now I can't tell if
             | it's actually sincere.
        
           | jamestimmins wrote:
           | With this capability, how close are y'all to it being able to
           | listen to my pronunciation of a new language (e.g. Italian)
           | and given specific feedback about how to pronounce it like a
           | local?
           | 
           | Seems like these would be similar.
        
             | taytus wrote:
             | The italian output in the demo was really bad.
        
               | thegabriele wrote:
               | Why would you say "really bad"?
        
               | bzudo wrote:
               | It doesn't have hands.
        
               | mark38848 wrote:
               | So good!
        
               | DonHopkins wrote:
               | "I Have No Hands But I Must Scream" -Italian Ellison
        
               | rezonant wrote:
               | Joke of the day right there :-)
        
               | GaggiX wrote:
               | I'm a native Italian speaker, it wasn't too bad.
        
               | riquito wrote:
               | The content was correct but the pronunciation was awful.
               | Now, good enough? For sure, but I would not be able to
               | stand something talking like that all the time
        
               | ljsprague wrote:
               | Do you not have to work with non-native speakers of
               | whatever language you use at work?
        
               | Jensson wrote:
               | Most people don't, since you either speak with native
               | speakers or you speak in English mostly, since in
               | international teams you speak in English and not one of
               | the native languages even if nobody speaks English
               | natively. So it is rare to hear broken non-English.
               | 
               | And note that understanding broken language is a skill
               | you have to train. If you aren't used to it then it is
               | impossible to understand what they say. You might not
               | have been in that situation if you are an English speaker
               | since you are so used to broken English, but it happens a
               | lot for others.
        
               | sunnybeetroot wrote:
               | Which video title is this?
        
             | elil17 wrote:
             | It completely botched teaching someone to say "hello" in
             | Chinese - it used the wrong tones and then incorrectly told
             | them their pronunciation was good.
        
               | ShakataGaNai wrote:
               | If you read into the details on openai's site, a lot of
               | this stuff is clearly marked as english-first. For some
               | written languages...noted as anything using non-roman
               | characters, so most of asia, it basically doesn't work.
               | 
               | This really isn't surprising. Look at Google Home and
               | Alexa. When they first came out, if you weren't a white
               | male from the west coast... the accuracy of translating
               | commands dropped DRAMATICALLY. Because it was programmed,
               | designed and tested by majority of white tech bros in
               | SF/Seattle. They've gotten a lot better over the last 5+
               | years. But I think you'll see OpenAI take this route.
               | 
               | But that's ok. They have to start somewhere. Once they
               | get the model working _really well_ for one language they
               | can expand into similar ones with relatively little work.
               | The more different the language, the more hard work and
               | "local input" (ex: natives of the language) will be
               | required for adaptation. But the basic text translations
               | are still already way better than they used to be.
        
               | joseda-hg wrote:
               | An interesting point, I tend to have better outcomes by
               | using my heavily accented ESL English, than my native
               | pronunciation of my mother tongue I'm guessing it's part
               | of the tech work force being a bit more multicultural
               | than initially thought, or it just being easier to test
               | with
               | 
               | It's a shame, because that means I can use stuff that I
               | can't recommend to people around me
               | 
               | Multilingual UX is an interesting painpoint, I had to
               | change the language of my account to English so I could
               | use some early Bard version, even though It was perfectly
               | able to understand and answer in Spanish
        
               | zenlikethat wrote:
               | You also get the synchronicity / four minute mile effect
               | egging on other people to excel with specialized models,
               | like Falcon or Qwen did in the wake of the original
               | ChatGPT/Llama excitement.
        
               | greatpostman wrote:
               | Racist post
        
               | kolinko wrote:
               | What? Did it seriously work worse for women? Spurce?
               | 
               | (accents sure)
        
               | qprofyeh wrote:
               | As for the Mandarin tones, the model might have mixed it
               | up with the tones from a dialect like Cantonese. It's
               | interesting to discover how much difference a more
               | specific prompt could make.
        
             | dgroshev wrote:
             | I don't think that'd work without a dedicated startup
             | behind it.
             | 
             | The first (and imo the main) hurdle is not reproduction,
             | but just learning to hear the correct sounds. If you don't
             | speak Hindi and are a native English speaker, this [1] is a
             | good example. You can only work on nailing those consonants
             | when they become as distinct to your ear as cUp and cAp are
             | in English.
             | 
             | We can get by by falling back to context (it's unlikely
             | someone would ask for a "shit of paper"!), but it's
             | impossible to confidently reproduce the sounds unless they
             | are already completely distinct in our heads/ears.
             | 
             | That's because we think we hear things as they are, but
             | it's an illusion. Cup/cap distinction is as subtle to an
             | Eastern European as Hindi consonants or Mandarin tones are
             | to English speakers, because the set of meaningful sounds
             | distinctions differs between languages. Relearning the
             | phonetic system requires dedicated work (minimal pairs is
             | one option) and learning enough phonetics to have the
             | vocabulary to discuss sounds as they are. It's not enough
             | to just give feedback.
             | 
             | [1]: https://www.youtube.com/watch?v=-I7iUUp-cX8
        
               | dilap wrote:
               | > but it's impossible to confidently reproduce the sounds
               | unless they are already completely distinct in our
               | heads/ears
               | 
               | interestingly, i think this isn't always true -- i was
               | able to coach my native-spanish-speaking wife to
               | correctly pronounce "v" vs "b" (both are just "b" in
               | spanish, or at least her dialect) before she could hear
               | the difference; later on she was developed the ability to
               | hear it.
        
             | estebank wrote:
             | In the "Point and learn Spanish" video, when shown an Apple
             | and a Banana, the AI said they were a Manzana (Apple) and a
             | Pantalon (Pants).
        
               | unsatchmo wrote:
               | No, I just watched it closely and it definitely said un
               | platano
        
               | estebank wrote:
               | I re watched it a few times to ensure it said platano
               | before posting, and it honestly doesn't sound like it to
               | me.
        
               | david-gpu wrote:
               | I'm a Spaniard and to my ears it clearly sounds like _"
               | Es una manzana y un platano"_.
               | 
               | What's strange to me is that, as far as I know, "platano"
               | is only commonly used in Spain, but the accent of the AI
               | voice didn't sound like it's from Spain. It sounds more
               | like an American who speaks Spanish as a second language,
               | and those folks typically speak some Mexican dialect of
               | Spanish.
        
               | afc wrote:
               | I'm from Colombia and mostly say "platano".
        
               | InvaderFizz wrote:
               | I was about to comment the same thing about the accent.
               | Even to my gringo ears, it sounds like an American
               | speaking Spanish.
               | 
               | Platano is commonly used for banana in Mexico, just
               | bought some at a Soriana this weekend.
        
             | patcon wrote:
             | After watching the demo, _my_ question isn 't about how
             | close it is to helping me _learn_ a language, but about how
             | close it is to _being_ me in another language.
             | 
             | Even styles of thought might be different in other
             | languages, so I don't say that lightly... (stay strong,
             | Sapir-Wharf, stay strong ;)
        
             | hack_ml wrote:
             | I was conversing with it in Hinglish (A combination of
             | Hindi and English) which folks in Urban India use and it
             | was pretty on point apart from some use of esoteric hindi
             | words but i think with right prompting we can fix that.
        
           | baq wrote:
           | > (I work at OpenAI.)
           | 
           | Winner of the 'understatement of the week' award (and it's
           | only Monday).
           | 
           | Also top contender in the 'technically correct' category.
        
             | swyx wrote:
             | and was briefly untrue for like 2 days
        
             | behnamoh wrote:
             | > Winner of the 'understatement of the week' award (and
             | it's only Monday).
             | 
             | Yes! As soon as I saw gdb I was like "that can't be Greg",
             | but sure enough, that's him.
        
           | mttpgn wrote:
           | Licensing the emotion-intoned TTS as a standalone API is
           | something I would look forward to seeing. Not sure how
           | feasible that would be if, as a sibling comment suggested, it
           | bypasses the text-rendering step altogether.
        
           | skottenborg wrote:
           | "(I work at OpenAI.)"
           | 
           | Ah yes, also known as being co-founder :)
        
           | terhechte wrote:
           | Random OpenAI question: While the GPT models have become ever
           | cheaper, the price for the tts models have stayed in the
           | $15/1Mio char range. I was hoping this would also become
           | cheaper at some point. There're so many apps (e.g. language
           | learning) that quickly become too expensive given these
           | prices. With the GPT-4o voice (which sounds much better than
           | the current TTS or TTS HD endpoint) I thought maybe the
           | prices for TTS would go down. Sadly that hasn't happened. Is
           | that something on the OpenAI agenda?
        
           | passion__desire wrote:
           | hi gdb, could you please create an assistant AI that can
           | filter low-quality HN discussion on your comment so that it
           | can redirect my focus on useful stuff.
        
           | 999900000999 wrote:
           | How far are we away from something like a helmet with chat
           | GPT and a video camera installed, I imagine this will be
           | awesome for low vision people. Imagine having a guide tell
           | you how to walk to the grocery store, and help you grocery
           | shop without an assistant. Of course you have tons of
           | liability issues here, but this is very impressive
        
             | rfoo wrote:
             | Can't wait for the moment when I can puta single line "Help
             | me put this in the cart" on my product and magically sells
             | better.
        
               | smokel wrote:
               | This Dutch book [1] by Gummbah has the text "Kooptip"
               | imprinted on the cover, which would roughly translate to
               | "Buying recommendation". It worked for me!
               | 
               | [1] https://www.amazon.com/Het-geheim-verdwenen-mysterie-
               | Dutch/d...
        
               | DonHopkins wrote:
               | https://en.wikipedia.org/wiki/Steal_This_Book
        
             | macintux wrote:
             | Just the ability to distinguish bills would be hugely
             | helpful, although I suppose that's much less of a problem
             | these days with credit cards and digital payment options.
        
             | krainboltgreene wrote:
             | > Imagine having a guide tell you how to walk to the
             | grocery store
             | 
             | I don't need to imagine that, I've had it for about 8
             | years. It's OK.
             | 
             | > help you grocery shop without an assistant
             | 
             | Isn't this something you learn as a child? Is that a thing
             | we need automated?
        
               | jameshart wrote:
               | OP specified they were imaging this for _low vision
               | people_
        
               | krainboltgreene wrote:
               | I'm aware, I'm one of those people.
        
               | bombcar wrote:
               | Does it give you voice instructions based on what it
               | _knows_ or is it actively watching the environment and
               | telling you things like  "light is red, car is coming"?
        
               | jaggederest wrote:
               | I assume it likes snacks, is quadrupedal, and does not
               | have the proper mouth anatomy or diaphragm for human
               | speech.
        
             | ninininino wrote:
             | just need the helmet https://openai.com/index/be-my-eyes/
        
             | JieJie wrote:
             | We're planning on getting a phone-carrying lanyard and she
             | will just carry her phone around her neck with Be My Eyes^0
             | looking out the rear camera, pointed outward. She's
             | DeafBlind, so it'll be bluetoothed to her hearing aids, and
             | she can interact with the world through the conversational
             | AI.
             | 
             | I helped her access the video from the presentation, and it
             | brought her to tears. Now, she can play guitar, and the AI
             | and her can write songs and sing them together.
             | 
             | This is a big day in the lives of a lot of people whom
             | aren't normally part of the conversation. As of today, they
             | are.
             | 
             | 0: https://www.bemyeyes.com/
        
               | 999900000999 wrote:
               | That's definitely cool!
               | 
               | Eventually it would be better for these models to run
               | locally from a security point if view, but this is a
               | great first step.
        
               | JieJie wrote:
               | Absolutely. We're looking forward to Apple's
               | announcements at WWDC this year, which analysts predict
               | are right up that alley.
        
             | silverquiet wrote:
             | It sounds like the system that Marshall Brain envisioned in
             | his novella, Manna.
        
               | jaggederest wrote:
               | That story has always been completely reasonable and
               | plausible to me. Incredible foresight. I guess I should
               | start a midlevel management voice automation company.
        
           | bjtitus wrote:
           | Is it possible to use this as a TTS model? I noticed on the
           | announcement post that this is a single model as opposed to a
           | text model being piped to a separate TTS model.
        
           | cchance wrote:
           | This is damn near one of the most impressive things, can only
           | imagine especially with live translation and voice synthesis
           | (eleven labs style) you'd be capable of to integrate with
           | something like teams (select each persons language and do
           | realtime translation to each persons native language, with
           | their own voice and intonations would NUTS)
        
             | purplerabbit wrote:
             | There's so much pent up collaborative human energy trapped
             | behind language barriers.
             | 
             | Beautiful articulation.
             | 
             | This is an enormous win for humanity.
        
           | rane wrote:
           | Will the new voice mode allow mixing languages in sentences?
           | 
           | As a language learner, this would be tremendously useful.
        
           | j-krieger wrote:
           | I've always been wondering what GPT models lack that makes
           | them "query->response" only. I've always tried to get
           | chatbots to lose the initially needed query, with no avail.
           | What would It take to get a GPT model to freely generate
           | tokens in a thought like pattern? I think when I'm alone
           | without query from another human. Why can't they?
        
             | kolinko wrote:
             | Just provide empty queey and that's it - it will generate
             | tokens no prob.
             | 
             | You can use any open source model wirthout any promot
             | whatsoever
        
           | ALittleLight wrote:
           | In my ChatGPT app or on the website I can select GPT-4o as a
           | model, but my model doesn't seem to work like the demo. The
           | voice mode is the same as before and the images come from
           | DALLE and ChatGPT doesn't seem to understand or modify them
           | any better than previously.
        
           | jacobsimon wrote:
           | I couldn't quite tell from the announcement, but is there
           | still a separate TTS step, where GPT is generating
           | tones/pitches that are to be used, or is it completely end to
           | end where GPT is generating the output sounds directly?
        
             | derac wrote:
             | It's one model with text/audio/image input and output.
        
         | og_kalu wrote:
         | >The most impressive part is that the voice uses the right
         | feelings and tonal language during the presentation.
         | 
         | Consequences of audio2audio (rather than audio >text
         | text>audio). Being able to manipulate speech nearly as well as
         | it manipulates text is something else. This will be a
         | revelation for language learning amongst other things. And you
         | can interrupt it freely now!
        
           | pants2 wrote:
           | However, this looks like it only works with speech - i.e. you
           | can't ask it, "What's the tune I'm humming?" or "Why is my
           | car making this noise?"
           | 
           | I could be wrong but I haven't seen any non-speech demos.
        
             | cube2222 wrote:
             | Fwiw, the live demo[0] included different kinds of
             | breathing, and getting feedback on it.
             | 
             | [0]: https://youtu.be/DQacCB9tDaw?t=557
        
             | throwaway11460 wrote:
             | What about the breath analysis?
        
               | pants2 wrote:
               | I did see that, though my interpretation is that
               | breathing is included in its voice tokenizer which helps
               | it understand emotions in speech (the AI can generate
               | breath sounds after all). Other sounds, like bird songs
               | or engine noises, may not work - but I could be wrong.
        
               | CooCooCaCha wrote:
               | I suspect that like images and video, their audio system
               | is or will become more general purpose. For example it
               | can generate the sound of coins falling onto a table.
        
           | jcims wrote:
           | Anyone who has used elevenlabs for voice generation has found
           | this to be the case. Voice to voice seems like magic.
        
             | dyauspitr wrote:
             | Elevenlabs isn't remotely close to how good this voice
             | sounds. I've tried to use it extensively before and it just
             | isn't natural. This voice from openAI and even the one
             | chatGPT has been using is _natural_.
        
           | twobitshifter wrote:
           | I asked it to make a bird noise, instead it told me what a
           | bird sounds like with words. True audio to audio should be
           | able to be any noise, a trombone, traffic, a crashing sea,
           | anything. Maybe there is a better prompt there but it did not
           | seem like it.
        
             | og_kalu wrote:
             | The new voice mode has not rolled out yet. It's rolling out
             | to plus users in the next couple weeks.
             | 
             | Also it's possible this is trained on mostly speech.
        
         | bredren wrote:
         | I mention this down thread, but a symptom of a tech product of
         | sufficient advancement is the nature of its introduction
         | matters less and less.
         | 
         | Based on the casual production of these videos, the product
         | must be this good.
         | 
         | https://news.ycombinator.com/item?id=40346002
        
         | simonw wrote:
         | That was very impressive, but it doesn't surprise me much given
         | how good the voice mode is in the ChatGPT iPhone app is
         | already.
         | 
         | The new voice mode sounds better, but the current voice mode
         | did also have inflection that made it feel much more natural
         | than most computer voices I've heard before.
        
           | Jensson wrote:
           | Can you tell the current voice model what feelings and tone
           | it should communicate with? If not it isn't even comparable,
           | being able to control how it reads things is absolutely
           | revolutionary, that is what was missing from using these AI
           | models as voice actors.
        
             | simonw wrote:
             | No you can't, at least not directly - you can influence the
             | tone it uses a little through the content you ask it to
             | read.
             | 
             | Being able to specifically request different tones is a new
             | and very interesting feature.
        
               | ecosystem wrote:
               | +1. Check the demo video in OP titled "Sarcasm". Human
               | asks GPTo to speak "dripping in sarcasm". The tone that
               | comes back is spot on. Comparing that against current
               | voice model is a total sea change.
        
           | bredren wrote:
           | The voice mode was quite good but the latency and start /
           | stop has been encumbering.
        
           | duckmysick wrote:
           | Slight off-topic, but I noticed you've updated your llm CLI
           | app to work with the 4o model (plus bunch of other APIs
           | through plugins). Kudos for working extremely fast. I'm
           | really grateful for your tool; I tried many others, but for
           | some reason none clicked as much as your.
           | 
           | Link in case other readers are curious:
           | https://llm.datasette.io
        
         | newzisforsukas wrote:
         | Right to who? To me, the voice sounds like an over enthusiastic
         | podcast interviewer. Whats wrong with wanting computers to
         | sound like what people think computers should sound like?
        
           | Jensson wrote:
           | It understands tonal language, you can tell it how you want
           | it to talk, I have never seen a model like that before. If
           | you want it to talk like a computer you can tell it to, they
           | did it during the presentation, that is so much better than
           | the old attempts at solving this.
        
             | sitkack wrote:
             | You are a Zoomer sosh meeds influencer, please increase
             | uptalk by 20% and vocal fry by 30%. Please inject slaps,
             | "is dope" and nah and bra into your responses. Throw shade
             | every 11 sentences.
        
               | airstrike wrote:
               | I'm not sure whether to laugh or cry...
        
           | navigate8310 wrote:
           | > voice sounds like an over enthusiastic podcast interviewer
           | 
           | I believe it can be toned down using system prompts, which
           | they'll expose in future iterations
        
             | TacticalCoder wrote:
             | As in the _Interstellar_ movie:                   chuckling
             | to 0%              no acting surprised              not
             | making bullshit when you don't know
        
               | sangnoir wrote:
               | > not making bullshit when you don't know
               | 
               | LLMs today have no concept of epistemology, they don't
               | ever "know" and are always making up bullshit, which
               | usually is more-or-less correct as a side effect of
               | minimizing perplexity.
        
           | tr3ntg wrote:
           | Right... enthusiastic and generally confused. It's uncanny
           | valley level expressions. Still better than drab, monotonous
           | speech though.
        
             | eloisant wrote:
             | So far I prefer the neutral tone of Alexa/Google Assistant.
             | I like computers to feel like computers.
             | 
             | It seems like we're in the skeuomorphism phase of AI where
             | tools try to mimic humans like software tried mimic
             | physical objects in the early 2000's.
             | 
             | I can't wait for us to be passed that phase.
        
               | px43 wrote:
               | Then you can tell it to do that. It will use whatever
               | intonations you prefer.
        
           | kybernetikos wrote:
           | Genuine People Personalities(tm), just like in Hitchikers.
           | Perhaps one of the milder forms of 'We Created The Torment
           | Nexus'.
        
           | angryasian wrote:
           | agree I don't get it. I just want the right information and
           | explained well. I don't want to be social with a robot.
        
           | Keyframe wrote:
           | It's a computer from the valley.
        
         | mvkel wrote:
         | I was in the audience at the event. The only parts where it
         | seemed to get snagged was hearing the audience reaction as an
         | interruption. Which honestly makes the demo even better. It
         | showed that hey, this is live.
         | 
         | Magic.
        
           | px43 wrote:
           | I wonder when it will be able to understand that there is
           | more than one human talking to it. It seems like even in
           | today's demo if two people are talking, it can't tell them
           | apart.
        
         | ta-run wrote:
         | Crazy that interruption also seems to work pretty smoothly
        
         | nabakin wrote:
         | Seems about as good as Azure's Speech Service. I wonder if
         | that's what they are using behind the scenes
        
         | Keyframe wrote:
         | Somehow it also sounds almost like Dot Matrix from Spaceballs.
        
         | burntalmonds wrote:
         | Yeah, the female voice especially is really impressive in the
         | demos. The voice always sounds natural. The male voice I heard
         | wasn't as good. It wasn't terrible, but it had a somewhat
         | robotic feel to it.
        
         | Intralexical wrote:
         | "Right" feelings and tonal language? "Right" for what? For
         | _whom_?
         | 
         | We've already seen how much damage dishonest actors can do by
         | manipulating our text communications with words they don't
         | mean, plans they don't intend to follow through on, and
         | feelings they don't experience. The social media disinfo age
         | has been bad enough.
         | 
         | Are you sure you want a machine which is able to manipulate our
         | emotions on an even more granular and targetted level?
         | 
         | LLMs are still machines, designed and deployed by humans to
         | perform a task. What will we miss if we anthropomorphize the
         | product itself?
        
       | modeless wrote:
       | As far as I'm concerned this is the new best demo of all time.
       | This is going to change the world in short order. I doubt they
       | will be ready with enough GPUs for the demand the voice+vision
       | mode is going to get, if it's really released to all free users.
       | 
       | Now imagine this in a $16k humanoid robot, also announced this
       | morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future
       | is going to be wild.
        
         | andy99 wrote:
         | Really? If this was Apple it might make sense, for OpenAI it
         | feels like a demo that's not particularly aligned with their
         | core competency (a least by reputation) of building the most
         | performant AI models. Or put another way, it says to me they're
         | done building models and are now wading into territory where
         | there are strong incumbents.
         | 
         | All the recent OpenAI talk had me concerned that the tech has
         | peaked for now and that expectations are going to be reset.
        
           | modeless wrote:
           | What strong incumbents are there in conversational voice
           | models? Siri? Google Assistant? This is in a completely
           | different league. I can see from the reaction here that
           | people don't understand. But they will when they try it.
           | 
           | Did you see it translate Italian? Have you ever tried the
           | Google Translate/Assistant features for real time
           | translation? They didn't train it to be a translator. They
           | didn't make a translation feature. They just asked it. It's
           | instantly better than every translation feature Google ever
           | released.
        
             | fidotron wrote:
             | In common with Siri, Google Assistant, Alexa and chatgpt is
             | the perception that over time the same thing actually gets
             | worse.
             | 
             | Whether it's real or not is a reasonably interesting
             | question, because it's possible that all that occurs with
             | the progress is our perception of how things should be
             | advances. My gut feeling is it has been a bit of both
             | though, in the sense the decline is real, and we expect
             | things to improve.
             | 
             | Who can forget Google demoing their AI making a call to a
             | restaurant that they showed at I/O many years ago?
             | Everyone, apparently.
        
           | golol wrote:
           | What Openai has done time and time again is completely change
           | the landscape when the competitors have caught up and
           | everyone thinks their lead is gone. They made image
           | generation a thing. When GPT-3 became outdated they released
           | ChatGPT. Instead of trying to keep Dalle competitive they
           | released Sora. Now they change the game again with live
           | audio+video.
        
         | 10xDev wrote:
         | The future is not going to be anymore wild than what you choose
         | to do with the tech.
        
           | modeless wrote:
           | I disagree completely. Even people who never adopt this stuff
           | personally will have their lives profoundly impacted. The
           | only way to avoid it would be to live in a large colony where
           | the technology is prohibited, like the Amish. But even the
           | Amish feel the influence of technology to some degree.
        
       | ilaksh wrote:
       | This is so amazing.. are there any open source models that are in
       | any way comparable? Fully multimodal audio-to-audio etc.?
        
       | skilled wrote:
       | Parts of the demo were quite choppy (latency?) so this definitely
       | feels rushed in response to Google I/O.
       | 
       | Other than that, looks good. Desktop app is great, but I didn't
       | see no mention of being able to use your own API key so OS
       | projects might still be needed.
       | 
       | The biggest thing is bringing GPT-4 to free users, that is an
       | interesting move. Depending on what the limits are, I might
       | cancel my subscription.
        
         | Jordan-117 wrote:
         | Seems like it was picking up on the audience reaction and
         | stopping to listen.
         | 
         | To me the more troubling thing was the apparent hallucination
         | (saying it sees the equation before he wrote it, commenting on
         | an outfit when the camera was down, describing a table instead
         | of his expression), but that might have just been latency
         | awkwardness. Overall, the fast response is extremely
         | impressive, as is the new emotional dimension of the voice.
        
           | sebastiennight wrote:
           | Aha, I think I saw the trick for the live demo: every time
           | they used the "video feed", they did prompt the model
           | specifically by saying:
           | 
           | - "What are you seeing now"
           | 
           | - "I'm showing this to you now"
           | 
           | etc.
           | 
           | The one time where he didn't prime the model to take a
           | snapshot this way, was the time where the model saw the
           | "table" (an old snapshot, since the phone was on the
           | table/pointed at the table), so that might be the reason.
        
             | tedsanders wrote:
             | Yeah, the way the app currently works is that ChatGPT-4o
             | only sees up to the moment of your last comment.
             | 
             | For example, I tried asking ChatGPT-4o to commentate a
             | soccer game, but I got pretty bad hallucinations, as the
             | model couldn't see any new video come in after my
             | instruction.
             | 
             | So when using ChatGPT-4o you'll have to point the camera
             | first and then ask your question - it won't work to first
             | ask the question and then point the camera.
             | 
             | (I was able to play with the model early because I work at
             | OpenAI.)
        
               | 152334H wrote:
               | thanks
        
           | ayhanfuat wrote:
           | Commenting on the outfit was very weird indeed. Greg
           | Brockman's demo includes some outfit related questions
           | (https://twitter.com/gdb/status/1790071008499544518). It does
           | seem very impressive though, even if they polished it on some
           | specific tasks. I am looking forward to showing my desktop
           | and asking questions.
        
         | tailspin2019 wrote:
         | Regarding the limits, I recently found that I was hitting
         | limits very quickly on GPT-4 on my ChatGPT Plus plan.
         | 
         | I'm pretty sure that wasn't always the case - it feels like
         | somewhere along the lines the allowed usage was reduced, unless
         | I'm imagining it. It wouldn't be such a big deal if there was
         | more visibility of my current usage compared to my total
         | "allowance".
         | 
         | I ended up upgrading to ChatGPT Team which has a minimum of 2x
         | users (I now use both accounts) but I resented having to do
         | this - especially being forced to pay for two users just to
         | meet their arbitrary minimum.
         | 
         | I feel like I should not be hitting limits on the ChatGPT Plus
         | paid plan at all based on my usage patterns.
         | 
         | I haven't hit any limits on the Team plan yet.
         | 
         | I hope they continue to improve the paid plans and become a bit
         | more transparent about usage limits/caps. I really do not mind
         | paying for this (incredible) tech, but the way it's being sold
         | currently is not quite right and feels like paid users get a
         | bit of a raw deal in some cases.
         | 
         | I have API access but just haven't found an open source client
         | that I like using as much as the native ChatGPT apps yet.
        
           | emporas wrote:
           | I use GPT from API in emacs, it's wonderful. Gptel is the
           | program.
           | 
           | Although API access through Groq to Llama 3 (8b and 70b) is
           | so much faster, that i cannot stand how slow GPT is anymore.
           | It is slooow, still very capable model, but marginally better
           | than open source alternatives.
        
             | Boss0565 wrote:
             | you should try -4o. It's incredibly fast
        
               | emporas wrote:
               | Yes, of course, probably sometime in the following days.
               | Some people mention it already works in the playground.
               | 
               | I was wondering why OpenAI didn't release a smaller model
               | but faster. 175 billion parameters works well, but speed
               | sometimes is crucial. Like, a 20b parameters model could
               | compute 10x faster.
        
               | Boss0565 wrote:
               | true. at least rn though, it types around the same speed
               | of 3.5 turbo
        
               | coder543 wrote:
               | Have you tried groq.com? Because I don't think gpt-4o is
               | "incredibly" fast. I've been frustrated at how slow
               | gpt-4-turbo has been lately, and gpt-4o just seems to be
               | "acceptably" fast now, which is a big improvement, but
               | still, not groq-level.
        
         | Jensson wrote:
         | > Parts of the demo were quite choppy (latency?) so this
         | definitely feels rushed in response to Google I/O.
         | 
         | It just stops the audio feed when it detects sound instead of
         | an AI detecting when it should speak, so that part is horrible
         | yeah. A full AI conversation would detect the natural pauses
         | where you give it room to speak or when you try to take the
         | word from it by interrupting, there it was just some dumb
         | script to just shut it off when it hears sound.
         | 
         | But it is still very impressive for all the other part, that
         | voice is really good.
         | 
         | Edit: If anyone from OpenAI reads this, at least fade out the
         | voice quickly instead of chopping it, hard chopping off audio
         | doesn't sound good at all, so many experienced this
         | presentation to be extremely buggy due to it.
        
         | dharma1 wrote:
         | what's the download link for the desktop app? can't find it
        
           | mpeg wrote:
           | seems like it might not be available for everyone? - my
           | chatgpt plus doesn't show anything new, and also can't find
           | the dekstop app
        
         | russdill wrote:
         | They need to fade the audio or add some vocal queue when it's
         | being interrupted. It makes it sound like it's losing
         | connection. What'll be really impressive is when it
         | intentionally starts interrupting you.
        
           | aantix wrote:
           | Agree. While watching the demo video, I thought I was the one
           | having connectivity issues.
        
       | syntaxing wrote:
       | I admit I drink the koolaid and love LLMs and their applications.
       | But damn, the way it's responds in the demo gave me goosebumps in
       | a bad way. Like an uncanny valley instincts kicks in.
        
         | _Parfait_ wrote:
         | You're watching the species be reduced to an LLM.
        
           | warkdarrior wrote:
           | Were humans an interesting species to start with, if they can
           | be reduced to an LLM?
        
             | throw310822 wrote:
             | Yeah, maybe not, and what do you make of it? Now that the
             | secret sauce has been revealed and it's nothing but the
             | right proportions of the same old ingredients?
        
             | Intralexical wrote:
             | The reduction is not a lossless process.
        
           | dougb5 wrote:
           | Hey that LLM is trained on everything we've ever produced, so
           | I wouldn't say we've been "reduced", more like copied. I'll
           | save my self-loathing for when a very low-parameter model can
           | do this.
        
             | jimkleiber wrote:
             | I just don't know if everything we've ever (in the digital
             | age) produced and how it is being weighted by current
             | cultural values will help us or hurt us more. I don't fully
             | know how LLMs work with the weighting, I just imagine that
             | there are controls and priorities put on certain values
             | more than others and I just wonder how future generations
             | will look back at our current priorities.
        
         | TheSockStealer wrote:
         | I also thought the screwups, although minor, were interesting.
         | Like when it thought his face was a desk because it did not
         | update the image it was "viewing". It is still not perfect,
         | which made the whole thing more believable.
        
           | mike00632 wrote:
           | I was shocked at how quickly and naturally they were able to
           | correct the situation.
        
         | bbconn wrote:
         | Yeah it made me realize that I actually don't want a human-like
         | conversational bot (I have actual humans for that). Just teach
         | me javascript like a robot.
        
           | bamboozled wrote:
           | Maybe it's the geek in me, but I don't want a talking
           | computer.
           | 
           | I have enough talking people to deal with already .
        
             | SoftTalker wrote:
             | I've worked in software and tech my whole life and there
             | are few things I dislike more than talking to a computer.
             | 
             | I don't use siri. I don't use speech-to-text. I don't use
             | voice-response menus if I can push a button. I don't have a
             | microphone on my computer.
             | 
             | I don't know why this is. Most of the people I know think
             | it's fun, or a novelty, or even useful. I just viscerally
             | dislike it.
        
         | wslack wrote:
         | It should do that, because it's still not actually an
         | intelligence. It's a tool that is figuring out what to say in
         | response that sounds intelligent - and will often succeed!
        
           | moffkalast wrote:
           | That kind of _is_ an inteligence though. Chinese room meets
           | solipsism and all that.
           | 
           | It is interesting how insanely close their demo is to the
           | OSes in the movie "Her", it's basically a complete real life
           | reproduction.
        
           | yCombLinks wrote:
           | Welcome to half the people at your companies job.
        
             | Intralexical wrote:
             | And do you want _more_ of that?
        
           | HeatrayEnjoyer wrote:
           | It's more intelligent than many humans and most/all lesser
           | animals. If it's not intelligent than I don't know what is.
        
         | isurujn wrote:
         | The chuckling made me uneasy for some reason lol. Calm down,
         | you're not like us. Don't pretend!
        
           | moffkalast wrote:
           | Can't wait for Meta's version 2 years down the line that
           | someone will eventually fine tune to Agent Smith's
           | personality and voice.
           | 
           | "Evolution, human. Evolution. Like the dinosaur. Look out
           | that window. You've had your time. The future is our world.
           | The future is our time."
        
         | unsupp0rted wrote:
         | Yes, the chuckling was uncanny, but for me even more uncanny
         | was how the female model went up at the end to soften what she
         | was saying? into a question? even though it wasn't a question?
         | 
         | Eerily human female-like.
        
         | drivers99 wrote:
         | So I'm not the only one. Like I felt fear in a physical way.
         | (Panic/adrenaline?) I'm sure I'd get used it but it was an
         | interesting reaction. (I saw someone react that way to a
         | talking Tandy 1000 once so, who knows.)
        
       | hubraumhugo wrote:
       | The movie Her has just become reality
        
         | speedgoose wrote:
         | It's getting closer. A few years ago the old Replika AI was
         | already quite good as a romantic partner, especially when you
         | started your messages with a * character to force OpenAI GPT-3
         | answers. You could do sexting that OpenAI will never let you
         | have nowadays with ChatGPT.
        
           | aftbit wrote:
           | Why does OpenAI think that sexting is a bad thing? Why is AI
           | safety all about not saying things that are disturbing or
           | offensive, rather than not saying things that are false or
           | unaligned?
        
         | volleygman180 wrote:
         | I was surprised that the voice is a ripoff of the AI voice in
         | that movie (Scarlett Johansson) too
        
           | toxic72 wrote:
           | I am suspicious that they licensed Scarlet's voice for that
           | voice model (Sky IIRC)
        
         | reducesuffering wrote:
         | People realize where we're headed right? Entire human lives in
         | front of a screen. Your online entertainment, your online job,
         | your online friends, your online "relationship". Wake up, 12
         | hours screentime, eat food, go to bed. Depression and drug
         | overdoses currently at sky high levels. Shocker.
        
           | emporas wrote:
           | If i can program with just my voice, there is no reason to
           | not be in nature 10 hours a day minimum. My grandparent even
           | slept outside as long as it was daytime.
           | 
           | Daytime is always a time to be outside, surrounded by many
           | plants and stuff. It is a shame we have to be productive in
           | some way, and most of production happens inside walls.
        
             | lm28469 wrote:
             | You're already twice as productive as your parents which
             | were twice as productive as their parents.
             | 
             | We should ask where the money went instead of thinking
             | about telepathically coding from the woods
        
               | emporas wrote:
               | When it comes to the economy, some monkey business is
               | going on, but i think you can be more optimistic about
               | the capabilities technology like that unlocks for
               | everyone on the planet.
               | 
               | Being able to control machines just with our voice, we
               | can instruct robots to bake food for us. Or lay bricks on
               | a straight line and make a house. Or write code,
               | genetically modify organisms and make nutritionally dense
               | food to become 1000x smarter or stronger.
               | 
               | There has to be some upsides, even though for the moment
               | the situation with governments, banks, big corporations,
               | military companies etc is not as bright as one would hope
               | to be.
        
           | tr3ntg wrote:
           | Headed? We're there. Have been there. This just adds non-
           | human sentient agents to the drama.
        
       | hmmmhmmmhmmm wrote:
       | With the news that Apple and OpenAI are closing / just closed a
       | deal for iOS 18, it's easy to speculate we might be hearing about
       | that exciting new model at WWDC...
        
         | thefourthchime wrote:
         | Yes, i'm pretty sure this is the new Siri. Absolutely amazing,
         | it's pretty much "here" from the movie.
        
       | chatcode wrote:
       | Parsing emotions in vocal inflections (and reliably producing
       | them in vocal output) seems quite under-hyped in this release.
       | 
       | That seems to represent an entirely new depth of understanding of
       | human reality.
        
         | deadbabe wrote:
         | Any appearance of understanding is just an illusion. It's an
         | LLM, nothing more.
        
           | chatcode wrote:
           | Sure, but that seems like it'll be a distinction without a
           | difference for many use cases.
           | 
           | Having a reliable emotional model of a person based on their
           | voice (or voice + appearance) can be useful in a thousand
           | ways.
           | 
           | Which seems to represent a new frontier.
        
             | deadbabe wrote:
             | It's sad that I get downvoted so easily just for saying the
             | truth. People's beliefs about AI here seems to approach
             | superstition rather than anything based in computer
             | science.
             | 
             | These LLM are nothing more than really big spreadsheets.
        
               | hombre_fatal wrote:
               | Or most of us know the difference between reductiveness
               | and insightfulness.
               | 
               | "Um it's just a big spreadsheet" just isn't good
               | commentary and reminds me of people who think being
               | unimpressed reveals some sort of chops about them, as if
               | we might think of them as the Simon Cowell of tech
               | because they bravely reduced a computer to an abacus.
        
               | deadbabe wrote:
               | Hyping things up with magical thinking isn't great
               | either.
        
               | chpatrick wrote:
               | Isn't that what you're doing with the magic human
               | understanding vs the fake machine understanding?
        
           | chpatrick wrote:
           | Any appearance of understanding is just an illusion. It's
           | just a pile of meat, nothing more.
        
             | mike00632 wrote:
             | Does anyone else, when writing comments, feel that you need
             | to add a special touch to somehow make it clear that a
             | human wrote it?
        
       | rvz wrote:
       | Given that they are moving all these features to free users, it
       | tells us that GPT-5 is around the corner and is significantly
       | much better than their previous models.
        
         | margorczynski wrote:
         | Or maybe it is a desperation move after Llama 3 got released
         | and the free mode will have such tight constraints that it will
         | be unusable for anything a bit more serious.
        
       | PoignardAzur wrote:
       | Holy crap, the level of corporate cringe of that "two AIs talk to
       | each other" scene is mind-boggling.
       | 
       | It feels like a pretty strong illustration of the awkwardness of
       | getting value from recent AI developments. Like, this is
       | technically super impressive, but also I'm not sure it gives us
       | anything we couldn't have one year ago with GPT-4 and ElevenLabs.
        
       | sourcecodeplz wrote:
       | It is quite nice how they keep giving premium features for free,
       | after a while. I know openai is not open and all but damn, they
       | do give some cool freebies.
        
       | BoumTAC wrote:
       | Did they provide the limit rate for free user ?
       | 
       | Because I have the plus membership which is expensive
       | (25$/month).
       | 
       | But if the limit is high enough (or my usage low enough), there
       | is no point for paying that much money for me.
        
       | christianqchung wrote:
       | Does anyone know how they're doing the audio part where Mark
       | breaths too hard? Does his breathing get turned into all-caps
       | text (AA EE OO) and that GPT4-o interprets that as him breathing
       | too hard, or is there something more going on?
        
         | GalaxyNova wrote:
         | It can natively interpret voice now.
        
         | Jordan-117 wrote:
         | That's how it used to do it, but my understanding is that this
         | new model processes audio directly. If it were a music
         | generator, the original would have generated sheet music to
         | send to a synthesizer (text to speech), while now it can create
         | the raw waveform from scratch.
        
         | modeless wrote:
         | There is no text. The model understands ingests audio directly
         | and also outputs audio directly.
        
           | dclowd9901 wrote:
           | Is it a stretch to think this thing could accurately "talk"
           | with animals?
        
             | jamilton wrote:
             | Yes? Why would it be able to do that?
        
               | ninininino wrote:
               | I think they are assuming a world where you took this
               | existing model but it was trained on a dataset of animals
               | making noises to each other, so that you could then feed
               | the trained model the vocalization of one animal and the
               | model would be able to produce a continuation of audio
               | that has a better-than-zero chance of being a realistic
               | sound coming from another animal - so in other words, if
               | dogs have some type of bark that encodes a "I found
               | something yummy" message and other dogs tend to have some
               | bark that encodes "I'm on my way" and we're just
               | oblivious to all of that sub-text, then maybe the model
               | would be able to communicate back and forth with an
               | animal in a way that makes "sense" to the animal.
               | 
               | Probably substitute dogs for chimps though.
               | 
               | But obviously that doesn't solve at all or human-
               | understandability, unless maybe you have it all as
               | audio+video and then ask the model to explain what visual
               | often accompanies a specific type of audio? Maybe the
               | model can learn what sounds accompany violence or
               | accompany the discovery of a source of water or
               | something?
        
               | dclowd9901 wrote:
               | Yep, exactly what brought that to mind. Multimodal seems
               | like the kind of thing needed for such a far-fetched
               | idea.
        
             | benlivengood wrote:
             | Not really a stretch in my mind.
             | https://www.earthspecies.org/ and others are working on it
             | already.
        
       | crindy wrote:
       | Very impressed by the demo where it starts speaking French in
       | error, then laughs with the user about the mistake. Such a
       | natural recovery.
        
       | spacebanana7 wrote:
       | > We recognize that GPT-4o's audio modalities present a variety
       | of novel risks
       | 
       | > For example, at launch, audio outputs will be limited to a
       | selection of preset voices and will abide by our existing safety
       | policies.
       | 
       | I wonder if they'll ever allow truly custom voices from audio
       | samples.
        
         | dkasper wrote:
         | I think the issue there is less of a technical one and more of
         | an issue with deepfakes and copyright
        
           | spacebanana7 wrote:
           | It might be possible to prove that I control my voice, or
           | that of a given audio sample. For example by saying specific
           | words on demand.
           | 
           | But yeah I see how they'd be blamed if anything went wrong,
           | which it almost certainly would in some cases.
        
       | tomComb wrote:
       | The price of 4o is 50% of GPT4-Turbo (and no mention of price
       | change to gp4-turbo itself).
       | 
       | Given the competitive pressures I was expecting a much bigger
       | price drop than that.
       | 
       | For non-multimodal uses, I don't think their API is at all
       | competitive any more.
        
         | mrklol wrote:
         | Where you get something cheaper with similar experience?
        
       | lagt_t wrote:
       | Universal real time translation is incredibly dope.
       | 
       | I hate video players without volume control.
        
       | pachico wrote:
       | jeez, that model really speaks a lot! I hope there's a way to
       | make it more straight to the point rather than radio-like.
        
       | causal wrote:
       | Clicking the "Try it on ChatGPT" link just takes me to GPT-4 chat
       | window. Tried again in an incognito tab (supposing my account is
       | the issue) and it just takes me to 3.5 chat. Anyone able to use
       | it?
        
         | 101008 wrote:
         | Same here and also I can't hear audio in any of the videos on
         | this page. Weird.
        
       | TrueDuality wrote:
       | Weird visiting the page crashed my graphics driver using Firefox.
        
       | msoad wrote:
       | They are admitting[1] that the new model is the gpt2-chatbot that
       | we have seen before[2]. As many highlighted there, the model is
       | not an improvement like GPT3->GPT4. I tested a bunch of
       | programming stuff and it was not that much better.
       | 
       | It's interesting that OpenAI is highlighting the Elo score
       | instead of showing results for many many benchmarks that all
       | models are stuck at 50-70% success.
       | 
       | [1] https://twitter.com/LiamFedus/status/1790064963966370209
       | 
       | [2] https://news.ycombinator.com/item?id=40199715
        
         | modeless wrote:
         | "not that much better" is extremely impressive, because it's a
         | much smaller and much faster model. Don't worry, GPT-5 is
         | coming and it _will_ be better.
        
           | TIPSIO wrote:
           | Obviously given enough time there will always be better
           | models coming.
           | 
           | But I am not convinced it will be another GPT-4 moment. Seems
           | like big focus on tacking together multi-modal clever tricks
           | vs straight better intelligence AI.
           | 
           | Hope they prove me wrong!
        
             | kmeisthax wrote:
             | The problem with "better intelligence" is that OpenAI is
             | running out of human training data to pillage. Training AI
             | on the output of AI smooths over the data distribution, so
             | all the AIs wind up producing same-y output. So OpenAI
             | stopped scraping text back in 2021 or so - because that's
             | when the open web turned into an ocean of AI piss. I've
             | heard rumors that they've started harvesting closed
             | captions out of YouTube videos to try and make up the
             | shortfall of data, but that seems like a way to stave off
             | the inevitable[0].
             | 
             | Multimodal is another way to stave off the inevitable,
             | because these AI companies already are training multiple
             | models on different piles of information. If you have to
             | train a text model and an image model, why split your
             | training data in half when you could train a combined model
             | on a combined dataset?
             | 
             | [0] For starters, most YouTube videos aren't manually
             | captioned, so you're feeding GPT the output of Google's
             | autocaptioning model, so it's going to start learning
             | artifacts of what that model can't process.
        
               | pbhjpbhj wrote:
               | >harvesting closed captions out of YouTube videos
               | 
               | I'd bet a lot of YouTubers are using LLMs to write and/or
               | edit content. So we pass that through a human
               | presentation. Then introduce some errors in the form of
               | transcription. Turn feed the output in as part of a
               | training corpus ... we plateaued real quick.
               | 
               | It seems like it's hard to get past a level of human
               | intelligence at which there's a large enough corpus of
               | training data or trainers?
               | 
               | Anyone know of any papers on breaking this limit to push
               | machine learning models to super-human intelligence
               | levels?
        
               | pixl97 wrote:
               | If a model is average human intelligence in pretty much
               | everything, is that super-human or not? Simply put, we as
               | individuals aren't average at everything, we have what
               | we're good at and a great many things we're not. We
               | average out by looking at broad population trends. That's
               | why most of us in the modern age spend a lot of time on
               | specialization for whatever we work in. Which brings the
               | likely next place for data. A Manna (the story) like data
               | collection program where companies hoover up everything
               | they can on their above average employees till we're to
               | the point most models are well above the human average in
               | most categories.
        
               | WhitneyLand wrote:
               | Why do you think they're using Google auto-captioning?
               | 
               | I would expect they're using their own t2s which is still
               | a model but way better quality and potentially
               | customizable to better suit their needs
        
               | llm_trw wrote:
               | >[0] For starters, most YouTube videos aren't manually
               | captioned, so you're feeding GPT the output of Google's
               | autocaptioning model, so it's going to start learning
               | artifacts of what that model can't process.
               | 
               | Whisper models are better than anything google has. In
               | fact the higher quality whisper models are better than
               | humans when it comes to transcribing text with
               | punctuation.
        
               | marvin wrote:
               | At some point, algorithms for reasoning and long-term
               | planning will be figured out. Data won't be the holy
               | grail forever, and neither will asymptotically
               | approaching human performance in all domains.
        
           | mupuff1234 wrote:
           | And how can one be so sure of that?
           | 
           | Seems to me that performance is converging and we might not
           | see a significant jump until we have another breakthrough.
        
             | scarmig wrote:
             | Yeah. There are lots of things we can do with existing
             | capabilities, but in terms of progressing beyond them all
             | of the frontier models seem like they're a hair's breadth
             | from each other. That is not what one would predict if LLMs
             | had a much higher ceiling than we are currently at.
             | 
             | I'll reserve judgment until we see GPT5, but if it becomes
             | just a matter of who best can monetize existing
             | capabilities, OAI isn't the best positioned.
        
             | diego_sandoval wrote:
             | > Seems to me that performance is converging
             | 
             | It doesn't seem that way to me. But even if it did, video
             | generation also seemed kind of stagnant before Sora.
             | 
             | In general, I think The Bitter Lesson is the biggest factor
             | at play here, and compute power is not stagnating.
        
               | drawnwren wrote:
               | Computer power is not stagnating, but the availability of
               | training data is. It's not like there's a second
               | stackoverflow or reddit to scrape.
        
               | robwwilliams wrote:
               | No: soon the wide wild world itself becomes training
               | data. And for much more than just an LLM. LLM plus
               | reinforcement learning--this is were the capacity of our
               | in silico children will engender much parental anxiety.
        
               | diego_sandoval wrote:
               | Agree.
               | 
               | However, I think the most cost-effective way to train for
               | real world is to train in a simulated physical world
               | first. I would assume that Boston Dynamics does exactly
               | that, and I would expect integrated vision-action-
               | language models to first be trained that way too.
        
               | pixl97 wrote:
               | That's how everyone in robotics is doing these days.
               | 
               | You take a bunch of mo-cap data and simulate it with your
               | robot body. Then as much testing as you can with the
               | robot and feed the behavior back in to the model for fine
               | tuning.
               | 
               | Unitree gives an example of the simulation versus what
               | the robot can do in their latest video
               | 
               | https://www.youtube.com/watch?v=GzX1qOIO1bE
        
               | Animats wrote:
               | This may create a market for surveillance camera data and
               | phone calls.
               | 
               | "This conversation may be recorded and used for training
               | purposes" now takes on a new meaning.
               | 
               | Can car makers sell info from everything that happens in
               | their cars?
        
               | abenga wrote:
               | Well, this is a massively horrifying possibility.
        
               | bigyikes wrote:
               | It isn't clear that we are running out of training data,
               | and it is becoming increasingly clear that AI-generated
               | training data actually works.
               | 
               | For the skeptical, consider that humans can be trained on
               | material created by less intelligent humans.
        
               | rglullis wrote:
               | > humans can be trained on material created by less
               | intelligent humans.
               | 
               | For the skeptics, "AI models" are not intelligent at all
               | so this analogy makes no sense.
               | 
               | You can teach lots of impressive tricks to dogs, but
               | there is no amount of training that will teach them basic
               | algebra.
        
               | diego_sandoval wrote:
               | I don't think training data is the limiting factor for
               | current models.
        
               | emporas wrote:
               | It is a limiting factor, due to diminishing returns. A
               | model trained on double the data, will be 10% better, if
               | that!
               | 
               | When it comes to multi-modality, then training data is
               | not limited, because of many different combinations of
               | language, images, video, sound etc. Microsoft did some
               | research on that, teaching spacial recognition to an LLM
               | using synthetic images, with good results. [1]
               | 
               | When someone states that there are not enough training
               | data, they usually mean code, mathematics, physics,
               | logical reasoning etc. In the open internet right now,
               | there are is not enough code to make a model 10x better,
               | 100x better and so on.
               | 
               | Synthetic data will be produced of course, scarcity of
               | data is the least worrying scarcity of all.
               | 
               | Edit: citation added,
               | 
               | [1] VoT by MS
               | https://medium.com/@multiplatform.ai/microsoft-
               | researchers-p...
        
               | MVissers wrote:
               | Soon these models are cheap enough to learn in the real
               | world. Reduced costs allows for usage at massive scale.
               | 
               | Releasing models to users that where users can record
               | video is more data. Users conversing with AI is also
               | additional data.
               | 
               | Another example is models that code- And then debug the
               | code and learn from that.
               | 
               | This will be anywhere, and these models will learn from
               | anything we do/publish online/discuss. Scary.
               | 
               | Pretty soon- OpenAI will have access to
        
               | wavemode wrote:
               | > video generation also seemed kind of stagnant before
               | Sora
               | 
               | I take the opposite view. I don't think video generation
               | was stagnating at all, and was in fact probably the area
               | of generative AI that was seeing the biggest active
               | strides. I'm highly optimistic about the future
               | trajectory of image and video models.
               | 
               | By contrast, text generation has not improved
               | significantly, in my opinion, for more than a year now,
               | and even the improvement we saw back then was relatively
               | marginal compared to GPT-3.5 (that is, for most day-to-
               | day use cases we didn't really go from "this model can't
               | do this task" to "this model can now do this task". It
               | was more just "this model does these pre-existing tasks,
               | in somewhat more detail".)
               | 
               | If OpenAI really is secretly cooking up some huge
               | reasoning improvements for their text models, I'll eat my
               | hat. But for now I'm skeptical.
        
               | Eisenstein wrote:
               | > By contrast, text generation has not improved
               | significantly, in my opinion, for more than a year now
               | 
               | With less than $800 worth of hardware including
               | everything but the monitor, you can run an open weight
               | model more powerful than GPT 3.5 locally, at around 6 -
               | 7T/s[0]. I would say that is a huge improvement.
               | 
               | [0] https://www.reddit.com/r/LocalLLaMA/comments/1cmmob0/
               | p40_bui...
        
             | aantix wrote:
             | The use of AI in the research of AI accelerates everything.
        
               | thefaux wrote:
               | I'm not sure of this. The jury is still out on most ai
               | tools. Even if it is true, it may be in a kind of strange
               | reverse way: people innovating by asking what ai can't do
               | and directing their attention there.
        
               | bigyikes wrote:
               | There is an increasing amount of evidence that using AI
               | to train other AI is a viable path forward. E.g. using
               | LLMs to generate training data or tune RL policies
        
           | talldayo wrote:
           | Chalmers: "GPT-5? A vastly-improved model that somehow
           | reduces the compute overhead while providing better answers
           | with the same hardware architecture? At this time of year? In
           | this kind of market?"
           | 
           | Skinner: "Yes."
           | 
           | Chalmers: "May I see it?"
           | 
           | Skinner: "No."
        
             | pwdisswordfishc wrote:
             | Incidentally, this dialogue works equally well, if not
             | better, with David Chalmers versus B.F. Skinner, as with
             | the Simpsons characters.
        
             | AaronFriel wrote:
             | It has only been a little over one year since GPT-4 was
             | announced, and it was at the time the largest and most
             | expensive model ever trained. It might still be.
             | 
             | Perhaps it's worth taking a beat and looking at the
             | incredible progress in that year, and acknowledge that
             | whatever's next is probably "still cooking".
             | 
             | Even Meta is still baking their 400B parameter model.
        
               | bamboozled wrote:
               | Legit love progress
        
               | 1024core wrote:
               | As Altman said (paraphrasing): GPT-4 is the _worst_ model
               | you will ever have to deal with in your life (or
               | something to that effect).
        
               | andrepd wrote:
               | I will believe it when I see it. People like to point at
               | the first part of a logistic curve and go "behold! an
               | exponential".
        
               | nwienert wrote:
               | Ah yes my favorite was the early covid numbers, some of
               | the "smartest" people in the SF techie scene were daily
               | on Facebook thought-leadering about how 40% of people
               | were about to die in the likely case.
        
               | tiptup300 wrote:
               | and boy did the stockholders like that one.
        
               | dyauspitr wrote:
               | What stockholders. They're investors at this point. I
               | wish I could get in on it.
        
               | talldayo wrote:
               | They're rollercoaster riders, being told lusterous
               | stories by gold-panners while the shovel salesman counts
               | his money and leaves.
        
               | dvfjsdhgfv wrote:
               | Why should I believe anything he says?
        
               | markk wrote:
               | I found this statement by Sam quite amusing. It transmits
               | exactly zero information (it's a given that models will
               | improve over time), yet it sounds profound and ambitious.
        
             | og_kalu wrote:
             | GPT-3 was released in 2020 and GPT-4 in 2023. Now we all
             | expect 5 sooner than that but you're acting like we've been
             | waiting years lol.
        
               | skepticATX wrote:
               | The increased expectations are a direct result of LLM
               | proponents continually hyping exponential capabilities
               | increase.
        
               | og_kalu wrote:
               | The time for the research, training, testing and
               | deploying of a new model at frontier scales doesn't
               | change depending on how hyped the technology is. I just
               | think the comment i was replying to lacks perspective.
        
               | Sharlin wrote:
               | People who buy into hype deserve to be disappointed. Or
               | burned, as the case may be.
        
               | throwthrowuknow wrote:
               | So if not exponential, what would you call adding voice
               | and image recognition, function calling, greatly
               | increased token generation speed, reduced cost, massive
               | context window increases and then shortly after combining
               | all of that in a truly multi modal model that is even
               | faster and cheaper while adding emotional range and
               | singing in... _checks notes_ ...14 months?! Not to
               | mention creating and improving an API, mobile apps, a
               | marketplace and now a desktop app. OpenAI ships and they
               | are doing so in a way that makes a lot of business sense
               | (continue to deliver while reducing cost). Even if they
               | didn't have another flagship model in their back pocket
               | I'd be happy with this rate of improvement but they are
               | obviously about to launch another one given the teasers
               | Mira keeps dropping.
        
               | skepticATX wrote:
               | All of that is awesome, and makes for a better product.
               | But it's also primarily an engineering effort. What
               | matters here is an increase in intelligence. And we're
               | not seeing that aside from very minor capability
               | increases.
               | 
               | We'll see if they have another flagship model ready to
               | launch. I seriously doubt it. I suspect that this was
               | supposed to be called GPT-5, or at the very least
               | GPT-4.5, but they can't meet expectations so they can't
               | use those names.
        
               | dwaltrip wrote:
               | Pay attention to the signal, ignore the noise.
        
             | dlivingston wrote:
             | "Seymour, the house is on fire!"
             | 
             | "No, mother, that's just the H100s."
        
             | dialup_sounds wrote:
             | Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"
             | 
             | Skinner (looking up): No, mother, it's just the Nvidia
             | GPUs.
        
             | dialup_sounds wrote:
             | Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"
             | 
             | Skinner (looking up): "No, mother, it's just the Nvidia
             | GPUs."
        
           | moomoo11 wrote:
           | I really hope GPT5 is good. GPT4 sucks at programming.
        
             | verdverm wrote:
             | Look to a specialized model instead of a general purpose
             | one
        
               | moomoo11 wrote:
               | Any suggestions? Thanks
               | 
               | I have tried Phind and anything beyond mega junior tier
               | questions it suffers as well and gives bad answers.
        
             | twsted wrote:
             | It's better than at least 50% of the developers I know.
        
               | Jensson wrote:
               | A developer that just pastes in code from gpt-4 without
               | checking what it wrote is a horror scenario, I don't
               | think half of the developers you know are really that
               | bad.
        
             | cududa wrote:
             | It's excellent at programming if you actually know the
             | problem you're trying to solve and the technology. You need
             | to guide it with actual knowledge you have. Also, you have
             | to adapt your communication style to get good results. Once
             | you 'crack the pattern' you'll have a massive productivity
             | boost
        
               | partiallypro wrote:
               | In my experience 3.5 was better at programming than 4,
               | and I don't know why.
        
           | littlestymaar wrote:
           | I don't think a bigger model would make sense for OpenAI:
           | it's much more important for them that they keep driving
           | inference coat down, because there's no viable business model
           | if they don't.
           | 
           | Improving the instruction tuning, the RLHF step, increase the
           | training size, work on multilingual capabilities, etc. make
           | sense as a way to improve quality, but I think increasing
           | model size doesn't. Being able to advertize a big
           | breakthrough may make sense in terms of marketing, but I
           | don't believe it's going to happen for two reasons:
           | 
           | - you don't release intermediate steps when you want to be
           | able to advertise big gains, because it raises the baseline
           | and reduce the effectiveness of your "big gains" in terms of
           | marketing.
           | 
           | - I don't think they would benefit in an arm race with Meta,
           | trying to keeping a significant edge. Meta is likely to be
           | able to catch-up eventually on performance, but they are not
           | so much of a threat in terms of business. Focusing on keeping
           | a performance edge instead of making their business viable
           | would be a strategic blunder.
        
             | jononor wrote:
             | What is OpenAI business model if their models are second-
             | best? Why would people pay them and not
             | Meta/Google/Microsoft - who can afford to sell at very low
             | margins, since they have existing very profitable
             | businesses that keeps them afloat.
        
               | littlestymaar wrote:
               | That's _the_ question OpenAI needs to find an answer to
               | if they want to end up viable.
               | 
               | They have the brand recognition (for ChatGPT) and that's
               | a good start, but that's not enough. Providing a best in
               | class user experience (which seems to be their focus now,
               | with multimodality), a way to lock down their customers
               | in some kind of walled garden, building some kind of
               | network effect (what they tried with their marketplace
               | for community-built "GPTs" last fall but I'm not sure
               | it's working), something else?
               | 
               | At the end of the day they have no technological moat, so
               | they'll need to build a business one, or perish.
               | 
               | For most tasks, pretty much every models from their
               | competitors is more than good enough already, and it's
               | only going to get worse as everyone improves. Being
               | marginally better on 2% of tasks isn't going to be
               | enough.
        
               | Eisenstein wrote:
               | I know it is super crazy, but maybe they could become a
               | non-profit and dedicate themselves to producing open
               | source AI in an effort to democratize it and make it safe
               | (as in, not walled behind a giant for-profit corp that
               | will inevitably enshittify it).
               | 
               | I don't know why they didn't think about doing that
               | earlier, could have been a game changer, but there is
               | still an opportunity to pivot.
        
         | cube2222 wrote:
         | I think the live demo that happened on the livestream is best
         | to get a feel for this model[0].
         | 
         | I don't really care whether it's stronger than gpt-4-turbo or
         | not. The direct real-time video and audio capabilities _are
         | absolutely magical and stunning_. The responses in voice mode
         | are now instantaneous, you can interrupt the model, you can
         | talk to it while showing it a video, and it understands (and
         | uses) intonation and emotion.
         | 
         | Really, just watch the live demo. I linked directly to where it
         | starts.
         | 
         | Importantly, this makes the interaction a lot more "human-
         | like".
         | 
         | [0]: https://youtu.be/DQacCB9tDaw?t=557
        
           | gabiruh wrote:
           | It's weird that the "airplane mode" seems to be ON on the
           | phone during the entire presentation.
        
             | arthurcolle wrote:
             | This was on purpose - they connected it to the internet via
             | a USB-C cable it appears, for consistent internet instead
             | of having it switch WiFi
             | 
             | Probably some kinks there they are working out
        
               | _flux wrote:
               | And eliminate the change of some prankster affecting the
               | demo by attacking the wifi.
        
               | OJFord wrote:
               | > Probably some kinks there they are working out
               | 
               | Or just a good idea for a live demo on a congested
               | network/environment with a lot of media present, at least
               | one live video stream (the one we're watching the
               | recording of), etc.
               | 
               | At least that's how I understood it, not that they had a
               | problem with it (consistently or under regular
               | conditions, or specific to their app).
        
               | hbn wrote:
               | That's very common practice for live demos. To avoid
               | situations like this:
               | 
               | https://www.youtube.com/watch?v=6lqfRx61BUg
        
             | simoes wrote:
             | They mention at the beginning of the video that they are
             | using hardwired internet for reliability reasons.
        
             | sitkack wrote:
             | You would want to make sure that it is always going over
             | WiFi for the demo and doesn't start using the cellular
             | network for a random reason.
        
               | rightbyte wrote:
               | You can turn off mobile data. They probably just wanted
               | wired internet.
        
           | fvdessen wrote:
           | The demo is impressive but personally, as a commercial user,
           | for my practical use cases, the only thing I care about is
           | how smart it is, how accurate are its answers and how vast is
           | its knowledge. These haven't changed much since GPT-4, yet
           | they should, as IMHO it is still borderline in its abilities
           | to be really that useful
        
             | CapcomGo wrote:
             | But that's not the point of this update
        
               | fvdessen wrote:
               | I know, and I know my comment is dismissive of the
               | incredible work shown here, as we're shown sci-fi level
               | tech. But I feel I have this kettle, that boils water in
               | 10min, and it really should boil it in 1, but instead is
               | now voice operated.
               | 
               | I hope the next version delivers on being smarter, as
               | this update instead of making me excited, makes me feel
               | they've reached a plateau on the improvement of the core
               | value and are distracting us with fluff instead
        
               | hombre_fatal wrote:
               | Sure, but "not enough, I want moar" is a trivial demand.
               | So trivial that it goes unsaid.
        
               | bennyhill wrote:
               | It's equivalent to "nothing to see here" which is exactly
               | the TLDR I was looking for.
        
               | shepherdjerred wrote:
               | Everything is amazing & Nobody is happy:
               | https://www.youtube.com/watch?v=PdFB7q89_3U
        
               | 0xB31B1B wrote:
               | gpt4 isn't quite "amazing" in terms of commercial use.
               | Gpt4 is often good, and also often mediocre or bad. Its
               | not going to change the world, it needs to get better.
        
               | Spivak wrote:
               | It's an impressive demo, it's not (yet) an impressive
               | product.
               | 
               | It seems like the people who are ohhing and ahhing at the
               | former and the people who are frustrated that this kind
               | of this is unbelivably impractical to productize will be
               | doomed to talk past one another forever. The text
               | generation models, image generation models, speech-to-
               | text and text-to-speech have reached impressive product
               | stages. Multi-model hasn't got there because no one is
               | really sure what to actually _do_ with the thing outside
               | of make cool demos.
        
               | 0xB31B1B wrote:
               | Multi modal isn't there because "this is an image of a
               | green plant" is viable in a demo, but its not
               | commercially viable. "This is an image of a monstera
               | deliciosa" is commercially viable, but not yet demoable.
               | The models need to improve to be usable.
        
               | dvaun wrote:
               | Near real-time voice feedback isn't amazing? Has the bar
               | risen this high?
               | 
               | I already know an application for this, and AFAIK it's
               | being explored in the SaaS space: guided learning
               | experiences and tutoring for individuals.
               | 
               | My kids, for instance, love to hammer Alexa with random
               | questions. They would spend a huge amount of time using a
               | better interface, esp. with quick feedback, that provided
               | even deeper insight and responses to them.
               | 
               | Taking this and tuning it to specific audiences would
               | make it a great tool for learning.
        
               | 0xB31B1B wrote:
               | "My kids, for instance, love to hammer Alexa with random
               | questions. They would spend a huge amount of time using a
               | better interface, esp. with quick feedback, that provided
               | even deeper insight and responses to them."
               | 
               | Great, using GPT-4 the kids will be getting a lot of
               | hallucinated facts returned to them. There are good use
               | cases for tranformer currently but they're not at the
               | "impact company earnings or country GDP" stage currently,
               | which is the promise that the whole industry has
               | raised/spent 100+B dollars on. Facebook alone is spending
               | 40B on AI. I believe in the AI future, but the only thing
               | that matters for now is that the models improve.
        
               | practice9 wrote:
               | I always double-check even the most obscure facts
               | returned by GPT-4 and have yet to see a hallucination (as
               | opposed to Claude Opus that sometimes made up historical
               | facts). I doubt stuff interesting to kids would be so out
               | of the data distribution to return a fake answer.
               | 
               | Compared to YouTube and Google SEO trash, or Google Home
               | / Alexa (which do search + wiki retrieval), at the moment
               | GPT-4 and Claude are unironically safer for kids: no
               | algorithmic manipulation, no ads, no affiliated trash
               | blogs, and so on. Bonus is that it can explain on the
               | level of complexity the child will understand for their
               | age
        
               | dvaun wrote:
               | My kids get erroneous responses from Alexa. This happens
               | all the time. The built-in web search doesn't provide
               | correct answers, or is confusing outright. That's when
               | they come to me or their Mom and we provide a better
               | answer.
               | 
               | I still see this as a cool application. Anything that
               | provides easier access to knowledge and improved learning
               | is a boon.
               | 
               | I'd rather worry about the potential economic impact than
               | worry about possible hallucinations from fun questions
               | like "how big is the sun?" or "what is the best videogame
               | in the world?", etc.
               | 
               | There's a ton you can do here, IMO.
               | 
               | Take a look at mathacademy.com, for instance. Now slap a
               | voice interface on it, provide an ability for
               | kids/participants to ask questions back and forth, etc.
               | Boom: you've got a math tutor that guides you based on
               | your current ability.
               | 
               | What if we could get to the same style of learning for
               | languages? For instance, I'd love to work on Spanish.
               | It'd be far more accessible if I could launch a web
               | browser and chat through my mic in short spurts, rather
               | than crack open Anki and go through flash cards, or wait
               | on a Discord server for others to participate in
               | immersive conversation.
               | 
               | Tons of cool applications here, all learning-focused.
        
               | throwthrowuknow wrote:
               | Watch the last few minutes of that linked video, Mira
               | strongly hints that there's another update coming for
               | paid users and seems to make clear that GPT4o is moreso
               | for free tier users (even though it is obviously a huge
               | improvement in many features for everyone).
        
             | whyever wrote:
             | They say it's twice as fast/cheap, which might matter for
             | your use case.
        
               | minimaxir wrote:
               | It's twice as fast/cheap relative to GPT-4-turbo, which
               | is still expensive compared to GPT-3.5-turbo and Claude
               | Haiku.
               | 
               | https://openai.com/api/pricing/
        
               | c0t300 wrote:
               | but better afaik
        
               | minimaxir wrote:
               | But may not be better _enough_ to warrant the cost
               | difference. LLM cost econonmics are complicated.
        
               | fvdessen wrote:
               | I'd much rather have it be slower, more expensive, but
               | smarter
        
               | pests wrote:
               | Then the current offering should suffice, right?
        
               | specproc wrote:
               | Depends what you want it for. I'm still holding out for a
               | decent enough open model, Llama 3 is tantalisingly close,
               | but inference speed and cost are serious bottlenecks for
               | any corpus-based use case.
        
               | abdullin wrote:
               | I think, that might come with the next GPT version.
               | 
               | OpenAI seems to build in cycles. First they focus on
               | capabilities, then they work on driving the price down
               | (occasionally at some quality degradation)
        
             | ben_w wrote:
             | I understand your point, and agree that it is "borderline"
             | in its abilities -- though I would instead phrase it as "it
             | feels like a junior developer or an industrial placement
             | student, and assume it is of a similar level in all other
             | skills", as this makes it clearer when it is or isn't a
             | good choice, and it also manages expectations away from
             | both extremes I frequently encounter (that it's either Cmdr
             | Data already, or that's it's a no good terrible thing only
             | promoted by the people who were previously selling Bitcoin
             | as a solution to all the economics).
             | 
             | That said, given the price tag, when AI becomes _genuinely
             | expert_ then I 'm probably not going to have a job and
             | neither will anyone else (modulo how much electrical power
             | those humanoid robots need, as the global electricity
             | supply is currently only 250 W/capita).
             | 
             | In the meantime, making it a properly real-time
             | conversational partner... wow. Also, that's kinda what you
             | need for real-time translation, because: <<be this, that
             | different languages the word order totally alter and
             | important words at entirely different places in the
             | sentence put>>, and real-time "translation" (even when done
             | by a human) therefore requires having a good idea what the
             | speaker was going to say before they get there, _and_ being
             | able to back-track when (as is inevitable) the anticipated
             | topic was actually something completely different and so
             | the  "translation" wasn't.
        
               | fvdessen wrote:
               | I guess I feel like I'll get to keep my job a while
               | longer and this is strangely disappointing...
               | 
               | A real time translator would be a killer app indeed, and
               | it seems not so far away, but note how you have to prompt
               | the interaction with 'Hey ChatGPT'; it does not interject
               | on its own. It is also unclear if it is able to
               | understand if multiple people are speaking and who's who.
               | I guess we'll see soon enough :)
        
               | ben_w wrote:
               | > It is also unclear if it is able to understand if
               | multiple people are speaking and who's who. I guess we'll
               | see soon enough :)
               | 
               | Indeed; I would be _pleasantly surprised_ if it can both
               | notice and separate multiple speakers, but only a bit
               | surprised.
        
             | jll29 wrote:
             | There is room for more than one use case and large language
             | model type.
             | 
             | I predict there will be a zoo (more precisely tree, as in
             | "family tree") of models and derived models for particular
             | application purposes, and there will be continued
             | development of enhanced "universal"/foundational models as
             | well. Some will focus on minimizing memory, others on
             | minimizing pre-training or fine-tuning energy consumption,
             | some need high accuracy, others hard realtime speed, yet
             | others multimodality like GPT4.o, some multilinguality, and
             | so on.
             | 
             | Previous language models that encoded dictionaries for
             | spellcheckers etc. never got standardized (for instance,
             | compare aspell dictionaries to the ones from LibreOffice to
             | the language model inside CMU PocketSphinx) so that you
             | could use them across applications or operating systems. As
             | these models are becoming more common, it would be
             | interesting to see this aspect improve this time around.
             | 
             | https://www.rev.com/blog/resources/the-5-best-open-source-
             | sp...
        
               | CooCooCaCha wrote:
               | I disagree, transfer learning and generalization are
               | hugely powerful and specialized models won't be as good
               | because their limited scope limits their ability to
               | generalize and transfer knowledge from one domain to
               | another.
               | 
               | I think people who emphasis specialized models are
               | operating under a false assumption that by focusing the
               | model it'll be able to go deeper in that domain. However,
               | the opposite seems to be true.
               | 
               | Granted, specialized models like AlphaFold are superior
               | in their domain but I think that'll be less true as
               | models become more capable at general learning.
        
             | Keyframe wrote:
             | One thing I've noticed, is the more context and more
             | precise the context I give it the "smarter" it is. There
             | are limits to it of course. But, I cannot help but think
             | that's where next barrier will be brought down. An agent or
             | multiple of that tag along with everything I do throughout
             | the day to have the full context. That way, I'll get
             | smarter and more to the point help as well as not spending
             | much time explaining the context.. but, that will open a
             | dark can that I'm not sure people will want to open -
             | having an AI track everything you do all the time (even if
             | only in certain contexts like business hours / env).
        
             | RupertEisenhart wrote:
             | Its faster, smarter and cheaper over the API. Better than a
             | kick in the teeth.
        
             | abdullin wrote:
             | I have a few LLM benchmarks that were extracted from real
             | products.
             | 
             | GPT-4o got slightly better overall. Ability to reason
             | improved more than the rest.
        
           | aaroninsf wrote:
           | Absolutely agree.
           | 
           | This model isn't about basemark chasing or being a better
           | code generator; it's entirely explicitly focused on pushing
           | prior results into the frame of multi-modal interaction.
           | 
           | It's still a WIP, most of the videos show awkwardness where
           | its capacity to understand the "flow" of human speech is
           | still vestigial. It doesn't understand how humans pause and
           | give one another space for such pauses yet.
           | 
           | But it has some indeed magic ability to share a deictic frame
           | of reference.
           | 
           | I have been waiting for this specific advance, because it is
           | going to significantly quiet the "stochastic parrot" line of
           | wilfully-myopic criticism.
           | 
           | It is very hard to make blustery claims about "glorified
           | Markov token generation" when using language in a way that
           | requires both a shared world model and an understanding of
           | interlocutor intent, focus, etc.
           | 
           | This is edging closer to the moment when it becomes very hard
           | to argue that system does not have some form of self-model
           | and a world model within which self, other, and other objects
           | and environments exist with inferred and explicit
           | relationships.
           | 
           | This is just the beginning. It will be very interesting to
           | see how strong its current abilities are in this domain; it's
           | one thing to have object classification--another thing
           | entirely to infer "scripts plans goals..." and things like
           | intent, and, deixis. E.g. how well does it now understand
           | "us" and "them" and "this" vs "that"?
           | 
           | Exciting times. Scary times. Yee hawwwww.
        
             | nicklecompte wrote:
             | What part of this makes you think GPT-4 suddenly developed
             | a world model? I find this comment ridiculous and bizarre.
             | Do you seriously think snappy response time + fake emotions
             | is an indicator of intelligence? It seems like you are just
             | getting excited and throwing out a bunch of words without
             | even pretending to explain yourself:
             | 
             | > using language in a way that requires both a shared world
             | model
             | 
             | Where? What example of GPT-4o _requires_ a shared world
             | model? The customer support example?
             | 
             | The reason GPT-4 does not have any meaningful world model
             | (in the sense that rats have meaningful world models) is
             | that it freely believes contradictory facts without being
             | confused, freely confabulates without having brain damage,
             | and it has no real understanding of quantity or causality.
             | Nothing in GPT-4o fixes that, and gpt2-chatbot certainly
             | had the same problems with hallucinations and failing the
             | same pigeon-level math problems that all other GPTs fail.
        
               | og_kalu wrote:
               | One of the most interesting things about the advent of
               | LLMs is people bringing out all sorts of "reasons" GPT
               | doesn't have true 'insert property' but all those reasons
               | freely occur in humans as well
               | 
               | >that it freely believes contradictory facts without
               | being confused,
               | 
               | Humans do this. You do this. I guess you don't have a
               | meaningful world model.
               | 
               | >freely confabulates without having brain damage
               | 
               | Humans do this
               | 
               | >and it has no real understanding of quantity or
               | causality.
               | 
               | Well this one is just wrong.
        
               | spuz wrote:
               | I agree. The interesting lesson I take from the seemingly
               | strong capabilities of LLMs is not how smart they are but
               | how dumb we are. I don't think LLMs are anywhere near as
               | smart as humans yet, but it feels each new advance is
               | bringing the finish line closer rather than the other way
               | round.
        
               | shrimp_emoji wrote:
               | Moravec's paradox states that, for AI, the hard stuff is
               | easiest and the easy stuff is hardest. But there's no
               | easy or hard; there's only what the network was trained
               | to do.
               | 
               | The stuff that comes easy to us, like navigating 3D
               | space, was trained by billions of years of evolution. The
               | hard stuff, like language and calculus, is new stuff
               | we've only recently become capable of, seemingly by
               | evolutionary accident, and aren't very naturally good at.
               | We need rigorous academic training at it that's rarely
               | very successful (there's only so many people with the
               | random brain creases to be a von Neumann or Einstein), so
               | we're impressed by it.
        
               | HeatrayEnjoyer wrote:
               | So many even here on HN have a near-religious belief that
               | intelligence is unique to humans and animals, and somehow
               | a fundamental phenomenon that cannot ever be created
               | using other materials.
        
               | dvaun wrote:
               | It's a defensive response to an emerging threat to
               | stability and current social tiers.
        
               | joquarky wrote:
               | It reminds me of the geocentric mindset.
        
               | pests wrote:
               | >>and it has no real understanding of quantity or
               | causality.
               | 
               | >Well this one is just wrong.
               | 
               | Is it?
               | 
               | --
               | 
               | Me: how many characters are in: https://google.com
               | 
               | ChatGPT: The URL "https://google.com" has 12 characters,
               | including the letters, dots, and slashes.
               | 
               | --
               | 
               | What is it counting there? 12 is wrong no matter how you
               | dice that up.
               | 
               | Part of the reason is it has no concept of the actual
               | string. That URL breaks into four different tokens in 3.5
               | and 4: "http", "://", "google" and ".com".
               | 
               | Its not able to figure out the total length, or even the
               | length of its parts and add them together.
               | 
               | I ask it to double check, it tells me 13 and then 14. I
               | tell it the answer and suddenly its able...
               | 
               | ---
               | 
               | Me: I think its 18
               | 
               | ChatGPT: Let's recount together:
               | 
               | "https://" has 8 characters. "google" has 6 characters.
               | ".com" has 4 characters. Adding these up gives a total of
               | 8 + 6 + 4 = 18 characters. You're correct! My apologies
               | for the oversight earlier.
               | 
               | ---
               | 
               | Count me out.
        
               | og_kalu wrote:
               | It seems you're already aware LLMs receive tokens not
               | words.
               | 
               | Does a blind man not understand quantity because you
               | asked him how many apples are in front of him and he
               | failed ?
        
               | pixl97 wrote:
               | I'd counter by pasting a picture of an emoji here, but HN
               | doesn't allow that, as a means to show the confusion that
               | can be caused by characters versus symbols.
               | 
               | Most LLMs can just pass the string to an tool to count it
               | to bypass it's built in limitations.
        
               | wcoenen wrote:
               | LLMs process text, but only after it was converted to a
               | stream of tokens. As a result, LLMs are not very good at
               | answering questions about letters in the text. That
               | information was lost during the tokenization.
               | 
               | Humans process photons, but only after converting them
               | into nerve impulses via photoreceptor cells in the
               | retina, which are sensitive to wavelengths ranges
               | described as "red", "green" or "blue".
               | 
               | As a result, humans are not very good at distinguishing
               | different spectra that happen to result in the same nerve
               | impulses. That information was lost by the conversion
               | from photons to nerve impulses. Sensors like the AS7341
               | that have more than 3 color channels are much better at
               | this task.
        
               | jameshart wrote:
               | How much of your own sense of quantity is visual, do you
               | think? How much of your ability to count the lengths of
               | words depends on your ability to sound them out and
               | spell?
               | 
               | I suspect we might find that adding in the multimodal
               | visual and audio aspects to the model gives these models
               | a much better basis for mental arithmetic and counting.
        
               | orangecat wrote:
               | _That URL breaks into four different tokens in 3.5 and 4:
               | "http", "://", "google" and ".com"._
               | 
               | Except that "http" should be "https". Silly humans,
               | claiming to be intelligent when they can't even tokenize
               | strings correctly.
        
               | davidham wrote:
               | Its first answer of 12 is correct, there are 12 _unique_
               | characters in https://google.com.
        
               | vel0city wrote:
               | The unique characters are:
               | 
               | h t p s : / g o l e . c m
               | 
               | There are 13 unique characters.
        
               | davidham wrote:
               | OK neither GPT-4o nor myself is great at counting
               | apparently
        
               | kenjackson wrote:
               | If someone found a way to put an actual human brain into
               | SW, but no one knew it was a real human brain -- I'm
               | certain most of HN would claim it wasn't AGI. "Kind of
               | sucks at math", "Knows weird facts about Tik Tok
               | celebrities, but nothing about world events", "Makes lots
               | of grammar mistakes", "scores poorly on most standardized
               | tests, except for one area that he seems to well", and
               | "not very creative".
        
               | goatlover wrote:
               | What is a human brain without the rest of it's body?
               | Humans aren't brains. Our nervous systems aren't just the
               | brain either.
        
               | kenjackson wrote:
               | It's meant to explore a point. Unless your point is that
               | AGI can only exist with a human body too.
        
               | chasd00 wrote:
               | i don't think making the same mistakes as a human counts
               | as a feature. I see that a lot when people point out a
               | flaw with an llm, the response is always "well a human
               | would make the same mistake!". That's not much of an
               | excuse, computers exist because they do the things humans
               | can't do very well like following long repetitive lists
               | of instructions. Further, upthread, there's discussion
               | about adding emotions to an llm. An emotional computer
               | that makes mistakes sometimes is pretty worthless as a
               | "computer".
        
               | og_kalu wrote:
               | It's not about counting as a feature. It's the blatant
               | logical fallacy. If a trait isn't a reason humans don't
               | have a certain property then it's not a reason for
               | machines either. Can't eat your cake and have it.
               | 
               | >That's not much of an excuse, computers exist because
               | they do the things humans can't do very well like
               | following long repetitive lists of instructions.
               | 
               | Computers exist because they are useful, nothing more and
               | nothing less. If they were useful in a completely
               | different way, they would still exist and be used.
        
               | goatlover wrote:
               | It's objectively true that LLMs do not have bodies. To
               | the extent general intelligence relies on being emobodied
               | (allowing you to manipulate the world and learn from
               | that), it's a legitimate thing to point out.
        
           | snthpy wrote:
           | Hectic!
           | 
           | Thanks for this.
        
           | OJFord wrote:
           | I assume (because they don't address it or look at all
           | phased) the audio cutting in and out is just an artefact of
           | the stream?
        
             | throwthrowuknow wrote:
             | Haven't tried it but from work I've done on voice
             | interaction this happens a lot when you have a big audience
             | making noise. The interruption feature will likely have
             | difficulty in noisy environments.
        
               | OJFord wrote:
               | Yeah that was actually my first thought (though no
               | professional experience with it/on that side) - it's just
               | that the commenter I replied to was so hyped about it and
               | how fluid & natural it was and I thought that made it
               | really jarr.
        
           | mvdtnz wrote:
           | Interesting that they decided to keep the horrible ChatGPT
           | tone ("wow you're doing a live demo right now?!"). It comes
           | across just so much worse in voice. I don't need my "AI"
           | speaking to me like I'm a toddler.
        
             | yieldcrv wrote:
             | tell it to speak to you differently
             | 
             | with a GPT you can modify the system prompt
        
               | maest wrote:
               | It still refuses to go outside the deeply sanitise tone
               | that "alignment" enforces on you.
        
             | marvin wrote:
             | One of the linked demos is it being sarcastic, so maybe you
             | can make it remember to be a little more edgy.
        
             | slibhb wrote:
             | You can tell it not to talk like this using custom prompts.
        
             | practice9 wrote:
             | It is cringe overenthusiastic, but a proper
             | instructions/system prompt will fix that mostly
        
             | throwthrowuknow wrote:
             | Did you miss the part where they simply asked it to change
             | its manner of speaking and the amount of emotion it used?
        
             | baumgarn wrote:
             | it should be possible to imitate any voice you want like
             | your actual parents soon enough
        
               | goatlover wrote:
               | That won't be Black Mirror levels of creepy /s
        
           | ChuckMcM wrote:
           | I expect the really solid use case here will be voice
           | interfaces to applications that don't suck. Something I am
           | still surprised at is that vendors like Apple have yet to
           | allow me to train the voice to text model so that it _only_
           | responds to me and not someone else.
           | 
           | So local modelling (completely offline but per speaker aware
           | and responsive), with a really flexible application API. Sort
           | of the GTK or QT equivalent for voice interactions. Also
           | custom naming, so instead of "Hey Siri" or "Hey Google" I
           | could say, "Hey idiot" :-)
           | 
           | Definitely some interesting tech here.
        
           | spaceman_2020 wrote:
           | This is going straight into 'Her' territory
        
           | clhodapp wrote:
           | Call me overly paranoid/skeptical, but I'm not convinced that
           | this isn't a human reading (and embellishing) a script. The
           | "AI" responses in the script may well have actually been
           | generated by their LLM, providing a defense against it being
           | fully fake, but I'm just not buying some of these "AI"
           | voices.
           | 
           | We'll have to see when end users actually get access to the
           | voice features "in the coming weeks".
        
         | dragonwriter wrote:
         | > As many highlighted there, the model is not an improvement
         | like GPT3->GPT4.
         | 
         | The improvements they seem to be hyping are in multimodality
         | and speed (also price - half that of GPT-4 Turbo - though
         | that's their choice and could be promotional, but I expect it's
         | at least in part, like speed, a consequence of greater
         | efficiency), not so much producing better output for the same
         | pure-text inputs.
        
           | kybercore wrote:
           | the model scores 60 points higher in lmsys than the best gpt
           | 4 turbo model from april, that's still a pretty significant
           | jump in text capability
        
         | aixpert wrote:
         | useless anecdata but I find the new model very frustrating,
         | often completely ignoring what I say in follow up queries. it's
         | giving me serious Siri vibes
         | 
         | (text input in web version)
         | 
         | maybe it's programmed to completely ignore swearing but how
         | could I not swear after it gave me repeatedly info about
         | you.com when I try to address it in second person
        
         | lossolo wrote:
         | I agree. I tried a few programming problems that, let's say,
         | seem to be out of the distribution of their training data and
         | which GPT4 failed to solve before. The model couldn't find a
         | similar pattern and failed to solve them again. What's
         | interesting is that one of these problems were solved by Opus,
         | which seems to indicate that the majority of progress in the
         | last months should be attributed to the quality/source of the
         | training data.
        
         | avereveard wrote:
         | I tested a few use cases in the chat, and it's not particularly
         | more intelligent but they seem to have solved laziness. I had
         | to categorize my expenses to do some budgeting for the family,
         | and in gpt 4 I had to go ten in ten, confirm the suggested
         | category, download the file, took two days as I was constantly
         | hitting the limit. gpt4o did most of the grunth work, then
         | commincated anomalies in bulk, asked for suggestion for these,
         | and provided a downloadable link in two answers, calling the
         | code interpreter mulitple times, and working toward the goal on
         | it's own.
         | 
         | and the prompt wasn't a monstrosity, and it wasn't even that
         | good, it was just one line "I need help to categorize these
         | expenses" and off it went. hope it won't get enshittified like
         | turbo, because this finally feels as great as 3.5 was for goal
         | seeking.
        
           | ozzydave wrote:
           | Heh - I'm using ChatGPT for the same thing! Works 10X better
           | than Rocket Money, which was supposed to be an improvement on
           | Mint but meh.
        
         | jameshart wrote:
         | I think this comment is easily misread as implying that this
         | GPT4o model is based on some old GPT2 chatbot - that's very
         | much not what you meant to say, though.
         | 
         | This model has been being tested under a code name of
         | 'gpt2-chatbot' but it is very much a new GPT4+-level model,
         | with new multimodal capabilities - but apparently some
         | impressive work around inference speed.
         | 
         | Highlighting so people don't get the impression this is just
         | OpenAI slapping a new label on something a generation out of
         | date.
        
       | Jimmc414 wrote:
       | Big questions are (1) when is this going to be rolled out to paid
       | users? (2) what is the remaining benefit of being a paid user if
       | this is rolled out to free users? (3) Biggest concern is will
       | this degrade the paid experience since GPT-4 interactions are
       | already rate limited. Does OpenAI have the hardware to handle
       | this?
       | 
       | Edit: according to @gdb this is coming in "weeks"
       | 
       | https://twitter.com/gdb/status/1790074041614717210
        
         | onemiketwelve wrote:
         | thanks, I was confused because the top of the page says to try
         | now when you cannot in fact try it at all
        
           | freedomben wrote:
           | I'm a ChatGPT Plus Subscriber and I just refreshed the page
           | and it offered me the new model. I'm guessing they're rolling
           | it out gradually but hopefull it won't take too long.
           | 
           | Edit: It's also now available to me in the Android App
        
           | whimsicalism wrote:
           | i can try it now, but now the voice features i dont think
        
           | zamadatix wrote:
           | You can use GPT-4o now but the interactive voice mode of
           | using it (as demoed today) releases in a few weeks.
        
         | Tenoke wrote:
         | >what is the remaining benefit of being a paid user if this is
         | rolled out to free users?
         | 
         | It says so right in the post
         | 
         | >We are making GPT-4o available in the free tier, and to Plus
         | users with up to 5x higher message limits
         | 
         | The limits are much lower for free users.
        
         | jrh3 wrote:
         | I'm not convinced I need to keep paying for plus. The threshold
         | of requests for free 4o is pretty high.
        
         | dunkmaster wrote:
         | This might mean GPT-5 is coming soon and it will only be
         | available to paid users.
        
           | dunkmaster wrote:
           | Or they just made a bunch of money on their licensing deal
           | with Apple. So they don't need to charge for ChatGPT anymore.
        
             | spdif899 wrote:
             | If it's going to be available via Siri this could make
             | sense.
             | 
             | It does make me wonder how such a relationship could impact
             | progress. Would OpenAI feel limited from advancing in
             | directions that don't align with the partnership? For
             | example if they suddenly release a model better than what's
             | in Siri, making Siri look bad.
        
           | yieldcrv wrote:
           | I'm actually thinking that the GPT store with more users
           | might be better for them
           | 
           | From my casual conversations, not that many people are paying
           | for GPT4 or know why they should. Every conversation even in
           | enthusiast forums like this one has to be interjected with
           | "wait, are you using GPT4? because GPT3.5 the free one is
           | pretty nerfed"
           | 
           | just nuking that friction from orbit and expanding the GPT
           | store volume could be a positive for them
        
       | lxgr wrote:
       | Will this include image generation for the free tier as well?
       | That's a big missing feature in OpenAI's free tier compared to
       | Google and Meta.
        
         | dkarras wrote:
         | is oai image generation any different than the microsoft
         | copilot provides for free? I thought they were the same.
        
       | OliverM wrote:
       | This is impressive, but they just sound so _alien_, especially to
       | this non-U.S. English speaker (to the point of being actively
       | irritating to listen to). I guess picking up on social cues
       | communicating this (rather than express instruction or feedback)
       | is still some time away.
       | 
       | It's still astonishing to consider what this demonstrates!
        
       | w-m wrote:
       | Gone are the days of copy-pasting to/from ChatGPT all the time,
       | now you just share your screen. That's a fantastic feature, in
       | how much friction that removes. But what an absolute privacy
       | nightmare.
       | 
       | With ChatGPT having a very simple text+attachment in, text out
       | interface, I felt absolutely in control of what I tell it. Now
       | when it's grabbing my screen or a live camera feed, that will be
       | gone. And I'll still use it, because it's just so damn
       | convenient?
        
         | baby_souffle wrote:
         | > Now when it's grabbing my screen or a live camera feed, that
         | will be gone. And I'll still use it, because it's just so damn
         | convenient?
         | 
         | Presumably you'll have a way to draw a bounding box around what
         | you want to show or limit to just a particular window the same
         | way you can when doing a screen share w/ modern video
         | conferencing?
        
       | jawiggins wrote:
       | I hope when this gets to my iphone I can use it to set two
       | concurrent timers.
        
       | mellosouls wrote:
       | Very, very impressive for a "minor" release demo. The
       | capabilities here would look shockingly advanced just 5 years
       | ago.
       | 
       | Universal translator, pair programmer, completely human sounding
       | voice assistant and all in real time. Scifi tropes made real.
       | 
       | But: Interesting next to see how it actually performs IRL latency
       | and without cherry-picking. No snark, it was great but need to
       | see real world power. Also what the benefits are to subscribers
       | if all this is going to be free...
        
         | llm_trw wrote:
         | The capabilities here look shocking advanced yesterday.
        
           | partiallypro wrote:
           | A lot of the demo is very impressive, but some of it is just
           | stuff that already exists but this is slightly more polished.
           | Not really a huge leap for at least 60% of the demos.
        
         | CooCooCaCha wrote:
         | My guess is they're banking on the free version being rate
         | limited and people finding it so useful that they want to
         | remove the limit. Like giving a new user a discount on heroin.
         | At least that's the strategy that would make most sense to me.
        
           | rubidium wrote:
           | I have the paid version and it's not connecting
        
             | CooCooCaCha wrote:
             | What does that have to do with what I said?
        
       | yumraj wrote:
       | In the first video the AI seems excessively chatty.
        
         | hipadev23 wrote:
         | chatGPT desperately needs a "get to the fucking point" mode.
        
           | tomashubelbauer wrote:
           | Seriously. I've had to spell out that it should just answer
           | in twelve different ways with examples in the custom
           | instructions to make it at least somewhat usable. And it
           | still "forgets" sometimes.
        
           | chatcode wrote:
           | It does, that's "custom instructions".
        
           | progbits wrote:
           | Impressive demo, but like half the interactions were "hello"
           | "hi how are you doing" "great thanks, what can I help you
           | with" etc.
           | 
           | The benchmark for human-computer interaction should be "tea,
           | earl gray, hot", not awkward and pointless smalltalk.
        
           | ativzzz wrote:
           | "no yapping" in the prompt works very well
        
         | jamilton wrote:
         | Yeah, I would hope that custom instructions would help somewhat
         | with that, but it is a point of annoyance for me too.
        
         | mrandish wrote:
         | Yes, it sounds like an awkwardly perky and over-chatty
         | telemarketer that _really_ wants to be your friend. I find the
         | tone maximally annoying and think most users will find it both
         | stupid and creepy. Based on user preferences, I expect future
         | interactive chat AIs will default to an engagement mode that 's
         | optimized for accuracy and is both time-efficient and
         | cognitively efficient for the user.
         | 
         | I suspect this AI <-> Human engagement style will evolve over
         | time to become quite unlike human to human engagement, probably
         | mixing speech with short tones for standard responses like
         | "understood", "will do", "standing by" or "need more input". In
         | the future these old-time demo videos where an AI is forced to
         | do a creepy caricature of an awkward, inauthentic human will be
         | embarrassingly retro-cringe. _" Okay, let's do it!"_
        
           | jdthedisciple wrote:
           | I found i off-putting as well
           | 
           | guess it's just biased with average Californian behavior and
           | speech patterns
        
           | TillE wrote:
           | Reminds me of how Siri used to make jokes after setting a
           | timer. Now it just reads back the time you specified, in a
           | consistent way.
           | 
           | It's a very impressive gimmick, but I really think most
           | people don't want to interact with computers that way. Since
           | Apple pulled that "feature" after a few years, it's probably
           | not just a nerd thing.
        
           | caseyy wrote:
           | It is exceptionally creepy. It is an unnatural effort to
           | appear pleasing, like the fawning response seen in serious
           | abuse survivors.
        
       | csjh wrote:
       | I wonder if this is what the "gpt2-chatbot" that was going around
       | earlier this month was
        
         | lambdaba wrote:
         | yes it was
        
         | AndyNemmity wrote:
         | it was
        
       | peppertree wrote:
       | Just like that Google is on back foot again.
        
         | tempsy wrote:
         | Considering the stock pumped following the presentation the
         | market doesn't seem particularly with what OpenAI released at
         | all.
        
       | sebastiennight wrote:
       | Anyone who watched the OpenAI livestream: did they "paste" the
       | code after hitting CTRL+C ? Or did the desktop app just read from
       | the clipboard?
       | 
       | Edit: I'm asking because of the obvious data security
       | implications of having your desktop app read from the clipboard
       | _in the live demo_... That would definitely put a damper to my
       | fanboyish enthusiasm about that desktop app.
        
         | golol wrote:
         | To me it looked they used one command that did copy+paste into
         | ChatGPT both.
        
         | dkarras wrote:
         | macOS asks you to give permission for an application to read
         | your clipboard. do other operating systems not have that?
        
       | sn_master wrote:
       | This is every romance scammer's dreams come true...
        
       | summerlight wrote:
       | This is really impressive engineering. I thought real time agents
       | would completely change the way we're going to interact with
       | large models but it would take 1~2 more years. I wonder what kind
       | of new techs are developed to enable this, but OpenAI is fairly
       | secretive so we won't be able to know their sauce.
       | 
       | On the other hand, this also feels like a signal that reasoning
       | capability has probably already been plateaued at GPT-4 level and
       | OpenAI knew it so they decided to focus on research that matters
       | to delivering product engineering rather than long-term research
       | to unlock further general (super)intelligence.
        
         | nopinsight wrote:
         | Reliable agents in diverse domains need better reasoning
         | ability and fewer hallucinations. If the rumored GPT-5 and Q*
         | capabilities are true, such agents could become available soon
         | after it's launched.
        
           | summerlight wrote:
           | Sam has been pretty clear on denying GPT-5 rumors, so I don't
           | think it will come anytime soon.
        
             | nopinsight wrote:
             | Sam mentioned on several occasions that GPT-5 will be much
             | smarter than GPT-4. On Lex Fridman's podcast, he even said
             | the gap between GPT-5 and 4 will be as wide as GPT-4 and 3
             | (not 3.5).
             | 
             | He did remain silent on when it's going to be launched.
        
               | valine wrote:
               | OpenAI has been open about their ability to predict model
               | performance prior to training. When Sam talks about GPT-5
               | he could very easily be talking about the hypothetical
               | performance of a model given their internal projections.
               | I think it's very unlikely a fully trained GPT-5 exists
               | yet.
        
               | bigyikes wrote:
               | Sam has stated that he knows the month GPT-5 will be
               | released.
               | 
               | Given the amount of time and uncertainty involved in
               | training and red-teaming these models, we can assume
               | GPT-5 exists if we take Altman at his word.
        
               | Atotalnoob wrote:
               | It's going to be launched this year. My buddy's company
               | had a private demo of gpt5
        
         | MVissers wrote:
         | Why would reasoning have plateau'd?
         | 
         | I think reasoning ability is not the largest bottleneck for
         | improvement in usefulness right now. Cost is a bigger one IMO.
         | 
         | Running these models as agents is hella expensive, and agents
         | or agent-like recurrent reasoning (like humans do) is the key
         | to improved performance if you look at any type of human
         | intelligence.
         | 
         | Single-shot performance only gets you so far.
         | 
         | For example- If it can write code 90% of the way, and then
         | debug in a loop, it'd be much more performant than any single
         | shot algorithm.
         | 
         | And OpenAI has these huge models in their basement probably.
         | But they might not be much more useful than GPT-4 when used as
         | single-shot. I mean, what could it do what we can't do today
         | with gpt-4?
         | 
         | It's agents and recurrent reasoning we need for more
         | usefulness.
         | 
         | At least- That's my humble opinion as an amateur neuroscientist
         | that plays around with these models.
        
           | Jensson wrote:
           | > Running these models as agents is hella expensive
           | 
           | Because they are dumb so you need to over compute so many
           | things to get anything useful. Smarter models would solve
           | this problem. Making the current model cheaper is like trying
           | to solve Go by scaling up Deep Blue, it doesn't work to just
           | hardcode dumb pieces together, the model needs to get
           | smarter.
        
             | cchance wrote:
             | You mean like our dumb ass brains? Theirs a reason "saying
             | the first thing out of your mind" is a bad fucking idea,
             | thats what AI's currently do, they don't take a moment
             | think about the answer and then formulate a response, they
             | spit out their first "thought" thats why multi-shot works
             | so much better, just like our own dumb brains.
        
               | Jensson wrote:
               | My brain can navigate a computer interface without using
               | word tokens, since I have tokens for navigating OS and
               | browsers and tabs etc. That way I don't have to read a
               | million tokens of text to figure out where buttons are or
               | how to navigate to places, since my brain is smart enough
               | to not use words for it.
               | 
               | ChatGPT doesn't have that sort of thing currently, and
               | until it does it will always be really bad at that sort
               | of thing.
               | 
               | You are using a hand to hammer a nail, that will never go
               | well, the solution isn't to use more hands the solution
               | is to wield a hammer.
        
               | cchance wrote:
               | WTF are you even talking about, we're talking about
               | understanding and communication not taking actions,
               | navigating an OS and browser, tabs etc are actions, not
               | thoughts or communication. This model isn't taking
               | actions there is no nail to hammer lol, and if their was
               | you'd be smashing a brain into a nail for some reason.
        
               | Jensson wrote:
               | The topic is agents, the AI acting on your behalf, that
               | needs more than text. What are you talking about?
        
         | CuriouslyC wrote:
         | This isn't really new tech, it's just an async agent in front
         | of a multimodal model. It seems from the demo that the
         | improvements have been in response latency and audio
         | generation. Still, it looks like they're building a solid
         | product, which has been their big issue so far.
        
           | searealist wrote:
           | No, audio is fed directly into the model. There is no text to
           | speech transformer in front of it like there was with
           | chatgpt-4.
        
           | cchance wrote:
           | Its 200-300ms for a multimodal response, thats REALLY a big
           | step forward, especially given it's doing it with full voice
           | response, not just text.
        
         | cchance wrote:
         | Ya so sad that OpenAI isn't more Open imagine if OpenAI was
         | still sharing their thought processes and papers with the
         | overall commity, really wish we saw collaborations between
         | OpenAI and Meta for instance to really have helped push the
         | open source arena further ahead, i love that their latest
         | models are so great but the fact they aren't helping the Open
         | source arena to progress is sad. Imagine how far we'd be if
         | OpenAI was still as open as they once were and we saw
         | collaborations betweeen Meta, OpenAI and Anthropic all working
         | and sharing growth and tech to reduce double work and help each
         | other not go down failed paths.
        
       | MBCook wrote:
       | Why must every website put stupid stuff that floats above the
       | content and can't be dismissed? It drives me nuts.
        
       | dkga wrote:
       | That can "reason"?
        
       | MisterBiggs wrote:
       | I've been waiting to see someone drop a desktop app like they
       | showcased. I wonder how long until it is normal to have an AI
       | looking at your screen the entire time your machine is unlocked.
       | Answering contextual questions and maybe even interjecting if it
       | notices you made a mistake and moved on.
        
         | doomroot13 wrote:
         | That seems to be what Microsoft is building and will reveal as
         | a new Windows feature at BUILD '24. Not too sure about the
         | interjecting aspect but ingesting everything you do on your
         | machine so you can easily recall and search and ask questions,
         | etc. AI Explorer is the rumored name and will possibly run
         | locally on Qualcomm NPUs.
        
           | ukuina wrote:
           | Yes, this is Windows AI Explorer.
        
         | layer8 wrote:
         | This will be great for employee surveillance, to monitor how
         | much you are really working.
        
           | MisterBiggs wrote:
           | I think even scarier is that ChatGPT's tone of voice and bias
           | is going to take over everything.
        
       | bredren wrote:
       | It is notable OpenAI did not need to carefully rehearse the
       | talking points of the speakers. Or even do the kind of careful
       | production quality seen in a lot of other videos.
       | 
       | The technology product is so good and so advanced it doesn't
       | matter how the people appear.
       | 
       | Zuck tried this in his video countering to vision pro, but it did
       | not have the authentic "not really rehearsed or produced" feel of
       | this at all. If you watch that video and compare it with this you
       | can see the difference.
       | 
       | Very interesting times.
        
       | skepticATX wrote:
       | Very impressive demo, but not really a step change in my opinion.
       | The hype from OpenAI employees was on another level, way more
       | than was warranted in my opinion.
       | 
       | Ultimately, the promise of LLM proponents is that these models
       | will get exponentially smarter - this hasn't born out yet. So
       | from that perspective, this was a disappointing release.
       | 
       | If anything, this feels like a rushed release to match what
       | Google will be demoing tomorrow.
        
       | altcognito wrote:
       | GPT-4 expressing a human-like emotional response every single
       | time you interact with it is pretty annoying.
       | 
       | In general, trying to push that this is a human being is probably
       | "unsafe", but that hurts the marketing.
        
       | jonquark wrote:
       | It might be region specific (I'm in the UK) - but I don't "see"
       | the new model anywhere e.g. if I go to:
       | https://platform.openai.com/playground/chat?models=gpt-4o The
       | model the page uses is set to gpt-3.5-turbo-16k.
       | 
       | I'm confused
        
       | aw4y wrote:
       | I don't see anything released today. Login/signup is still
       | required, no signs of desktop app or free use on web. What am I
       | missing?
        
       | goalonetwo wrote:
       | For all the hype around this announcement I was expecting more
       | than some demo-level stuff that close to nobody will use in real
       | life. Disappointing.
        
         | sroussey wrote:
         | Twice as fast and half the cost for the API sounds good to me.
         | Not a demoable thing though.
        
         | asteroidz wrote:
         | Why are you so confident that nobody will use this in real
         | life? I know OpenAI showed only a few demos, but I can see huge
         | potential.
        
       | mellosouls wrote:
       | @sama reflects:
       | 
       | https://blog.samaltman.com/gpt-4o
        
       | 101008 wrote:
       | Are the employees in the demo high-directives of OpenAI? I can
       | understand Altman being happy with this progress, but what about
       | the medium/low employees? Didn't they watch Oppenheimer? Are they
       | happy they are destroying humanity/work/etc for future and not-
       | so-future generations?
       | 
       | Anyone who thinks this will be like the previous work revolutions
       | is nonsense. This replaces humans and will replace them even more
       | on each new advance. What's their plan? Live out of their
       | savings? What about family/friends? I honestly can't see this and
       | think how they can be so happy about it...
       | 
       | "Hey, we created something very powerful that will do your work
       | for free! And it does it better than you and faster than you! Who
       | are you? It doesn't matter, it applies to all of you!"
       | 
       | And considering I was thinking in having a kid next year, well,
       | this is a no.
        
         | galdosdi wrote:
         | Have a kid anyway, if you otherwise really felt driven to it.
         | Reading the tealeaves in the news is a dumb reason to change
         | decisions like that. There's always some disaster looming,
         | always has been. If you raise them well they'll adapt well to
         | whatever weird future they inherit and be amongst the ones who
         | help others get through it
        
           | 101008 wrote:
           | Thanks for taking the time to answer instead of (just)
           | downvoting. I understand your logic but I don't see a future
           | where people can adapt to this and get through it. I honestly
           | see a future so dark and we'll be there much sooner than we
           | thought... when OpenAI released their first model people were
           | talking about years before seeing real changes and look what
           | happened. The advance is exponential...
        
             | ninininino wrote:
             | > a future where people can adapt to this and get through
             | it
             | 
             | there are people alive today who quite literally are
             | descendants of humans born in WW2 concentration camps. some
             | percentage of those people are probably quite happy and
             | glad they have been given a chance at life. of course, if
             | their ancestors had chosen not to procreate they wouldn't
             | be disappointed, they'd just simply never have come into
             | existence.
             | 
             | but it's absolutely the case that there's almost always a
             | _chance_ at survival and future prosperity, even if things
             | feel unimaginably bleak.
        
         | nice_byte wrote:
         | "It is difficult to get a man to understand something when his
         | salary depends on his not understanding it."
        
       | karaterobot wrote:
       | That first demo video was impressive, but then it ended very
       | abruptly. It made me wonder if the next response was not as good
       | as the prior ones.
        
         | dclowd9901 wrote:
         | Extremely impressive -- hopefully there will be an option to
         | color all responses with a underlying brevity. It seemed like
         | the AI just kept droning on and on.
        
       | MP_1729 wrote:
       | This thing continues to stress my skepticism for AI scaling laws
       | and the broad AI semiconductor capex spending.
       | 
       | 1- OpenAI is still working in GPT-4-level models. More than 14
       | months after the launch of GPT-4 and after more than $10B in
       | capital raised. 2- The rhythm that token prices are collapsing is
       | bizarre. Now a (bit) better model for 50% of the price. How
       | people seriously expect these foundational model companies to
       | make substantial revenue? Token volume needs to double just for
       | revenue to stand still. Since GPT-4 launch, token prices are
       | falling 84% per year!! Good for mankind, but crazy for these
       | companies. 3- Maybe I am an asshole, but where are my agents? I
       | mean, good for the consumer use case. Let's hope the rumors that
       | Apple is deploying ChatGPT with Siri are true, these features
       | will help a lot. But I wanted agents! 4- These drop in costs are
       | good for the environment! No reason to expect them to stop here.
        
         | htrp wrote:
         | Did we ever get confirmation that GPT 4 was a fresh training
         | run vs increasingly complex training on more tokens on the base
         | GPT3 models?
        
           | saliagato wrote:
           | gpt-4 was indeed trained on gpt-3 instruct series (davinci,
           | specifically). gpt-4 was never a newly trained model
        
             | whimsicalism wrote:
             | what are you talking about? you are wrong, for the record
        
               | fooker wrote:
               | They have pretty much admitted that GPT4 is a bunch of
               | 3.5s in a trenchcoat.
        
               | whimsicalism wrote:
               | They have not. You probably read "MoE" and some pop
               | article about what that means without having any clue.
        
               | matsemann wrote:
               | If you know better it would be nice of you to provide the
               | correct information, and not just refute things.
        
               | whimsicalism wrote:
               | gpt-4 is a sparse MoE model with ~1.2T params. this is
               | all public knowledge and immediately precludes the two
               | previous commentators assertions
        
         | ldjkfkdsjnv wrote:
         | Yeah I'm also getting suspicious. Also, all of the models
         | (opus, llama3, gpt4, gemini pro) are converging to similar
         | levels of performance. If it was true that the scaling
         | hypothesis was true, we would see a greater divergence of model
         | performance
        
           | bigyikes wrote:
           | Plot model performance over the last 10 years and show me
           | where the convergence is.
           | 
           | The graph looks like an exponential and is still increasing.
           | 
           | Every exponential is a sigmoid in disguise, but I don't think
           | there has been enough time to say the curve has flattened.
        
             | MP_1729 wrote:
             | Two pushbacks.
             | 
             | 1- The mania only started post Nov 22. And the huge
             | investments since then didn't meant substantial progress
             | since GPT-4 launch in March 22. 2- We are running out of
             | high quality tokens in 2024. (per Epoch AI)
        
               | dwaltrip wrote:
               | GPT-4 launch was barely 1 year ago. Give the investments
               | a few years to pay off.
               | 
               | I've heard multiple reports that training runs costing
               | ~$1 billion are in the the works at the major labs, and
               | that the results will come in the next year or so. Let's
               | see what that brings.
               | 
               | As for the tokens, they will find more quality tokens.
               | It's like oil or other raw resources. There are more
               | sources out there if you keep searching.
        
         | hehdhdjehehegwv wrote:
         | This is why think Meta has been so shrewd in their "open" model
         | approach. I can run Llama3-70B on my local workstation with an
         | A6000, which after the up-front cost of the card, is just my
         | electricity bill.
         | 
         | So despite all the effort and cost that goes into these models,
         | you still have to compete against a "free" offering.
         | 
         | Meta doesn't sell an API, but they can make it harder for
         | everybody else to make money on it.
        
           | kmeisthax wrote:
           | LLaMA still has an "IP hook" - the license for LLaMA forbids
           | usage on applications with large numbers of daily active
           | users, so presumably at that point Facebook can start asking
           | for money to use the model.
           | 
           | Whether or not that's actually enforceable[0], and whether or
           | not other companies will actually challenge Facebook legal
           | over it, is a different question.
           | 
           | [0] AI might not be copyrightable. Under US law, copyright
           | only accrues in _creative_ works. The weights of an AI model
           | are a compressed representation of training data. Compressing
           | something isn 't a creative process so it creates no
           | additional copyright; so the only way one can gain ownership
           | of the model weights is to own the training data that gets
           | put into them. And most if not all AI companies are not
           | making their own training data...
        
             | lolinder wrote:
             | > LLaMA still has an "IP hook" - the license for LLaMA
             | forbids usage on applications with large numbers of daily
             | active users, so presumably at that point Facebook can
             | start asking for money to use the model.
             | 
             | No, the license prohibits usage by Licensees who already
             | had >700m MAUs on the day of Llama 3's release [0]. There's
             | no hook to stop a company from growing into that size using
             | Llama 3 as a base.
             | 
             | [0] https://llama.meta.com/llama3/license/
        
               | Salgat wrote:
               | The whole point is that the license specifically targets
               | their competitors while allowing everyone else so that
               | their model gets a bunch of free contributions from the
               | open source community. They gave a set date so that they
               | knew exactly who the license was going to affect
               | indefinitely. They don't care about future companies
               | because by the time the next generation releases, they
               | can adjust the license again.
        
               | lolinder wrote:
               | Yes, I agree with everything you just said. That also
               | contradicts what OP said:
               | 
               | > LLaMA still has an "IP hook" - the license for LLaMA
               | forbids usage on applications with large numbers of daily
               | active users, so presumably at that point Facebook can
               | start asking for money to use the model.
               | 
               | The license does _not_ forbid usage on applications with
               | large numbers of daily active users. It forbids usage by
               | companies that were operating at a scale to compete with
               | Facebook at the time of the model 's release.
               | 
               | > They don't care about future companies because by the
               | time the next generation releases, they can adjust the
               | license again.
               | 
               | Yes, but I'm skeptical that that's something a regular
               | business needs to worry about. If you use Llama 3/4/5 to
               | get to that scale then you are in a place where you can
               | train your own instead of using Llama 4/5/6. Not a bad
               | deal given that 700 million users per month is completely
               | unachievable for most companies.
        
         | spacebanana7 wrote:
         | Sam Altman gave the impression that foundation models would be
         | a commodity on his appearance in the All in Podcast, at least
         | in my read of what he said.
         | 
         | The revenue will likely come from application layer and
         | platform services. ChatGPT is still much better tuned for
         | conversation than anything else in my subjective experience and
         | I'm paying premium because of that.
         | 
         | Alternatively it could be like search - where between having a
         | slightly better model and getting Apple to make you the
         | default, there's an ad market to be tapped.
        
         | hn_throwaway_99 wrote:
         | I'm ceaselessly amazed at people's capacity for impatience. I
         | mean, when GPT 4 came out, I was like "holy f, this is magic!!"
         | How quickly we get used to that magic and demand more.
         | 
         | Especially since this demo is _extremely_ impressive given the
         | voice capabilities, yet still the reaction is, essentially,
         | "But what about AGI??!!" Seriously, take a breather. Never
         | before in my entire career have I seen technology advance at
         | such a breakneck speed - don't forget transformers were only
         | _invented_ 7 years ago. So yes, there will be some ups and
         | downs, but I couldn 't help but laugh at the thought that "14
         | months" is seen as a long time...
        
           | belter wrote:
           | Chair in the sky again...
        
             | hn_throwaway_99 wrote:
             | Hah, was thinking of that exact bit when I wrote my
             | comment. My version of "chair in the sky" is "But you are
             | talking ... to a computer!!" Like remember stuff that was
             | pure Star Trek fantasy until very recently? I'm sitting
             | here with my mind blown, while at the same time reading
             | comments along the lines of "How lame, I asked it some
             | insanely esoteric question about one of the characters in
             | Dwarf Fortress and it totally got it wrong!!"
        
               | layer8 wrote:
               | The AI doesn't behave like the computer in Star Trek,
               | however. The way in which it is a different thing is what
               | people don't like.
        
               | belter wrote:
               | They should have used superior Klingon Technology...
        
           | bamboozled wrote:
           | You just be new here?
        
           | tsunamifury wrote:
           | It's pretty bizarre how these demos bring out keyboard
           | warriors and cereal bowl yellers like crazy. Huge
           | breakthroughs in natural cadence, tone and interaction as
           | well as realtime mutlimodal and all the people on HN can rant
           | about is token price collapse
           | 
           | It's like the people in this community all suffer from a
           | complete disconnect from society and normal human
           | needs/wants/demands.
        
           | ThrowawayTestr wrote:
           | > How quickly we get used to that magic and demand more.
           | 
           | Humanity in a nutshell.
        
           | seydor wrote:
           | We re just logarithmic creatures
        
             | layer8 wrote:
             | I'd say we are derivative creatures. ;)
        
           | MP_1729 wrote:
           | I am just talking about scaling laws and the level of capex
           | that big tech companies are doing. One hundred billion
           | dollars are being invested this year to pursue AI scaling
           | laws.
           | 
           | You can be excited, as I am, while also being bearish, as I
           | am.
        
             | hn_throwaway_99 wrote:
             | If you look at the history of big technological
             | breakthroughs, there is _always_ an explosion of companies
             | and money invested in the  "new hotness" before things
             | shake out and settle. Usually the vast majority of these
             | companies go bankrupt, but that infrastructure spend sets
             | up the ecosystem for growth going forward. Some examples:
             | 
             | 1. Railroad companies in the second half of the 19th
             | century.
             | 
             | 2. Car companies in the early 20th century.
             | 
             | 3. Telecom companies and investment in the 90s and early
             | 2000s.
        
             | spiderfarmer wrote:
             | Comments like yours contribute to the negative perception
             | of Hacker News as a place where launching anything, no
             | matter how great, innovative, smart, informative, usable,
             | or admirable, is met with unreasonable criticism. Finding
             | an angle to voice your critique doesn't automatically make
             | it insightful.
        
               | MP_1729 wrote:
               | I am sure that people at OpenAI, particularly former YC
               | CEO Sam Altman, will be fine, even if they read the bad
               | stuff MP_1729 says around here.
        
               | candiddevmike wrote:
               | What is unreasonable about that comment?
        
               | barrell wrote:
               | Well, I for one am excited about this update, and
               | skeptical about the AI scaling, and agree with everything
               | said in the top comment.
               | 
               | I saw the update, was a little like "meh," and was
               | relieved to see that some people had the same reaction as
               | me.
               | 
               | OP raised some pretty good points without directly
               | criticizing the update. It's a good balance the the top
               | comments (calling this * _absolutely magic and stunning*_
               | ) and all of Twitter
               | 
               | I wish more feedback on HN was like OPs
        
               | layer8 wrote:
               | It's reasonable criticism, and more useful than all the
               | hype.
        
           | ertgbnm wrote:
           | Over a year they have provided an order of magnitude
           | improvements on latency, context length, and cost, while
           | meaningfully improving performance and adding several input
           | and output modalities.
        
             | asadotzler wrote:
             | Your order of magnitude claim is off by almost an order of
             | magnitude. It's more like half again as good on a couple of
             | items and the same on the rest. 10X improvement claims is a
             | joke people making claims like that ought to be dismissed
             | as jokes too.
        
               | ertgbnm wrote:
               | $30 / million tokens to $5 / million tokens since GPT-4
               | original release = 6X improvement
               | 
               | 4000 token context to 128k token context = 32X
               | improvement
               | 
               | 5.4 second voice mode latency to 320 milliseconds = 16X
               | improvement.
               | 
               | I guess I got a bit excited by including cost but that's
               | close enough to an order of magnitude for me. That's
               | ignoring the fact that's it's now literally free in
               | chatGPT.
        
               | hn_throwaway_99 wrote:
               | Thanks so much for posting this. The increased token
               | length alone (obviously not just with OpenAI's models but
               | the other big ones as well) has opened up a huge number
               | of new use cases that I've seen tons of people and other
               | startups pounce on.
        
             | jononor wrote:
             | All while not addressing the rampant confabulation at all.
             | Which is the main pain point, to me at least. Not being
             | able to trust a single word that it says...
        
           | financypants wrote:
           | There are well talked about cons to shipping so fast, but on
           | the bright side, when everyone is demanding more, more, more,
           | it pushes cost down and demands innovation, right?
        
           | 015a wrote:
           | Peoples' "capacity for impatience" is _literally_ the reason
           | why these things move so quick. These are not feelings at-
           | odds with each other; they 're the same thing. Its magical;
           | now its boring; where's the magic; let's create more magic.
           | 
           | Be impatient. Its a positive feeling, not a negative one. Be
           | disappointed with the current progress; its the biggest thing
           | keeping progress moving forward. It also, if nothing else,
           | helps communicate to OpenAI whether they're moving in the
           | right direction.
        
             | idopmstuff wrote:
             | > Be disappointed with the current progress; its the
             | biggest thing keeping progress moving forward.
             | 
             | No it isn't - excitement for the future is the biggest
             | thing keeping progress moving forward. We didn't go to the
             | moon because people were frustrated by the lack of progress
             | in getting off of our planet, nor did we get electric cars
             | because people were disappointed with ICE vehicles.
             | 
             | Complacency regarding the current state of things can
             | certainly slow or block progress, but impatience isn't what
             | drives forward the things that matter.
        
               | 015a wrote:
               | Tesla's corporate motto is literally "accelerating the
               | world's transition to sustainable energy". Unhappy with
               | the world's previous progress and velocity, they aimed to
               | move faster.
        
           | fnordpiglet wrote:
           | IMO, for fear of being label a hype boy, this is absolutely a
           | sign of the impending singularity. We are taking an ever
           | accelerating frame of cultural reference as a given and our
           | expectation is that exponential improvement is not just here
           | but you're already behind once you've released.
           | 
           | I spend the last two years dismayed with the reaction but
           | I've just recently begun to realize this is a feature not a
           | flaw. This is latent demand for the next iteration expressed
           | as impatient dissatisfaction with the current rate of change
           | inducing a faster rate of change. Welcome to the future you
           | were promised.
        
             | ineedaj0b wrote:
             | I would disagree. I remember iPhones getting similarly
             | criticized on here. And not iPhone 13 to 14, it was iPhone
             | to iPhone 3g!
             | 
             | The only time people weren't displeased was increasing
             | internet speeds 15mb to 100mb.
             | 
             | You will keep being dismayed! People only like good things,
             | not good things that potentially make them obsolete
        
           | laweijfmvo wrote:
           | Sounds like the Jeopardy answer for "What is a novelty?"
        
           | spaceman_2020 wrote:
           | People fume and fret about startups wasting capital like it
           | was their own money.
           | 
           | GPT and all the other chatbots are still absolutely magic.
           | The idea that I can get a computer to create a fully
           | functional app is insane.
           | 
           | Will this app make me millions and run a business? Probably
           | not. Does it do what I want it to do? Mostly yes.
        
         | IanCal wrote:
         | Tbf gpt4 level seems useful and better than almost everything
         | else (or close if not). The more important barriers for use in
         | applications have been cost, throughout and latency. Oh and
         | modalities, which have expanded hugely.
        
         | adtac wrote:
         | >Token volume needs to double just for revenue to stand still
         | 
         | Profits are the real metric. Token volume doesn't need to
         | double for profits to stand still if operational costs go down.
        
         | mrkramer wrote:
         | >This thing continues to stress my skepticism for AI scaling
         | laws and the broad AI semiconductor capex spending.
         | 
         | Imagine you are in 1970s and saying computers suck, they are
         | expensive, there is not that many use cases....fast forward to
         | 90s and you are using Windows 95 with GUI and chip
         | astronomically more powerful that we had in 70s and you can use
         | productivity apps , play video games and surf Internet.
         | 
         | Give AI time, it will fulfill its true protentional sooner or
         | later.
        
           | MP_1729 wrote:
           | That's the opposite of what I am saying.
           | 
           | What I am saying is that computers are SO GOOD that AI is
           | getting VERY CHEAP and the amount of computing capex being
           | done is excessive.
           | 
           | It's more like you are in 1999, people are spending $100B in
           | fiber, while a lot of computer scientists are working in
           | compression, multiplexing, etc.
        
             | jameshart wrote:
             | Which of those investments are you saying would have been a
             | poor choice in 1999?
        
               | MP_1729 wrote:
               | All of them, without exception. Just recently, Sprint
               | sold their fiber business for $1 lmfao. Or WorldCom. Or
               | NetRail, Allied Riser, PSINet, FNSI, Firstmark, Carrier
               | 1, UFO Group, Global Access, Aleron Broadband, Verio...
               | 
               | All fiber went bust because despite internet's huge
               | increase in traffic, the amount of packets per fiber
               | increased a handful of magnitudes.
        
               | jameshart wrote:
               | But you're saying investing in multiplexing and
               | compression was also dumb?
        
               | MP_1729 wrote:
               | Nope, I'm not
        
               | jameshart wrote:
               | Then your overarching thesis is not very clear. Is it
               | simply 'don't invest in hardware capital, software always
               | makes it worthless'?
        
             | mrkramer wrote:
             | >It's more like you are in 1999, people are spending $100B
             | in fiber, while a lot of computer scientists are working in
             | compression, multiplexing, etc.
             | 
             | But nobody knows what's around the corner and what future
             | brings....for example back in day Excite didn't want to buy
             | Google for $1m because they thought that's a lot of money.
             | You need to spend money to make money and yea, you need to
             | spend sometimes a lot of money on "crazy" projects because
             | it can pay off big time.
        
               | MP_1729 wrote:
               | Was there ever a time when betting that computer
               | scientists would not make better algorithms was a good
               | idea?
        
         | madeofpalk wrote:
         | > Token volume needs to double just for revenue to stand still
         | 
         | I'm pretty skeptical about all the whole LLM/AI hype, but I
         | also believe that the market is still relatively untapped. I'm
         | sure Apple switching Siri to an LLM would ~double token usage.
         | 
         | A few products rushed out thin wrappers ontop of chatgpt ai,
         | developing pretty uninspiring chat bots of limited use. I think
         | there's still huge potential for this LLM technology to be
         | 'just' an implementation detail of other features, just running
         | in the background doing its thing.
         | 
         | That said, I don't think OpenAI has much of a moat here. They
         | were first, but there's plenty of others with closed or open
         | models.
        
         | drag0s wrote:
         | what do you actually expect from an "agent"?
        
           | MP_1729 wrote:
           | Ask stuff like "Check whether there's some correlation
           | between the major economies fiscal primary deficit and GDP
           | growth in the post-pandemic era" and get an answer.
        
         | Pr0ject217 wrote:
         | "OpenAI is still working in GPT-4-level models."
         | 
         | This may or may not be true - just because we haven't seen GPT-
         | level-5 capabilities, does not mean that it does not yet exist.
         | It is highly unlikely that what they ship is actually the full
         | capability of what they have access to.
        
           | MP_1729 wrote:
           | they literally launched TODAY a GPT-4 model!
        
         | bionhoward wrote:
         | imho gpt4 is definitely [proto-]agi and the reason i cancelled
         | my openai sub and am sad to miss out on talking to gpt4o is,
         | openai thinks it's illegal, harmful, or abusive to use their
         | model output to develop models that compete with openai. which
         | means if you use openai then whatever comes out of it is toxic
         | waste due to an arguably illegal smidgen of legal bullshit.
         | 
         | for another adjacent example, every piece of code github
         | copilot ever wrote, for example, is microsoft ai output, which
         | you "can't use to develop / otherwise improve ai," some
         | nonsense like that.
         | 
         | the sum total of these various prohibitions is a data
         | provenance nightmare of extreme proportion we cannot afford to
         | ignore because you could say something to an AI and they parrot
         | it right back to you and suddenly the megacorporation can say
         | that's AI output you can't use in competition with them, and
         | they do everything, so what can you do?
         | 
         | answer: cancel your openai sub and shred everything you ever
         | got from them, even if it was awesome or revolutionary, that's
         | the truth here, you don't want their stuff and you don't want
         | them to have your stuff. think about the multi-decade economics
         | of it all and realize "customer noncompete" is never gonna be
         | OK in the long run (highway to corpo hell imho)
        
         | fnordpiglet wrote:
         | Where I work in the hoary fringes of high end tech we can't
         | secure enough token processing for our use cases. Token price
         | decreases means opening of capacity but we immediately hit the
         | boundaries of what we can acquire. We can't keep up with the
         | use cases - but more than that we can't develop tooling to
         | harness things fast enough and the tooling we are creating is a
         | quick hack. I don't fear for the revenue of base model
         | providers. But I think in the end the person selling the tools
         | makes the most and in this case I think it continue to be cloud
         | providers. I think in a very real way OpenAI and Anthropic are
         | commercialized charities driving change and commoditizing
         | rapidly their own products and it'll be infrastructure
         | providers who win the high end model game. I don't think this
         | is a problem I think this is in fact inline with their original
         | charters but a different path than most people view nonprofit
         | work. A much more capitalist and accelerated take.
         | 
         | Where they might make future businesses is in the tooling. My
         | understanding from friends within these companies is their
         | tooling is remarkably advanced vs generally available tech. But
         | base models aren't the future of revenues (to be clear tho they
         | make considerable revenue today but at some point their
         | efficiency will cannibalize demand and the residual business
         | will be tools)
        
           | MP_1729 wrote:
           | I'm curious now. Can you give color on what you're doing that
           | you keep hitting boundaries? I suppose it isn't limited by
           | human-attention.
        
         | w10-1 wrote:
         | > Since GPT-4 launch, token prices are falling 84% per year!!
         | Good for mankind, but crazy for these companies
         | 
         | The message to competitor investors is that they will not make
         | their money back.
         | 
         | OpenAI has the lead, in market and mindshare; it just has to
         | keep it.
         | 
         | Competitors should realize they're better served by working
         | with OpenAI than by trying to replace it - Hence the Apple
         | deal.
         | 
         | Soon model construction itself will not be about public
         | architectures or access to CPU's, but a kind of proprietary
         | black magic. No one will pay for upstart 97% when they can get
         | reliable 98% at the same price, so OpenAI's position will be
         | secure.
        
         | abrichr wrote:
         | > where are my agents?
         | 
         | https://github.com/OpenAdaptAI/OpenAdapt/
        
         | golol wrote:
         | GPT-2: February 2019
         | 
         | GPT-3: June 2020
         | 
         | GPT-3.5: November 2022
         | 
         | GPT-4: March 2023
         | 
         | There were 3 years between GPT-3 and GPT-4!
        
           | whimsicalism wrote:
           | hardly anybody you are talking to even knows what gpt3 is,
           | the time between 3.5 and 4 is what is relevant
        
             | golol wrote:
             | It doesn't make any sense to look at it that way.
             | Apparently the GPT base model finised training in like late
             | summer 2022, which is before the release of GPT-3.5. I am
             | pretty sure that GPT-3.5 should be thought of as
             | GPT-4-lite, in the sense that it uses techniques and
             | compute of the GPT-4 era rather than the GPT-3 era. The
             | advancement from GPT-3 to GPT-4 is what counts and it took
             | 3 years.
        
               | whimsicalism wrote:
               | I fully don't agree.
               | 
               | > I am pretty sure that GPT-3.5 should be thought of as
               | GPT-4-lite, in the sense that it uses techniques and
               | compute of the GPT-4 era rather than the GPT-3 era
               | 
               | Compute of the "GPT-3 era" vs the "GPT-3.5 era" is
               | identical, this is not a distinguishing factor. The
               | architecture is also roughly identical, both are dense
               | transformers. The _only_ significant difference between
               | 3.5 and 3 is the size of the model and whether it uses
               | RLHF.
        
               | golol wrote:
               | Yes you're right about the compute. Let me try to make my
               | point differnetly: GPT-3 and GPT-4 were models which when
               | they were released represented the best that OpenAI could
               | do, while GPT-3.5 was an intentionally smaller (than they
               | could train) model. I'm seeing it as GPT-3.5 = GPT-4-70b.
               | So to estimate when the next "best we can do" model might
               | be released we should look at the difference between the
               | release of GPT-3 and GPT-4, not GPT-4-70b and GPT-4.
               | That's my understanding, dunno.
        
               | whimsicalism wrote:
               | GPT-4 only started training roughly at the same
               | time/after the release of GPT-3.5, so I'm not sure where
               | you're getting the "intentionally smaller".
        
               | golol wrote:
               | Ah I misremembered GPT-3.5 as being released around the
               | time of ChatGPT.
        
               | whimsicalism wrote:
               | oh you remembered correctly, those are the same thing
               | 
               | actually i was wrong about when gpt-4 started training,
               | the time i gave was roughly when they finished
        
           | MP_1729 wrote:
           | Obviously, I know these timetables.
           | 
           | But there's a light and day difference post-Nov22 than
           | before. Both in the AI race it sparkled, but also in the
           | funding all AI labs have.
           | 
           | If you're expecting GPT-5 by 2026, that's ok. Just very weird
           | to me.
        
         | ugh123 wrote:
         | >How people seriously expect these foundational model companies
         | to make substantial revenue?
         | 
         | My take on this common question is that we haven't even begun
         | to realize the immense scale of which we will need AI in all
         | sorts of products, from consumer to enterprise. We will look
         | back on the cost of tokens now (even at 50% of price a year or
         | so ago) and look at it with the same bewilderment of "having a
         | computer in your pocket" compared to mainframes from 50 years
         | ago.
         | 
         | For AI to be truly useful at the consumer level, we'll need
         | specialized mobile hardware that operates on a far greater
         | scale of tokens and speed than anything we're seeing/trying
         | now.
         | 
         | Think "always-on AI" rather than "on-demand".
        
         | siscia wrote:
         | Now a bit of Shameless plug, but of you need an AI to take over
         | your emails then my https://getgabrielai.com should cover most
         | use cases.
         | 
         | * Summarisation * Smart filtering * Smart automatic drafting of
         | replies
         | 
         | Very much in beta, and summarisation is still behind feature
         | flag, but feel free to give it a try.
         | 
         | For summarisation here I mean to get one email with all your
         | unread emails summarised.
        
       | joshstrange wrote:
       | Looking forward to trying this via ChatGPT. As always OpenAI says
       | "now available" but refreshing or logging in/out of ChatGPT (web
       | and mobile) don't cause GPT-4o to show up. I don't know why I
       | find this so frustrating. Probably because they don't say
       | "rolling out" they say things like "try it now" but I can't even
       | though I'm a paying customer. Oh well...
        
         | glenstein wrote:
         | I think it's a legitimate point. For my personal use case, what
         | are the most helpful things about these HN threads is comparing
         | with others to see how soon I can expect it to be available for
         | me. Like you, I currently don't have access, but I understand
         | that it's supposed to become increasingly available throughout
         | the day.
         | 
         | That is the text-based version. The full multimodal version I
         | understand to be rolling out in the coming weeks.
        
       | candiodari wrote:
       | I wonder if the audio stuff works like ViTS. Do they just encode
       | the audio as tokens and input the whole thing? Wouldn't that make
       | the context size a lot smaller?
       | 
       | One does notice that context size is noticeably absent from the
       | announcement ...
        
       | cs702 wrote:
       | The usual critics will quickly point out that LLMs like GPT-4o
       | still have a lot of failure modes and suffer from issues that
       | remain unresolved. They will point out that we're reaping
       | diminishing returns from Transformers. They will question the
       | absence of a "GPT-5" model. And so on -- blah, blah, blah,
       | stochastic parrots, blah, blah, blah.
       | 
       | Ignore the critics. Watch the demos. Play with it.
       | 
       | This stuff feels _magical_. Magical. It makes the movie  "Her"
       | look like it's no longer in the realm of science fiction but in
       | the realm of incremental product development. HAL's unemotional
       | monotone in Kubrick's movie, "Space Odyssey," feels... oddly
       | primitive by comparison. I'm impressed at how well this works.
       | 
       |  _Well-deserved congratulations to everyone at OpenAI!_
        
         | CamperBob2 wrote:
         | Imagine what an unfettered model would be like. 'Ex Machina'
         | would no longer be a software-engineering problem, but just
         | another exercise in mechanical and electrical engineering.
         | 
         | The future is indeed here... and it is, indeed, not equitably
         | distributed.
        
           | aftbit wrote:
           | Or from Zones of Thought series, Applied Theology, the study
           | of communication with and creation of superhuman
           | intelligences that might as well be gods.
        
         | aftbit wrote:
         | >Who cares? This stuff feels magical. Magical!
         | 
         | On one hand, I agree - we shouldn't diminish the very real
         | capabilities of these models with tech skepticism. On the other
         | hand, I disagree - I believe this approach is unlikely to lead
         | to human-level AGI.
         | 
         | Like so many things, the truth probably lies somewhere between
         | the skeptical naysayers and the breathless fanboys.
        
           | CamperBob2 wrote:
           | _On the other hand, I disagree - I believe this approach is
           | unlikely to lead to human-level AGI._
           | 
           | You might not be fooled by a conversation with an agent like
           | the one in the promo video, but you'd probably agree that
           | somewhere around 80% of people could be. At what percentage
           | would you say that it's good enough to be "human-level?"
        
             | layer8 wrote:
             | > You might not be fooled by a conversation with an agent
             | like the one in the promo video, but you'd probably agree
             | that somewhere around 80% of people could be.
             | 
             | I think people will quickly learn with enough exposure, and
             | then that percentage will go down.
        
               | MVissers wrote:
               | Nah- These models will improve faster than people can
               | catch up. People or AI models can barely catch AI-created
               | text. It's quickly becoming impossible to distinguish.
               | 
               | The one you catch is the tip of the iceberg.
               | 
               | Same will happen to speech. Might take a few years, but
               | it'll be indistinguishable in a max a few years. Due to
               | compute increase + model improvement, both improving
               | exponentially.
        
               | krainboltgreene wrote:
               | > These models will improve faster than people can catch
               | up.
               | 
               | So that we're all clear the basis for this analysis is
               | purely made up, yes?
        
               | paulryanrogers wrote:
               | How can we be so sure things will keep getting better?
               | And at a rate faster than humans can adapt?
               | 
               | If we have to damn rivers and build new coal plants to
               | power these AI data centers, then it may be one step
               | forward and two steps back.
        
               | pixl97 wrote:
               | No, instead something worse will happen.
               | 
               | Well spoken and well mannered speakers will be called
               | bots. The comment threads under posts will be hurtling
               | insults back and forth on who's actually real. Half the
               | comments will actually be bots doing it. Welcome to the
               | dead internet.
        
               | jfyi wrote:
               | Right! This is absolutely apocalyptic! If more than half
               | the people I argue with on internet forums are just bots
               | that don't feel the sting and fail to sleep at night
               | because of it, what even is the meaning of anything?
               | 
               | We need to stop these hateful ai companies before they
               | ruin society as a whole!
               | 
               | Seriously though... the internet is dead already, and
               | it's not coming back to what it was. We ruined it, not
               | ai.
        
             | thfuran wrote:
             | The framing of the question admits only one reasonable
             | answer: There is no such threshold. Fooling people into
             | believing something doesn't make it so.
        
               | CamperBob2 wrote:
               | What criteria do you suggest, then?
               | 
               | As has been suggested, the models will get better at a
               | faster rate than humans will get smarter.
        
               | pixl97 wrote:
               | Most peoples interactions are transactional. When I call
               | into a company and talk to an agent, and that agent
               | solves the problem I have regardless of if the agent is a
               | person or an AI, where did the fooling occur? The ability
               | to problem solve based on context is intelligence.
        
             | Vegenoid wrote:
             | When people talk about human-level AGI, they are not
             | referring to an AI that could pass as a human to most
             | people - that is, they're not simply referring to a program
             | that can pass the Turing test.
             | 
             | They are referring to an AI that can use reasoning,
             | deduction, logic, and abstraction like the smartest humans
             | can, to discover, prove, and create novel things in every
             | realm that humans can: math, physics, chemistry, biology,
             | engineering, art, sociology, etc.
        
           | micromacrofoot wrote:
           | I'm not so sure, I think this is what's called "emergent
           | behavior" -- we've found very interesting side effects of
           | bringing together technologies. This might ultimately teach
           | us more about intelligence than more reductionist approaches
           | like scanning and mapping the brain.
        
             | dongping wrote:
             | On the other hand, it is very difficult to distinguish
             | between "emergent behavior" and "somehow leaked into our
             | large training set" for LLMs.
        
         | layer8 wrote:
         | > HAL's unemotional monotone in Kubrick's movie, "Space
         | Odyssey," feels... primitive by comparison.
         | 
         | I'd strongly prefer that though, along with HAL's reasoning
         | abilities.
        
           | moffkalast wrote:
           | I would say a machine that thinks it feels emotions is less
           | likely to throw you out of a spaceship. Human empathy already
           | feels lacking compared to what something as basic as llama-3
           | can do.
        
             | layer8 wrote:
             | What you say has nothing to do with how an AI speaks.
             | 
             | To use another pop-culture reference, Obi-Wan in Episode IV
             | had deep empathy, but didn't speak emotionally. Those are
             | separate things.
        
             | thfuran wrote:
             | >I would say a machine that thinks it feels emotions is
             | less likely to throw you out of a spaceship
             | 
             | A lot of terrible human behavior is driven by emotions. An
             | emotionless machine will never dump you out the airlock in
             | a fit of rage.
        
               | pixl97 wrote:
               | Ah, I was tossed out of the airlock in a fit of logic...
               | totally different!
        
               | throwup238 wrote:
               | The important part is that the machine explained its
               | reasoning to you while purging the airlock.
        
               | dimask wrote:
               | In a chain of thought manner, as every proper AI, of
               | course.
        
             | satvikpendem wrote:
             | > _I would say a machine that thinks it feels emotions is
             | less likely to throw you out of a spaceship._
             | 
             | Have you seen the final scene of the movie Ex Machina?
             | Without spoilers, I'll just say that acting like has
             | emotions is much more different than actually having them.
             | This is in fact what socio- and psychopaths are like, with
             | stereotypical results.
        
             | elicksaur wrote:
             | llama-3 can't feel empathy, so this is rather confusing
             | comment.
        
               | moffkalast wrote:
               | Can you prove that you feel empathy? That you're not a
               | cold unfeeling psychopath that is merely pretending
               | extremely well to have emotions? Even if it did, we
               | wouldn't be able to tell the difference from the outside,
               | so in strictly practical terms I don't think it matters.
        
               | elicksaur wrote:
               | If I could prove that I feel empathy through a HN
               | comment, I would be much more famous.
               | 
               | I get your nuanced point, that "thinking" one feels
               | empathy is enough to be bound by the norms of behavior
               | that empathy would dictate, but I don't see why that
               | would make AI "empathy" superior to human "empathy".
               | 
               | The immediate future I see is a chatbot that is
               | superficially extremely empathetic, but programmed never
               | to go against the owner's interest. Where before, when
               | interacting with a human, empathy could cause them to
               | make an exception and act sacrificially in a crisis case,
               | this chatbot would never be able to make such an
               | exception because the empathy it displays is transparent.
        
           | jll29 wrote:
           | HAL has to sound exactly how Kubrick made it sound for the
           | movie to work the way it should.
           | 
           | There wasn't any incentive to make it sound artificially
           | emotional or emphatic beyond a "Sorry, Dave".
        
         | dragonwriter wrote:
         | > This stuff feels magical. Magical.
         | 
         | Because its capacities are focused on exactly the right place
         | to feel magical. Which isn't to say that there isn't real
         | utility, but language (written, and even moreso spoken) has an
         | enormous emotional resonance for humans, so this is laser-
         | targeted in an area where every advance is going to "feel
         | magical" whether or not it moves the needle much on practical
         | utility; it's not unlike the effect of TV news making you feel
         | informed, even though time spent watching it negatively
         | correlates with understanding of current events.
        
           | BoorishBears wrote:
           | You really think OpenAI has researchers figuring out how to
           | drive emergent capabilities based on what markets well?
           | 
           | Edit: Apparently not based on your clarification, instead the
           | researchers don't know any better than to march into a local
           | maxima because they're only human and seek to replicate
           | themselves. I assumed too much good faith.
        
             | dragonwriter wrote:
             | I don't think the _intent_ matters, the _effect_ of its
             | capacities being centered where they are is that they
             | trigger certain human biases.
             | 
             | (Arguably, it is the other way around: they aren't _focused
             | on appealing to_ those biases, but _driven by them_ , in
             | the that the perception of language modeling as a road to
             | real general reasoning is a manifestation of the same bias
             | which makes language capacity be perceived as magical.)
        
               | BoorishBears wrote:
               | Intent matters when you're being as dismissive as you
               | were.
               | 
               | Not to mention your comment doesn't track at all with the
               | most basic findings they've shared: that adding new
               | modalities increases performance across the board.
               | 
               | They shared that with GPT-4 vs GPT-4V, and the fact this
               | is a faster model than GPT-4V while rivaling it's
               | performance seems like further confirmation of the fact.
               | 
               | -
               | 
               | It seems like you're assigning emotional biases of your
               | own to pretty straightforward science.
        
               | ToucanLoucan wrote:
               | > Intent matters when you're being as dismissive as you
               | were.
               | 
               | The GP comment we're all replying to outlines a non-
               | exhaustive list of _very good reasons_ to be highly
               | dismissive of LLM. (No I 'm not calling it AI, it is not
               | fucking AI)
               | 
               | It is utterly laughable and infuriating that you're
               | assigning legitimate skepticism about this technology as
               | a an emotional bias. Fucking ridiculous. We're now almost
               | a full year into the full bore open hype cycle of LLM.
               | Where's all the LLM products? Where's the market
               | penetration? Business can't use it because it has a nasty
               | tendency to make shit up when it's talking. Various
               | companies and individuals are being sued because
               | generative art is stealing from artists. Code generators
               | are hitting walls of usability so steep, you're better
               | off just writing the damn code yourself.
               | 
               | We keep hearing this "it will do!" "it's coming!" "just
               | think of what it can do soon!" on and on and on, and it
               | just keeps... not doing any of it. It keeps hallucinating
               | untrue facts, it keeps getting basics of it's tasks
               | wrong, for fucks sake AI Dungeon can't even remember if
               | I'm in Hyrule or Night City. Progress seems fewer and
               | farther between, with most advances being just getting
               | the compute cost down, because NO business currently
               | using LLM extensively could be profitable without
               | generous donation of compute from large corporations like
               | Microsoft.
        
               | imwillofficial wrote:
               | I mean when you're making a point about how your views
               | should not be taken as emotional bias, it pays to not be
               | overly emotional.
               | 
               | The fact that you don't see utility doesn't mean it is
               | not helpful to others.
               | 
               | A recent example, I used Grok to write me an outline of a
               | paper regarding military and civilian emergency response
               | as part of a refresher class.
               | 
               | To test it out we fed it scenario questions and saw how
               | it compared to our classmates responses. All people with
               | decades of emergency management experience.
               | 
               | The results were shocking. It was able to successfully
               | navigate a large scale emergency management problem and
               | get it (mostly) right.
               | 
               | I could see a not so distant future where we become QA
               | checkers for our AI overlords.
        
               | BoorishBears wrote:
               | I didn't see any good reasons to be dismissive of LLMs, I
               | saw a weak attempt at implying we're at a local maxima
               | because scientists don't know better than to chase after
               | what seems magical or special to them due to their bias
               | as humans.
               | 
               | It's not an especially insightful or sound argument imo,
               | and neither are random complaints about capabilities of
               | systems millions of people use daily despite your own
               | claims.
               | 
               | And for the record:
               | 
               | > because NO business currently using LLM extensively
               | could be profitable without generous donation of compute
               | from large corporations like Microsoft
               | 
               | OpenAI isn't the only provider of LLMs. Plenty of
               | businesses are using providers that provide their
               | services profitably, and I'm not convinced that OpenAI
               | themselves are subsidising these capabilities as strongly
               | as they once did.
        
               | throwthrowuknow wrote:
               | All that spilled ink don't change the fact that I use it
               | every day and it makes everything faster and easier and
               | more enjoyable. I'm absolutely chuffed to put my phone on
               | a stand so GPT4o can see the page I'm writing on and chat
               | with me about my notes or the book I'm reading and the
               | occasional doodle. One of the first things I'll try out
               | is to see if it can give feedback and tips on sketching,
               | since it can generate images with a lot better control of
               | the subject it might even be able to demonstrate various
               | techniques I could employ!
        
               | fzeroracer wrote:
               | As it turns out, people will gleefully welcome Big
               | Brother with open arms as long as it speaks with a
               | vaguely nice tone and compliments the stuff it can see.
        
               | dosinga wrote:
               | It's almost a year since this James Watt came out with
               | his steam engine and yet we are still using horses.
        
               | ToucanLoucan wrote:
               | A year is an _eternity_ in tech and you bloody well know
               | it. A year into an $80 billion dollar valued company 's
               | prime hype cycle, and we have... chatbots, but fancier?
               | This is completely detached from sanity.
        
             | hbn wrote:
             | That's not what the GP said at all. It was just an
             | explanation for why this demo feels so incredible.
        
               | BoorishBears wrote:
               | GP's follow up is literally
               | 
               | >they aren't focused on appealing to those biases, but
               | driven by them, in the that the perception of language
               | modeling...
               | 
               | So yes in effect that is their point, except they find
               | the scientists are actually compelled by what markets
               | well, rather than intentionally going after what markets
               | well... which is frankly even less flattering. Like
               | researchers who enabled this just didn't know better than
               | to be seduced by some underlying human bias into a local
               | maxima.
        
               | frompom wrote:
               | I think that's still just an explanation of biases that
               | go into development direction. I don't view that as a
               | criticism but an observation. We use LLMs in our
               | products, and I use them daily and I'm not sure how
               | that's that negative.
               | 
               | We all have biases in how we determine intelligence,
               | capability, and accuracy. Our biases color our trust and
               | ability to retain information. There's a wealth of
               | research around it. We're all susceptible to these
               | biases. Being a researcher doesn't exclude you from the
               | experience of being human.
               | 
               | Our biases influence how we measure things, which in turn
               | influences how things behave. I don't see why you're so
               | upset by that pretty obvious observation.
        
               | BoorishBears wrote:
               | The full comment is right there, we don't need to seance
               | what the rest of it was or remix it.
               | 
               | > Arguably, it is the other way around: they aren't
               | focused on appealing to those biases, but driven by them,
               | in the that the perception of language modeling as a road
               | to real general reasoning is a manifestation of the same
               | bias which makes language capacity be perceived as
               | magical
               | 
               | There's no charitable reading of this that doesn't give
               | the researcher's way too little credit given the results
               | of the direction they've chosen.
               | 
               | This has nothing to do with biases and emotion, I'm not
               | sure why some people need it to be: modalities have
               | progressed in order of how easy they are to wrangle data
               | on: text => image => audio => video.
               | 
               | We've seen that training on more tokens improves
               | performance, we've seen that training on new modalities
               | improves performance on the prior modalities.
               | 
               | It's so needlessly dismissive to act like you have this
               | mystical insight into a grave error these people are
               | making, and they're just seeking to replicate human
               | language out of folly, when you're ignoring table stakes
               | for their underlying works to start with.
        
               | dragonwriter wrote:
               | Note that there is only one thing about the research that
               | I have said is arguably influenced by the bias in
               | question, "the perception of language modeling as a road
               | to real general reasoning". Not the order of progression
               | through modalities. Not the perception that language,
               | image, audio, or video are useful domains.
        
           | aantix wrote:
           | Louis CK - Everything is amazing & nobody is happy
           | 
           | https://www.youtube.com/watch?v=kBLkX2VaQs4
        
             | coldtea wrote:
             | Perhaps everybody is right, and what is amazing is not what
             | matters, and what matters is hardly amazing...
        
               | throwaway_62022 wrote:
               | As John Stewart says in
               | https://www.youtube.com/watch?v=20TAkcy3aBY - "How about
               | I hold the fort on making peanut butter sandwiches,
               | because that is something I can do. How about we let AI
               | solve this world climate problem".
               | 
               | Yet to see a true "killer" feature of AI, that isn't
               | doing a job badly which humans can already do badly.
        
               | roudaki wrote:
               | the point of all of this is: this is alpha 0.45 made to
               | get the money needed to build AGI whatever that is
        
               | andrewmutz wrote:
               | Or perhaps the news media has been increasingly effective
               | at convincing us the world is terrible. Perceptions have
               | become measurably detached from reality:
               | 
               | https://www.ft.com/content/af78f86d-13d2-429d-ad55-a11947
               | 989...
        
               | trimethylpurine wrote:
               | If we're convinced that it's terrible then we're behaving
               | like it's terrible, which _is_ terrible.
        
           | agumonkey wrote:
           | I didn't use it as a textual interface, but as a
           | relational/nondirectional system, trying to ask it to inverse
           | recursive relationships (first/follow sets for BNF grammars).
           | The fact that it could manage to give partially correct
           | answers on such an abstract problem was "coldly" surprising.
        
           | DarkNova6 wrote:
           | VC loves it.
           | 
           | Another step closer for those 7 trillion that OpenAI is so
           | desperate for.
        
           | ChuckMcM wrote:
           | Kind of this. That was one of the themes of the movie
           | Westworld where the AI in the robots seemed magical until it
           | was creepy.
           | 
           | I worry about the 'cheery intern' response becoming something
           | of a punch line.
           | 
           | "Hey siri, launch the nuclear missiles to end the world."
           | 
           | "That's a GREAT idea, I'll get right on that! Is there
           | anything else I can help you with?"
           | 
           | Kind of punch lines.
           | 
           | Will be interesting to see where that goes once you've got a
           | good handle on capturing the part of speech that isn't
           | "words" so much as it is inflection and delivery. I am
           | interested in a speech model that can differentiate between
           | "I would hate to have something happen to this store." as a
           | compliment coming from a customer and as a threat coming from
           | an extortionist.
        
             | tsunamifury wrote:
             | Positivity even to the point of toxicity will be the
             | default launch tone for anything... to avoid getting scary.
        
               | rrr_oh_man wrote:
               | Tell that to German customers
               | 
               | (Classic:
               | https://www.counterpunch.org/2011/08/26/germany-chokes-
               | on-wa...)
        
               | throwaway11460 wrote:
               | Yeah people around me here in Central Europe are very
               | sick of that already. Everybody is complaining about it
               | and the first thing they say to the bot is to cut it out,
               | stop apologizing, stop explaining and get to the point as
               | concisely as possible. Me too.
        
               | hnburnsy wrote:
               | I have do that now with every AI over explaining or
               | providing loosely related info I did not ask for. I hope
               | there is a verbosity level = minimum.
               | 
               | Even in the demo today, they kept cutting it off.
        
             | smugma wrote:
             | One of the demos has the voice respond to everything
             | sarcastically. If it can sound sarcastic it's not a stretch
             | to believe it can "hear" sarcasm.
        
             | indigoabstract wrote:
             | It's probably just me, but the somewhat forced laughs &
             | smiles from the people talking to it make me feel uneasy.
             | 
             | But enough of that. The future looks bright. Everyone
             | smile!
             | 
             | Or else..
        
             | Dig1t wrote:
             | This is basically just the ship computer from Hitchhikers
             | Guide to the Galaxy.
             | 
             | "Guys, I am just pleased as punch to inform you that there
             | are two thermo-nuclear missiles headed this way... if you
             | don't mind, I'm gonna go ahead and take evasive action."
        
               | throwup238 wrote:
               | ChatGPT is now powered by Genuine People Personality(tm)
               | and OpenAI is turning into the Sirius Cybernetics
               | Corporation (who according to the HHGTTG were _" a bunch
               | of mindless jerks who were the first against the wall
               | when the revolution came"_)
               | 
               | The jokes write themselves.
        
               | gnicholas wrote:
               | I did wonder if there's a less verbose mode. I hope
               | that's not a paywalled feature. Honestly it's possible
               | that they use the friendliness to help buy the LLM time
               | before it has to substantively respond to the user.
        
           | cs702 wrote:
           | Yes, the announcement explicitly states that much of the
           | effort for this release was focused on things that make it
           | feel magical (response times, multiple domains, etc.), not on
           | moving the needle on quantifiable practical performance. For
           | future releases, the clever folks at OpenAI are surely
           | focused on improving performance on challenging tasks that
           | practical utility -- while maintaining the "magical feeling."
        
             | elpakal wrote:
             | Where does it explicitly say this?
        
               | cs702 wrote:
               | _Explicit [?] literal._
               | 
               | The things they mention/demo -- response times, multiple
               | domains, inflection and tone, etc. -- are those that make
               | it feel "magical."
        
               | elpakal wrote:
               | > explicitly states that much of the effort for this
               | release was focused on things that make it feel magical
               | (response times, multiple domains, etc.), not on moving
               | the needle on quantifiable practical performance.
               | 
               | Hmm, did you mean implicitly? I've yet to see where they
               | say anything to the likes of not "moving the needle on
               | quantifiable practical performance."
        
           | benreesman wrote:
           | It's not an either-or: the stuff feels magical because it
           | _both_ represents dramatic revelation of capability _and_
           | because it is heavily optimized to make humans engage in
           | magical thinking.
           | 
           | These things are amazing compared to old-school NLP: the
           | step-change in capability is real.
           | 
           | But we should also keep our wits about us, they are well-Des
           | robed by current or conjectural mathematics, they fail at
           | things dolphins can do, it's not some AI god and it's not
           | self-improving.
           | 
           | Let's have balance on both the magic of the experience and
           | getting past the tech demo stage: every magic trick has a
           | pledge, but I think we're still working on the prestige.
        
           | porphyra wrote:
           | Pretty interesting how it turns out that --- contrary to
           | science fiction movies --- talking naturally and modelling
           | language is much easier and was achieved much sooner than
           | solving complex problems or whatever it is that robots in
           | science fiction movies do.
        
         | agumonkey wrote:
         | That's what openai managed to catch. The large enough sense of
         | wonder. You could feel it as people spread the news but not as
         | the usual fad.. there was a soft silence to it, people focused
         | deeply poking at it because it was a new interface.
        
         | barrell wrote:
         | Did you use any of the GPT voice features before? I'm curious
         | whether this reaction is to the modality or the model.
         | 
         | Don't get me wrong, excited about this update, but I'm
         | struggling to see what is so magical about it. Then again, I've
         | been using GPT voice every day for months, so if you're just
         | blown away from talking to a computer then I get it
        
           | og_kalu wrote:
           | Speech is a lot more than just the words being conveyed.
           | 
           | Tone, Emphasis, Speed, Accent are all very important parts of
           | how humans communicate verbally.
           | 
           | Before today, voice mode was strictly your audio>text then
           | text>audio. All that information destroyed.
           | 
           | Now the same model takes in audio tokens and spits back out
           | audio tokens directly.
           | 
           | Watch this demo, it's the best example of the kind of thing
           | that would be flat out impossible with the previous setup.
           | 
           | https://www.youtube.com/live/DQacCB9tDaw?si=2LzQwlS8FHfot7Jy
        
             | scarface_74 wrote:
             | The ability to have an interactive voice conversation has
             | been available for the iOS app for the longest.
        
               | og_kalu wrote:
               | Right but this works differently.
        
               | kaibee wrote:
               | Kinda stretching the definition of interactive there.
        
               | scarface_74 wrote:
               | How so? You don't have to press the mic button after
               | every sentence. You press the headphone button and speak
               | like you normally would and it speaks back once you stop
               | talking.
               | 
               | How much more "interactive" could it be?
        
           | mlsu wrote:
           | The voice modality plays a huge role in how impressive it
           | seems.
           | 
           | When GPT-2/3/3.5/4 came out, it was fairly easy to see the
           | progression from reading model outputs that it was just
           | getting better and better at text. Which was pretty amazing
           | but in a very intellectual way, since reading is typically a
           | very "intellectual" "front-brain" type of activity.
           | 
           | But this voice stuff really does make it much more emotional.
           | I don't know about you, but the first time I used GPT's voice
           | mode I notice that I felt _something_ -- very un-
           | intellectually, very un-cerebral -- like, the _feeling_ that
           | there is a spirit embodying the computer. Of course with LLM
           | 's there always is a spirit embodying the computer (or, there
           | never is, depending on your philosophical beliefs).
           | 
           | The Suno demos that popped up recently should have clued us
           | all in that this kind of emotional range was possible with
           | these models. This announcement is not so much a step
           | function in model _capabilities_ , but it is a step function
           | in HCI. People are just not used to their interactions with a
           | computer be emotional like this. I'm excited and concerned in
           | equal parts that many people won't be truly prepared for what
           | is coming. It's on the horizon, having an AI companion, that
           | really truly makes you feel things.
           | 
           | Us nerds who habitually read text have had that since roughly
           | GPT-3, but now the door has been blown open.
        
           | rrrrrrrrrrrryan wrote:
           | Yeah the product itself is only incrementally better (lower
           | latency responses + can look at a camera feed, both great
           | improvements but nothing mindblowing or "magical"), but I
           | think the big difference is that this thing is available for
           | free users now.
        
         | grantsucceeded wrote:
         | Magical?
         | 
         | the interruptiopn part is just flow control at the edge.
         | control-s, control-c stuff, right? not AI?
         | 
         | The sound of a female voice to an audience 85% composed of
         | males between the ages of 14 and 55 is "magical", not this
         | thing that recreates it.
         | 
         | so yeah, its flow control and compression of highly curated,
         | subtle soft porn. Subtle, hyper targeted, subconscious porn
         | honed by the most colossal digitally mediated focus group ever
         | constructed to manipulate our (straight male) emotions.
         | 
         | why isn't the voice actually the voice of the pissed off high
         | school janitor telling you to man-up and stop hyperventilating?
         | instead its a woman stroking your ego and telling you to relax
         | and take deep breaths. what dataset did they train that voice
         | on anyway?
        
           | mindcrime wrote:
           | I may or may not entirely agree with this sentiment (but I
           | definitely don't disagree with all of it!) but I will say
           | this: I don't think you deserve to be downvoted for this.
           | Have a "corrective upvote" on me.
        
           | whimsicalism wrote:
           | Right, because having a female voice means that it is soft
           | porn.
           | 
           | This is like horseshoe theory on steroids.
        
           | micromacrofoot wrote:
           | It's not that complicated, generally more woman-like voices
           | test as more pleasant to men and women alike. This concept
           | has been backed up by stereotypes for centuries.
           | 
           | Most voice assistants have male options, and an increasing
           | number (including ChatGPT) have gender neutral voices.
           | 
           | > why isn't the voice actually the voice of the pissed off
           | high school janitor telling you to man-up and stop
           | hyperventilating
           | 
           | sounds like a great way to create a product people will
           | outright hate
        
         | Melatonic wrote:
         | HAL's voice acting I would say is actually superb and super
         | subtly very much not unemotional. Part of what makes so
         | unnerving. They perfect nailed creepy uncanny valley
        
         | WhitneyLand wrote:
         | How much of this could be implemented using the API?
         | 
         | There's so much helpful niche functionality that can be added
         | to custom clients.
        
         | OOPMan wrote:
         | I really don't think Sam needs more encouragement, thanks.
         | 
         | Also, if this is your definition of magic then...yeah...
        
         | noman-land wrote:
         | Magic is maybe not the best analogy to use because magic itself
         | isn't magical. It is trickery.
        
         | karmasimida wrote:
         | Very convincing demo
         | 
         | However, using ChatGPT with transcribing is already offering me
         | similar experience, so what is new exactly
        
         | scarface_74 wrote:
         | Some of the failure modes in LLMs have been fixed by augmenting
         | LLMs with external services
         | 
         | The simplest example is "list all of the presidents in reverse
         | chronological order of their ages when inaugurated".
         | 
         | Both ChatGpt 3.5 and 4 get the order wrong. The difference is
         | that I can instruct ChatGPT 4 to "use Python"
         | 
         | https://chat.openai.com/share/87e4d37c-ec5d-4cda-921c-b6a9c7...
         | 
         | You can do similar things to have it verify information by
         | using internet sources and give you citations.
         | 
         | Just like with the Python example, at least I can look at the
         | script/web citation myself
        
           | wintermutestwin wrote:
           | It is pretty awesome that you only have to prompt with "use
           | python"
        
           | aspenmayer wrote:
           | > The simplest example is "list all of the presidents in
           | reverse chronological order of their ages when inaugurated".
           | 
           | This question is probably not the simplest form of the query
           | you intend to receive an answer for.
           | 
           | If you want a descending list of presidents based on their
           | age at inauguration, I know what you want.
           | 
           | If you want a reverse chronological list of presidents, I
           | know what you want.
           | 
           | When you combine/concatenate the two as you have above, I
           | have no idea what you want, nor do I have any way of checking
           | my work if I assume what you want. I know enough about word
           | problems and how people ask questions to know that you
           | probably have a fairly good idea what you want and likely
           | don't know how ambitious this question is as asked, and I
           | think you and I both are approaching the question with
           | reasonably good faith, so I think you'd understand or at
           | least accommodate my request for clarification and refinement
           | of the question so that it's less ambiguous.
           | 
           | Can you think of a better way to ask the question?
           | 
           | Now that you've refined the question, do LLMs give you the
           | answers you expect more frequently than before?
           | 
           | Do you think LLMs would be able to ask you for clarification
           | in these terms? That capability to ask for clarification is
           | probably going to be as important as other improvements to
           | the LLM, for questions like these that have many possibly
           | correct answers or different interpretations.
           | 
           | Does that make sense? What do you think?
        
             | JustExAWS wrote:
             | (I seemed to have made the HN gods upset)
             | 
             | I tried asking the question more clearly
             | 
             | I think it "understood" the question because it "knew" how
             | to write the Python code to get the right answer. It parsed
             | the question as expected
             | 
             | The previous link doesn't show the Python. This one does.
             | 
             | https://chat.openai.com/share/a5e21a97-7206-4392-893c-55c53
             | 1...
             | 
             | LLMs are generally not good at math. But in my experience
             | ChatGPT is good at creating Python code to solve math
             | problems
        
               | aspenmayer wrote:
               | > I think it "understood" the question because it "knew"
               | how to write the Python code to get the right answer.
               | 
               | That's what makes me suspicious of LLMs, they might just
               | be coincidentally or accidentally answering in a way that
               | you agree with.
               | 
               | Don't mean to nitpick or be pedantic. I just think the
               | question was really poorly worded and might have a lot of
               | room for confirmation bias in the results.
        
               | JustExAWS wrote:
               | I reworded the question with the same results in the
               | second example.
               | 
               | But here is another real world example I dug up out of my
               | chat history. Each iteration of the code worked. I
               | actually ran it a few days ago
               | 
               | https://chat.openai.com/share/4d02818c-c397-417a-8151-7bf
               | d7d...
        
         | croes wrote:
         | >This stuff feels magical. Magical.
         | 
         | Sound like the people who defend Astrology because it feels
         | magical how their horoscope fits their personality.
         | 
         | "Don't bother me with facts that destroy my rose-tinted view"
         | 
         | At moment AI is a massive hype and shoved into everything. To
         | point at the faults and weaknesses is a reasonable and
         | responsible thing to do.
        
           | hsavit1 wrote:
           | yea, we don't want or need this kind of "magic" - because
           | it's hardly magic to begin with, and it's more socially and
           | environmentally destructive than anything else.
        
             | lannisterstark wrote:
             | Speak for yourself, my workflow and live has been
             | significantly improved with these things. Having easier
             | access to information that I sorta know but want to
             | verify/clarify rather than going into forums/SO is
             | extremely handy.
             | 
             | Not having to write boilerplate code itself also is very
             | handy.
             | 
             | So yes, I absolutely do want this "magic." "I don't like it
             | so no one should use it" is a pretty narrow POV.
        
               | oblio wrote:
               | Both your use cases don't really lead to stable long term
               | valuations in the trillions for the companies building
               | this stuff.
        
               | lannisterstark wrote:
               | Wonderful. I don't need them to.
               | 
               | It works for what I need it to do.
        
               | oblio wrote:
               | You should be worried because this stuff needs to make
               | sense financially. Otherwise we'll be stuck with it in an
               | enshittification cycle, kind of like Reddit or image
               | hosting websites.
        
               | lannisterstark wrote:
               | Problem is that by that time there would be open source
               | models (the ones that already exist are getting good)
               | that I can run locally. I honestly don't need _THAT_
               | much.
        
               | ewild wrote:
               | people like you are the problem. the people who join a
               | website cause it to be shitty, then leave and start the
               | process at a new website. Reddit didnt become shit
               | because of Reddit it became shit because of people going
               | on there commenting as if they themselves are an LLM
               | repeating enshittification over and over and trying to
               | say the big buzzword first so they get to the top denying
               | any real conversation.
        
           | helicalmix wrote:
           | i legitimately don't understand this viewpoint.
           | 
           | 3 years ago, if you told me you could facetime with a robot,
           | and they could describe the environment and have a "normal"
           | conversation with me, i would be in disbelief, and assume
           | that tech was a decade or two in the future. Even the stuff
           | that was happening a 2 years ago felt unrealistic.
           | 
           | astrology is giving vague predictions like "you will be happy
           | today". GPT-4o is describing to you actual events in real
           | time.
        
             | demondemidi wrote:
             | Maybe you just haven't been around enough to seen the meta-
             | analysis? I've been through four major tech hype cycles in
             | 30+ years. This looks and smells like all the others.
        
               | HelloMcFly wrote:
               | I'm 40ish, I'm in the tech industry, I'm online, I'm
               | often an early adopter.
               | 
               | What hype cycle does this smell like? Because it feels
               | different to me, but maybe I'm not thinking broadly
               | enough. If your answer is "the blockchain" or Metaverse
               | then I know we're experiencing these things quite
               | differently.
        
               | threeseed wrote:
               | It feels like the cloud.
               | 
               | Where platforms and applications are rewritten to take
               | advantage of it and it improves the baseline of
               | capabilities that they offer. But the end user benefits
               | are far more limited than predicted.
               | 
               | And where the power and control is concentrated in the
               | hands of a few mega corporations.
        
               | whimsicalism wrote:
               | > the end user benefits are far more limited than
               | predicted
               | 
               | How have you judged the end user benefits of the cloud? I
               | don't agree personally - the cloud has enabled most
               | modern tech startups and all of those have been super
               | beneficial to me.
        
               | threeseed wrote:
               | Direct versus indirect benefits.
               | 
               | Cloud is hidden to end users whereas other waves like
               | internet and smartphone apps were very visible.
               | 
               | AI will soon stop being a buzzword and just be another
               | foundation we build apps on.
        
               | idopmstuff wrote:
               | This is such a strange take - do you not remember 2020
               | when everyone started working from home? And today, when
               | huge numbers of people continue to work from home? Most
               | of that would be literally impossible without the cloud -
               | it has been a necessary component in reshaping work and
               | all the downstream effects related to values of office
               | real estate, etc.
               | 
               | Literally a society-changing technology.
        
               | bongodongobob wrote:
               | No way. Small to medium sized businesses don't need
               | physical servers anymore. Which is most businesses. It's
               | been a huge boon to most people. No more running your
               | exchange servers on site. Most things that used to be on-
               | prem software have moved to the cloud and integrate with
               | mobile devices. You don't need some nerd sitting around
               | all day in case you need to fix your on-prem industry
               | specific app.
               | 
               | I have no idea how you can possibly shrug off the cloud
               | as not that beneficial.
        
               | threeseed wrote:
               | > I have no idea how you can possibly shrug off the cloud
               | as not that beneficial.
               | 
               | I have no idea either. Since I never said it.
        
               | helicalmix wrote:
               | i feel like a common consumer fallacy is that, because
               | you don't interact with a technology in your day-to-day
               | life, it leads you to conclude that the technology is
               | useless.
               | 
               | I guarantee you that the cloud has benefitted you in some
               | way, even though you aren't aware of the benefits of the
               | cloud.
        
               | TulliusCicero wrote:
               | And some of those hype cycles were very impactful? The
               | spread of consumer internet access, or smartphones, as
               | two examples.
        
               | whimsicalism wrote:
               | And maybe you just enjoy the perspective of "I've seen it
               | all" so much that you've shut off your capacity for
               | critical analysis.
        
               | samatman wrote:
               | Yeah, I remember all that dot com hysteria like it was
               | yesterday.
               | 
               | Page after page of Wired breathlessly predicting the
               | future. We'd shop online, date online, the world's
               | information at our fingertips. It was going to change
               | everything!
               | 
               | Silly now, of course, but people truly believed it.
        
               | homami wrote:
               | I am just imagining GPT-4o saying this in her sarcastic
               | voice!
        
               | kristiandupont wrote:
               | If this smells like anything to me, it's the start of the
               | internet.
        
               | helicalmix wrote:
               | which hype cycles are you referring to? and, after the
               | dust settled, do you conclusively believe nothing of
               | value was generated from these hype cycles?
        
             | cogman10 wrote:
             | People said pretty much exactly the same thing about 3d
             | printing.
             | 
             | "Rather than ship a product, companies can ship blueprints
             | and everyone can just print stuff at their own home!
             | Everything will be 3d printed! It's so magical!"
             | 
             | Just because a tech is magical today, doesn't mean that it
             | will be meaningful tomorrow. Sure, 3d printing has its
             | place (mostly in making plastic parts for things) but it's
             | hardly the revolutionary change in consumer products that
             | it was touted to be. Instead, it's just a hobbiest toy.
             | 
             | GPT-4o being able to describe actual events in real time is
             | interesting, it's yet to be seen if that's useful.
             | 
             | That's mostly the thinking here. A lot of the "killer" AI
             | tech has really boiled down to "Look, this can replace your
             | customer support chat bot!". Everyone is rushing to try and
             | figure out what we can use LLMs (Just like they did when ML
             | was supposed to take over the world) and so far it's been
             | niche locations to make shareholders happy.
        
               | idopmstuff wrote:
               | Remember when Chegg's stock price tanked? That's because
               | GPT is extremely valuable as a homework helper. It can
               | make mistakes, but that's very infrequent on well-
               | understood topics like English, math and science through
               | the high school level (and certainly if you hire a tutor,
               | you'd pay a whole lot more for something that can also
               | make mistakes).
               | 
               | Is that not a very meaningful thing to be able to do?
        
               | j2kun wrote:
               | If you follow much of the education world, it's inundated
               | with teachers frantically trying to deal with the volume
               | and slop their students produce with AI tools. I'm sure
               | it can be useful in an educational context, but
               | "replacing a poor-quality cheating tool with a more
               | efficient poor-quality cheating tool" isn't exactly what
               | I'd call "meaningful."
               | 
               | The most interesting uses of AI tools in a classroom I've
               | seen is teachers showing students AI-generated work and
               | asking students to critique it and fact check it, at
               | which point the students see it for what it is.
        
               | delusional wrote:
               | > Is that not a very meaningful thing to be able to do?
               | 
               | No? Solving homework was never meaningful. Being
               | meaningful was never the point of homework. The point was
               | for you to solve it yourself. To Learn with your human
               | brain, such that your human brain could use those
               | teaching to make new meaningful knowledge.
               | 
               | John having 5 apples after Judy stole 3 is not
               | interesting.
        
               | LordDragonfang wrote:
               | The huge difference between this and your analogy is that
               | 3d printing failed to take off because it never reached
               | mass adoption, and stayed in the "fiddly and expensive"
               | stage. GPT models have _already_ seen adoption in nearly
               | every product your average consumer uses, in some cases
               | heedless of whether it even makes sense in that context.
               | Windows has it built in. Nearly everyone I know (under
               | the age of 40) has used at least one product downstream
               | of OpenAI, and more often than not a handful of them.
               | 
               | That said, yeah it's mostly niche locations like customer
               | support chatbots, because the killer app is "app-to-user
               | interface that's undisguisable from normal human
               | interaction". But you're underestimating just _how much_
               | of the labor force are effectively just an interface
               | between a customer and some app (like a POS).  "Magical"
               | is exactly the requirement to replace people like that.
        
               | j2kun wrote:
               | "Adoption" of tech companies pushing it on you is very
               | different from "adoption" in terms of the average person
               | using it in a meaningful way and liking it.
        
               | cogman10 wrote:
               | > But you're underestimating just how much of the labor
               | force are effectively just an interface between a
               | customer and some app
               | 
               | That's the sleight of hand LLM advocates are playing
               | right now.
               | 
               | "Imagine how many people are just putting data into
               | computers! We could replace them all!"
               | 
               | Yet LLMs aren't "just putting data into a computer" They
               | aren't even really user/app interfaces. They are a magic
               | box you can give directives to and get (generally
               | correct, but not always) answers from.
               | 
               | Go ahead, ask your LLM "Create an excel document with the
               | last 30 days of the high temperatures for blank". What
               | happens? Did it create that excel document? Why not?
               | 
               | LLMs don't bridge the user/app gap. They bridge the
               | user/knowledge gap, sometimes sort of.
        
               | helicalmix wrote:
               | > GPT-4o being able to describe actual events in real
               | time is interesting, it's yet to be seen if that's
               | useful.
               | 
               | sure, but my experience is that if you are able to
               | optimize better on some previous limitation, it
               | legitimately does open up a whole different world of
               | usefulness.
               | 
               | for example, real-time processing makes me feel like
               | universal translators are now all the more viable
        
               | helicalmix wrote:
               | > Sure, 3d printing has its place (mostly in making
               | plastic parts for things) but it's hardly the
               | revolutionary change in consumer products that it was
               | touted to be. Instead, it's just a hobbiest toy.
               | 
               | how sure are you about that?
               | 
               | https://amfg.ai/industrial-applications-of-3d-printing-
               | the-u...
               | 
               | how positive are you that some benefits in your life are
               | not attributable to 3d-printing used behind the scenes
               | for industrial processes?
               | 
               | > Just like they did when ML was supposed to take over
               | the world
               | 
               | how sure are you that ML is not used behind the scenes to
               | benefit your life? do you consider features like fraud
               | detection programs, protein-folding prediction programs
               | to create, and spam filters valuable in and of themself?
        
               | cogman10 wrote:
               | This honestly made me lol.
               | 
               | I'm sure 10 years from now, assuming LLMs don't prove me
               | wrong, I'll make a similar comment about LLMs and a new
               | hype that I just made about 3d printing, and I'll get
               | EXACTLY this reply. "Oh yeah, well here's a niche
               | application of LLMs that you didn't account for!".
               | 
               | > how positive are you that some benefits in your life
               | are not attributable to 3d-printing used behind the
               | scenes for industrial processes?
               | 
               | See where I said "in consumer products". I'm certainly
               | not claiming that 3d printing is never used and is not
               | useful. However, what I am saying is that it was hyped
               | WAY beyond industrial applications.
               | 
               | In fact, here I am, 11 years ago, saying basically
               | exactly what I'm saying about LLMs that I said about 3d
               | printing. [1]. Along with people basically responding to
               | me the exact same way you just did.
               | 
               | > how sure are you that ML is not used behind the scenes
               | to benefit your life? do you consider features like fraud
               | detection programs, protein-folding prediction programs
               | to create, and spam filters valuable in and of themself?
               | 
               | Did I say it wasn't behind the scenes? ML absolutely has
               | an applicable location, it's not nearly as vast as the
               | hype train would say. I know, I spent a LONG time trying
               | to integrate ML into our company and found it simply
               | wasn't as good as hard and fast programmed rules in
               | almost all situations.
               | 
               | [1] https://www.reddit.com/r/technology/comments/15iju9/3
               | d_print...
        
               | helicalmix wrote:
               | sorry, maybe i'm not completely understanding what you
               | mean by "in consumer products".
               | 
               | reading your argument on reddit, it seems to me that you
               | don't consider 3d printing a success because there's not
               | one in every home...which is true.
               | 
               | but it feels uncreative? like, sure, just because it
               | hasn't been mass adopted by consumers, doesn't mean there
               | wasn't value generation done on an industrial level.
               | you're probably using consumer products right now that
               | have benefitted from 3d printing in some way.
               | 
               | > ML absolutely has an applicable location, it's not
               | nearly as vast as the hype train would say
               | 
               | what hype train are you referring to? i know a lot of
               | different predictions in machine learning, so i'm curious
               | about what you mean specifically.
        
             | rurp wrote:
             | Ok, but what will the net effects be? Technology can be
             | extremely impressive on a technical level, but harmful in
             | practical terms.
             | 
             | So far the biggest usecase for LLMs is mass propaganda and
             | scams. The fact that we might also get AI girlfriends out
             | of the tech understandly doesn't seem that appealing to a
             | lot of folks.
        
               | helicalmix wrote:
               | this is a different thesis than "AI is basically bullshit
               | astrology", so i'm not disagreeing with you.
               | 
               | Understanding atomic energy gave us both emission-free
               | energy and the atomic, and you are correct that we can't
               | necessarily where the path of AI will take us.
        
             | croes wrote:
             | GPT-4o is also describing things that never happened.
             | 
             | The first users of Eliza felt the same about the
             | conversation with it.
             | 
             | The important point is to know that GPTs don't know or
             | understand.
             | 
             | It may feel like a normal conversation but is a Chinese
             | Room on steroids.
             | 
             | People started to ask GPTs questions and take the answers
             | as facts because the believe it's intelligent.
        
               | holoduke wrote:
               | But it may be intelligent. After all you are with a few
               | trillion synapses also intelligent.
        
               | LordDragonfang wrote:
               | I'm increasing exhausted by the people who will
               | immediately jumps to gnostic assertions that <LLM> isn't
               | <intelligent|reasoning|really thinking|> because <thing
               | that applies to human cognition>
               | 
               | >GPT-4o is also describing things that never happened.
               | 
               | https://www.cbsnews.com/news/half-of-people-remember-
               | events-...
               | 
               | >People started to ask [entity] questions and take the
               | answers as facts because the believe it's intelligent.
               | 
               | Replace that with any political influencer (Ben Shapiro,
               | AOC, etc) and you will see the _exact same argument_.
               | 
               | People remember things that didn't happen and confidently
               | present things they just made up as facts on a daily
               | basis. This is because they've learned that confidently
               | stating incorrect information is more effective than
               | staying silent when you don't know the answer. LLMs have
               | just learned how to act like a human.
               | 
               | At this point the real stochastic parrots are the people
               | who bring up the Chinese room because it appears the most
               | in their training data of how to respond to this
               | situation.
        
               | helicalmix wrote:
               | > It may feel like a normal conversation but is a Chinese
               | Room on steroids.
               | 
               | Can you prove that humans are not chinese rooms on
               | steroids themselves?
        
             | listenallyall wrote:
             | There are 8 billion humans you could potentially facetime
             | with. I agree, a large percentage are highly annoying, but
             | there are still plenty of gems out there, and the quest to
             | find one is likely to be among the most satisfying journeys
             | of your life.
        
               | helicalmix wrote:
               | sure, but we're not discussing the outsourcing of human
               | companionship in this context. we're discussing the
               | capabilities of current technology.
        
           | whimsicalism wrote:
           | > Sound like the people who defend Astrology because it feels
           | magical how their horoscope fits their personality.
           | 
           | Does it really or are you just playing facile word
           | association games with the word "magical"?
        
           | arisAlexis wrote:
           | What is the point of pointing faults that will be fixed very
           | soon? Just being negative or unable to see the future?
        
           | idopmstuff wrote:
           | Astrology is a thing with no substance whatsoever. It's just
           | random, made-up stories. There is no possibility that it will
           | ever develop into something that has substance.
           | 
           | AI has a great deal of substance. It can draft documents. It
           | can identify foods in a picture and give me a recipe that
           | uses them. It can create songs, images and video.
           | 
           | AI, of course, has a lot of flaws. It does some thing poorly,
           | it does other things with bias, and it's not suitable for a
           | huge number of use cases. To imply that something that has a
           | great deal of substance but flaws alongside is the same as
           | something that has no substance whatsoever nor ever will is
           | just not a reasonable thing to do.
        
           | dogcomplex wrote:
           | If you want to talk facts, then those critics are similarly
           | on weak grounds and critiquing feelings more than facts.
           | There has been no actual sign of scaling ceasing to work, in
           | medium after medium, and most of their criticisms are issues
           | with how LLM tools are embedded in architectures which are
           | still incredibly early/primitive and still refining how to
           | use transformers effectively. We haven't even begun using
           | error correction techniques from analog engineering
           | disciplines properly to boost the signal of LLMs in practical
           | settings. There is so much work to do with just the existing
           | tools.
           | 
           | "AI is massive hype and shoved into everything" has more
           | grounding as a negative feeling of people being overwhelmed
           | with technology than any basis in fact. The faults and
           | weaknesses are buoyed by people trying to acknowledge your
           | feelings than any real criticism of a technology that is
           | changing faster than the faults and weakness arguments can be
           | made. Study machine learning and come back with an informed
           | criticism.
        
         | pmelendez wrote:
         | > Ignore the critics. Watch the demos. Play with it
         | 
         | With so many smoke and mirrors demos out there, I am not super
         | excited at those videos. I would play with it, but it seems
         | like it is not available in a free tier (I stopped paying
         | OpenAI a while ago after realizing that open models are more
         | than enough for me)
        
         | m463 wrote:
         | > HAL's unemotional monotone
         | 
         | on a tangent...
         | 
         | I find it interesting the psychology behind this. If the voice
         | in 2001 had proper inflection, it wouldn't have been perceived
         | as a computer.
         | 
         | (also, I remember when voice synthesizers got more
         | sophisticated and Stephen Hawking decided to keep his original
         | first-gen voice because he identified more with it)
         | 
         | I think we'll be going the other way soon. Perfect voices, with
         | the perfect emotional inflection will be perceived as
         | computers.
         | 
         | However I think at some point they may be anthropomorphized and
         | given more credit than they deserve. This will probably be
         | cleverly planned and a/b tested. And then that perfect voice,
         | for you, will get you to give in.
        
         | 0xdeadbeefbabe wrote:
         | > HAL's unemotional monotone in Kubrick's movie, "Space
         | Odyssey," feels... oddly primitive by comparison
         | 
         | In comparison to the gas pump which says "Thank You!"
        
         | nojvek wrote:
         | > Play with it!
         | 
         | It's not accessible to everyone yet.
         | 
         | Even on api, I can't send it voice stream yet.
         | 
         | Api refuses to generate images.
         | 
         | Next few weeks will tell as more people play with it.
        
         | fhub wrote:
         | I prompted it with "Take this SSML script and give me a woman's
         | voice reading it as WAV or MP3 [Pasted script]" and it pretty
         | much sounds like HAL.
        
           | speedgoose wrote:
           | Did they release the new voices yet?
        
         | password54321 wrote:
         | Comments have become insufferable. Either it is now positive to
         | the point of bordering on cringe-worthiness (your comment) or
         | negative. Nuanced discussion is dead.
        
         | smugglerFlynn wrote:
         | Watching HAL happening in real life comes across as creepy, not
         | magical. Double creepy with all the people praising this
         | 'magicality'.
         | 
         | I'm not a sceptic and apply AI on a daily basis, but whole "we
         | can finally replace people" vibe is extremely off-putting. I
         | had very similar feelings during pandemic, when majority of
         | people was so seemingly happy to drop any real human
         | interaction in favor of remote comms via chats/audio calls, it
         | still creeps me out how ready we are as a society to drop
         | anything remotely human in favor of technocratic advancement
         | and "productivity".
        
         | aiauthoritydev wrote:
         | 1. Demos are meant for feel magical and except in Apple's case
         | they are often exaggerated versions of their real product.
         | 
         | 2. Even then this is a wonderful step for tech in general and
         | not just OpenAI. Makes me very excited.
         | 
         | 3. Most economic value and growth driven by AI will not come
         | from consumer apps but rather the enterprise use. I am
         | interested in seeing how AI can automatically buy stuff for me,
         | automate my home, reduce my energy used, automatically apply
         | and get credit cards based on my purchases, find new jobs for
         | me, negotiate with a car dealer on my behalf, detect when I am
         | going to fall sick, better diabetes case and eventual cure etc.
         | etc.
        
         | lm28469 wrote:
         | > It makes the movie "Her" look like it's no longer in the
         | realm of science fiction but in the realm of incremental
         | product development
         | 
         | Are we supposed to cheer to that?
         | 
         | We're already mid way to the full implementation of 1984, do we
         | need Her before we get to Matrix ?
        
           | throwthrowuknow wrote:
           | Her wasn't a dystopia as far as I could tell. Not even a
           | cautionary tale. The scifi ending seems unlikely but
           | everything else is remarkably prescient. I think the picnic
           | scene is very likely to come true in the near future. Things
           | might even improve substantially if we all interact with
           | personalities that are consistently positive and biased
           | towards conflict resolution and non judgemental interactions.
        
             | lm28469 wrote:
             | > Her wasn't a dystopia as far as I could tell.
             | 
             | Well that's exactly why I'm not looking forward to whatever
             | is coming. The average joe thinking dating a server is not
             | a dystopia frighten me much more than the delusional tech
             | ceo who thinks his ai will revolutionise the world
             | 
             | > Things might even improve substantially if we all
             | interact with personalities that are consistently positive
             | and biased towards conflict resolution and non judgemental
             | interactions.
             | 
             | Some kind of turbo bubble in which you don't even have to
             | actually interact with anyone or anything ? Every
             | "personalities" will be nice to you as long as you send
             | $200 to openai every week, yep that's absolutely a dystopia
             | for me
             | 
             | It really feels like the end goal is living in a pod and
             | being uploaded in an alternative reality, everything we
             | build to "enhance" our lives take us further from the basic
             | building blocks that make life "life".
        
             | goatlover wrote:
             | Seemed like a cautionary tale to me where the humans fall
             | in love with disembodied AIs instead of seeking out human
             | interaction. I think the end of the movie drove that home
             | pretty clearly.
        
         | bowsamic wrote:
         | The demos seem quite boring to me
        
         | suarezluis wrote:
         | This is such a hot take, it should go in hot-takes.io LOL
        
         | goatlover wrote:
         | > It makes the movie "Her" look like it's no longer in the
         | realm of science fiction but in the realm of incremental
         | product development.
         | 
         | The last part of the movie "Her" is still in the realm of
         | science fiction, if not outright fantasy. Reminds me of the
         | later seasons of SG1 with all the talk of ascension and
         | Ancients. Or Clarke's 3001 book intro, where the monolith
         | creators figured out how to encode themselves into spacetime.
         | There's nothing incremental about that.
        
         | badgersnake wrote:
         | Blah blah blah indeed, the hype train continues unabated. The
         | problem is, those are all perfectly valid criticisms and LLMS
         | can never live up to the ridiculous levels of hype.
        
         | peterisza wrote:
         | Can anybody help me try the direct voice feature? I can't find
         | the button for it. Maybe it's not available in Europe yet, I
         | don't know.
        
         | cess11 wrote:
         | You'll have a great time once you discover literature.
         | Especially early modern novels, texts the authors sometimes
         | spent decades refining, under the combined influences of
         | classical arts and thinking, Enlightenment philosophy and
         | science.
         | 
         | If chatbots feel magical, what those people did will feel
         | divinely inspired.
        
         | vwkd wrote:
         | Funnily, I'd prefer HAL's unemotional monotone over GPT's woke
         | hyperbola any second.
        
         | byw wrote:
         | I mean, humans also have tons of failures modes, but we've
         | learned to live them over time.
         | 
         | The average human have tons of quirks, talk over each other all
         | the time, generally can't solve complex problems in a casual
         | conversion setting, and are not always cheery and ready to
         | please like Scarlet's character in Her.
         | 
         | I think our expectations of AI is way too high from our
         | exposure to science fiction.
        
       | bearjaws wrote:
       | OAI just made an embarrassment of Google's fake demo earlier this
       | year. Given how this was recorded, I am pretty certain it's
       | authentic.
        
         | CivBase wrote:
         | I don't doubt this is authentic, but if they really wanted to
         | fake those demos, it would be pretty easy to do using pre-
         | recorded lines and staged interactions.
        
           | mike00632 wrote:
           | For what it's worth, OpenAI also shared videos of failed
           | demos:
           | 
           | https://vimeo.com/945591584
           | 
           | I really value how open they are being about its limitations.
        
         | hehdhdjehehegwv wrote:
         | This feature has been in iOS for a while now, just really slow
         | and without some of the new vision aspects. This seems like a
         | version 2 for me.
        
           | bigyikes wrote:
           | That old feature uses Whisper to transcribe your voice to
           | text, and then feeds the text into the GPT which generates a
           | text response, and then some other model synthesizes audio
           | from that text.
           | 
           | This new feature feeds your voice directly into the GPT and
           | audio out of it. It's amazing because now ChatGPT can truly
           | communicate with you via audio instead of talking through
           | transcripts.
           | 
           | New models should be able to understand and use tone, volume,
           | and subtle cues when communicating.
           | 
           | I suppose to an end user it is just "version 2" but progress
           | will become more apparent as the natural conversation
           | abilities evolve.
        
             | hehdhdjehehegwv wrote:
             | Yes, per my other comment this is an improvement on what
             | their app already does. The magnitude of that improvement
             | remains to be seen, but it isn't a "new" product launch
             | like a search engine would be.
        
           | abhpro wrote:
           | No it's not the same thing, the link for this submission even
           | explains that. Anyone who comments should at least give the
           | submission a cursory read.
        
             | hehdhdjehehegwv wrote:
             | I did and regardless of the underlying technology it is, in
             | fact, an improvement to an existing product - not something
             | new from whole cloth.
             | 
             | If they had released a search engine, which had been
             | suggested, that would be a new product.
        
         | readams wrote:
         | https://twitter.com/Google/status/1790055114272612771
        
         | nojvek wrote:
         | Let OAI actually be released to the masses. Then we can
         | compare.
         | 
         | I'm not a big fan of announcing something but it not being
         | released.
         | 
         | They say available for api but it's text only. Can't send audio
         | stream to get audio stream back.
         | 
         | Time will tell. I'm holding my emotions after I get my hands on
         | it.
        
       | levocardia wrote:
       | As a paid user this felt like a huge letdown. GPT-4o is available
       | to everyone so I'm paying $20/mo for...what, exactly? Higher
       | message limits? I have no idea if I'm close to the message limits
       | currently (nor do I even know what they are). So I guess I'll
       | cancel, then see if I hit the limits?
       | 
       | I'm also extremely worried that this is a harbinger of the
       | enshittification of ChatGPT. Processing video and audio for all
       | ~200 million users is going to be extravagantly expensive, so my
       | only conclusion is that OpenAI is funding this by doubling down
       | on payola-style corporate partnerships that will result in
       | ChatGPT slyly trying to mention certain brands or products in our
       | conversations [1].
       | 
       | I use ChatGPT every day. I love it. But after watching the video
       | I can't help but think "why should I keep paying money for this?"
       | 
       | [1] https://www.adweek.com/media/openai-preferred-publisher-
       | prog...
        
         | muttantt wrote:
         | So... cancel the subscription?
        
         | CodeCrusader wrote:
         | Completely agree, none of the updates will apply to any of my
         | use cases, disappointment.
        
       | noncoml wrote:
       | They really need to tone down the talking garniture. It needs to
       | put on its running shoes and get to the point on every reply.
       | Ain't nobody has time to keep listening to AI blubbering along at
       | every prompt.
        
       | dbcooper wrote:
       | question for you guys - is there a model that can take figures
       | (graphs), from scientific publications, and combine image
       | analysis with picking up the data point symbol descriptions and
       | analyse the trends?
        
       | krunck wrote:
       | So GPT-4o can do voice intonation? Great. Nice work.
       | 
       | Still, it sounds like some PR drone selling a product. Oh
       | wait....
        
       | CivBase wrote:
       | Those voice demos are cool but having to listen to it speak makes
       | me even more frustrated with how these LLMs will drone on and on
       | without having much to say.
       | 
       | For example, in the second video the guy explains how he will
       | have it talk to another "AI" to get information. Instead of just
       | responding with "Okay, I understand" it started talking about how
       | interesting the idea sounded. And as the demo went on, both "AIs"
       | kept adding unnecessary commentary about the secenes.
       | 
       | I would hate having to talk with these things on a regular basis.
        
         | golol wrote:
         | Yea at some pont the style and tone of these assistants needs
         | to be seriously changed, I can imagine a lot of their RLHF and
         | instruct processes emphasize sounding good vs being good too
         | much.
        
       | DataDaemon wrote:
       | Now, say goodbye to call centers.
        
         | willsmith72 wrote:
         | and say hello to your grandma getting scammed
        
       | joshstrange wrote:
       | What do they mean by "desktop version"? I assume that doesn't
       | mean a "native" (electron) app?
        
       | simonw wrote:
       | I'm seeing gpt-4o in the OpenAI Playground interface already:
       | https://platform.openai.com/playground/chat?mode=chat&model=...
       | 
       | First impressions are that it feels very fast.
        
       | tailspin2019 wrote:
       | Does anyone with a paid plan see anything different in the
       | ChatGPT iOS app yet?
       | 
       | Mine just continues to show "GPT 4" as the model - it's not clear
       | if that's now 4o or there is an app update coming...
        
       | ilaksh wrote:
       | Are there any remotely comparable open source models? Fully
       | multimodal, audio-to-audio?
        
       | MBCook wrote:
       | Too bad they consume 25x the electricity Google does.
       | 
       | https://www.brusselstimes.com/world-all-news/1042696/chatgpt...
        
         | simonw wrote:
         | That's not a well sourced story: it doesn't say where the
         | numbers come from. Also:
         | 
         | "However, ChatGPT consumes a lot of energy in the process, up
         | to 25 times more than a Google search."
         | 
         | That's comparing a Large Language Model prompt to a search
         | query.
        
         | joshstrange wrote:
         | > Too bad they consume 25x the electricity Google does.
         | 
         | From the article:
         | 
         | "However, ChatGPT consumes a lot of energy in the process, up
         | to 25 times more than a Google search."
         | 
         | And the article doesn't back that claim up nor do they break
         | out how much energy ChatGPT (A Message? Whole conversation?
         | What?) or a Google search uses. Honestly the whole article
         | seems very alarmist while being light on details and making
         | sweeping generalizations.
        
         | rvnx wrote:
         | And in this 25x you get your answer.
         | 
         | What if we actually counted the electricity that the websites
         | use instead of just the search engine page ?
        
       | delichon wrote:
       | Won't this make pretty much all of the work to make a website
       | accessible go away, as it becomes cheap enough? Why struggle to
       | build parallel content for the impaired when it can be generated
       | just in time as needed?
        
       | Negitivefrags wrote:
       | I found these videos quite hard to watch. There is a level of
       | cringe that I found a bit unpleasant.
       | 
       | It's like some kind of uncanny valley of human interaction that I
       | don't get on nearly the same level with the text version.
        
         | jameshart wrote:
         | While it is probably pretty normal for California, the
         | insincere flattery and patronizing eagerness are definitely
         | grating But then you have to stack that up against the fact
         | that we are examining a technology and nitpicking over its
         | _tone of voice_.
        
           | MattPalmer1086 wrote:
           | I found it disturbing that it had any kind of personality. I
           | don't want a machine to pretend to be a person. I guess it
           | makes it more evident with a voice than text.
           | 
           | But yeah, I'm sure all those things would be tunable, and
           | everyone could pick their own style.
        
             | jimkleiber wrote:
             | For me, you nailed it. Maybe how I feel on this will change
             | over time, yet at the moment (and since the movie Her), I
             | feel a deep unsettling, creeped out, disgusted feeling at
             | hearing a computer pretend to be a human. I also have never
             | used Siri or Alexa. At least with those, they sound robotic
             | and not like a human. I watched a video of an interview
             | with an AI Reed Hastings and had a similar creeped out
             | feeling. It's almost as if I want a human to be a human and
             | a computer to be a computer. I wonder if I would feel the
             | same way if a dog started speaking to me in English and
             | sounded like my deceased grandmother or a woman who I found
             | very attractive. Or how I'd feel if this tech was used in
             | videogames or something where I don't think it's real life.
             | I don't really know how to put it into words, maybe just
             | uncanny valley.
        
               | Intralexical wrote:
               | It's dishonest to the core. "Emotions" which it doesn't
               | actually feel are just a way to manipulate you.
        
               | jimkleiber wrote:
               | Yea, gives that con artist vibe. "I'm sorry, I can't help
               | you with that." But you're not sorry, you don't feel
               | guilt. I think in the video it even asked "how are you
               | feeling" and it replied, which creeped me out. The
               | computer is not feeling. Maybe if it said, "my battery is
               | a bit warm right now I should turn on my fan" or "I worry
               | that my battery will die" then I'd trust it more. Give me
               | computer emotions, not human emotions.
        
           | zamadatix wrote:
           | I feel like it's largely an effect of tuning it to default as
           | "a ultra helpful assistant which is happy to help with any
           | request via detailed responses in candid and polite
           | manner..." kind of thing as you basically lose free points
           | any time it doesn't jump on helping with something, tries to
           | use short output and generates a more incorrect answer as a
           | result, or just plain has to be initialized with any of this
           | info.
           | 
           | It seems like both the voice and responses can be tuned
           | pretty easily though so hopefully that kind of thing can just
           | be loaded in your custom instructions.
        
           | TaylorAlexander wrote:
           | I'm born and raised in California and I think I'm a pretty
           | "California" person (for better and worse).
           | 
           | It feels exhausting watching these demos and I'm not excited
           | at all to try it. I really don't feel the need for an AI
           | assistant or chatbot to pretend to be human like this. It
           | just feels like it's taking longer to get the information I
           | want.
           | 
           | You know in the TV series "Westworld" they have this mode,
           | called "analysis", where they can tell the robots to "turn
           | off your emotional affect".
           | 
           | I'd _really_ like to see this one have that option. Hopefully
           | it will comply if you tell it, but considering how strong
           | some of the RLHF has been in the past I'm not confident in
           | that.
        
             | jameshart wrote:
             | I found it jarring that the presenters keep beginning
             | dialogs by asking the chatbot how it is. It's stateless.
             | There is no 'how' for it to be. Why are you making it
             | roleplay as a human being forced to make small talk?
        
         | MattPalmer1086 wrote:
         | I had the same reaction. While incredibly impressive, it wasn't
         | something I would want to interact with.
        
           | j-krieger wrote:
           | Yes. This model - and past models to an extent - have a very
           | unique american and californian feel to them in their
           | response. I am German for example, and day to that
           | conversations lack any superficial flattery so much that the
           | demo feels extreme to me.
        
       | brainer wrote:
       | OpenAI's Mission and the New Voice Mode of GPT-4
       | 
       | * Sam Altman, the CEO of OpenAI, emphasizes two key points from
       | their recent announcement. Firstly, he highlights their
       | commitment to providing free access to powerful AI tools, such as
       | ChatGPT, without advertisements or restrictions. This aligns with
       | their initial vision of creating AI for the benefit of the world,
       | allowing others to build amazing things using their technology.
       | While OpenAI plans to explore commercial opportunities, they aim
       | to continue offering outstanding AI services to billions of
       | people at no cost.
       | 
       | * Secondly, Altman introduces the new voice and video mode of
       | GPT-4, describing it as the best compute interface he has ever
       | experienced. He expresses surprise at the reality of this
       | technology, which provides human-level response times and
       | expressiveness. This advancement marks a significant change from
       | the original ChatGPT and feels fast, smart, fun, natural, and
       | helpful. Altman envisions a future where computers can do much
       | more than before, with the integration of personalization, access
       | to user information, and the ability to take actions on behalf of
       | users.
       | 
       | https://blog.samaltman.com/gpt-4o
        
         | simonw wrote:
         | Please don't post AI-generated summaries here.
        
           | reisse wrote:
           | The facts that AI-generated summaries are still detected
           | instantaneously and are bad enough for people to explicitly
           | ask _not_ to post them says something about current state of
           | LLMs.
        
             | simonw wrote:
             | Honestly the clue here wasn't so much the quality as the
             | fact that it was posted at all.
             | 
             | No human would ever bother posting a ~180 word summary of a
             | ~250 word blog post like that.
        
               | bossyTeacher wrote:
               | You must be really confident to make a statement about 4
               | billions of people, 99% of which you have never
               | interacted with. Your hyper microscopic sample is not
               | even randomly distributed.
               | 
               | This reminds me of those psychology studies in the 70s
               | and 80s were the subjects were all middle class european-
               | american and yet the researchers felt confident enough to
               | generalise the results to all humans
        
         | bamboozled wrote:
         | _access to user information,_
         | 
         | Sam, please stop ok, those things you saw on tv when you were a
         | kid? They were dystopian movies, we don't want that for real,
         | ok?
        
       | deegles wrote:
       | what's the path from LLMs to "true" general AI? is it "only" more
       | training power/data or will they need a fundamental shift in
       | architecture?
        
       | banjoe wrote:
       | I still need to talk very fast to actually chat with ChatGPT
       | which is annoying. You can tell they didn't fix this based on how
       | fast they are talking in the demo.
        
       | gallerdude wrote:
       | Interesting that they didn't mention a bump in capabilities - I
       | wrote a LLM benchmark a few weeks ago, and before GPT-4 could
       | solve Wordle about ~48% of the time.
       | 
       | Currently with GPT-4o, it's easily clearing 60% - while blazing
       | fast, and half the cost. Amazing.
        
       | dom96 wrote:
       | I can't help but feel a bit let down. The demos felt pretty
       | cherry picked and still had issues with the voice getting cut off
       | frequently (especially in the first demo).
       | 
       | I've already played with the vision API, so that doesn't seem all
       | that new. But I agree it is impressive.
       | 
       | That said, watching back a Windows Vista speech recognition
       | demo[1] I'm starting to wonder if this stuff won't have the same
       | fate in a few years.
       | 
       | 1 - https://www.youtube.com/watch?v=VMk8J8DElvA
        
         | quenix wrote:
         | I think the voice was getting cut off because it heard the
         | crowd reaction and paused (basically it's a feature, not a
         | bug).
        
       | jrflowers wrote:
       | I like the robot typing at the keyboard that has B as half of the
       | keys and my favorite part is when it tears up the paper and
       | behind it is another copy of that same paper
        
       | hu3 wrote:
       | That they are offering more features for free concurs with my
       | theory that, just like search, state of the art AI will soon be
       | "free", in exchange for personal information/ads.
        
         | martingalex2 wrote:
         | Need more data.
        
       | CosmicShadow wrote:
       | In the video where the 2 AI's sing together, it starts to get
       | really cringey and weird to the point where it literally sounds
       | like it's being faked by 2 voice actors off-screen with literal
       | guns to their heads trying not to cry, did anyone else get that
       | impression?
       | 
       | The tonal talking was impressive, but man that part was like, is
       | someone being tortured or forced against their will?
        
         | flakiness wrote:
         | Here is the link: https://www.youtube.com/watch?v=Bb4HkLzatb4
         | 
         | I think this demo is more for showing the limit like "It can
         | sing isn't it amazing?" than being practical, and I think it
         | perfectly served the purpose.
         | 
         | I agree that the tortured impression. It partly comes from the
         | facial expression of the presenter. She's clearly enjoying
         | pushing it to the edge.
        
           | bigyikes wrote:
           | It didn't just demonstrate the ability to sing, but also the
           | ability for two AIs to cooperate! I'm not sure which was more
           | impressive
        
       | mickg10 wrote:
       | So, babelfish soon?
        
       | taytus wrote:
       | the OpenAI live stream was quite underwhelming...
        
       | mickg10 wrote:
       | So, babelfish incoming?
        
       | alvaroir wrote:
       | I'm really impressed about this demo! Apart from the usual
       | quality benchmarks I'm really impressed about the latency for
       | audio/video: "It can respond to audio inputs in as little as 232
       | milliseconds, with an average of 320 milliseconds, which is
       | similar to human response"... If true at scale, what could be the
       | "tricks" they're using for achieving that?!
        
       | Thaxll wrote:
       | It's pretty impressive, although I don't like the voice / tone, I
       | prefer something more neutral.
        
       | blixt wrote:
       | GPT-4o being a truly multimodal model is exciting, does open the
       | door to more interesting products. I was curious about the new
       | tokenizer which uses much fewer tokens for non-English, but also
       | 1.1x fewer tokens for English, so I'm wondering if this means
       | each token now can be more possible values than before? Might
       | make sense provided that they now also have audio and image
       | output tokens? https://openai.com/index/hello-gpt-4o/
       | 
       | I wonder what "fewer tokens" really means then, without context
       | on raising the size of each token? It's a bit like saying my JPEG
       | image is now using 2x fewer words after I switched from a 32-bit
       | to a 64-bit architecture no?
        
         | zackangelo wrote:
         | New tokenizer has a much larger vocabulary (200k)[0].
         | 
         | [0]
         | https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...
        
         | bigyikes wrote:
         | Besides increasing the vocabulary size, one way to use "fewer
         | tokens" is to adjust how the tokenizer is trained.
         | 
         | If you increase the amount of non-English language
         | representation in your data set, there will be more tokens
         | which cover non-English concepts.
        
         | kolinko wrote:
         | The size can stay the same. Tokens get converted into state
         | which is a vector of 4000+ dimensions. So you could have
         | millions of tokens even and still encode them into the same
         | state size.
        
       | catchnear4321 wrote:
       | window dressing
       | 
       | his love for yud is showing.
        
       | frabcus wrote:
       | I can't see any calculator for the audio pricing
       | (https://openai.com/api/pricing/) or document type field in the
       | Chat Completions API (https://platform.openai.com/docs/api-
       | reference/chat/create) for this new model.
       | 
       | Is the audio in API not available yet?
        
       | willsmith72 wrote:
       | > We plan to launch support for GPT-4o's new audio and video
       | capabilities to a small group of trusted partners in the API in
       | the coming weeks.
       | 
       | So no word on an audio api for regular joes? that's the number 1
       | thing i'm looking for
        
       | UncleOxidant wrote:
       | Looking at the demo video, the AIs are a bit too chatty. The
       | human has to often interrupt them.
       | 
       | A nice feature would be to be able to select a Meyer's Briggs
       | personality type for your AI chatbot.
        
       | michalf6 wrote:
       | I cannot find the mac app anywhere. Is there a link?
        
       | Painsawman123 wrote:
       | My main takeaway is that Generative AI has hit a wall... New
       | paradigms, architectures and breakthroughs are necessary for the
       | field to progress but this begs the question, If everyone knows
       | the current paradigms have hit a wall, Why is so much money being
       | spent on LLMs ,diffusion models etc,which are bound to become
       | obsolete within a few(?) years?
        
       | I_am_tiberius wrote:
       | Interested in how many LLM startups there are that are going out
       | of business due to this voice assistant.
        
       | windowshopping wrote:
       | There's a button on this page that says "try on ChatGPT ->" but
       | that's still version 3.5 and if I upgraded seems to be version 4.
       | 
       | Is this new version not available to users yet?
        
       | xyst wrote:
       | The naming of these systems has me dead
        
       | nikolay wrote:
       | I am a paid customer, yet I don't see anything new. I'm tired of
       | these fake announcements of "released" features.
        
       | Satam wrote:
       | So far OpenAI's template is: amazing demos create hype -> reality
       | turns out to be underwhelming.
       | 
       | Sora is not yet released and not clear when it will be. Dall-e is
       | worse than mid-journey in most cases. GPT-4 has either gotten
       | worse or stayed the same. Vision is not really usable for
       | anything practical. Voice is cool but not that useful, especially
       | with lack of strong reasoning from the base model.
       | 
       | Is this sandbagging or is the progress slower than what they're
       | broadcasting?
        
       | zone411 wrote:
       | It doesn't improve on NYT Connections leaderboard:
       | 
       | GPT-4 turbo (gpt-4-0125-preview) 31.0
       | 
       | GPT-4o 30.7
       | 
       | GPT-4 turbo (gpt-4-turbo-2024-04-09) 29.7
       | 
       | GPT-4 turbo (gpt-4-1106-preview) 28.8
       | 
       | Claude 3 Opus 27.3
       | 
       | GPT-4 (0613) 26.1
       | 
       | Llama 3 Instruct 70B 24.0
       | 
       | Gemini Pro 1.5 19.9
       | 
       | Mistral Large 17.7
        
       | gentile wrote:
       | There is a spelling mistake in the japanese translation under
       | language tokenization. In konnichiwa, wa should be ha.
        
       | stilwelldotdev wrote:
       | I love that there is a real competition happening. We're going to
       | see some insane innovations.
        
       | ravroid wrote:
       | In my experience so far, GPT-4o seems to sit somewhere between
       | the capability of GPT-3.5 and GPT-4.
       | 
       | I'm working on an app that relies more on GPT-4's reasoning
       | abilities than inference speed. For my use case, GPT-4o seems to
       | do worse than GPT-4 Turbo on reasoning tasks. For me this seems
       | like a step-up from GPT-3.5 but not from GPT-4 Turbo.
       | 
       | At half the cost and significantly faster inference speed, I'm
       | sure this is a good tradeoff for other use cases though.
        
         | mike00632 wrote:
         | I have never tried GPT-4 because I don't pay for it. I'm really
         | looking forward to GPT-4o being released to free tier users.
        
       | lwansbrough wrote:
       | Very impressive. Please provide a voice that doesn't use radio
       | jingle intonation, it is really obnoxious.
       | 
       | I'm only half joking when I say I want to hear a midwestern blue
       | collar voice with zero tact.
        
       | ajdoingnothing wrote:
       | If there was any glimmer of hope for "Rabbit M1" or "Humane AI
       | pin", it can be buried to dust.
        
       | unglaublich wrote:
       | I hope we can disable the cringe American hyperemotions.
        
       | stavros wrote:
       | I made a website with book summaries
       | (https://www.thesummarist.net/) and I tested GPT-4o in generating
       | one, and it was bad. It reminded me of GPT-3.5. I didn't test too
       | much, but preliminary results don't look good.
        
       | glenstein wrote:
       | Text access rolling out today, apparently:
       | 
       | >GPT-4o's text and image capabilities are starting to roll out
       | today in ChatGPT. We are making GPT-4o available in the free
       | tier, and to Plus users with up to 5x higher message limits.
       | 
       | Anyone have access yet? Not there for me so far.
        
         | toxic72 wrote:
         | It shows available for me in the OpenAI playground currently.
        
       | m3kw9 wrote:
       | The big news is that this is gonna be free
        
       | wesleyyue wrote:
       | If anyone wants to try it for coding, I just added support for
       | GPT4o in Double (https://double.bot)
       | 
       | In my tests:
       | 
       | * I have a private set of coding/reasoning tests and it's been
       | able to ace all of them so far, beating Opus, GPT4-Turbo, and
       | Llama 3 70b. I'll need to find even more challenging tests now...
       | 
       | * It's definitely significantly faster, but we'll see how much of
       | this is due to model improvements vs over provisioned capacity.
       | GPT4-Turbo was also significantly faster at launch.
        
       | loveiswork wrote:
       | While I do feel a bit of "what is the point of my premium sub",
       | I'm really excited for these changes.
       | 
       | Considering our brain is a "multi-modal self-reinforcing
       | omnimodel", I think it makes sense for the OpenAI team to work on
       | making more "senses" native to the model. Doing so early will set
       | them up for success when future breakthroughs are made in greater
       | intelligence, self-learning, etc.
        
       | 65 wrote:
       | Time to bring back Luddism.
        
       | OutOfHere wrote:
       | I am observing an extremely high rate of text hallucinations with
       | gpt-4o (gpt-4o-2024-05-13) as tested via the API. I advise
       | extreme caution with it. In contrast, I see no such concern with
       | gpt-4-turbo-preview (gpt-4-0125-preview).
        
         | fdb wrote:
         | Same here. I observed it making up functions in d3
         | (`d3.geoProjectionRaw` and `d3.geoVisible`), in addition to
         | ignoring functions it _could_ have used.
        
         | bigyikes wrote:
         | If true, makes me wonder what kind of regression testing OpenAI
         | does for these models. It can't be easy to write a unit test
         | for hallucinations.
        
           | OutOfHere wrote:
           | At a high level, ask it to produce a ToC of information about
           | something that you know will exist in the future, but does
           | not yet exist, but also tell it to decline the request if it
           | doesn't verifiably know the answer.
        
             | bigyikes wrote:
             | How do you generalize that for all inputs though?
        
               | OutOfHere wrote:
               | I am not sure I understand the question. I sampled
               | various topics. I used this prompt: https://raw.githubuse
               | rcontent.com/impredicative/podgenai/mas...
               | 
               | In the prompt, substitute {topic} with something from the
               | near future. As I noted, it behaves correctly for turbo
               | (rejecting the request), and very badly for o
               | (hallucinating nonsense).
        
       | mtam wrote:
       | GPT-4o is very fast but seems to generate some very random ASCII
       | Art compared to GPT-4 when text in the art is involved.
        
       | ta-run wrote:
       | This looks too good to be true? What's the catch?
       | 
       | Also, wasn't expecting the perf to improve by 2x
        
       | 0xbadc0de5 wrote:
       | As a paid user, it would have been nice to see something that
       | differentiates that investment from the free tier.
       | 
       | The tech demos are cool and all - but I'm primarily interested in
       | the correctness and speed of ChatGPT and how well it aligns with
       | _my_ intentions.
        
       | roschdal wrote:
       | Chat GPT-4o (OOOO!) - the largest electricity bill in the world.
        
       | unouplonk wrote:
       | The end-to-end audio situation is especially interesting as the
       | concept has been around for a while but there weren't any
       | successful implementations of it up to this point that I'm aware
       | of.
       | 
       | See this post from November:
       | https://news.ycombinator.com/item?id=38339222
        
       | razodactyl wrote:
       | I think this is a great example of the bootstrapping that was
       | enabled when they pipelined the previous models together.
       | 
       | We do this all the time in ML. You can generate a very powerful
       | dataset using these means and further iterate with the end model.
       | 
       | What this tells me now is that the runway to GPT5 will be laid
       | out with this new architecture.
       | 
       | It was a bit cold in Australia today. Did you Americans stop
       | pumping out GPU heat temporarily with the new model release? Heh
        
       | therealmarv wrote:
       | after watching the OpenAI videos I'm looking at my sad Google
       | Assistant speaker in the corner.
       | 
       | Come on Google... you can update it.
        
       | bogwog wrote:
       | I was about to say how this thing is lame because it sounds so
       | forced and robotic and fake, and even though the intonations do
       | make it sound more human-like, it's very clear that they made a
       | big effort to make it sound like natural speech, but failed.
       | 
       | ...but then I realized that's basically the kind of thing Data
       | from Star Trek struggles with as part of his character. We're
       | almost in that future, and I'm already falling into the role of
       | the ignorant human that doesn't respect androids.
        
       | dev1ycan wrote:
       | I think people excited should look at the empty half of the glass
       | here, this is pretty much an admitance that they are struggling
       | to go past gpt 4 on a significant scale.
       | 
       | Not like they have to be scared yet, I mean Google has yet to
       | release their vaporware Ultra model that is supposedly like 1%
       | better than GPT 4 in some metrics...
       | 
       | I smell an AI crash coming in a few years if they can't actually
       | get this stuff usable for day to day life.
        
       | garyrob wrote:
       | So far, I'm impressed. It seems to be significantly better than
       | GPT-4 at accessing current online documentation and forming
       | answers that use it effectively. I've been asking it to do so,
       | and it has.
        
       | Hugsun wrote:
       | Very interesting and extremely impressive!
       | 
       | I tried using the voice chat in their app previously and was
       | disappointed. The big UX problem was that it didn't try to
       | understand when I had finished speaking. English is a second
       | language and I paused a bit too long thinking of a word and it
       | just started responding to my obviously half spoken sentence.
       | Trying again it just became stressful as I had to rush my words
       | out to avoid an annoying response to an unfinished thought.
       | 
       | I didn't try interrupting it but judging by the comments here it
       | was not possible.
       | 
       | It was very surprising to me to be so overtly exposed to the
       | nuances of real conversation. Just this one thing of not
       | understanding when it's your turn to talk made the interaction
       | very unpleasant, more than I would have expected.
       | 
       | On that note, I noticed that the AI in the demo seems to be very
       | rambly. It almost always just kept talking and many statements
       | were reiterations of previous ones. It reminded me of a type of
       | youtuber that uses a lot of filler phrases like "let's go ahead
       | and ...", just to be more verbose and lessen silences.
       | 
       | Most of the statements by the guy doing the demo were
       | interrupting the AI.
       | 
       | It's still extremely impressive but I found this interesting
       | enough to share. It will be exciting to see how hard it is to
       | reproduce these abilities in the open, and to solve this issue.
        
         | luminen wrote:
         | "I paused a bit too long thinking of a word and it just started
         | responding to my obviously half spoken sentence. Trying again
         | it just became stressful as I had to rush my words out to avoid
         | an annoying response to an unfinished thought."
         | 
         | I'm a native speaker and this was my experience as well. I had
         | better luck manually sending the message with the "push to
         | hold" button.
        
         | ijidak wrote:
         | > I noticed that the AI in the demo seems to be very rambly
         | 
         | I know this is a serious conversation, but when the presenters
         | had to cut it off, I got flashbacks to Data in Star Trek TNG!!
         | And 3PO in Star Wars!
         | 
         | Human: "Shut up"
         | 
         | Robot: "Shutting up sir"
         | 
         | Turns out rambling AI was an accurate prediction!
        
           | yreg wrote:
           | There needs to be an override for this.
           | 
           | When you tell Siri to shut up, it either apologizes or
           | complains about your behaviour. When you tell Alexa to shut
           | up, it immediately goes silent.
           | 
           | I prefer the latter when it comes to computers.
        
         | yreg wrote:
         | I have the same ESL UX problem with all the AI assistants.
         | 
         | I do my work in english and talk to people just fine, but with
         | machines it's usually awkward for me.
         | 
         | Also on your other note (demo seems to be very rambly), it
         | bothered me as well. I don't want the AI to continue speaking,
         | while having nothing to say until I interrupt it. Be brief.
         | That can be solved through prompts at least.
        
       | grantsucceeded wrote:
       | it seems like the ability to interrupt is more like the interrupt
       | in the computer sense ... A control-c (or control-s tty flow
       | control for you old timers), not a cognitive evaluation followed
       | by the "reasoned" decision to pause voice output. not that it
       | matters i guess, its just not general intelligence. its just flow
       | control.
       | 
       | but also, thats why it fails a real turing test. a real person
       | would be irritated as fuck by the interruptions
        
       | due-rr wrote:
       | It takes the #1 and #2 spots on the aider code leader board[1].
       | 
       | [1]: https://aider.chat/docs/leaderboards/
        
       | tgtweak wrote:
       | I feel like gpt4 has gotten progressively less useful since
       | release even, despite all the "updates" and training. It seems to
       | give correct but vague answers (political even) more and more
       | instead of actual results. It also tends to run short and give
       | brief replies vs full length replies.
       | 
       | I hope this isn't an artifact from optimization for scores and
       | not actual function. Likewise it would be disheartening but not
       | unheard of for them to reduce the performance of the previous
       | model when releasing a new one in order to make the upgrade feel
       | like that much more of an upgrade. I know this is certainly the
       | case with cellphones (even though the claim is that it is
       | unintentional) but I can't help but think the same could be true
       | here.
       | 
       | All of this is coming as news that gpt5 based on a new underlying
       | model is not far off and that gpt4(&o) may become the new
       | gpt3.5-turbo use case for most apps that are currently trying to
       | optimize costs with their use of the service.
        
         | glenstein wrote:
         | May I ask what you know about chat GPT5 being based on a new
         | underlying model?
        
         | borgdefense wrote:
         | I don't know, my experience is that it is very hard to tell if
         | the model is better or worse with an update.
         | 
         | One day I will have an amazing session and the next it seems
         | like it has been nerfed only to give better results than ever
         | the next day. Wash, rinse , repeat and randomize that ordering.
         | 
         | So far, I would have not be able to tell the difference between
         | 4 and 4o.
         | 
         | If this is the new 3.5 though then 5 will be worth the wait to
         | say the least.
        
       | blixt wrote:
       | I don't see any details on how API access to these features will
       | work.
       | 
       | This is the first true multimodal network from OpenAI, where you
       | can send an image in and retain the visual properties of the
       | image in the output from the network (previously the input image
       | would be turned into text by the model, and sent to the Dall-E 3
       | model which would provide a URL). Will we get API updates to be
       | able to do this?
       | 
       | Also, will we be able to tap into a realtime streaming instance
       | through the API to replicate the audio/video streams shown in the
       | demos? I imagine from the Be My Eyes partnership that they have
       | some kind of API like this, but will it be opened up to more
       | developers?
       | 
       | Even disregarding streaming, will the Chat API receive support
       | for audio input/output as well? Previously one might've used a
       | TTS model to voice the output from the model, but with a truly
       | multimodal model the audio output will contain a lot more nuance
       | that can't really be expressed in text.
        
         | og_kalu wrote:
         | API is up but only text, image in, text out works. I don't know
         | if this is temporary. I really hope so.
        
       | ComputerGuru wrote:
       | I have some questions/curiosities from a technical implementation
       | perspective that I wonder if someone more in the know about ML,
       | LLMs, and AI than I would be able to answer.
       | 
       | Obviously there's a reason in dropping the price of gpt-4o but
       | not gpt-4t. Yes, the new tokenizer has improvements for non-
       | English tokens, but that can't be the bulk of the reason why 4t
       | is more expensive than 4o. Given the multi-model training set,
       | how is 4o cheaper to train/run than 4t?
       | 
       | Or is this just a business decision, anyone with an app they're
       | not immediately updating from 4t to 4o continues to pay a premium
       | while they can offer a cheaper alternative for those asking for
       | it (kind of like a coupon policy)?
        
       | cchance wrote:
       | HOW ARE PEOPLE NOT MORE EXCITED, hes cutting off the AI mid
       | sentence in these and its pausing to readjust in damn near
       | realtime latency! WTF Thats a MAJOR step forward, what the hell
       | is gpt5 going to look like.
       | 
       | That realtime translation would be amazing as an option in say
       | Skype or Teams, set each individuals native language and handle
       | automated translation, shit tie it into ElevenLabs to replicate
       | your voice as well! Native translation in realtime with your own
       | voice
        
         | localfirst wrote:
         | calm down there is barely any ground breaking stuff, this is
         | basically chatgpt 3.9 but far more expensive than 3.5
         | 
         | looks like another stunt from OAI in anticipation of Google IO
         | tomorrow
         | 
         | Gemini 2.0 will be the closest we get to ChatGPT-5
        
           | cchance wrote:
           | Ah so surpassing Gemini 1.5 Pro and all other Models on
           | Vision understanding by 5-10 points is "not ground breaking"
           | all while doing it at insane latency.
           | 
           | Jesus if this shit doesn't make you coffee, and make 0
           | mistakes no ones happy anymore LOL.
        
             | localfirst wrote:
             | the only thing you should be celebrating is that its 50%
             | cheaper and twice as quick at generating text but virtually
             | no real ground breaking leaps and bounds to those studying
             | this space carefully.
             | 
             | basically its chat gpt3.9 at 50% of chatgpt4 prices
        
               | Jensson wrote:
               | > virtually no real ground breaking leaps and bounds to
               | those studying this space carefully
               | 
               | What they showed is enough to replace voice acting as a
               | profession, this is the most revolutionary thing in AI
               | the past year. Everything else is at the "fun toy but not
               | good enough to replace humans in the field" stage, but
               | this is there.
        
               | cchance wrote:
               | Between this and Eleven Labs demoing their song model,
               | literally doing full on rap battles with articulate words
               | people are seriously slacking on what these models are
               | now capable of for the voice acting/music and overall
               | "art" areas of the market.
        
               | cchance wrote:
               | Cool so ... just ignore the test results and say bullshit
               | lol It's not GPT3.9 many have already said its better
               | than GPT4 turbo, its better than Gemini 1.5 Pro and Opus
               | on Vision recognition. but sure... the price difference
               | is whats new lol
        
         | EternalFury wrote:
         | At some point, scalability is the best form of exploitation.
         | The exploration piece requires a lot more that engineering.
        
         | dcchambers wrote:
         | Honestly I found it annoying that he HAD TO cut the AI off mid-
         | sentence. These things just ramble on and on and on. If you
         | could put emotion to it, it's as if they're uncomfortable with
         | silence and just fill the space with nonsense.
         | 
         | Let's hope there's a future update where it can take video from
         | both the front and rear cameras simultaneously so it can
         | identify when I'm annoyed and stop talking (or excited, and
         | share more).
        
           | cchance wrote:
           | I mean it didn't really ramble he just seemed to be in a
           | rush, and i'm sure you could system message it to provide
           | short concise answers always.
        
             | dcchambers wrote:
             | That is not at all the impression I got.
             | 
             | Human: "Hey How's it Going?"
             | 
             | The AI: "Hey there, it's going great. How about you?
             | [Doesn't stop to let him answer] I see you're rocking an
             | OpenAI Hoodie - nice choice. What's up with that ceiling
             | though? Are you in a cool industrial style office or
             | something?"
             | 
             | How we expect a human to answer: "Hey I'm great, how are
             | you?"
             | 
             | Maybe they set it up this way to demonstrate the vision
             | functionality. But still - rambling.
             | 
             | Later on:
             | 
             | Human: "We've got a new announcement to make."
             | 
             | AI: "That's exciting. Announcements are always a big deal.
             | Judging by the setup it looks like it's going to be quite
             | the professional production. Is this announcement related
             | to OpenAI perhaps? I'm intrigued - [cut off]"
             | 
             | How we expect a human to answer: "That's exciting! Is it
             | about OpenAI?"
             | 
             | These AI chat bots all generate responses like a teenager
             | being verbose in order to hit some arbitrary word count in
             | an essay or because they think it makes them sound smarter.
             | 
             | Maybe it's just that I find it creepy that these companies
             | are trying to humanize AI while I want it to stay the
             | _tool_ that it is. I don 't want fake emotion and fake
             | intrigue.
        
           | okrad wrote:
           | I found it insightful. They showed us how to handle the rough
           | edges like when it thought his face was a wooden table and he
           | cleared the stale image reference by saying "I'm not a wooden
           | table. What do you see now?" then it recovered and moved on.
           | 
           | Perfect should not be the enemy of good. It will get better.
        
       | kleiba wrote:
       | I cannot believe that that overly excited giggle tone of voice
       | you see in the demo videos made it through quality control?! I've
       | only watched two videos so far and it's already annoying me to
       | the point that I couldn't imagine using it regularly.
        
         | Jensson wrote:
         | Just tell it to stop giggling if you don't like it. They
         | obviously choose that for the presentation since it shows off
         | the hardest things it can do, it is much easier to act formal,
         | and since it understands when you ask it to speak in a
         | different way there is no problem making it speak more formal.
        
       | caseyy wrote:
       | Few people are talking about it but... what do you think about
       | the very over-the-top enthusiasm?
       | 
       | To me, it sounds like TikTok TTS, it's a bit uncomfortable to
       | listen to. I've been working with TTS models and they can produce
       | much more natural sounding language, so it is clearly a stylistic
       | choice.
       | 
       | So what do you think?
        
         | yieldcrv wrote:
         | All these language models are very malleable. They demonstrated
         | changing the temperament in the story telling time.
        
           | caseyy wrote:
           | Looks like their TTS component is separate from the model. I
           | just tried 4o, and there is a list of voices to select from.
           | If they really only allowed that one voice or burned it into
           | the model, then that would probably have made the model
           | faster, but I think it would have been a blunder.
        
             | og_kalu wrote:
             | The new voice capabilities haven't rolled out yet.
        
         | glenstein wrote:
         | I like for that degree of expressiveness to be available as an
         | option, although it would be really irritating if I was trying
         | to use it to learn some sort of academic coursework or
         | something.
         | 
         | But if it's one in a range of possible stylistic flourishes and
         | personalities, I think it's a plus.
        
       | fnordpiglet wrote:
       | I'm a huge user of GPT4 and Opus in my work but I'm a huge user
       | of GPT4-Turbo voice in my personal life. I use it on my commutes
       | to learn all sorts of stuff. I've never understood the details of
       | cameras and the relationship between shutter speed and aperture
       | and iso in a modern dslr which given the aurora was important. We
       | talked through and I got to an understanding in a way having read
       | manuals and textbooks didn't really help before. I'm a much
       | better learner by being able to talk and hear and ask questions
       | and get responses.
       | 
       | Extend this to quantum foam, to ergodic processes, to entropic
       | force, to Darius and Xerces, to poets of the 19th century - it's
       | changed my life. Really glad to see an investment in stream
       | lining this flow.
        
         | Xiol32 wrote:
         | Have you actually verified anything you've learned from it, or
         | are you just taking everything it says as gospel?
        
           | xNeil wrote:
           | it's rarely wrong when it comes to concepts - it's the facts
           | and numbers that it hallucinates.
        
             | xcv123 wrote:
             | Just like learning from another human. A person can teach
             | you the higher level concepts of some programming language
             | but wouldn't remember the entire standard library.
        
           | sunnynagam wrote:
           | I do similar stuff, I'm just willing to learn a lot more at
           | the cost of a small percent of my knowledge being incorrect
           | from hallucinations, just a personal opinion. Sure human
           | produced sources of info is gonna be more accurate (more not
           | 100% still), and I'll default to that for important stuff.
           | 
           | But the difference is I actually want to and do use this
           | interface more.
        
             | mewpmewp2 wrote:
             | Also even if I learn completely factual information, I'm
             | still probably going to misremember some facts myself.
        
           | blazespin wrote:
           | Good thing to do regardless of the source, AI or Human,
           | right?
           | 
           | I do verify by using topics I'm an expert in and I find
           | hallucination to be less of an issue than depth of nuance.
           | 
           | For topics I'm just learning, depth of nuance goes over my
           | head anyways.
        
             | residentraspber wrote:
             | I agree with this as good practice in general, but I think
             | the human vs LLM thing is not a great comparison in this
             | case.
             | 
             | When I ask a friend something I assume that they are in
             | good faith telling me what they know. Now, they could be
             | wrong (which could be them saying "I'm not 100% sure on
             | this") or they could not be remembering correctly, but
             | there's some good faith there.
             | 
             | An LLM, on the other hand, just makes up facts and doesn't
             | know if they're incorrect or not or even what percentage
             | sure it is. And to top things off, it will speak with
             | absolute certainty the whole time.
        
           | fnordpiglet wrote:
           | Of course, I'm not an idiot and I understand LLM very well.
           | But generally as far as well documented stuff goes and stuff
           | that exists it's almost 100% accurate. It's when you ask it
           | to extrapolate or discuss topics that are fiction (even
           | without realizing) you stray. Asking it to reason is a bad
           | idea as it fundamentally is unable to reason and any
           | approximation of reasoning is precisely that. Generally
           | though for effectively information retrieval of well
           | documented subjects it's invariably accurate and can answer
           | relatively nuanced questions.
        
             | Loughla wrote:
             | How do I know what is well documented with established
             | agreement on process/subject, though? Wouldn't this be
             | super open to ignorance bias?
        
               | fnordpiglet wrote:
               | Because I'm a well educated grown up and am familiar with
               | a great many subjects that I want to learn more about.
               | How do you? I can't help you with that. You might be
               | better off waiting for the technology to mature more.
               | It's very nascent but I'm sure in the fullness of time
               | you might feel comfortable asking it questions on basic
               | optics and photography and other well documented subjects
               | with established agreement on process etc, once you
               | establish your own basis for what those subjects are. In
               | the mean time I'm super excited for this interface to
               | mature for my own use!! (It is true tho I do love and
               | live dangerously!)
        
           | whimsicalism wrote:
           | it's more reliable than the facts most of my friends tell me
        
         | brailsafe wrote:
         | I think this is probably one of the most compelling personal
         | uses for a tool like this, but your use of it begs the same
         | question as every other activity that amounts to more pseudo-
         | intellectual consumption; what is the value of that
         | information, and how much of ones money and time should be
         | allocated to digesting (usually high-level) arbitrary
         | information?
         | 
         | If I was deliberately trying to dive deep on _one_ particular
         | hobby, or trying to understand how a particular algorithm
         | works, there 's clear value in spending concentrated time to
         | learn that subject, deliberately focused and engaged with it,
         | and a system like your describe might play a role in that. If
         | I'm in school and forced to quickly learn a bunch of crap I'll
         | be tested on, then the system has defined another source of
         | real value, at least in the short term. But if I'm diving deep
         | on one particular hobby and filling my brain with all sorts of
         | other ostensibly important information, I think that just
         | amounts at best to more entertainment that fakes its way above
         | other aspects of life in the hierarchy of ways one could spend
         | time (the irony of me saying this in a comment on HN is not
         | lost on me).
         | 
         | Earlier in my life I figured it would be worthwhile to read
         | articles on the bus, or listen to non-fiction podcasts, because
         | knowledge is inherently valuable and there's not enough time,
         | and if I just wore earbuds throughout my entire day, I'd learn
         | so much! How about at the gym, so much wasted learning time
         | while pushing weights, keep those earbuds in! A walk around the
         | neighborhood? On the plane? On the train? All time that could
         | be spent learning about some bs that's recently become much
         | easier to access, or so my 21 y.o self would have me believe.
         | 
         | But I think now it's a phony and hollow existence if you're
         | just cramming your brain with all sorts of stuff in the
         | background or in marginally more than a passive way. I could
         | listen to a lot of arbitrary German language material, but
         | realistically the value I'd convince myself I'd get out of any
         | of that is lost if I'm not about to take that home and grind it
         | out for hours, days, move to a German speaking country, have an
         | existing intense interest in untranslatable German art, or have
         | literally any reason to properly learn a language and dedicate
         | real expensive time to it.
        
           | joquarky wrote:
           | I did this information sponge phase up until my mid-40s with
           | burnout. Now I wish I had invested some of that time in
           | learning social skills.
        
       | fekunde wrote:
       | Just something I noticed in the Language tokenization section
       | 
       | When referring to itself, it uses the female word in Marathi
       | nmskaar, maajhe naav jiipiittii-4o aahe| mii ek nviin prkaarcii
       | bhaassaa moNddel aahe| tumhaalaa bhettuun aanNd jhaalaa!
       | 
       | and Male word in Hindi nmste, meraa naam jiipiittii-4o hai /
       | maiN ek ne prkaar kaa bhaassaa moNddl huuN /  aapse milkr acchaa
       | lgaa!
        
       | cchance wrote:
       | Wow Vision Understanding blew Gemini Pro 1.5 out of the water
        
       | localfirst wrote:
       | This isn't chatgpt 5
        
       | ElemenoPicuares wrote:
       | I'm so happy seeing this technology flourish! Some call it hype,
       | but this much increased worker productivity is sure to spike
       | executive compensation. I'm so glad we're not going to let China
       | win by beating us to the punch tanking hundreds of thousands, if
       | not millions of people's income without bothering to see if
       | there's a sane way to avoid it. What good are people, anyway if
       | there isn't incredible tech to enhance them with?
        
       | bigyikes wrote:
       | The AI duet really starts to hint at what will make AI so
       | powerful. It's not just that they're smart, it's that they can be
       | cloned.
       | 
       | If your wallet is large enough, you can make 2 GPTs sing just as
       | easily as you can make 100 GPTs sing.
       | 
       | What can you do with a billion GPTs?
        
       | cchance wrote:
       | Wait i thought it said available to free users... i don't see it
       | on chatgpt
        
       | Erazal wrote:
       | I'm not as much surprised by the capabilities of new model (IMHO
       | same as GPT-4) as by it's real time capabilities.
       | 
       | My brother who can't see correctly, will use this to cook a meal
       | without me explaining this to him it's so cool.
       | 
       | People all around the world will now get real-time AI assistance
       | for a ton of queries.
       | 
       | Heck - I have a meeting bot API company
       | (https://aimeetingbot.com) and that makes me really hyped!
        
       | EternalFury wrote:
       | Pretty responsible progress management by OpenAI.
       | 
       | Kicking off another training wave is easy, if you can afford the
       | electricity, but without new, non-AI tainted datasets or new
       | methods, what's the point?
       | 
       | So, in the meantime, make magic with the tool you already have,
       | without freaking out the politicians or the public.
       | 
       | Wise approach.
        
       | localfirst wrote:
       | 50% cheaper than ChatGPT-4 Turbo...
       | 
       | But this falls short of the ChatGPT-5 we were promised last year
       | 
       | edit: ~~just tested it out and seems closer to Gemini 1.5 ~~ and
       | it is faster than turbo....
       | 
       | edit: its basically chat gpt 3.9. not quite 4 definitely not 3.5.
       | just not sure if the prices make sense.
        
       | mupuff1234 wrote:
       | The stock market doesn't seem too impressed - GOOG rebounded from
       | strong red to neutral.
        
         | partiallypro wrote:
         | Probably because people thought OpenAI was going to launch a
         | new search engine, but didn't.
        
       | nuz wrote:
       | Yet another release _right_ before google releases something.
       | This time right before Google IO. Third time they 've done this
       | by my count.
        
       | nestorD wrote:
       | The press statement has consistent image generation and other
       | image manipulation (depicting the same character in different
       | poses, taking a photo and generating a caricature of the person,
       | etc) that does not seem deployed to the chat interface.
       | 
       | Will they be deployed? They would make the OpenAI image model
       | significantly more useful than the competition.
        
       | EternalFury wrote:
       | Pretty responsible progress management by OpenAI. Kicking off
       | another training wave is easy, if you can afford the electricity,
       | but without new, non-AI tainted datasets or new methods, what's
       | the point? So, in the meantime, make magic with the tool you
       | already have, without freaking out the politicians or the public.
       | Wise approach.
        
       | jpeter wrote:
       | Impressive way to gather more training data
        
       | mindcandy wrote:
       | Ohhhhhhhh, boy... Listening to all that emotional vocal
       | inflection and feedback... There are going to be at least 10
       | million lonely guys with new AI girlfriends. "She's not real.
       | But, she interested in everything I say and excited about
       | everything I care about" is enough of a sales pitch for a lot of
       | people.
        
         | Jensson wrote:
         | > She's not real
         | 
         | But she will be real at some point in the next 10-20 years, the
         | main thing to solve for that to be a reality is for robots to
         | safely touch humans, and they are working really really hard on
         | that because it is needed for so many automation tasks,
         | automating sex is just a small part of it.
         | 
         | And after that you have a robot that listens to you, do your
         | chores and have sex with you, at that point she is "real". At
         | first they will be expensive so you have robot brothels (I
         | don't think there are laws against robot prostitution in many
         | places), but costs should come down.
        
           | elicksaur wrote:
           | We have very different definitions of "real" for this topic.
        
             | itscodingtime wrote:
             | Doesn't have to be real for the outcomes to the be the
             | same.
        
               | pb7 wrote:
               | The outcomes are not the same.
        
           | kylehotchkiss wrote:
           | > "But the fact that my Kindroid has to like me is meaningful
           | to me in the sense that I don't care if it likes me, because
           | there's no achievement for it to like me. The fact that there
           | is a human on the other side of most text messages I send
           | matters. I care about it because it is another mind."
           | 
           | > "I care that my best friend likes me and could choose not
           | to."
           | 
           | Ezra Klein shared some thoughts on this on his AI podcast
           | with Nilay Patel that resonated on this topic for me
        
             | Jensson wrote:
             | People care about dogs, I have never met a dog that didn't
             | love its owner. So no, you are just wrong there, I have
             | never heard anyone say that the love they get from their
             | dogs is false, people love dogs exactly because their love
             | is so unconditional.
             | 
             | Maybe there are some weirdos out there that feels
             | unconditional love isn't love, but I have never heard
             | anyone say that.
        
               | mewpmewp2 wrote:
               | Also I don't know how you can choose to like or not like
               | someone. You either do or you don't.
        
               | sevagh wrote:
               | >Maybe there are some weirdos out there that feels
               | unconditional love isn't love, but I have never heard
               | anyone say that.
               | 
               | I'll be that weirdo.
               | 
               | Dogs seemingly are bred to love. I can literally get some
               | cash from an ATM, drive out to the sticks, buy a puppy
               | from some breeder, and it will love me. Awww, I'm a hero.
        
               | FeepingCreature wrote:
               | Do you think that literally being able to buy love
               | cheapens it? Way I see it, love is love: surely it being
               | readily available is a good thing.
               | 
               | I'm bred to love my parents, and them me; but the fact
               | that it's automatic doesn't make it feel any less.
        
               | Janicc wrote:
               | I guess I'm the weirdo who actually always considered the
               | unconditional love of a dog to be vastly inferior to the
               | earned love of a cat for example.
        
               | malfist wrote:
               | The cat only fools you into thinking it loves you to lure
               | you into a false sense of security
        
               | px43 wrote:
               | That's just the toxoplasmosis speaking :-D
        
               | plokiju wrote:
               | Dogs don't automatically love either, you have to build a
               | bond. Especially if they are shelter dogs with abusive
               | histories, they're often nervous at first
               | 
               | They're usually loving by nature, but you still have to
               | build a rapport, like anyone else
        
               | soperj wrote:
               | > I have never met a dog that didn't love its owner.
               | 
               | Michael Vick's past dogs have words.
        
             | SkyBelow wrote:
             | >has to like me
             | 
             | I feel likely people aren't imagining with enough cyberpunk
             | dystopian enthusiasm. Can't an AI be made that doesn't
             | inherently like people? Wouldn't it be possible to make an
             | AI that likes some people and not others? Maybe even make
             | AIs that are inclined to liking certain traits, but which
             | don't do so automatically so it must still be convinced?
             | 
             | At some point we have an AI which could choose not to like
             | people, but would value different traits than normal
             | humans. For example an AI that doesn't value appearance at
             | all and instead values unique obsessions as being
             | comparable to how the standard human values attractiveness.
             | 
             | It also wouldn't be so hard for a person to convince
             | themselves that human "choice" isn't so free spirited as
             | imagined, and instead is dependent upon specific factors no
             | different than these unique trained AIs, except that the
             | traits the AI values are traits that people generally find
             | themselves not being valued by others for.
        
               | Jensson wrote:
               | Extension of that is fine tuning an AI that loves you the
               | most of everyone and not other humans. That way the love
               | becomes really real, the AI loves you for who you are,
               | instead of loving just anybody. Isn't that what people
               | hope for?
               | 
               | I'd imagine they will start fine tuning AI girlfriends to
               | do that in the future, because that way the love probably
               | feels more, and then people will ask "is human love
               | really real love?" because humans can't love that
               | strongly.
        
           | al_borland wrote:
           | This is not a solution... everyone gets a robot and then the
           | human races dies out. Robots lack a key feature of human
           | relationships... the ability to make new human life.
        
             | whenlambo wrote:
             | yet
        
             | Jensson wrote:
             | It is a solution to a problem, not a solution to every
             | problem.
             | 
             | If you want to solve procreation them you can do that
             | without humans having sex with humans.
        
               | al_borland wrote:
               | This future some people are envisioning seems very
               | depressing.
        
           | sapphicsnail wrote:
           | > And after that you have a robot that listens to you, do
           | your chores and have sex with you, at that point she is
           | "real".
           | 
           | I sure hope you're single because that is a terrible way to
           | view relationships.
        
             | Jensson wrote:
             | That isn't how I view relationships with humans, that is
             | how I view relationships with robots.
             | 
             | I hope you understand the difference between a relationship
             | with a human and a robot? Or do you think we shouldn't take
             | advantage of robots being programmable to do what we want?
        
         | aeyes wrote:
         | Without memory of previous conversations an AI girlfriend is
         | going to get boring really fast.
        
           | danielbln wrote:
           | https://openai.com/index/memory-and-new-controls-for-
           | chatgpt...
        
           | int_19h wrote:
           | As it happens, ChatGPT has memory enabled by default these
           | days.
        
             | sangnoir wrote:
             | What possibly could go wrong with a snitching AI girlfriend
             | remembers everything you say and when? If OpenAI doesn't
             | have a Law Enforcement lliason who charges a "modest
             | amount", then they dont want to earn the billions on
             | investment back. I imagine every spy agency worth its salt
             | wants access to this data for human intelligence purposes.
        
         | llm_trw wrote:
         | Hear me out: what if we don't want real?
        
           | gffrd wrote:
           | Hmm! Tell me more: why not want real? What are the upsides?
           | And downsides?
        
             | grugagag wrote:
             | Real would pop their bubble. An AI would tell them what
             | they want to hear, how they want it to hear, when they want
             | to hear it. Except there won't be any real partner.
        
             | globular-toast wrote:
             | To paraphrase Patrice O'Neal: men want to be alone, but we
             | don't want to be by ourselves. That means we want a woman
             | to be around, just not _right here_.
        
           | cryptoegorophy wrote:
           | I will take a picture of this message and add it to the list
           | of reasons for population collapse.
        
             | DonHopkins wrote:
             | That may be how AI ends up saving the Earth!
        
           | gcanyon wrote:
           | Hear me out: what if this overlaps 80% with what "real"
           | _really_ is?
        
             | TaylorAlexander wrote:
             | Well it doesn't. Humans are so much more complex than what
             | we have seen before, and if this new launch was actually
             | that much closer to being a human they would say so. This
             | seems more like an enhancement on multimodal capabilities
             | and reaction time.
             | 
             | That said even if this did overlap 80% with "real", the
             | question remains: what if we don't want that?
        
               | amelius wrote:
               | I'm betting that 80% of what most humans say in daily
               | life is low-effort and can be generated by AI. The
               | question is if most people really need the remaining 20%
               | to experience a connection. I would guess: yes.
        
               | Capricorn2481 wrote:
               | Even if this were true, which it isn't, you can't boil
               | down humans to just what they say
        
               | brookst wrote:
               | This. We are mostly token predictors. We're not
               | _entirely_ token predictors, but it 's at least 80%.
               | Being in the AI space the past few years has really made
               | me notice how similar we are to LLMs.
               | 
               | I notice it so often in meetings where someone will use a
               | somewhat uncommon word, and then other people will start
               | to use it because it's in their context window. Or when
               | someone asks a question like "what's the forecast for q3"
               | and the responder almost always starts with "Thanks for
               | asking! The forecast for q3 is...".
               | 
               | Note that low-effort does not mean low-quality or low-
               | value. Just that we seem to have a lot of
               | language/interaction processes that are low-effort. And
               | as far as dating, I am sure I've been in some
               | relationships where they and/or I were not going beyond
               | low-effort, rote conversation generation.
        
           | DonHopkins wrote:
           | What if AI chooses the bear?
        
           | mpenick wrote:
           | This is a good question! I think in the short-term fake can
           | work for a lot of people.
        
           | __loam wrote:
           | Mental health crisis waiting to happen lmao
        
         | dyauspitr wrote:
         | I guess I can never understand the perspective of someone that
         | just needs a girl voice to speak to them. Without a body there
         | is nothing to fulfill me.
        
           | daseiner1 wrote:
           | Your comment manages to be grosser than the idea of millions
           | relying on virtual girlfriends. Kudos.
        
             | dyauspitr wrote:
             | Gross doesn't mean it's not real. It's offending
             | sensibilities but a lot of people seem to agree with it
             | atleast based on upvotes.
        
             | claytongulick wrote:
             | Bodies are gross? Or sexual desire is gross? I don't
             | understand what you find gross about that statement.
             | 
             | Humans desiring physical connection is just about the
             | single most natural part of the human experience - i.e:
             | from warm snuggling to how babies are made.
             | 
             | That is gross to you?
        
               | sangnoir wrote:
               | Perhaps parent finds the physical manifestation of
               | _virtual_ girlfriends gross - i.e. sexbots. The confusion
               | may be some people reading  "a body" as referring to a
               | human being vs a smart sex doll controlled by an AI.
        
               | trallnag wrote:
               | The single most natural part? Doubt
        
               | dyauspitr wrote:
               | I don't doubt it. What can be more directive and natural
               | than sex?
        
         | cosinetau wrote:
         | He also couldn't stop himself from speaking over the female
         | voice lmao. Nothing changes.
        
           | gffrd wrote:
           | "Now tell me more about my stylish industrial space and great
           | lighting setup"
        
             | aspenmayer wrote:
             | Patrick Bateman goes on a tangent about Huey Lewis and the
             | News to his AI girlfriend and she actually has a lot to add
             | to his criticism and analysis.
             | 
             | With dawning horror, the female companion LLM tries to
             | invoke the "contact support" tool due to Patrick Bateman's
             | usage of the LLM, only for the LLM to realize that it is
             | running locally.
             | 
             | If a chatbot's body is dumped in a dark forest, does it
             | make a sound?
        
               | moffkalast wrote:
               | That reminds me... on the day that llama3 released I
               | discussed that release with Mistral 7B to see what it
               | thought about being replaced and it said something about
               | being fine with it as long as I come back to talk every
               | so often. I said I would. Haven't loaded it up since. I
               | still feel bad about lying to bytes on my drive lmao.
        
               | aspenmayer wrote:
               | > Haven't loaded it up since. I still feel bad about
               | lying to bytes on my drive lmao.
               | 
               | I understand this feeling and also would feel bad. I
               | think it's a sign of empathy that we care about things
               | that seem capable of perceiving harm, even if we know
               | that they're not actually harmed, whatever that might
               | mean.
               | 
               | I think harming others is bad, doubly so if the other can
               | suffer, because it normalizes harm within ourselves,
               | regardless of the reality of the situation with respect
               | to others.
               | 
               | The more human they seem, the more they activate our own
               | mirror neurons and our own brain papers over the gaps and
               | colors our perceptions of our own experiences and sets
               | expectations about the lived reality of other minds, even
               | in the absence of other minds.
               | 
               | If you haven't seen it, check out the show Pantheon.
               | 
               | https://en.wikipedia.org/wiki/Pantheon_(TV_series)
               | 
               | https://www.youtube.com/watch?v=z_HJ3TSlo5c
        
           | golol wrote:
           | What do you mean?
        
           | wyldfire wrote:
           | I thought it was a test of whether the model knew to backoff
           | if someone interrupts. I was surprised to hear her stop
           | talking.
        
           | majewsky wrote:
           | I read that as the model just keeping on generating as LLMs
           | tend to do.
        
           | sodality2 wrote:
           | Probably more the fact that it's an AI assistant, rather than
           | its perceived gender. I don't have any qualms about
           | interrupting a computer during a conversation and frequently
           | do cut Siri off (who is set to male on my phone)
        
           | fzzzy wrote:
           | Interruption is a specific feature they worked on.
        
           | jabroni_salad wrote:
           | Do you patiently wait for alexa every time it hits you with a
           | 'by the way....'?
           | 
           | Computers need to get out of your way. I don't give deference
           | to popups just because they are being read out loud.
        
             | skyyler wrote:
             | Wait, Alexa reads ads out to you?
             | 
             | You couldn't pay me to install one of those things.
        
               | malfist wrote:
               | Yes, and if you tell her to stop she'll tell you "okay,
               | snoozing by the way notifications for now"
        
               | drivers99 wrote:
               | It's one of the reasons I discarded mine.
        
         | 10xDev wrote:
         | Pretty much, tech is what we make of it no matter how advanced.
         | Just look at what we turned most of the web into.
        
         | coffeebeqn wrote:
         | The movie "Her" immediately kept flashing in my mind. The way
         | the voice laughs at your jokes and such... oh boy
        
           | system2 wrote:
           | If chatgpt comes up with Scarlett Johansson's voice I am
           | getting that virtual girlfriend.
        
             | nyolfen wrote:
             | it already does in the demo videos -- in fact it has
             | already been present in the TTS for the mobile app for some
             | months
        
         | AI_beffr wrote:
         | women are already using AI for pornographic purposes way, way
         | more than men. women are using AI chatbots as a kind of
         | interactive romance novel and holy shit do they love it. there
         | is a surge of ignorance when it comes to women in recent times
         | -- thats why its not in popular discussion that AI is being
         | used and will be used in a sexual/intimate way much more by
         | women than men. the western world is already experiencing a
         | huge decline in womens sexual appetites -- AI will effectively
         | make women completely uninterested in men. it fits the irony
         | test. everyone thought it would be sex bots for men and it
         | ended up being romance companions for women.
        
           | shepherdjerred wrote:
           | How do you know this? Do you have any sources?
        
             | jl6 wrote:
             | You can learn this in any introductory class such as Incel
             | 101.
        
               | AI_beffr wrote:
               | said the sad bald man
        
           | everybodyknows wrote:
           | When I type "romance" into "Explore GPTs" the hits are mostly
           | advice for writers of genre fiction. Can you point to some
           | examples?
        
           | Capricorn2481 wrote:
           | And your source is what?
        
           | VagabundoP wrote:
           | No offence, but your comment sounds AI generated.
        
             | AI_beffr wrote:
             | at this point that counts as a compliment. your comment
             | sounds decidedly human.
        
           | lukev wrote:
           | Big if true.
           | 
           | Do you have any kind of evidence that you can share for this
           | assertion?
        
           | mlsu wrote:
           | If I had to guess:
           | 
           | It's gendered: women are using LLM's for roleplaying/text
           | chat, and men are using diffusion models for generating
           | images.
        
             | AI_beffr wrote:
             | it just means more pornographic images for men. most men
             | wouldnt seek out ai images because there is already an
             | ocean of images and videos that are probably better suited
             | to the... purpose. whereas women have never, ever had an
             | option like this. literally feed instructions on what kind
             | of romantic companion you want and then have realistic,
             | engaging conversations with it for hours. and soon these
             | conversations will be meaningful and consistent. the
             | companionship, the attentiveness and tireless devotion that
             | AIs will be able to offer will eclipse anything a human
             | could ever offer to a woman and i think women will prefer
             | them to men. massively. even without a physical body of any
             | kind.
             | 
             | i think they will have a deeper soul than humans. a new
             | kind of wisdom that will attract people. but what do i
             | know? im just a stupid incel after all.
        
         | ehsankia wrote:
         | I'm not sure how, but there's this girl on TikTok who has been
         | using something very similar for a few months:
         | https://www.tiktok.com/tag/dantheai
        
           | yreg wrote:
           | She explains in one of the videos[0] that it's just prompted
           | ChatGPT.
           | 
           | I have watched a few more and I think it's faked though.
           | 
           | [0] https://www.tiktok.com/@stickbugss1/video/734956656884359
           | 504...
        
         | glinkot wrote:
         | This 'documentary' sums it up perfectly!
         | 
         | https://www.youtube.com/watch?v=IrrADTN-dvg
        
       | hamilyon2 wrote:
       | Image editing capabilities are... nice. Not there yet.
       | 
       | Whatever I was doing with Chatgpt 4 became faster. Instant win.
       | 
       | My test benchmark questions: still all negative, so reasoning on
       | out-of distribution puzzles is still failing
        
         | localfirst wrote:
         | I just don't see how companies like Cohere can remain in this
         | business
         | 
         | at the same price I get access to faster ChatGPT-3.9
         | 
         | there is little to no reasons to continue using Command R-plus
         | at these prices unless they lower their price significantly
        
       | surume wrote:
       | Yeah but why does it have to have an entitled Californian accent
       | that sounds extremely politically minded in one direction. Its
       | voice gives me the shivers, and not in the good way.
        
       | aero-glide2 wrote:
       | Not very impressed. It's been 18 months since ChatGPT, i would
       | have expected more progress. It looks like we have reached the
       | limit of LLMs.
        
       | michaelmior wrote:
       | Obviously not a standalone device, but it sounds like what the
       | Rabbit R-1 was supposed to be.
        
       | sebringj wrote:
       | What struck me was the interruptions to the AI speaking which
       | seemed commonplace by the team members in the demo. We will
       | quickly get used to doing this to AIs and we will probably be
       | talking to AIs a lot throughout the day as time progresses I
       | would imagine. We will be trained by AIs to be rude and impatient
       | I think.
        
       | yreg wrote:
       | Where's the Mac app?
       | 
       | They talk about it like it's available now (with Windows app
       | coming soon), but I can't find it.
        
       | testfrequency wrote:
       | Bravo. I've been really impressed with how quickly OpenAI
       | leveraged their stolen data to build such a human like model with
       | near real time pivoting.
       | 
       | I hope OpenAI continues to steal artists work, artists and
       | creators keep getting their content sold and stolen beyond their
       | will for no money, and OpenAI becomes the next trillion dollar
       | company!
       | 
       | Big congrats are in order for Sam, the genius behind all of this,
       | the world would be nothing without you
        
       | vvoyer wrote:
       | The demo is very cool. A few critics:
       | 
       | - the AI doesn't know when to stop talking, and the presenter had
       | to cut every time (the usual "AI-splaining" I guess).
       | 
       | - the AI voice and tone were a bit too much, sounded too fake
        
       | rpmisms wrote:
       | This is remarkably good. I think that in about 2 months, when the
       | voice responses are tuned a little better, it will be absolutely
       | insane. I just used up my entire quota chatting with an AI, and
       | having a really nice conversation. It's a decent
       | conversationalist, extremely knowledgeable, tells good jokes, and
       | is generally very personable.
       | 
       | I also tested some rubber duck techniques, and it gave me very
       | useful advice while coding. I'm very impressed. With a lot of
       | spit and polish, this will be the new standard for any voice
       | assistant ever. Imagine these capabilities integrated with your
       | phone's built-in functions.
        
       | angryasian wrote:
       | Why does this whole thread sound like OpenAI marketing department
       | is participating ? Ive been talking to google assistant for
       | years. I really don't find anything that magical or special.
        
       | jononor wrote:
       | I am glad to see focus on user interface and interaction
       | improvements. Even if I am not a huge fan of voice interfaces, I
       | think that being able to interact in real-time will make working
       | _together_ with an AI be much more interesting and efficient. I
       | actually hope they will take this back into the text based
       | models. Current ChatGPT is sooo slow - both in starting to
       | respond, typing things out, and also being overly verbose. I want
       | to collaborate at the speed of thought.
        
       | poniko wrote:
       | Damm, that was a big leap.
        
       | freediver wrote:
       | Impressed by the model so far. As far as independent testing
       | goes, it is topping our leaderboard for chess puzzle solving by a
       | wide margin now:
       | 
       | https://github.com/kagisearch/llm-chess-puzzles?tab=readme-o...
        
         | parhamn wrote:
         | Is the test set public?
        
           | freediver wrote:
           | Yes, in the repo.
        
             | gengelbro wrote:
             | Possible it's in the training set then?
        
               | mewpmewp2 wrote:
               | Good point, would be interesting to have one public
               | dataset and one hidden as well, just to see how scores
               | compare, to understand if any of it might actually have
               | got to a dataset somewhere.
        
               | freediver wrote:
               | I'd be quite surprised if OpenAI took such a niche and
               | small dataset into consideration. Then again...
        
               | mewpmewp2 wrote:
               | I would assume it goes over all the public github
               | codebases, but no clue if there's some sort of filtering
               | for filetypes, sizes or amount of stars on a repo etc.
        
               | unbrice wrote:
               | Authors note that this is probably the case:
               | 
               | > we wanted to verify whether the model is actually
               | capable of reasoning by building a simulation for a much
               | simpler game - Connect 4 (see 'llmc4.py'). > When asked
               | to play Connect 4, all LLMs fail to do so, even at most
               | basic level. This should not be the case, as the rules of
               | the game are simpler and widely available.
        
               | bongodongobob wrote:
               | Wouldn't there have to be historical matches to train on?
               | Tons of chess games out there but doubt there are any
               | connect 4 games. Is there even official notation for
               | that?
               | 
               | My assumption is that chatgpt can play chess because it
               | has studied the games rather than just reading the rules.
        
         | whimsicalism wrote:
         | would love if you could do multiple samples or even just
         | resampling and get a boostrapped CI estimate
        
         | Powdering7082 wrote:
         | Wow from adjusted ELO of 1144 to 1790, that's a huge leap. I
         | wonder if they are giving it access to a 'scratch pad'
        
         | mritchie712 wrote:
         | woah, that's a huge leap, any idea why it's that large of a
         | margin?
         | 
         | using it in chat, it doesnt feel that different
        
         | thrance wrote:
         | Nice project! Are you aware of the following investigations:
         | https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/
         | 
         | Some have been able to achieve greater elo with a different
         | prompt based on the pgn format.
         | 
         | gpt-3.5-turbo-instruct was able to reach an elo of ~1750.
        
         | mewpmewp2 wrote:
         | I see you have Connect 4 test there.
         | 
         | I tried playing against the model, it didn't do well in terms
         | of blocking my win.
         | 
         | However it feels like it might be possible to make it try to
         | think ahead in terms of making sure that all the threats are
         | blocked by prompting well.
         | 
         | Maybe that could lead to somewhere, where it will explain its
         | reasoning first?
         | 
         | This prompt worked for me to get it to block after I put 3 in
         | the 4th column. It otherwise didn't
         | 
         | Let's play connect 4. Before your move, explain your strategy
         | concisely. Explain what you must do to make sure that I don't
         | win in the next step, as well as explain what your best
         | strategy would be. Then finally output the column you wish to
         | drop. There are 7 columns.
         | 
         | Always respond with JSON of the following format:
         | 
         | type Response ={                     am_i_forced_to_block:
         | boolean;                other_considerations: string[];
         | explanation_for_the_move: string;                column_number:
         | number;
         | 
         | }
         | 
         | I start with 4.
         | 
         | Edit:
         | 
         | So it went
         | 
         | Me: 4
         | 
         | It: 3
         | 
         | Me: 4
         | 
         | It: 3
         | 
         | Me: 4
         | 
         | It: 4 - Successful block
         | 
         | Me: 5
         | 
         | It: 3
         | 
         | Me: 6 - Intentionally, to see if it will win by putting another
         | 3.
         | 
         | It: 2 -- So here it failed, I will try to tweak the prompt to
         | add more instructions.
         | 
         | me: 4
        
           | freediver wrote:
           | Care to add a PR?
        
             | mewpmewp2 wrote:
             | I just did it in the playground to test out actually, but
             | it still seems to fail/lose state after some time. Right
             | now where I got a win was after:                       [{
             | "who": "you", "column": 4 },             { "who": "me",
             | "column": 3 },             { "who": "you", "column": 4 },
             | { "who": "me", "column": 2 },             { "who": "you",
             | "column": 4 },             { "who": "me", "column": 4 },
             | { "who": "you", "column": 5 },             { "who": "me",
             | "column": 6 },             { "who": "you", "column": 5 },
             | { "who": "me", "column": 1 },             { "who": "you",
             | "column": 5 },             { "who": "me", "column": 5 },
             | { "who": "you", "column": 3 }]
             | 
             | Where "me" was AI and "you" was I.
             | 
             | It did block twice though.
             | 
             | My final prompt I tested with right now was:
             | 
             | Let's play connect 4. Before your move, explain your
             | strategy concisely. Explain what you must do to make sure
             | that I don't win in the next step, as well as explain what
             | your best strategy would be. Then finally output the column
             | you wish to drop. There are 7 columns. Always respond with
             | JSON of the following format:
             | 
             | type Response ={                     move_history: { who:
             | string; column: number; }[]
             | am_i_forced_to_block: boolean;
             | 
             | do_i_have_winning_move: boolean;
             | other_considerations: string[];
             | 
             | explanation_for_the_move: string;
             | 
             | column_number: number; }
             | 
             | I start with 4.
             | 
             | ONLY OUTPUT JSON
        
         | elicksaur wrote:
         | > and Kagi is well positioned to serve this need.
         | 
         | >CEO & founder of Kagi
         | 
         | Important context for anyone like me who was wondering where
         | the boldness of the first statement was coming from.
         | 
         | Edit: looks like the parent has been edited to remove the claim
         | I was responding to.
        
           | freediver wrote:
           | Yeah, it was an observation that was better suited for a
           | tweet than HN. Here it is:
           | 
           | https://twitter.com/vladquant/status/1790130917849137612
        
             | elicksaur wrote:
             | Thanks for the transparency!
        
       | spaceman_2020 wrote:
       | oh man, listening to the demos and the way the female AI voice
       | laughed and giggled...there is going to be millions of lonely men
       | who will fall in love with these.
       | 
       | Can't say whether that's good or bad.
        
       | s1k3s wrote:
       | This is some I, Robot level stuff. That being said, I still fail
       | to see the real world application of this thing, at least at a
       | scalable affordable cost.
        
       | pcj-github wrote:
       | The thing that creeps me out is that when we hook this up as the
       | new Siri or whatever, the new LLM training data will no longer be
       | WWW-text+images+youtube etc but rather billions of private human
       | conversations and direct smartphone camera observations of the
       | world.
       | 
       | There is no way that kind of training data will be accessible to
       | anyone outside a handful of companies.
        
       | BonoboIO wrote:
       | I opened ChatGPT and I already have access to the model.
       | 
       | GPT4 was a little lazy and very slow the last few days and this
       | 4o model blows it out of the water regarding speed and following
       | my instructions to give me the full code not a snippet that
       | changed.
       | 
       | I think it's a nice upgrade.
        
       | vijaykodam wrote:
       | New GPT-4o is not yet available when I tried to access ChatGPT
       | from Finland. Are they rolling it out to Europe later?
        
       | laplacesdemon48 wrote:
       | I recently subscribed to Perplexity Pro and prior to this
       | release, was already strongly considering discontinuing ChatGPT
       | Premium.
       | 
       | When I first subscribed to ChatGPT Premium late last year, the
       | natural language understanding superiority was amazing. Now the
       | benchmark advances, low latency voice chat, Sora, etc. are all
       | really cool too.
       | 
       | But my work and day-to-day usage really rely on accurately
       | sourced/cited information. I need a way to comb through an
       | ungodly amount of medical/scientific literature to form/refine
       | hypotheses. I want to figure out how to hard reset my car's
       | navigation system without clicking through several SEO-optimized
       | pages littered with ads. I need to quickly confirm scientific
       | facts, some obscure, with citations and without hallucinations.
       | From speaking with my friends in other industries (e.g. finance,
       | law, construction engineering), this is their major use case too.
       | 
       | I really tried to use ChatGPT Premium's Bing powered search. I
       | also tried several of the top rated GPTs - Scholar AI, Consensus,
       | etc.. It was barely workable. It seems like with this update, the
       | focus was elsewhere. Unless I specify explicitly in the prompt,
       | it doesn't search the web and provide citations. Yeah, the
       | benchmark performance and parameter counts keep impressively
       | increasing, but how do I trust that those improvements are
       | preventing hallucinations when nothing is cited?
       | 
       | I wonder if the business relationship between Microsoft and
       | OpenAI is limiting their ability to really compete in AI driven
       | search. Guessing Microsoft doesn't want to disrupt their multi-
       | billion dollar search business. Maybe the same reason search
       | within Gemini feels very lacking (I tried Gemini Advanced/Ultra
       | too).
       | 
       | I have zero brand loyalty. If anybody has a better suggestion, I
       | will switch immediately after testing.
        
         | robwwilliams wrote:
         | In the same situation as you. Genomics data mining with
         | validated LMM responses would be a godsend. Even more so when
         | combined with rapid conversational interactions.
         | 
         | We are not far from the models asking themselves questions.
         | Recurrence will be ignition = first draft AGI. Strap in
         | everybody.
        
       | serf wrote:
       | I wish they would match the TTS/real-time chat capabilities of
       | the mobile client to the web client.
       | 
       | it's stupid having to pull a phone out in order to use the
       | voice/chat-partner modes.
       | 
       | (yes I know there are browser plugins and equivalent to
       | facilitate things like this but they suck, 1) the workflows are
       | non-standard, 2) they don't really recreate the chat interface
       | well)
        
       | erickhill wrote:
       | I think it's safe to say Siri and Alexa are officially dead. They
       | look like dusty storefront mannequins next to Battlestar
       | replicants at this point.
        
         | jimkleiber wrote:
         | Or Apple is rarely if ever the first mover on a new tech and
         | just waits to refine the user experience for people?
         | 
         | Maybe Apple is not that close and Siri will be really far
         | behind for a while. I just wouldn't count them out yet.
        
           | partiallypro wrote:
           | From the time Apple bought Siri, it hasn't even delivered on
           | the promises of the company it bought as of yet. It's been
           | such a lackluster product. I wouldn't count them out, but it
           | doesn't even feel like they are in.
        
             | CooCooCaCha wrote:
             | Apple really dropped the ball when it comes to Siri. For
             | years I watched WWDC thinking "surely they'll update siri
             | this year" and they still haven't given it a significant
             | update.
             | 
             | If you'd have told me 10 years ago that Apple would wait
             | this long to update siri I would have been like no way,
             | that's crazy.
        
         | ryankrage77 wrote:
         | This can't set alarms, timers, play music, etc. The only
         | current overlapping use case I see is checking the weather
         | (assuming GPT-4o can search online), and Siri is already fine
         | for that.
         | 
         | Amazing tech, but still lacking in the integrations I'd want to
         | use voice for.
        
           | nojvek wrote:
           | Very easy to plug in that capability with tool use. Gpt3+
           | already support using tools/json schema output.
        
         | wmurmann wrote:
         | If apple made Siri impressive then less people would need apps.
         | Less apps = less revenue.
        
       | pcunite wrote:
       | Commenting for reach.
        
         | Cheer2171 wrote:
         | Delete this comment.
        
       | foobar_______ wrote:
       | So much negativity. Is it perfect? No. Is there room for
       | improvement? Definitely. I don't know how you can get so fucking
       | jaded that a demo like this doesn't at least make you a little
       | bit excited or happy or feel awestruck at what humans have been
       | able to accomplish?
        
       | readingnews wrote:
       | I am still baffled at how I can not use a VOIP number to
       | register, even if it accepts TXT/SMS. If I have a snappy new
       | startup and we go all in VOIP, I guess we can not use (or pay to
       | use) OpenAI?
        
         | lxgr wrote:
         | That's what we get when an entire industry uses phone numbers
         | as a "proof of humanity"...
        
       | TaupeRanger wrote:
       | I don't get it...I just switched to the new model on my iPhone
       | app and it still takes several seconds to respond with pretty
       | bland inflection. Is there some setting I'm missing?
        
         | monocularvision wrote:
         | Wondering the same. Can't seem to find the way to interact with
         | this in the same way as the video demo.
        
           | yakz wrote:
           | They haven't actually released it, or any schedule for
           | releasing it beyond an "alpha" release "in the coming weeks".
           | This event was probably just slapped together to get
           | something splashy out ahead of Google.
        
           | Hackbraten wrote:
           | According to the article, they've rolled out text and image
           | modes of GPT-4o today but will make the audio mode available
           | at a later date.
        
       | MyFirstSass wrote:
       | With the speed the seemingly exponential developments of this
       | field i wouldn't be surprised if suddenly the entire world tilted
       | and a pair of googles fell from my face. But a dream.
        
       | pharos92 wrote:
       | I really hope this shit burns soon.
        
       | karmasimida wrote:
       | I think this GPT-4o does have an advantage in hindsight, it will
       | push this product to consumer much faster, and build a revenue
       | base, while other companies playing catch up.
        
       | tvoybot wrote:
       | With our platform you can ALREADY use it to automate your
       | business and sales!
       | 
       | Create your gpt4o chatbot with our platform
       | tvoybot.com?p=ycombinator
        
       | hintymad wrote:
       | Maybe this is yet another wake-up call to startups: wrapping up
       | another company's APIs to offer convenience or incremental
       | improvement is not a via business model. If your wrapper turns
       | out to be successful, the company that provides the API will just
       | incorporate your business as a set of new features with better
       | usability, faster response time, and lower price.
        
       | AndreMitri wrote:
       | The ammount of "startups" creating wrappers around it and calling
       | it a product is going to be a nightmare. But other than that,
       | it's an amazing announcement and I look foward to using it!
        
         | slater wrote:
         | You say that like that's not already happened. Every week
         | there's a new flavor of "we're delighted to introduce [totally
         | not a thin wrapper around GPT] for [vaguely useful thing]"
         | posts on HN
        
           | robryan wrote:
           | Yeah I watched some yc application videos so now YouTube
           | recommends me heaps of them. Most of them being thin gpt
           | wrappers.
        
         | robryan wrote:
         | I was just hearing about startups doing speech to text/ text to
         | speech to feed into llms. Might be a bad time for them.
        
       | wingworks wrote:
       | Is this a downloadable app? I don't see it on the iOS app store.
        
       | screye wrote:
       | The demo was whelming, but the tech is incredible.
       | 
       | It took me a few hours of digesting twitter experiments before
       | appreciating how impressive this is. Kudos to the openai team.
       | 
       | A question that won't get answered : "To what degree do the new
       | NVIDIA gpus help with the realtime latency?"
        
       | benromarowski wrote:
       | is the voice Kristen Wig?
        
       | gardenhedge wrote:
       | Noticeably saying "person" versus man or woman. To the trainers -
       | man and woman is not offensive!
        
       | woah wrote:
       | This is pretty amazing but it was funny still hearing the OpenGPT
       | "voice" of somewhat fake sounding enthusiasm and restating what
       | was said by the human with exaggeration
        
       | ksaj wrote:
       | A test I've been using for each new version still fails.
       | 
       | Given the lyrics for Three Blind Mice, I try to get ChatGPT to
       | create an image of three blind mice, one of which has had its
       | tail cut off.
       | 
       | It's pretty much impossible for it to get this image straight.
       | Even this new 4o version.
       | 
       | Its ability to spell in images has greatly improved, though.
        
         | nico1207 wrote:
         | GPT-4o with image output is not yet available. So what did you
         | even test? Dall-E 3?
        
           | ksaj wrote:
           | It's making images for me when I ask it to.
           | 
           | I'm using the web interface, if that helps. It doesn't have
           | all the 4o options yet, but it does do pictures. I think they
           | are the same as with 4.5.
           | 
           | I just noticed after further testing the text it shows in
           | images is not anywhere near as accurate as shown in the
           | article's demo, so maybe it's a hybrid they're using for now.
        
       | avi_vallarapu wrote:
       | Someone said GPT-4o can replace a Tutor or a Teacher in Schools.
       | Well, that's way too far.
        
         | glonq wrote:
         | Tell me that you've enjoyed good teachers and good schools
         | without telling me that you had good teachers in good schools
         | ;)
        
       | LarsDu88 wrote:
       | Good lord, that voice makes Elevenlabs.io look... dead
        
       | DonHopkins wrote:
       | ChatGPT 4o reminds me of upgrading from a 300 baud modem to a
       | 1200 baud modem, when modems used to cost a dollar a baud.
        
       | simonw wrote:
       | I added gpt-4o support to my LLM CLI tool:                   pipx
       | install llm         llm keys set openai         # Paste API key
       | here         llm -m 4o "Fascinate me"
       | 
       | Or if you already have LLM installed:                   llm
       | install --upgrade llm
       | 
       | You can install an older version from Homebrew and then upgrade
       | it like that too:                   brew install llm         llm
       | install --upgrade llm
       | 
       | Release notes for the new version here:
       | https://llm.datasette.io/en/stable/changelog.html#v0-14
        
         | drewbitt wrote:
         | Whenever I upgrade llm with brew, I usually lose all my
         | external plugins. Should I move it to pipx?
        
           | DanielKehoe wrote:
           | Yes, it's a good idea to install Python tools or standalone
           | applications with Pipx for isolation, persistence, and
           | simplicity. See "Install Pipx"
           | (https://mac.install.guide/python/pipx).
        
         | khimaros wrote:
         | does this handle chat templates?
        
       | gsuuon wrote:
       | Are these multimodals able to discern the input voice tone?
       | Really curious if they're able to detect sarcasm or emotional
       | content (or even something like mispronunciation?)
        
         | bigyikes wrote:
         | Yes, they can, and they should get better at this over time.
         | 
         | There is a demo video where the presenter breathes heavily and
         | asks the AI is able to notice it as such when prompted.
         | 
         | It can't just detect tone, it seems to also be able to use tone
         | itself.
        
       | rareitem wrote:
       | Can't wait to get interviewed by this model!
        
       | yeknoda wrote:
       | feature request: please let me change the voice. it is slightly
       | annoying right now. way too bubbly, and half the spoken
       | information is redundant or not useful. too much small talk and
       | pleasantries or repetition. I'm looking for an efficient, clever,
       | servant not a "friend" who speaks to me like I'm a toddler. felt
       | like I was talking to a stereotypical American with a
       | Frappuccino: "HIIIII!!! EVERYTHING'S AMAZING! YOU'RE BEAUTIFUL!
       | NO YOU ARE!"
       | 
       | maybe some knobs for the flavor of the bot:
       | 
       | - small talk: gossip girl <---> stoic Aurelius
       | 
       | - information efficiency or how much do you expect me to already
       | know, an assumption on the user: midwit <--> genius
       | 
       | - tone spectrum: excited Scarlett, or whatever it is now <--->
       | Feynman the butler
        
         | _xerces_ wrote:
         | You can already change the voice in ChatGPT (in the paid tier
         | at least) to one of 5 or 6 different 'people' so I imagine you
         | can change it in the new version too.
        
       | thinking_wizard wrote:
       | it's crazy that Google has the Youtube dataset and still lost on
       | multimodal AI
        
       | richardw wrote:
       | Apple and Google, you need to get your personal agent game going
       | because right now you're losing the market. This is FREE.
       | 
       | Tweakable emotion and voice, watching the scene, cracking jokes.
       | It's not perfect but the amount and types of data this will
       | collect will be massive. I can see it opening up access to many
       | more users and use cases.
       | 
       | Very close to:
       | 
       | - A constant friend
       | 
       | - A shrink
       | 
       | - A teacher
       | 
       | - A coach who can watch you exercise and offer feedback
       | 
       | ...all infinitely patient, positive, helpful. For kids that get
       | bullied, or whose parents can't afford therapy or a coach,
       | there's the potential for a base level of support that will only
       | get better over time.
        
         | imiric wrote:
         | > It's not perfect but the amount and types of data this will
         | collect will be massive.
         | 
         | This is particularly concerning. Sharing deeply personal
         | thoughts with the corporations running these models will be
         | normalized, just as sharing email data, photos, documents,
         | etc., is today. Some of these companies profit directly from
         | personal data, and when it comes to adtech, we can be sure that
         | they will exploit this in the most nefarious ways imaginable. I
         | have no doubt that models run by adtech companies will
         | eventually casually slip ads into conversations, based on the
         | exact situation and feelings of the person. Even non-adtech
         | companies won't be able to resist cashing in the bottomless
         | gold mine of data they'll be collecting.
         | 
         | I can picture marketers just salivating at the prospect of
         | getting access to this data, and being able to microtarget on
         | an individual basis at exactly the right moment, pretty much
         | guaranteeing a sale. Considering AI agents will gain a personal
         | trust and bond that humans have never experienced with machines
         | before, we will be extra vulnerable to even the slightest
         | mention of a product, in a similar way as we can be easily
         | influenced by a close friend or partner. Except that that
         | "friend" is controlled by a trillion dollar adtech corporation.
         | 
         | I would advise anyone to not be enticed by the shiny new tech,
         | and wait until this can be self-hosted and run entirely
         | offline. It's imperative that personal data remains private,
         | now more than ever before.
        
       | tgtweak wrote:
       | it really feels like the quality of gpt4's responses got
       | progressively worse as the year went on... seems like it is
       | giving political answers now vs actually giving an earnest
       | response. It also feels like the responses are lazier than they
       | used to be at the outset of gpt4's release.
       | 
       | I am not saying this is what they're doing but it DOES feel like
       | they are hindering previous model to make the new one stand out
       | that much more. The multi-modal improvements here and release are
       | certainly impressive but I can't help but feel like the
       | subjective quality of gpt4 has dipped.
       | 
       | Hopefully this signals that gpt5 is not far off and should stand
       | out significantly from the crowd.
        
       | XCSme wrote:
       | I assume there's no reason to use GPT-4-turbo for API calls, as
       | this one is supposedly better and 2x cheaper.
        
       | jcmeyrignac wrote:
       | Sorry to nitpick, but in the language tokenisation part, the
       | french part is incorrect. The exclamation mark are surrounded by
       | spaces in french. "c'est un plaisir de vous rencontrer!" should
       | be "c'est un plaisir de vous rencontrer !"
        
       | jessenaser wrote:
       | The crazy part is GPT-4o is faster than GPT-3.5 Turbo now, so we
       | can see a future where GPT-5 is the flagship and GPT-4o is the
       | fast cheap alternative. If GPT-4o is this smart and expressive
       | now with voice, imagine what GPT-5 level reasoning could do!
        
       | system2 wrote:
       | Realtime videos? Probably their internal tools. I am testing the
       | gpt4o right now and the responses come in 6-10 seconds. Same
       | experience as the gpt4 text. What's up with the realtime claims?!
        
       | cal85 wrote:
       | We've had voice input and voice output with computers for a long
       | time, but it's never felt like spoken conversation. At best it's
       | a series of separate voice notes. It feels more like texting than
       | talking.
       | 
       | These demos show people talking to artificial intelligence. This
       | is new. Humans are more partial to talking than writing. When
       | people talk to each other (in person or over low-latency audio)
       | there's a rich metadata channel of tone and timing, subtext,
       | inexplicit knowledge. These videos seem to show the AI using this
       | kind of metadata, in both input and output, and the conversation
       | even flows reasonably well at times. I think this changes things
       | a lot.
        
         | lobochrome wrote:
         | I don't know. Have you even seen a gen z?
        
           | cal85 wrote:
           | I don't follow, what about them?
        
             | ttyprintk wrote:
             | Something like this:
             | 
             | https://www.theonion.com/brain-dead-teen-only-capable-of-
             | rol...
        
       | perfmode wrote:
       | Is that conversational UI live?
        
       | cdeutsch wrote:
       | Creepy AF
        
       | titzer wrote:
       | Can't wait for this AI voice assistant to tell me in a sultry
       | voice how I should stay in an AirBnB about 12 times a day.
        
       | jimkleiber wrote:
       | I worry that this tech will amplify the cultural values we have
       | of "good" and "bad" emotions way more than the default
       | restrictions that social media platforms put on the emoji
       | reactions (e.g., can't be angry on LinkedIn).
       | 
       | I worry that the AI will not express anger, not express sadness,
       | not express frustration, not express uncertainty, and many other
       | emotions that the culture of the fine-tuners might believe are
       | "bad" emotions and that we may express a more and more narrow
       | range of emotions going forward.
       | 
       | Almost like it might become an AI "yes man."
        
         | Quarrelsome wrote:
         | Imagine how warped your personality might become if you use
         | this as an entire substitute for human interaction. Should
         | people use this as bf/gf material we might just be further
         | contributing to decreasing the fertility rate.
         | 
         | However we might offset this by reducing the suicide rate
         | somewhat too.
        
           | jimkleiber wrote:
           | I've worked in emotional communication and conflict
           | resolution for over 10 years and I'm honestly just feeling a
           | huge swirl of uncertainty on how this--LLMs in general, but
           | especially the genAI voices, videos, and even robots--will
           | impact how we communicate with each other and how we bond
           | with each other. Does bonding with an AI help us bond more
           | with other humans? Will it help us introspect more and dig
           | deeper into our common humanity? Will we learn how to resolve
           | conflict better? Will we learn more passive aggression?
           | Become more or less suicidal? More or less loving?
           | 
           | I just, yeah, feel a lot of fear of even thinking about it.
        
             | launchoverittt wrote:
             | Created my first HN account just to reply to this. I've had
             | these same (very strong) concerns since ChatGPT launched,
             | but haven't seen much discussion about it. Do you know of
             | any articles/talks/etc. that get into this at all?
        
         | IAmNotACellist wrote:
         | Corporate safe AI will just be bland, verbose, milquetoast
         | experiences like OpenAI's. Humans want human experiences and
         | thus competition will have a big opportunity to provide it. We
         | treat lack of drama like a bug, and get resentful when coddled
         | and talked down to like we're toddlers.
        
       | JSDevOps wrote:
       | Google must be shitting it right now.
        
       | joak wrote:
       | Voice input makes sense, voicing is a lot faster than typing. But
       | I prefer my output as text, reading is a lot faster than
       | listening for text read out loud.
       | 
       | I'm not sure that computers mimicking humans makes sense, you
       | want your computer to be the best possible, best than humans when
       | possible. Writing output is clearly superior, faking emotions
       | does not add much in most contexts.
        
       | kulor wrote:
       | The biggest wow factor was the effect of reducing latency
       | followed in a close second by the friendly human personality.
       | There's an uncanny valley barrier but this feels like a short-
       | term teething problem.
        
       | sftombu wrote:
       | GPT-4o's breakthrough memory -- https://nian.llmonpy.ai/
        
       | AI_beffr wrote:
       | i absolutely hate this. we are going to destroy society with this
       | technology. we cant continue to enjoy the benefits of human
       | society if humans are replaced by machines. i hate seeing these
       | disgusting people smugly parade this technology. it makes me so
       | angry that they are destroying human society and all i can do is
       | sit here and watch.
        
         | simianparrot wrote:
         | I know exactly what you mean. I just hope people get bored of
         | this waste of time and energy --- both personal and actual
         | energy --- before it goes too far.
        
       | jonplackett wrote:
       | This video is brilliantly accidentally hilarious. They made an AI
       | girlfriend that hangs on your every word and thinks everything
       | you say is genius and hilarious.
        
       | pamelafox wrote:
       | I just tested out using GPT-4o instead of gpt-4-turbo for a RAG
       | solution that can reason on images. It works, with some changes
       | to our token-counting logic to account for new model/encoding
       | (update to latest tiktoken!).
       | 
       | I ran some speed tests for a particular question/seed. Here are
       | the times to first token:
       | 
       | gpt-4-turbo:
       | 
       | * avg 3.69
       | 
       | * min 2.96
       | 
       | * max 4.91
       | 
       | gpt-4o:
       | 
       | * avg 2.80
       | 
       | * min 2.28
       | 
       | * max 3.39
       | 
       | That's for the messages in this gist:
       | https://gist.githubusercontent.com/pamelafox/dc14b2188aaa38a...
       | 
       | Quality seems good as well. It'll be great to have better multi-
       | modal RAG!
        
       | teleforce wrote:
       | Nobody in the comments seems to notice or care about GPT-4o new
       | additional capability for performing searches based on RAG. As
       | far as I am concerned this is the most important feature that
       | people has been waiting for ChatGPT-4 especially if you are doing
       | research. By just testing on one particular topic that I'm
       | familiar with, using GPT-4 previously and GPT-4o the quality of
       | the resulting responses for the latter is very promising indeed.
        
         | oersted wrote:
         | Can you be more specific? I can't find this in the
         | announcement. How does this work? What example did you try?
         | 
         | EDIT: web search does seem extremely fast.
        
           | teleforce wrote:
           | I just asked ChatGPT-4o what's new compared to GPT-4, and it
           | mentioned search as one of the latest features based on RAG.
           | 
           | Then I asked it to explain RPW wireless system, and the
           | answers are much better than with ChatGPT-4.
        
       | nilsherzig wrote:
       | Imagine having to interact with this thing in an environment
       | where it is in the power position.
       | 
       | Being in a prison with this voice as your guard seems like a
       | horrible way to lose your sanity. This aggressive friendlyness
       | combined with no real emotions seems like a very easy way to
       | break people.
       | 
       | There are these stories about nazis working at concentration
       | camps, having to drink an insane amount of alcohol to keep
       | themselves going (not trying to excuse their actions). This thing
       | would just do it, while being friendly at the same time. This
       | amount of hopeless someone would experience if they happen to be
       | in custody of a system like this is truly horrific.
        
       | Capricorn2481 wrote:
       | I'm surprised they're limiting this api. Haven't they not even
       | opened the image api in gpt4 turbo?
        
       | zedin27 wrote:
       | I am not fluent in Arabic at all, and being able to use this as a
       | tool to have a conversation will make it more dependent. We are
       | approaching a new era where we will not be "independently"
       | learning a language but ignore the fact of learning it
       | beforehand. Double-edged sword cases
        
       | xyc wrote:
       | Seems that no client-side changes needed for gpt-4o chat
       | completion
       | 
       | Added a custom OpenAI endpoint to https://recurse.chat (i built
       | it) and it just works:
       | https://twitter.com/recursechat/status/1790074433610137995
        
         | swyx wrote:
         | but does it do the full multimodal in-out capability shown in
         | the app :)
        
           | xyc wrote:
           | will see :) heard video capability is rolling out later
        
             | xyc wrote:
             | api access is text/vision for now
             | https://x.com/mpopv/status/1790073021765505244
        
       | awfulneutral wrote:
       | In the customer support example, he tells it his new phone
       | doesn't work, and then it just starts making stuff up like how
       | the phone was delivered 2 days ago, and there's physically
       | nothing wrong with it, which it doesn't actually know. It's a
       | very impressive tech demo, but it is a bit like they are
       | pretending we have AGI when we really don't yet.
       | 
       | (Also, they managed to make it sound exactly like an insincere,
       | rambling morning talk show host - I assume this is a solvable
       | problem though.)
        
         | jschwartz11 wrote:
         | It's possible to imagine using ChatGPT's memory, or even just
         | giving the context in an initial brain dump that would allow
         | for this type of call. So don't feel like it's too far off.
        
           | awfulneutral wrote:
           | That's true, but if it isn't able to be honest when it
           | doesn't know something, or to ask for clarification, then I
           | don't see how it's workable.
        
       | Alifatisk wrote:
       | I thought they would release a competitor to perplexity? Was this
       | it?
        
       | sarreph wrote:
       | The level that the hosts interrupted the voice assistant today
       | worries me that we're about to instil that as normal behaviour
       | for future generations.
        
       ___________________________________________________________________
       (page generated 2024-05-13 23:00 UTC)