[HN Gopher] GPT-4o
___________________________________________________________________
GPT-4o
Author : Lealen
Score : 1479 points
Date : 2024-05-13 17:28 UTC (5 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| skilled wrote:
| Live now,
|
| _OpenAI Spring Update_
| (https://www.youtube.com/watch?v=DQacCB9tDaw)
|
| https://news.ycombinator.com/item?id=40343950
| EcommerceFlow wrote:
| A new "flagship" model with no improvement of intelligence, very
| disappointed. Maybe this is a strategy for them to mass collect
| "live" data before they're left behind by Google/Twitter live
| data...
| belter wrote:
| https://youtu.be/DQacCB9tDaw
| chzblck wrote:
| real time audio is mind blowing
| throwup238 wrote:
| So what's the point of paying for ChatGPT Plus? And who on earth
| chose to make the app Mac only...
| CSMastermind wrote:
| 5x the capacity threshold is the only thing I heard them
| mention on the live stream.
|
| Though presumably when they are ready to release new models the
| Plus users will get them first.
| anuar12 wrote:
| I think because usability increases so much (use cases of
| real-time conversation, and video-based coding, presentation
| feedback at work etc...) they would expect usage to
| drastically increase hence paying users would actually still
| have incentive to pay.
| agd wrote:
| They mentioned an announcement about a new frontier model
| coming soon. Presumably this will be exclusive to paid users.
| johnsimer wrote:
| Did they mention this in the gpt4o announcement video? I must
| have missed this
| riffic wrote:
| > Plus users will have a message limit that is up to 5x greater
| than free users
|
| from https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-
| fr...
| tomschwiha wrote:
| I like the demo for sure more than the "reduced latency" Gemini
| demo [0].
|
| [0] https://www.youtube.com/watch?v=UIZAiXYceBI
| Powdering7082 wrote:
| Wow this versioning scheme really messed up this prediction
| market: https://kalshi.com/markets/gpt4p5/gpt45-released
| smusamashah wrote:
| That im-also-a-good-gpt2-chatbot[1] was in fact the new ChatGPT
| model as people were assuming few days ago here on HN[2].
|
| Edit: may be not, name of that bot was just "gpt2-chatbot". May
| be that one was some initial iteration?
|
| [1]
| https://twitter.com/LiamFedus/status/1790064963966370209/pho...
|
| [2] https://news.ycombinator.com/item?id=40199715
| theusus wrote:
| This 4o is already rolling out?
| belter wrote:
| They mentioned capabilities will be rolled out over the next
| few weeks: https://youtu.be/DQacCB9tDaw?t=5018
| GalaxyNova wrote:
| It is really cool that they are bringing this to free users. It
| does make me wonder what justifies ChatGPT plus now though...
| InfiniteVortex wrote:
| they stated that they will be announcing something new that is
| on the next frontier (or close to it IIRC) soon. so there will
| definitely be an incentive to pay because it will be something
| better than gpt 4o.
| pantsforbirds wrote:
| I assume the desktop app with voice and vision is rolling out
| to plus users first?
| ppollaki wrote:
| I've noticed that the GPT-4 model's capabilities seem limited
| compared to its initial release. Others have also pointed this
| out. I suspect that making the model free might have required
| reducing its capabilities to meet cost efficiency goals. I'll
| have to try it out to see for myself.
| EcommerceFlow wrote:
| As I commented in the other thread, really really disappointed
| there's no intelligence update and more of a focus on "gimmicks".
| The desktop app did look really good, especially as the models
| get smarter. Will be canceling my premium as there's no real
| purpose of it until that new "flag ship" model comes out.
| adroniser wrote:
| Agree on hoping for an intelligence update, but I think it was
| clear from teasers that this was not gonna be GPT-5.
|
| I'm not sure how fair it is to classify the new multimodal
| capabilities as just a gimmick though. I personally haven't
| integrated GPT-4 into my workflow that much and the latency and
| the fact I have to type a query out is a big reason why.
| OutOfHere wrote:
| I don't see 4o or anything new at
| https://platform.openai.com/docs/models
|
| Overall I am highly skeptical of newer models as they risk
| worsening the completion quality to make them cheaper for OpenAI
| to run.
| frabcus wrote:
| It's there now! And still 128k context window
| IanCal wrote:
| It's there right now for me.
| atgctg wrote:
| Tiktoken added support for GPT-4o:
| https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...
|
| It has an increased vocab size of 200k.
| minimaxir wrote:
| For posterity, GPT-3.5/4's tokenizer was 100k. The benefit of a
| larger tokenizer is more efficient tokenization (and therefore
| cheaper/faster) but with massive diminishing returns: the
| larger tokenizer makes the model more difficult to train but
| tends to reduce token usage by 10-15%.
| simonw wrote:
| Oh interesting, does that mean languages other than English
| won't be paying such a large penalty in terms of token lengths?
|
| With previous tokenizers there was a notable increase in the
| number of tokens needed to represent non-English sentences:
| https://simonwillison.net/2023/Jun/8/gpt-tokenizers/
| mike_hearn wrote:
| Does that imply they retrained the foundation model from
| scratch? I thought changing the tokenization was something you
| couldn't really retrofit to an existing model. I mean sure they
| might have initialized the weights from the prior GPT-4 model
| but it'd still require a lot of retraining.
| og_kalu wrote:
| Yeah and they say as much in the blog.
| moffkalast wrote:
| Lots of those tokens would have to be pixel patches and sound
| samples right?
| nojvek wrote:
| Yep. Since it's multimodal. Pictures, text, audio all go into
| token space.
| kristofferR wrote:
| How are they able to use such a brand name, Tiktoken? Is it
| because TikTok is Chinese? Tiktoken, it's almost like if Apple
| released the Facebooken library for something entirely
| unrelated to Facebook.
| FergusArgyll wrote:
| First Impressions in no particular order: Being
| able to interrupt while GPT is talking 2x faster/cheaper
| not really a much smarter model Desktop app that can see
| screenshots Can display emotions with and change the sound
| of "it's" voice
| riffic wrote:
| wondering what apple is cooking up and what they'll announce
| next month.
|
| by the way the contraction "it's" is used to say "it is" or "it
| has", it is never a possessive form.
| karaterobot wrote:
| Unless you're talking about that sewer clown's balloon!
| throwup238 wrote:
| _Mac only_ desktop app. Windows version "later this year". No
| Linux.
|
| Welp there goes my Plus subscription.
| bmoxb wrote:
| It seems like a very odd decision. It's not like OpenAI can't
| afford to develop versions of the application for each OS in
| parallel.
| unstatusthequo wrote:
| Why? Just use the API or normal web access version like you
| have been since ChatGPT became available at all.
| ralusek wrote:
| Can't find info which of these new features are available via the
| API
| tazu wrote:
| > Developers can also now access GPT-4o in the API as a text
| and vision model. GPT-4o is 2x faster, half the price, and has
| 5x higher rate limits compared to GPT-4 Turbo. We plan to
| launch support for GPT-4o's new audio and video capabilities to
| a small group of trusted partners in the API in the coming
| weeks.
| ralusek wrote:
| [EDIT] The model has since been added to the docs
|
| Not seeing it or any of those documented here:
|
| https://platform.openai.com/docs/models/overview
| OutOfHere wrote:
| It is not listed as of yet, but it does work if you punch
| in gpt-4o. I will stick with gpt-4-0125-preview for now
| because gpt-4o seems majorly prone to hallucinations
| whereas gpt-4-0125-preview doesn't.
| Jensson wrote:
| The most impressive part is that the voice uses the right
| feelings and tonal language during the presentation. I'm not sure
| how much of that was that they had tested this over and over, but
| it is really hard to get that right so if they didn't fake it in
| some way I'd say that is revolutionary.
| gdb wrote:
| (I work at OpenAI.)
|
| It's really how it works.
| xanderlewis wrote:
| I like the humility in your first statement.
| moab wrote:
| Pretty sure the snark is unnecessary.
| ayhanfuat wrote:
| Was it snark? To me it sounds like "we all know you
| Greg"?
| xanderlewis wrote:
| This was my intention.
| moab wrote:
| I misunderstood; my apologies.
| colecut wrote:
| I don't think it was snark. The guy is co-founder and cto
| of OpenAi, and he didn't mention any of that..
| renewiltord wrote:
| I downvoted independently. No problem with groupies. They
| just contaminate the thread.
|
| Greg Brockman is famous for good reasons but constant "oh
| wow it's Greg Brockman" are noisy.
| egillie wrote:
| not snark. if only hn comments could show the right
| feelings and tonal language
| Induane wrote:
| I like their username.
| belter wrote:
| You might be talking to GPT-5...
| theboat wrote:
| I love how this comment proves the need for audio2audio. I
| initially read it as sarcastic, but now I can't tell if
| it's actually sincere.
| jamestimmins wrote:
| With this capability, how close are y'all to it being able to
| listen to my pronunciation of a new language (e.g. Italian)
| and given specific feedback about how to pronounce it like a
| local?
|
| Seems like these would be similar.
| taytus wrote:
| The italian output in the demo was really bad.
| thegabriele wrote:
| Why would you say "really bad"?
| bzudo wrote:
| It doesn't have hands.
| mark38848 wrote:
| So good!
| DonHopkins wrote:
| "I Have No Hands But I Must Scream" -Italian Ellison
| rezonant wrote:
| Joke of the day right there :-)
| GaggiX wrote:
| I'm a native Italian speaker, it wasn't too bad.
| riquito wrote:
| The content was correct but the pronunciation was awful.
| Now, good enough? For sure, but I would not be able to
| stand something talking like that all the time
| ljsprague wrote:
| Do you not have to work with non-native speakers of
| whatever language you use at work?
| Jensson wrote:
| Most people don't, since you either speak with native
| speakers or you speak in English mostly, since in
| international teams you speak in English and not one of
| the native languages even if nobody speaks English
| natively. So it is rare to hear broken non-English.
|
| And note that understanding broken language is a skill
| you have to train. If you aren't used to it then it is
| impossible to understand what they say. You might not
| have been in that situation if you are an English speaker
| since you are so used to broken English, but it happens a
| lot for others.
| sunnybeetroot wrote:
| Which video title is this?
| elil17 wrote:
| It completely botched teaching someone to say "hello" in
| Chinese - it used the wrong tones and then incorrectly told
| them their pronunciation was good.
| ShakataGaNai wrote:
| If you read into the details on openai's site, a lot of
| this stuff is clearly marked as english-first. For some
| written languages...noted as anything using non-roman
| characters, so most of asia, it basically doesn't work.
|
| This really isn't surprising. Look at Google Home and
| Alexa. When they first came out, if you weren't a white
| male from the west coast... the accuracy of translating
| commands dropped DRAMATICALLY. Because it was programmed,
| designed and tested by majority of white tech bros in
| SF/Seattle. They've gotten a lot better over the last 5+
| years. But I think you'll see OpenAI take this route.
|
| But that's ok. They have to start somewhere. Once they
| get the model working _really well_ for one language they
| can expand into similar ones with relatively little work.
| The more different the language, the more hard work and
| "local input" (ex: natives of the language) will be
| required for adaptation. But the basic text translations
| are still already way better than they used to be.
| joseda-hg wrote:
| An interesting point, I tend to have better outcomes by
| using my heavily accented ESL English, than my native
| pronunciation of my mother tongue I'm guessing it's part
| of the tech work force being a bit more multicultural
| than initially thought, or it just being easier to test
| with
|
| It's a shame, because that means I can use stuff that I
| can't recommend to people around me
|
| Multilingual UX is an interesting painpoint, I had to
| change the language of my account to English so I could
| use some early Bard version, even though It was perfectly
| able to understand and answer in Spanish
| zenlikethat wrote:
| You also get the synchronicity / four minute mile effect
| egging on other people to excel with specialized models,
| like Falcon or Qwen did in the wake of the original
| ChatGPT/Llama excitement.
| greatpostman wrote:
| Racist post
| kolinko wrote:
| What? Did it seriously work worse for women? Spurce?
|
| (accents sure)
| qprofyeh wrote:
| As for the Mandarin tones, the model might have mixed it
| up with the tones from a dialect like Cantonese. It's
| interesting to discover how much difference a more
| specific prompt could make.
| dgroshev wrote:
| I don't think that'd work without a dedicated startup
| behind it.
|
| The first (and imo the main) hurdle is not reproduction,
| but just learning to hear the correct sounds. If you don't
| speak Hindi and are a native English speaker, this [1] is a
| good example. You can only work on nailing those consonants
| when they become as distinct to your ear as cUp and cAp are
| in English.
|
| We can get by by falling back to context (it's unlikely
| someone would ask for a "shit of paper"!), but it's
| impossible to confidently reproduce the sounds unless they
| are already completely distinct in our heads/ears.
|
| That's because we think we hear things as they are, but
| it's an illusion. Cup/cap distinction is as subtle to an
| Eastern European as Hindi consonants or Mandarin tones are
| to English speakers, because the set of meaningful sounds
| distinctions differs between languages. Relearning the
| phonetic system requires dedicated work (minimal pairs is
| one option) and learning enough phonetics to have the
| vocabulary to discuss sounds as they are. It's not enough
| to just give feedback.
|
| [1]: https://www.youtube.com/watch?v=-I7iUUp-cX8
| dilap wrote:
| > but it's impossible to confidently reproduce the sounds
| unless they are already completely distinct in our
| heads/ears
|
| interestingly, i think this isn't always true -- i was
| able to coach my native-spanish-speaking wife to
| correctly pronounce "v" vs "b" (both are just "b" in
| spanish, or at least her dialect) before she could hear
| the difference; later on she was developed the ability to
| hear it.
| estebank wrote:
| In the "Point and learn Spanish" video, when shown an Apple
| and a Banana, the AI said they were a Manzana (Apple) and a
| Pantalon (Pants).
| unsatchmo wrote:
| No, I just watched it closely and it definitely said un
| platano
| estebank wrote:
| I re watched it a few times to ensure it said platano
| before posting, and it honestly doesn't sound like it to
| me.
| david-gpu wrote:
| I'm a Spaniard and to my ears it clearly sounds like _"
| Es una manzana y un platano"_.
|
| What's strange to me is that, as far as I know, "platano"
| is only commonly used in Spain, but the accent of the AI
| voice didn't sound like it's from Spain. It sounds more
| like an American who speaks Spanish as a second language,
| and those folks typically speak some Mexican dialect of
| Spanish.
| afc wrote:
| I'm from Colombia and mostly say "platano".
| InvaderFizz wrote:
| I was about to comment the same thing about the accent.
| Even to my gringo ears, it sounds like an American
| speaking Spanish.
|
| Platano is commonly used for banana in Mexico, just
| bought some at a Soriana this weekend.
| patcon wrote:
| After watching the demo, _my_ question isn 't about how
| close it is to helping me _learn_ a language, but about how
| close it is to _being_ me in another language.
|
| Even styles of thought might be different in other
| languages, so I don't say that lightly... (stay strong,
| Sapir-Wharf, stay strong ;)
| hack_ml wrote:
| I was conversing with it in Hinglish (A combination of
| Hindi and English) which folks in Urban India use and it
| was pretty on point apart from some use of esoteric hindi
| words but i think with right prompting we can fix that.
| baq wrote:
| > (I work at OpenAI.)
|
| Winner of the 'understatement of the week' award (and it's
| only Monday).
|
| Also top contender in the 'technically correct' category.
| swyx wrote:
| and was briefly untrue for like 2 days
| behnamoh wrote:
| > Winner of the 'understatement of the week' award (and
| it's only Monday).
|
| Yes! As soon as I saw gdb I was like "that can't be Greg",
| but sure enough, that's him.
| mttpgn wrote:
| Licensing the emotion-intoned TTS as a standalone API is
| something I would look forward to seeing. Not sure how
| feasible that would be if, as a sibling comment suggested, it
| bypasses the text-rendering step altogether.
| skottenborg wrote:
| "(I work at OpenAI.)"
|
| Ah yes, also known as being co-founder :)
| terhechte wrote:
| Random OpenAI question: While the GPT models have become ever
| cheaper, the price for the tts models have stayed in the
| $15/1Mio char range. I was hoping this would also become
| cheaper at some point. There're so many apps (e.g. language
| learning) that quickly become too expensive given these
| prices. With the GPT-4o voice (which sounds much better than
| the current TTS or TTS HD endpoint) I thought maybe the
| prices for TTS would go down. Sadly that hasn't happened. Is
| that something on the OpenAI agenda?
| passion__desire wrote:
| hi gdb, could you please create an assistant AI that can
| filter low-quality HN discussion on your comment so that it
| can redirect my focus on useful stuff.
| 999900000999 wrote:
| How far are we away from something like a helmet with chat
| GPT and a video camera installed, I imagine this will be
| awesome for low vision people. Imagine having a guide tell
| you how to walk to the grocery store, and help you grocery
| shop without an assistant. Of course you have tons of
| liability issues here, but this is very impressive
| rfoo wrote:
| Can't wait for the moment when I can puta single line "Help
| me put this in the cart" on my product and magically sells
| better.
| smokel wrote:
| This Dutch book [1] by Gummbah has the text "Kooptip"
| imprinted on the cover, which would roughly translate to
| "Buying recommendation". It worked for me!
|
| [1] https://www.amazon.com/Het-geheim-verdwenen-mysterie-
| Dutch/d...
| DonHopkins wrote:
| https://en.wikipedia.org/wiki/Steal_This_Book
| macintux wrote:
| Just the ability to distinguish bills would be hugely
| helpful, although I suppose that's much less of a problem
| these days with credit cards and digital payment options.
| krainboltgreene wrote:
| > Imagine having a guide tell you how to walk to the
| grocery store
|
| I don't need to imagine that, I've had it for about 8
| years. It's OK.
|
| > help you grocery shop without an assistant
|
| Isn't this something you learn as a child? Is that a thing
| we need automated?
| jameshart wrote:
| OP specified they were imaging this for _low vision
| people_
| krainboltgreene wrote:
| I'm aware, I'm one of those people.
| bombcar wrote:
| Does it give you voice instructions based on what it
| _knows_ or is it actively watching the environment and
| telling you things like "light is red, car is coming"?
| jaggederest wrote:
| I assume it likes snacks, is quadrupedal, and does not
| have the proper mouth anatomy or diaphragm for human
| speech.
| ninininino wrote:
| just need the helmet https://openai.com/index/be-my-eyes/
| JieJie wrote:
| We're planning on getting a phone-carrying lanyard and she
| will just carry her phone around her neck with Be My Eyes^0
| looking out the rear camera, pointed outward. She's
| DeafBlind, so it'll be bluetoothed to her hearing aids, and
| she can interact with the world through the conversational
| AI.
|
| I helped her access the video from the presentation, and it
| brought her to tears. Now, she can play guitar, and the AI
| and her can write songs and sing them together.
|
| This is a big day in the lives of a lot of people whom
| aren't normally part of the conversation. As of today, they
| are.
|
| 0: https://www.bemyeyes.com/
| 999900000999 wrote:
| That's definitely cool!
|
| Eventually it would be better for these models to run
| locally from a security point if view, but this is a
| great first step.
| JieJie wrote:
| Absolutely. We're looking forward to Apple's
| announcements at WWDC this year, which analysts predict
| are right up that alley.
| silverquiet wrote:
| It sounds like the system that Marshall Brain envisioned in
| his novella, Manna.
| jaggederest wrote:
| That story has always been completely reasonable and
| plausible to me. Incredible foresight. I guess I should
| start a midlevel management voice automation company.
| bjtitus wrote:
| Is it possible to use this as a TTS model? I noticed on the
| announcement post that this is a single model as opposed to a
| text model being piped to a separate TTS model.
| cchance wrote:
| This is damn near one of the most impressive things, can only
| imagine especially with live translation and voice synthesis
| (eleven labs style) you'd be capable of to integrate with
| something like teams (select each persons language and do
| realtime translation to each persons native language, with
| their own voice and intonations would NUTS)
| purplerabbit wrote:
| There's so much pent up collaborative human energy trapped
| behind language barriers.
|
| Beautiful articulation.
|
| This is an enormous win for humanity.
| rane wrote:
| Will the new voice mode allow mixing languages in sentences?
|
| As a language learner, this would be tremendously useful.
| j-krieger wrote:
| I've always been wondering what GPT models lack that makes
| them "query->response" only. I've always tried to get
| chatbots to lose the initially needed query, with no avail.
| What would It take to get a GPT model to freely generate
| tokens in a thought like pattern? I think when I'm alone
| without query from another human. Why can't they?
| kolinko wrote:
| Just provide empty queey and that's it - it will generate
| tokens no prob.
|
| You can use any open source model wirthout any promot
| whatsoever
| ALittleLight wrote:
| In my ChatGPT app or on the website I can select GPT-4o as a
| model, but my model doesn't seem to work like the demo. The
| voice mode is the same as before and the images come from
| DALLE and ChatGPT doesn't seem to understand or modify them
| any better than previously.
| jacobsimon wrote:
| I couldn't quite tell from the announcement, but is there
| still a separate TTS step, where GPT is generating
| tones/pitches that are to be used, or is it completely end to
| end where GPT is generating the output sounds directly?
| derac wrote:
| It's one model with text/audio/image input and output.
| og_kalu wrote:
| >The most impressive part is that the voice uses the right
| feelings and tonal language during the presentation.
|
| Consequences of audio2audio (rather than audio >text
| text>audio). Being able to manipulate speech nearly as well as
| it manipulates text is something else. This will be a
| revelation for language learning amongst other things. And you
| can interrupt it freely now!
| pants2 wrote:
| However, this looks like it only works with speech - i.e. you
| can't ask it, "What's the tune I'm humming?" or "Why is my
| car making this noise?"
|
| I could be wrong but I haven't seen any non-speech demos.
| cube2222 wrote:
| Fwiw, the live demo[0] included different kinds of
| breathing, and getting feedback on it.
|
| [0]: https://youtu.be/DQacCB9tDaw?t=557
| throwaway11460 wrote:
| What about the breath analysis?
| pants2 wrote:
| I did see that, though my interpretation is that
| breathing is included in its voice tokenizer which helps
| it understand emotions in speech (the AI can generate
| breath sounds after all). Other sounds, like bird songs
| or engine noises, may not work - but I could be wrong.
| CooCooCaCha wrote:
| I suspect that like images and video, their audio system
| is or will become more general purpose. For example it
| can generate the sound of coins falling onto a table.
| jcims wrote:
| Anyone who has used elevenlabs for voice generation has found
| this to be the case. Voice to voice seems like magic.
| dyauspitr wrote:
| Elevenlabs isn't remotely close to how good this voice
| sounds. I've tried to use it extensively before and it just
| isn't natural. This voice from openAI and even the one
| chatGPT has been using is _natural_.
| twobitshifter wrote:
| I asked it to make a bird noise, instead it told me what a
| bird sounds like with words. True audio to audio should be
| able to be any noise, a trombone, traffic, a crashing sea,
| anything. Maybe there is a better prompt there but it did not
| seem like it.
| og_kalu wrote:
| The new voice mode has not rolled out yet. It's rolling out
| to plus users in the next couple weeks.
|
| Also it's possible this is trained on mostly speech.
| bredren wrote:
| I mention this down thread, but a symptom of a tech product of
| sufficient advancement is the nature of its introduction
| matters less and less.
|
| Based on the casual production of these videos, the product
| must be this good.
|
| https://news.ycombinator.com/item?id=40346002
| simonw wrote:
| That was very impressive, but it doesn't surprise me much given
| how good the voice mode is in the ChatGPT iPhone app is
| already.
|
| The new voice mode sounds better, but the current voice mode
| did also have inflection that made it feel much more natural
| than most computer voices I've heard before.
| Jensson wrote:
| Can you tell the current voice model what feelings and tone
| it should communicate with? If not it isn't even comparable,
| being able to control how it reads things is absolutely
| revolutionary, that is what was missing from using these AI
| models as voice actors.
| simonw wrote:
| No you can't, at least not directly - you can influence the
| tone it uses a little through the content you ask it to
| read.
|
| Being able to specifically request different tones is a new
| and very interesting feature.
| ecosystem wrote:
| +1. Check the demo video in OP titled "Sarcasm". Human
| asks GPTo to speak "dripping in sarcasm". The tone that
| comes back is spot on. Comparing that against current
| voice model is a total sea change.
| bredren wrote:
| The voice mode was quite good but the latency and start /
| stop has been encumbering.
| duckmysick wrote:
| Slight off-topic, but I noticed you've updated your llm CLI
| app to work with the 4o model (plus bunch of other APIs
| through plugins). Kudos for working extremely fast. I'm
| really grateful for your tool; I tried many others, but for
| some reason none clicked as much as your.
|
| Link in case other readers are curious:
| https://llm.datasette.io
| newzisforsukas wrote:
| Right to who? To me, the voice sounds like an over enthusiastic
| podcast interviewer. Whats wrong with wanting computers to
| sound like what people think computers should sound like?
| Jensson wrote:
| It understands tonal language, you can tell it how you want
| it to talk, I have never seen a model like that before. If
| you want it to talk like a computer you can tell it to, they
| did it during the presentation, that is so much better than
| the old attempts at solving this.
| sitkack wrote:
| You are a Zoomer sosh meeds influencer, please increase
| uptalk by 20% and vocal fry by 30%. Please inject slaps,
| "is dope" and nah and bra into your responses. Throw shade
| every 11 sentences.
| airstrike wrote:
| I'm not sure whether to laugh or cry...
| navigate8310 wrote:
| > voice sounds like an over enthusiastic podcast interviewer
|
| I believe it can be toned down using system prompts, which
| they'll expose in future iterations
| TacticalCoder wrote:
| As in the _Interstellar_ movie: chuckling
| to 0% no acting surprised not
| making bullshit when you don't know
| sangnoir wrote:
| > not making bullshit when you don't know
|
| LLMs today have no concept of epistemology, they don't
| ever "know" and are always making up bullshit, which
| usually is more-or-less correct as a side effect of
| minimizing perplexity.
| tr3ntg wrote:
| Right... enthusiastic and generally confused. It's uncanny
| valley level expressions. Still better than drab, monotonous
| speech though.
| eloisant wrote:
| So far I prefer the neutral tone of Alexa/Google Assistant.
| I like computers to feel like computers.
|
| It seems like we're in the skeuomorphism phase of AI where
| tools try to mimic humans like software tried mimic
| physical objects in the early 2000's.
|
| I can't wait for us to be passed that phase.
| px43 wrote:
| Then you can tell it to do that. It will use whatever
| intonations you prefer.
| kybernetikos wrote:
| Genuine People Personalities(tm), just like in Hitchikers.
| Perhaps one of the milder forms of 'We Created The Torment
| Nexus'.
| angryasian wrote:
| agree I don't get it. I just want the right information and
| explained well. I don't want to be social with a robot.
| Keyframe wrote:
| It's a computer from the valley.
| mvkel wrote:
| I was in the audience at the event. The only parts where it
| seemed to get snagged was hearing the audience reaction as an
| interruption. Which honestly makes the demo even better. It
| showed that hey, this is live.
|
| Magic.
| px43 wrote:
| I wonder when it will be able to understand that there is
| more than one human talking to it. It seems like even in
| today's demo if two people are talking, it can't tell them
| apart.
| ta-run wrote:
| Crazy that interruption also seems to work pretty smoothly
| nabakin wrote:
| Seems about as good as Azure's Speech Service. I wonder if
| that's what they are using behind the scenes
| Keyframe wrote:
| Somehow it also sounds almost like Dot Matrix from Spaceballs.
| burntalmonds wrote:
| Yeah, the female voice especially is really impressive in the
| demos. The voice always sounds natural. The male voice I heard
| wasn't as good. It wasn't terrible, but it had a somewhat
| robotic feel to it.
| Intralexical wrote:
| "Right" feelings and tonal language? "Right" for what? For
| _whom_?
|
| We've already seen how much damage dishonest actors can do by
| manipulating our text communications with words they don't
| mean, plans they don't intend to follow through on, and
| feelings they don't experience. The social media disinfo age
| has been bad enough.
|
| Are you sure you want a machine which is able to manipulate our
| emotions on an even more granular and targetted level?
|
| LLMs are still machines, designed and deployed by humans to
| perform a task. What will we miss if we anthropomorphize the
| product itself?
| modeless wrote:
| As far as I'm concerned this is the new best demo of all time.
| This is going to change the world in short order. I doubt they
| will be ready with enough GPUs for the demand the voice+vision
| mode is going to get, if it's really released to all free users.
|
| Now imagine this in a $16k humanoid robot, also announced this
| morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future
| is going to be wild.
| andy99 wrote:
| Really? If this was Apple it might make sense, for OpenAI it
| feels like a demo that's not particularly aligned with their
| core competency (a least by reputation) of building the most
| performant AI models. Or put another way, it says to me they're
| done building models and are now wading into territory where
| there are strong incumbents.
|
| All the recent OpenAI talk had me concerned that the tech has
| peaked for now and that expectations are going to be reset.
| modeless wrote:
| What strong incumbents are there in conversational voice
| models? Siri? Google Assistant? This is in a completely
| different league. I can see from the reaction here that
| people don't understand. But they will when they try it.
|
| Did you see it translate Italian? Have you ever tried the
| Google Translate/Assistant features for real time
| translation? They didn't train it to be a translator. They
| didn't make a translation feature. They just asked it. It's
| instantly better than every translation feature Google ever
| released.
| fidotron wrote:
| In common with Siri, Google Assistant, Alexa and chatgpt is
| the perception that over time the same thing actually gets
| worse.
|
| Whether it's real or not is a reasonably interesting
| question, because it's possible that all that occurs with
| the progress is our perception of how things should be
| advances. My gut feeling is it has been a bit of both
| though, in the sense the decline is real, and we expect
| things to improve.
|
| Who can forget Google demoing their AI making a call to a
| restaurant that they showed at I/O many years ago?
| Everyone, apparently.
| golol wrote:
| What Openai has done time and time again is completely change
| the landscape when the competitors have caught up and
| everyone thinks their lead is gone. They made image
| generation a thing. When GPT-3 became outdated they released
| ChatGPT. Instead of trying to keep Dalle competitive they
| released Sora. Now they change the game again with live
| audio+video.
| 10xDev wrote:
| The future is not going to be anymore wild than what you choose
| to do with the tech.
| modeless wrote:
| I disagree completely. Even people who never adopt this stuff
| personally will have their lives profoundly impacted. The
| only way to avoid it would be to live in a large colony where
| the technology is prohibited, like the Amish. But even the
| Amish feel the influence of technology to some degree.
| ilaksh wrote:
| This is so amazing.. are there any open source models that are in
| any way comparable? Fully multimodal audio-to-audio etc.?
| skilled wrote:
| Parts of the demo were quite choppy (latency?) so this definitely
| feels rushed in response to Google I/O.
|
| Other than that, looks good. Desktop app is great, but I didn't
| see no mention of being able to use your own API key so OS
| projects might still be needed.
|
| The biggest thing is bringing GPT-4 to free users, that is an
| interesting move. Depending on what the limits are, I might
| cancel my subscription.
| Jordan-117 wrote:
| Seems like it was picking up on the audience reaction and
| stopping to listen.
|
| To me the more troubling thing was the apparent hallucination
| (saying it sees the equation before he wrote it, commenting on
| an outfit when the camera was down, describing a table instead
| of his expression), but that might have just been latency
| awkwardness. Overall, the fast response is extremely
| impressive, as is the new emotional dimension of the voice.
| sebastiennight wrote:
| Aha, I think I saw the trick for the live demo: every time
| they used the "video feed", they did prompt the model
| specifically by saying:
|
| - "What are you seeing now"
|
| - "I'm showing this to you now"
|
| etc.
|
| The one time where he didn't prime the model to take a
| snapshot this way, was the time where the model saw the
| "table" (an old snapshot, since the phone was on the
| table/pointed at the table), so that might be the reason.
| tedsanders wrote:
| Yeah, the way the app currently works is that ChatGPT-4o
| only sees up to the moment of your last comment.
|
| For example, I tried asking ChatGPT-4o to commentate a
| soccer game, but I got pretty bad hallucinations, as the
| model couldn't see any new video come in after my
| instruction.
|
| So when using ChatGPT-4o you'll have to point the camera
| first and then ask your question - it won't work to first
| ask the question and then point the camera.
|
| (I was able to play with the model early because I work at
| OpenAI.)
| 152334H wrote:
| thanks
| ayhanfuat wrote:
| Commenting on the outfit was very weird indeed. Greg
| Brockman's demo includes some outfit related questions
| (https://twitter.com/gdb/status/1790071008499544518). It does
| seem very impressive though, even if they polished it on some
| specific tasks. I am looking forward to showing my desktop
| and asking questions.
| tailspin2019 wrote:
| Regarding the limits, I recently found that I was hitting
| limits very quickly on GPT-4 on my ChatGPT Plus plan.
|
| I'm pretty sure that wasn't always the case - it feels like
| somewhere along the lines the allowed usage was reduced, unless
| I'm imagining it. It wouldn't be such a big deal if there was
| more visibility of my current usage compared to my total
| "allowance".
|
| I ended up upgrading to ChatGPT Team which has a minimum of 2x
| users (I now use both accounts) but I resented having to do
| this - especially being forced to pay for two users just to
| meet their arbitrary minimum.
|
| I feel like I should not be hitting limits on the ChatGPT Plus
| paid plan at all based on my usage patterns.
|
| I haven't hit any limits on the Team plan yet.
|
| I hope they continue to improve the paid plans and become a bit
| more transparent about usage limits/caps. I really do not mind
| paying for this (incredible) tech, but the way it's being sold
| currently is not quite right and feels like paid users get a
| bit of a raw deal in some cases.
|
| I have API access but just haven't found an open source client
| that I like using as much as the native ChatGPT apps yet.
| emporas wrote:
| I use GPT from API in emacs, it's wonderful. Gptel is the
| program.
|
| Although API access through Groq to Llama 3 (8b and 70b) is
| so much faster, that i cannot stand how slow GPT is anymore.
| It is slooow, still very capable model, but marginally better
| than open source alternatives.
| Boss0565 wrote:
| you should try -4o. It's incredibly fast
| emporas wrote:
| Yes, of course, probably sometime in the following days.
| Some people mention it already works in the playground.
|
| I was wondering why OpenAI didn't release a smaller model
| but faster. 175 billion parameters works well, but speed
| sometimes is crucial. Like, a 20b parameters model could
| compute 10x faster.
| Boss0565 wrote:
| true. at least rn though, it types around the same speed
| of 3.5 turbo
| coder543 wrote:
| Have you tried groq.com? Because I don't think gpt-4o is
| "incredibly" fast. I've been frustrated at how slow
| gpt-4-turbo has been lately, and gpt-4o just seems to be
| "acceptably" fast now, which is a big improvement, but
| still, not groq-level.
| Jensson wrote:
| > Parts of the demo were quite choppy (latency?) so this
| definitely feels rushed in response to Google I/O.
|
| It just stops the audio feed when it detects sound instead of
| an AI detecting when it should speak, so that part is horrible
| yeah. A full AI conversation would detect the natural pauses
| where you give it room to speak or when you try to take the
| word from it by interrupting, there it was just some dumb
| script to just shut it off when it hears sound.
|
| But it is still very impressive for all the other part, that
| voice is really good.
|
| Edit: If anyone from OpenAI reads this, at least fade out the
| voice quickly instead of chopping it, hard chopping off audio
| doesn't sound good at all, so many experienced this
| presentation to be extremely buggy due to it.
| dharma1 wrote:
| what's the download link for the desktop app? can't find it
| mpeg wrote:
| seems like it might not be available for everyone? - my
| chatgpt plus doesn't show anything new, and also can't find
| the dekstop app
| russdill wrote:
| They need to fade the audio or add some vocal queue when it's
| being interrupted. It makes it sound like it's losing
| connection. What'll be really impressive is when it
| intentionally starts interrupting you.
| aantix wrote:
| Agree. While watching the demo video, I thought I was the one
| having connectivity issues.
| syntaxing wrote:
| I admit I drink the koolaid and love LLMs and their applications.
| But damn, the way it's responds in the demo gave me goosebumps in
| a bad way. Like an uncanny valley instincts kicks in.
| _Parfait_ wrote:
| You're watching the species be reduced to an LLM.
| warkdarrior wrote:
| Were humans an interesting species to start with, if they can
| be reduced to an LLM?
| throw310822 wrote:
| Yeah, maybe not, and what do you make of it? Now that the
| secret sauce has been revealed and it's nothing but the
| right proportions of the same old ingredients?
| Intralexical wrote:
| The reduction is not a lossless process.
| dougb5 wrote:
| Hey that LLM is trained on everything we've ever produced, so
| I wouldn't say we've been "reduced", more like copied. I'll
| save my self-loathing for when a very low-parameter model can
| do this.
| jimkleiber wrote:
| I just don't know if everything we've ever (in the digital
| age) produced and how it is being weighted by current
| cultural values will help us or hurt us more. I don't fully
| know how LLMs work with the weighting, I just imagine that
| there are controls and priorities put on certain values
| more than others and I just wonder how future generations
| will look back at our current priorities.
| TheSockStealer wrote:
| I also thought the screwups, although minor, were interesting.
| Like when it thought his face was a desk because it did not
| update the image it was "viewing". It is still not perfect,
| which made the whole thing more believable.
| mike00632 wrote:
| I was shocked at how quickly and naturally they were able to
| correct the situation.
| bbconn wrote:
| Yeah it made me realize that I actually don't want a human-like
| conversational bot (I have actual humans for that). Just teach
| me javascript like a robot.
| bamboozled wrote:
| Maybe it's the geek in me, but I don't want a talking
| computer.
|
| I have enough talking people to deal with already .
| SoftTalker wrote:
| I've worked in software and tech my whole life and there
| are few things I dislike more than talking to a computer.
|
| I don't use siri. I don't use speech-to-text. I don't use
| voice-response menus if I can push a button. I don't have a
| microphone on my computer.
|
| I don't know why this is. Most of the people I know think
| it's fun, or a novelty, or even useful. I just viscerally
| dislike it.
| wslack wrote:
| It should do that, because it's still not actually an
| intelligence. It's a tool that is figuring out what to say in
| response that sounds intelligent - and will often succeed!
| moffkalast wrote:
| That kind of _is_ an inteligence though. Chinese room meets
| solipsism and all that.
|
| It is interesting how insanely close their demo is to the
| OSes in the movie "Her", it's basically a complete real life
| reproduction.
| yCombLinks wrote:
| Welcome to half the people at your companies job.
| Intralexical wrote:
| And do you want _more_ of that?
| HeatrayEnjoyer wrote:
| It's more intelligent than many humans and most/all lesser
| animals. If it's not intelligent than I don't know what is.
| isurujn wrote:
| The chuckling made me uneasy for some reason lol. Calm down,
| you're not like us. Don't pretend!
| moffkalast wrote:
| Can't wait for Meta's version 2 years down the line that
| someone will eventually fine tune to Agent Smith's
| personality and voice.
|
| "Evolution, human. Evolution. Like the dinosaur. Look out
| that window. You've had your time. The future is our world.
| The future is our time."
| unsupp0rted wrote:
| Yes, the chuckling was uncanny, but for me even more uncanny
| was how the female model went up at the end to soften what she
| was saying? into a question? even though it wasn't a question?
|
| Eerily human female-like.
| drivers99 wrote:
| So I'm not the only one. Like I felt fear in a physical way.
| (Panic/adrenaline?) I'm sure I'd get used it but it was an
| interesting reaction. (I saw someone react that way to a
| talking Tandy 1000 once so, who knows.)
| hubraumhugo wrote:
| The movie Her has just become reality
| speedgoose wrote:
| It's getting closer. A few years ago the old Replika AI was
| already quite good as a romantic partner, especially when you
| started your messages with a * character to force OpenAI GPT-3
| answers. You could do sexting that OpenAI will never let you
| have nowadays with ChatGPT.
| aftbit wrote:
| Why does OpenAI think that sexting is a bad thing? Why is AI
| safety all about not saying things that are disturbing or
| offensive, rather than not saying things that are false or
| unaligned?
| volleygman180 wrote:
| I was surprised that the voice is a ripoff of the AI voice in
| that movie (Scarlett Johansson) too
| toxic72 wrote:
| I am suspicious that they licensed Scarlet's voice for that
| voice model (Sky IIRC)
| reducesuffering wrote:
| People realize where we're headed right? Entire human lives in
| front of a screen. Your online entertainment, your online job,
| your online friends, your online "relationship". Wake up, 12
| hours screentime, eat food, go to bed. Depression and drug
| overdoses currently at sky high levels. Shocker.
| emporas wrote:
| If i can program with just my voice, there is no reason to
| not be in nature 10 hours a day minimum. My grandparent even
| slept outside as long as it was daytime.
|
| Daytime is always a time to be outside, surrounded by many
| plants and stuff. It is a shame we have to be productive in
| some way, and most of production happens inside walls.
| lm28469 wrote:
| You're already twice as productive as your parents which
| were twice as productive as their parents.
|
| We should ask where the money went instead of thinking
| about telepathically coding from the woods
| emporas wrote:
| When it comes to the economy, some monkey business is
| going on, but i think you can be more optimistic about
| the capabilities technology like that unlocks for
| everyone on the planet.
|
| Being able to control machines just with our voice, we
| can instruct robots to bake food for us. Or lay bricks on
| a straight line and make a house. Or write code,
| genetically modify organisms and make nutritionally dense
| food to become 1000x smarter or stronger.
|
| There has to be some upsides, even though for the moment
| the situation with governments, banks, big corporations,
| military companies etc is not as bright as one would hope
| to be.
| tr3ntg wrote:
| Headed? We're there. Have been there. This just adds non-
| human sentient agents to the drama.
| hmmmhmmmhmmm wrote:
| With the news that Apple and OpenAI are closing / just closed a
| deal for iOS 18, it's easy to speculate we might be hearing about
| that exciting new model at WWDC...
| thefourthchime wrote:
| Yes, i'm pretty sure this is the new Siri. Absolutely amazing,
| it's pretty much "here" from the movie.
| chatcode wrote:
| Parsing emotions in vocal inflections (and reliably producing
| them in vocal output) seems quite under-hyped in this release.
|
| That seems to represent an entirely new depth of understanding of
| human reality.
| deadbabe wrote:
| Any appearance of understanding is just an illusion. It's an
| LLM, nothing more.
| chatcode wrote:
| Sure, but that seems like it'll be a distinction without a
| difference for many use cases.
|
| Having a reliable emotional model of a person based on their
| voice (or voice + appearance) can be useful in a thousand
| ways.
|
| Which seems to represent a new frontier.
| deadbabe wrote:
| It's sad that I get downvoted so easily just for saying the
| truth. People's beliefs about AI here seems to approach
| superstition rather than anything based in computer
| science.
|
| These LLM are nothing more than really big spreadsheets.
| hombre_fatal wrote:
| Or most of us know the difference between reductiveness
| and insightfulness.
|
| "Um it's just a big spreadsheet" just isn't good
| commentary and reminds me of people who think being
| unimpressed reveals some sort of chops about them, as if
| we might think of them as the Simon Cowell of tech
| because they bravely reduced a computer to an abacus.
| deadbabe wrote:
| Hyping things up with magical thinking isn't great
| either.
| chpatrick wrote:
| Isn't that what you're doing with the magic human
| understanding vs the fake machine understanding?
| chpatrick wrote:
| Any appearance of understanding is just an illusion. It's
| just a pile of meat, nothing more.
| mike00632 wrote:
| Does anyone else, when writing comments, feel that you need
| to add a special touch to somehow make it clear that a
| human wrote it?
| rvz wrote:
| Given that they are moving all these features to free users, it
| tells us that GPT-5 is around the corner and is significantly
| much better than their previous models.
| margorczynski wrote:
| Or maybe it is a desperation move after Llama 3 got released
| and the free mode will have such tight constraints that it will
| be unusable for anything a bit more serious.
| PoignardAzur wrote:
| Holy crap, the level of corporate cringe of that "two AIs talk to
| each other" scene is mind-boggling.
|
| It feels like a pretty strong illustration of the awkwardness of
| getting value from recent AI developments. Like, this is
| technically super impressive, but also I'm not sure it gives us
| anything we couldn't have one year ago with GPT-4 and ElevenLabs.
| sourcecodeplz wrote:
| It is quite nice how they keep giving premium features for free,
| after a while. I know openai is not open and all but damn, they
| do give some cool freebies.
| BoumTAC wrote:
| Did they provide the limit rate for free user ?
|
| Because I have the plus membership which is expensive
| (25$/month).
|
| But if the limit is high enough (or my usage low enough), there
| is no point for paying that much money for me.
| christianqchung wrote:
| Does anyone know how they're doing the audio part where Mark
| breaths too hard? Does his breathing get turned into all-caps
| text (AA EE OO) and that GPT4-o interprets that as him breathing
| too hard, or is there something more going on?
| GalaxyNova wrote:
| It can natively interpret voice now.
| Jordan-117 wrote:
| That's how it used to do it, but my understanding is that this
| new model processes audio directly. If it were a music
| generator, the original would have generated sheet music to
| send to a synthesizer (text to speech), while now it can create
| the raw waveform from scratch.
| modeless wrote:
| There is no text. The model understands ingests audio directly
| and also outputs audio directly.
| dclowd9901 wrote:
| Is it a stretch to think this thing could accurately "talk"
| with animals?
| jamilton wrote:
| Yes? Why would it be able to do that?
| ninininino wrote:
| I think they are assuming a world where you took this
| existing model but it was trained on a dataset of animals
| making noises to each other, so that you could then feed
| the trained model the vocalization of one animal and the
| model would be able to produce a continuation of audio
| that has a better-than-zero chance of being a realistic
| sound coming from another animal - so in other words, if
| dogs have some type of bark that encodes a "I found
| something yummy" message and other dogs tend to have some
| bark that encodes "I'm on my way" and we're just
| oblivious to all of that sub-text, then maybe the model
| would be able to communicate back and forth with an
| animal in a way that makes "sense" to the animal.
|
| Probably substitute dogs for chimps though.
|
| But obviously that doesn't solve at all or human-
| understandability, unless maybe you have it all as
| audio+video and then ask the model to explain what visual
| often accompanies a specific type of audio? Maybe the
| model can learn what sounds accompany violence or
| accompany the discovery of a source of water or
| something?
| dclowd9901 wrote:
| Yep, exactly what brought that to mind. Multimodal seems
| like the kind of thing needed for such a far-fetched
| idea.
| benlivengood wrote:
| Not really a stretch in my mind.
| https://www.earthspecies.org/ and others are working on it
| already.
| crindy wrote:
| Very impressed by the demo where it starts speaking French in
| error, then laughs with the user about the mistake. Such a
| natural recovery.
| spacebanana7 wrote:
| > We recognize that GPT-4o's audio modalities present a variety
| of novel risks
|
| > For example, at launch, audio outputs will be limited to a
| selection of preset voices and will abide by our existing safety
| policies.
|
| I wonder if they'll ever allow truly custom voices from audio
| samples.
| dkasper wrote:
| I think the issue there is less of a technical one and more of
| an issue with deepfakes and copyright
| spacebanana7 wrote:
| It might be possible to prove that I control my voice, or
| that of a given audio sample. For example by saying specific
| words on demand.
|
| But yeah I see how they'd be blamed if anything went wrong,
| which it almost certainly would in some cases.
| tomComb wrote:
| The price of 4o is 50% of GPT4-Turbo (and no mention of price
| change to gp4-turbo itself).
|
| Given the competitive pressures I was expecting a much bigger
| price drop than that.
|
| For non-multimodal uses, I don't think their API is at all
| competitive any more.
| mrklol wrote:
| Where you get something cheaper with similar experience?
| lagt_t wrote:
| Universal real time translation is incredibly dope.
|
| I hate video players without volume control.
| pachico wrote:
| jeez, that model really speaks a lot! I hope there's a way to
| make it more straight to the point rather than radio-like.
| causal wrote:
| Clicking the "Try it on ChatGPT" link just takes me to GPT-4 chat
| window. Tried again in an incognito tab (supposing my account is
| the issue) and it just takes me to 3.5 chat. Anyone able to use
| it?
| 101008 wrote:
| Same here and also I can't hear audio in any of the videos on
| this page. Weird.
| TrueDuality wrote:
| Weird visiting the page crashed my graphics driver using Firefox.
| msoad wrote:
| They are admitting[1] that the new model is the gpt2-chatbot that
| we have seen before[2]. As many highlighted there, the model is
| not an improvement like GPT3->GPT4. I tested a bunch of
| programming stuff and it was not that much better.
|
| It's interesting that OpenAI is highlighting the Elo score
| instead of showing results for many many benchmarks that all
| models are stuck at 50-70% success.
|
| [1] https://twitter.com/LiamFedus/status/1790064963966370209
|
| [2] https://news.ycombinator.com/item?id=40199715
| modeless wrote:
| "not that much better" is extremely impressive, because it's a
| much smaller and much faster model. Don't worry, GPT-5 is
| coming and it _will_ be better.
| TIPSIO wrote:
| Obviously given enough time there will always be better
| models coming.
|
| But I am not convinced it will be another GPT-4 moment. Seems
| like big focus on tacking together multi-modal clever tricks
| vs straight better intelligence AI.
|
| Hope they prove me wrong!
| kmeisthax wrote:
| The problem with "better intelligence" is that OpenAI is
| running out of human training data to pillage. Training AI
| on the output of AI smooths over the data distribution, so
| all the AIs wind up producing same-y output. So OpenAI
| stopped scraping text back in 2021 or so - because that's
| when the open web turned into an ocean of AI piss. I've
| heard rumors that they've started harvesting closed
| captions out of YouTube videos to try and make up the
| shortfall of data, but that seems like a way to stave off
| the inevitable[0].
|
| Multimodal is another way to stave off the inevitable,
| because these AI companies already are training multiple
| models on different piles of information. If you have to
| train a text model and an image model, why split your
| training data in half when you could train a combined model
| on a combined dataset?
|
| [0] For starters, most YouTube videos aren't manually
| captioned, so you're feeding GPT the output of Google's
| autocaptioning model, so it's going to start learning
| artifacts of what that model can't process.
| pbhjpbhj wrote:
| >harvesting closed captions out of YouTube videos
|
| I'd bet a lot of YouTubers are using LLMs to write and/or
| edit content. So we pass that through a human
| presentation. Then introduce some errors in the form of
| transcription. Turn feed the output in as part of a
| training corpus ... we plateaued real quick.
|
| It seems like it's hard to get past a level of human
| intelligence at which there's a large enough corpus of
| training data or trainers?
|
| Anyone know of any papers on breaking this limit to push
| machine learning models to super-human intelligence
| levels?
| pixl97 wrote:
| If a model is average human intelligence in pretty much
| everything, is that super-human or not? Simply put, we as
| individuals aren't average at everything, we have what
| we're good at and a great many things we're not. We
| average out by looking at broad population trends. That's
| why most of us in the modern age spend a lot of time on
| specialization for whatever we work in. Which brings the
| likely next place for data. A Manna (the story) like data
| collection program where companies hoover up everything
| they can on their above average employees till we're to
| the point most models are well above the human average in
| most categories.
| WhitneyLand wrote:
| Why do you think they're using Google auto-captioning?
|
| I would expect they're using their own t2s which is still
| a model but way better quality and potentially
| customizable to better suit their needs
| llm_trw wrote:
| >[0] For starters, most YouTube videos aren't manually
| captioned, so you're feeding GPT the output of Google's
| autocaptioning model, so it's going to start learning
| artifacts of what that model can't process.
|
| Whisper models are better than anything google has. In
| fact the higher quality whisper models are better than
| humans when it comes to transcribing text with
| punctuation.
| marvin wrote:
| At some point, algorithms for reasoning and long-term
| planning will be figured out. Data won't be the holy
| grail forever, and neither will asymptotically
| approaching human performance in all domains.
| mupuff1234 wrote:
| And how can one be so sure of that?
|
| Seems to me that performance is converging and we might not
| see a significant jump until we have another breakthrough.
| scarmig wrote:
| Yeah. There are lots of things we can do with existing
| capabilities, but in terms of progressing beyond them all
| of the frontier models seem like they're a hair's breadth
| from each other. That is not what one would predict if LLMs
| had a much higher ceiling than we are currently at.
|
| I'll reserve judgment until we see GPT5, but if it becomes
| just a matter of who best can monetize existing
| capabilities, OAI isn't the best positioned.
| diego_sandoval wrote:
| > Seems to me that performance is converging
|
| It doesn't seem that way to me. But even if it did, video
| generation also seemed kind of stagnant before Sora.
|
| In general, I think The Bitter Lesson is the biggest factor
| at play here, and compute power is not stagnating.
| drawnwren wrote:
| Computer power is not stagnating, but the availability of
| training data is. It's not like there's a second
| stackoverflow or reddit to scrape.
| robwwilliams wrote:
| No: soon the wide wild world itself becomes training
| data. And for much more than just an LLM. LLM plus
| reinforcement learning--this is were the capacity of our
| in silico children will engender much parental anxiety.
| diego_sandoval wrote:
| Agree.
|
| However, I think the most cost-effective way to train for
| real world is to train in a simulated physical world
| first. I would assume that Boston Dynamics does exactly
| that, and I would expect integrated vision-action-
| language models to first be trained that way too.
| pixl97 wrote:
| That's how everyone in robotics is doing these days.
|
| You take a bunch of mo-cap data and simulate it with your
| robot body. Then as much testing as you can with the
| robot and feed the behavior back in to the model for fine
| tuning.
|
| Unitree gives an example of the simulation versus what
| the robot can do in their latest video
|
| https://www.youtube.com/watch?v=GzX1qOIO1bE
| Animats wrote:
| This may create a market for surveillance camera data and
| phone calls.
|
| "This conversation may be recorded and used for training
| purposes" now takes on a new meaning.
|
| Can car makers sell info from everything that happens in
| their cars?
| abenga wrote:
| Well, this is a massively horrifying possibility.
| bigyikes wrote:
| It isn't clear that we are running out of training data,
| and it is becoming increasingly clear that AI-generated
| training data actually works.
|
| For the skeptical, consider that humans can be trained on
| material created by less intelligent humans.
| rglullis wrote:
| > humans can be trained on material created by less
| intelligent humans.
|
| For the skeptics, "AI models" are not intelligent at all
| so this analogy makes no sense.
|
| You can teach lots of impressive tricks to dogs, but
| there is no amount of training that will teach them basic
| algebra.
| diego_sandoval wrote:
| I don't think training data is the limiting factor for
| current models.
| emporas wrote:
| It is a limiting factor, due to diminishing returns. A
| model trained on double the data, will be 10% better, if
| that!
|
| When it comes to multi-modality, then training data is
| not limited, because of many different combinations of
| language, images, video, sound etc. Microsoft did some
| research on that, teaching spacial recognition to an LLM
| using synthetic images, with good results. [1]
|
| When someone states that there are not enough training
| data, they usually mean code, mathematics, physics,
| logical reasoning etc. In the open internet right now,
| there are is not enough code to make a model 10x better,
| 100x better and so on.
|
| Synthetic data will be produced of course, scarcity of
| data is the least worrying scarcity of all.
|
| Edit: citation added,
|
| [1] VoT by MS
| https://medium.com/@multiplatform.ai/microsoft-
| researchers-p...
| MVissers wrote:
| Soon these models are cheap enough to learn in the real
| world. Reduced costs allows for usage at massive scale.
|
| Releasing models to users that where users can record
| video is more data. Users conversing with AI is also
| additional data.
|
| Another example is models that code- And then debug the
| code and learn from that.
|
| This will be anywhere, and these models will learn from
| anything we do/publish online/discuss. Scary.
|
| Pretty soon- OpenAI will have access to
| wavemode wrote:
| > video generation also seemed kind of stagnant before
| Sora
|
| I take the opposite view. I don't think video generation
| was stagnating at all, and was in fact probably the area
| of generative AI that was seeing the biggest active
| strides. I'm highly optimistic about the future
| trajectory of image and video models.
|
| By contrast, text generation has not improved
| significantly, in my opinion, for more than a year now,
| and even the improvement we saw back then was relatively
| marginal compared to GPT-3.5 (that is, for most day-to-
| day use cases we didn't really go from "this model can't
| do this task" to "this model can now do this task". It
| was more just "this model does these pre-existing tasks,
| in somewhat more detail".)
|
| If OpenAI really is secretly cooking up some huge
| reasoning improvements for their text models, I'll eat my
| hat. But for now I'm skeptical.
| Eisenstein wrote:
| > By contrast, text generation has not improved
| significantly, in my opinion, for more than a year now
|
| With less than $800 worth of hardware including
| everything but the monitor, you can run an open weight
| model more powerful than GPT 3.5 locally, at around 6 -
| 7T/s[0]. I would say that is a huge improvement.
|
| [0] https://www.reddit.com/r/LocalLLaMA/comments/1cmmob0/
| p40_bui...
| aantix wrote:
| The use of AI in the research of AI accelerates everything.
| thefaux wrote:
| I'm not sure of this. The jury is still out on most ai
| tools. Even if it is true, it may be in a kind of strange
| reverse way: people innovating by asking what ai can't do
| and directing their attention there.
| bigyikes wrote:
| There is an increasing amount of evidence that using AI
| to train other AI is a viable path forward. E.g. using
| LLMs to generate training data or tune RL policies
| talldayo wrote:
| Chalmers: "GPT-5? A vastly-improved model that somehow
| reduces the compute overhead while providing better answers
| with the same hardware architecture? At this time of year? In
| this kind of market?"
|
| Skinner: "Yes."
|
| Chalmers: "May I see it?"
|
| Skinner: "No."
| pwdisswordfishc wrote:
| Incidentally, this dialogue works equally well, if not
| better, with David Chalmers versus B.F. Skinner, as with
| the Simpsons characters.
| AaronFriel wrote:
| It has only been a little over one year since GPT-4 was
| announced, and it was at the time the largest and most
| expensive model ever trained. It might still be.
|
| Perhaps it's worth taking a beat and looking at the
| incredible progress in that year, and acknowledge that
| whatever's next is probably "still cooking".
|
| Even Meta is still baking their 400B parameter model.
| bamboozled wrote:
| Legit love progress
| 1024core wrote:
| As Altman said (paraphrasing): GPT-4 is the _worst_ model
| you will ever have to deal with in your life (or
| something to that effect).
| andrepd wrote:
| I will believe it when I see it. People like to point at
| the first part of a logistic curve and go "behold! an
| exponential".
| nwienert wrote:
| Ah yes my favorite was the early covid numbers, some of
| the "smartest" people in the SF techie scene were daily
| on Facebook thought-leadering about how 40% of people
| were about to die in the likely case.
| tiptup300 wrote:
| and boy did the stockholders like that one.
| dyauspitr wrote:
| What stockholders. They're investors at this point. I
| wish I could get in on it.
| talldayo wrote:
| They're rollercoaster riders, being told lusterous
| stories by gold-panners while the shovel salesman counts
| his money and leaves.
| dvfjsdhgfv wrote:
| Why should I believe anything he says?
| markk wrote:
| I found this statement by Sam quite amusing. It transmits
| exactly zero information (it's a given that models will
| improve over time), yet it sounds profound and ambitious.
| og_kalu wrote:
| GPT-3 was released in 2020 and GPT-4 in 2023. Now we all
| expect 5 sooner than that but you're acting like we've been
| waiting years lol.
| skepticATX wrote:
| The increased expectations are a direct result of LLM
| proponents continually hyping exponential capabilities
| increase.
| og_kalu wrote:
| The time for the research, training, testing and
| deploying of a new model at frontier scales doesn't
| change depending on how hyped the technology is. I just
| think the comment i was replying to lacks perspective.
| Sharlin wrote:
| People who buy into hype deserve to be disappointed. Or
| burned, as the case may be.
| throwthrowuknow wrote:
| So if not exponential, what would you call adding voice
| and image recognition, function calling, greatly
| increased token generation speed, reduced cost, massive
| context window increases and then shortly after combining
| all of that in a truly multi modal model that is even
| faster and cheaper while adding emotional range and
| singing in... _checks notes_ ...14 months?! Not to
| mention creating and improving an API, mobile apps, a
| marketplace and now a desktop app. OpenAI ships and they
| are doing so in a way that makes a lot of business sense
| (continue to deliver while reducing cost). Even if they
| didn't have another flagship model in their back pocket
| I'd be happy with this rate of improvement but they are
| obviously about to launch another one given the teasers
| Mira keeps dropping.
| skepticATX wrote:
| All of that is awesome, and makes for a better product.
| But it's also primarily an engineering effort. What
| matters here is an increase in intelligence. And we're
| not seeing that aside from very minor capability
| increases.
|
| We'll see if they have another flagship model ready to
| launch. I seriously doubt it. I suspect that this was
| supposed to be called GPT-5, or at the very least
| GPT-4.5, but they can't meet expectations so they can't
| use those names.
| dwaltrip wrote:
| Pay attention to the signal, ignore the noise.
| dlivingston wrote:
| "Seymour, the house is on fire!"
|
| "No, mother, that's just the H100s."
| dialup_sounds wrote:
| Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"
|
| Skinner (looking up): No, mother, it's just the Nvidia
| GPUs.
| dialup_sounds wrote:
| Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"
|
| Skinner (looking up): "No, mother, it's just the Nvidia
| GPUs."
| moomoo11 wrote:
| I really hope GPT5 is good. GPT4 sucks at programming.
| verdverm wrote:
| Look to a specialized model instead of a general purpose
| one
| moomoo11 wrote:
| Any suggestions? Thanks
|
| I have tried Phind and anything beyond mega junior tier
| questions it suffers as well and gives bad answers.
| twsted wrote:
| It's better than at least 50% of the developers I know.
| Jensson wrote:
| A developer that just pastes in code from gpt-4 without
| checking what it wrote is a horror scenario, I don't
| think half of the developers you know are really that
| bad.
| cududa wrote:
| It's excellent at programming if you actually know the
| problem you're trying to solve and the technology. You need
| to guide it with actual knowledge you have. Also, you have
| to adapt your communication style to get good results. Once
| you 'crack the pattern' you'll have a massive productivity
| boost
| partiallypro wrote:
| In my experience 3.5 was better at programming than 4,
| and I don't know why.
| littlestymaar wrote:
| I don't think a bigger model would make sense for OpenAI:
| it's much more important for them that they keep driving
| inference coat down, because there's no viable business model
| if they don't.
|
| Improving the instruction tuning, the RLHF step, increase the
| training size, work on multilingual capabilities, etc. make
| sense as a way to improve quality, but I think increasing
| model size doesn't. Being able to advertize a big
| breakthrough may make sense in terms of marketing, but I
| don't believe it's going to happen for two reasons:
|
| - you don't release intermediate steps when you want to be
| able to advertise big gains, because it raises the baseline
| and reduce the effectiveness of your "big gains" in terms of
| marketing.
|
| - I don't think they would benefit in an arm race with Meta,
| trying to keeping a significant edge. Meta is likely to be
| able to catch-up eventually on performance, but they are not
| so much of a threat in terms of business. Focusing on keeping
| a performance edge instead of making their business viable
| would be a strategic blunder.
| jononor wrote:
| What is OpenAI business model if their models are second-
| best? Why would people pay them and not
| Meta/Google/Microsoft - who can afford to sell at very low
| margins, since they have existing very profitable
| businesses that keeps them afloat.
| littlestymaar wrote:
| That's _the_ question OpenAI needs to find an answer to
| if they want to end up viable.
|
| They have the brand recognition (for ChatGPT) and that's
| a good start, but that's not enough. Providing a best in
| class user experience (which seems to be their focus now,
| with multimodality), a way to lock down their customers
| in some kind of walled garden, building some kind of
| network effect (what they tried with their marketplace
| for community-built "GPTs" last fall but I'm not sure
| it's working), something else?
|
| At the end of the day they have no technological moat, so
| they'll need to build a business one, or perish.
|
| For most tasks, pretty much every models from their
| competitors is more than good enough already, and it's
| only going to get worse as everyone improves. Being
| marginally better on 2% of tasks isn't going to be
| enough.
| Eisenstein wrote:
| I know it is super crazy, but maybe they could become a
| non-profit and dedicate themselves to producing open
| source AI in an effort to democratize it and make it safe
| (as in, not walled behind a giant for-profit corp that
| will inevitably enshittify it).
|
| I don't know why they didn't think about doing that
| earlier, could have been a game changer, but there is
| still an opportunity to pivot.
| cube2222 wrote:
| I think the live demo that happened on the livestream is best
| to get a feel for this model[0].
|
| I don't really care whether it's stronger than gpt-4-turbo or
| not. The direct real-time video and audio capabilities _are
| absolutely magical and stunning_. The responses in voice mode
| are now instantaneous, you can interrupt the model, you can
| talk to it while showing it a video, and it understands (and
| uses) intonation and emotion.
|
| Really, just watch the live demo. I linked directly to where it
| starts.
|
| Importantly, this makes the interaction a lot more "human-
| like".
|
| [0]: https://youtu.be/DQacCB9tDaw?t=557
| gabiruh wrote:
| It's weird that the "airplane mode" seems to be ON on the
| phone during the entire presentation.
| arthurcolle wrote:
| This was on purpose - they connected it to the internet via
| a USB-C cable it appears, for consistent internet instead
| of having it switch WiFi
|
| Probably some kinks there they are working out
| _flux wrote:
| And eliminate the change of some prankster affecting the
| demo by attacking the wifi.
| OJFord wrote:
| > Probably some kinks there they are working out
|
| Or just a good idea for a live demo on a congested
| network/environment with a lot of media present, at least
| one live video stream (the one we're watching the
| recording of), etc.
|
| At least that's how I understood it, not that they had a
| problem with it (consistently or under regular
| conditions, or specific to their app).
| hbn wrote:
| That's very common practice for live demos. To avoid
| situations like this:
|
| https://www.youtube.com/watch?v=6lqfRx61BUg
| simoes wrote:
| They mention at the beginning of the video that they are
| using hardwired internet for reliability reasons.
| sitkack wrote:
| You would want to make sure that it is always going over
| WiFi for the demo and doesn't start using the cellular
| network for a random reason.
| rightbyte wrote:
| You can turn off mobile data. They probably just wanted
| wired internet.
| fvdessen wrote:
| The demo is impressive but personally, as a commercial user,
| for my practical use cases, the only thing I care about is
| how smart it is, how accurate are its answers and how vast is
| its knowledge. These haven't changed much since GPT-4, yet
| they should, as IMHO it is still borderline in its abilities
| to be really that useful
| CapcomGo wrote:
| But that's not the point of this update
| fvdessen wrote:
| I know, and I know my comment is dismissive of the
| incredible work shown here, as we're shown sci-fi level
| tech. But I feel I have this kettle, that boils water in
| 10min, and it really should boil it in 1, but instead is
| now voice operated.
|
| I hope the next version delivers on being smarter, as
| this update instead of making me excited, makes me feel
| they've reached a plateau on the improvement of the core
| value and are distracting us with fluff instead
| hombre_fatal wrote:
| Sure, but "not enough, I want moar" is a trivial demand.
| So trivial that it goes unsaid.
| bennyhill wrote:
| It's equivalent to "nothing to see here" which is exactly
| the TLDR I was looking for.
| shepherdjerred wrote:
| Everything is amazing & Nobody is happy:
| https://www.youtube.com/watch?v=PdFB7q89_3U
| 0xB31B1B wrote:
| gpt4 isn't quite "amazing" in terms of commercial use.
| Gpt4 is often good, and also often mediocre or bad. Its
| not going to change the world, it needs to get better.
| Spivak wrote:
| It's an impressive demo, it's not (yet) an impressive
| product.
|
| It seems like the people who are ohhing and ahhing at the
| former and the people who are frustrated that this kind
| of this is unbelivably impractical to productize will be
| doomed to talk past one another forever. The text
| generation models, image generation models, speech-to-
| text and text-to-speech have reached impressive product
| stages. Multi-model hasn't got there because no one is
| really sure what to actually _do_ with the thing outside
| of make cool demos.
| 0xB31B1B wrote:
| Multi modal isn't there because "this is an image of a
| green plant" is viable in a demo, but its not
| commercially viable. "This is an image of a monstera
| deliciosa" is commercially viable, but not yet demoable.
| The models need to improve to be usable.
| dvaun wrote:
| Near real-time voice feedback isn't amazing? Has the bar
| risen this high?
|
| I already know an application for this, and AFAIK it's
| being explored in the SaaS space: guided learning
| experiences and tutoring for individuals.
|
| My kids, for instance, love to hammer Alexa with random
| questions. They would spend a huge amount of time using a
| better interface, esp. with quick feedback, that provided
| even deeper insight and responses to them.
|
| Taking this and tuning it to specific audiences would
| make it a great tool for learning.
| 0xB31B1B wrote:
| "My kids, for instance, love to hammer Alexa with random
| questions. They would spend a huge amount of time using a
| better interface, esp. with quick feedback, that provided
| even deeper insight and responses to them."
|
| Great, using GPT-4 the kids will be getting a lot of
| hallucinated facts returned to them. There are good use
| cases for tranformer currently but they're not at the
| "impact company earnings or country GDP" stage currently,
| which is the promise that the whole industry has
| raised/spent 100+B dollars on. Facebook alone is spending
| 40B on AI. I believe in the AI future, but the only thing
| that matters for now is that the models improve.
| practice9 wrote:
| I always double-check even the most obscure facts
| returned by GPT-4 and have yet to see a hallucination (as
| opposed to Claude Opus that sometimes made up historical
| facts). I doubt stuff interesting to kids would be so out
| of the data distribution to return a fake answer.
|
| Compared to YouTube and Google SEO trash, or Google Home
| / Alexa (which do search + wiki retrieval), at the moment
| GPT-4 and Claude are unironically safer for kids: no
| algorithmic manipulation, no ads, no affiliated trash
| blogs, and so on. Bonus is that it can explain on the
| level of complexity the child will understand for their
| age
| dvaun wrote:
| My kids get erroneous responses from Alexa. This happens
| all the time. The built-in web search doesn't provide
| correct answers, or is confusing outright. That's when
| they come to me or their Mom and we provide a better
| answer.
|
| I still see this as a cool application. Anything that
| provides easier access to knowledge and improved learning
| is a boon.
|
| I'd rather worry about the potential economic impact than
| worry about possible hallucinations from fun questions
| like "how big is the sun?" or "what is the best videogame
| in the world?", etc.
|
| There's a ton you can do here, IMO.
|
| Take a look at mathacademy.com, for instance. Now slap a
| voice interface on it, provide an ability for
| kids/participants to ask questions back and forth, etc.
| Boom: you've got a math tutor that guides you based on
| your current ability.
|
| What if we could get to the same style of learning for
| languages? For instance, I'd love to work on Spanish.
| It'd be far more accessible if I could launch a web
| browser and chat through my mic in short spurts, rather
| than crack open Anki and go through flash cards, or wait
| on a Discord server for others to participate in
| immersive conversation.
|
| Tons of cool applications here, all learning-focused.
| throwthrowuknow wrote:
| Watch the last few minutes of that linked video, Mira
| strongly hints that there's another update coming for
| paid users and seems to make clear that GPT4o is moreso
| for free tier users (even though it is obviously a huge
| improvement in many features for everyone).
| whyever wrote:
| They say it's twice as fast/cheap, which might matter for
| your use case.
| minimaxir wrote:
| It's twice as fast/cheap relative to GPT-4-turbo, which
| is still expensive compared to GPT-3.5-turbo and Claude
| Haiku.
|
| https://openai.com/api/pricing/
| c0t300 wrote:
| but better afaik
| minimaxir wrote:
| But may not be better _enough_ to warrant the cost
| difference. LLM cost econonmics are complicated.
| fvdessen wrote:
| I'd much rather have it be slower, more expensive, but
| smarter
| pests wrote:
| Then the current offering should suffice, right?
| specproc wrote:
| Depends what you want it for. I'm still holding out for a
| decent enough open model, Llama 3 is tantalisingly close,
| but inference speed and cost are serious bottlenecks for
| any corpus-based use case.
| abdullin wrote:
| I think, that might come with the next GPT version.
|
| OpenAI seems to build in cycles. First they focus on
| capabilities, then they work on driving the price down
| (occasionally at some quality degradation)
| ben_w wrote:
| I understand your point, and agree that it is "borderline"
| in its abilities -- though I would instead phrase it as "it
| feels like a junior developer or an industrial placement
| student, and assume it is of a similar level in all other
| skills", as this makes it clearer when it is or isn't a
| good choice, and it also manages expectations away from
| both extremes I frequently encounter (that it's either Cmdr
| Data already, or that's it's a no good terrible thing only
| promoted by the people who were previously selling Bitcoin
| as a solution to all the economics).
|
| That said, given the price tag, when AI becomes _genuinely
| expert_ then I 'm probably not going to have a job and
| neither will anyone else (modulo how much electrical power
| those humanoid robots need, as the global electricity
| supply is currently only 250 W/capita).
|
| In the meantime, making it a properly real-time
| conversational partner... wow. Also, that's kinda what you
| need for real-time translation, because: <<be this, that
| different languages the word order totally alter and
| important words at entirely different places in the
| sentence put>>, and real-time "translation" (even when done
| by a human) therefore requires having a good idea what the
| speaker was going to say before they get there, _and_ being
| able to back-track when (as is inevitable) the anticipated
| topic was actually something completely different and so
| the "translation" wasn't.
| fvdessen wrote:
| I guess I feel like I'll get to keep my job a while
| longer and this is strangely disappointing...
|
| A real time translator would be a killer app indeed, and
| it seems not so far away, but note how you have to prompt
| the interaction with 'Hey ChatGPT'; it does not interject
| on its own. It is also unclear if it is able to
| understand if multiple people are speaking and who's who.
| I guess we'll see soon enough :)
| ben_w wrote:
| > It is also unclear if it is able to understand if
| multiple people are speaking and who's who. I guess we'll
| see soon enough :)
|
| Indeed; I would be _pleasantly surprised_ if it can both
| notice and separate multiple speakers, but only a bit
| surprised.
| jll29 wrote:
| There is room for more than one use case and large language
| model type.
|
| I predict there will be a zoo (more precisely tree, as in
| "family tree") of models and derived models for particular
| application purposes, and there will be continued
| development of enhanced "universal"/foundational models as
| well. Some will focus on minimizing memory, others on
| minimizing pre-training or fine-tuning energy consumption,
| some need high accuracy, others hard realtime speed, yet
| others multimodality like GPT4.o, some multilinguality, and
| so on.
|
| Previous language models that encoded dictionaries for
| spellcheckers etc. never got standardized (for instance,
| compare aspell dictionaries to the ones from LibreOffice to
| the language model inside CMU PocketSphinx) so that you
| could use them across applications or operating systems. As
| these models are becoming more common, it would be
| interesting to see this aspect improve this time around.
|
| https://www.rev.com/blog/resources/the-5-best-open-source-
| sp...
| CooCooCaCha wrote:
| I disagree, transfer learning and generalization are
| hugely powerful and specialized models won't be as good
| because their limited scope limits their ability to
| generalize and transfer knowledge from one domain to
| another.
|
| I think people who emphasis specialized models are
| operating under a false assumption that by focusing the
| model it'll be able to go deeper in that domain. However,
| the opposite seems to be true.
|
| Granted, specialized models like AlphaFold are superior
| in their domain but I think that'll be less true as
| models become more capable at general learning.
| Keyframe wrote:
| One thing I've noticed, is the more context and more
| precise the context I give it the "smarter" it is. There
| are limits to it of course. But, I cannot help but think
| that's where next barrier will be brought down. An agent or
| multiple of that tag along with everything I do throughout
| the day to have the full context. That way, I'll get
| smarter and more to the point help as well as not spending
| much time explaining the context.. but, that will open a
| dark can that I'm not sure people will want to open -
| having an AI track everything you do all the time (even if
| only in certain contexts like business hours / env).
| RupertEisenhart wrote:
| Its faster, smarter and cheaper over the API. Better than a
| kick in the teeth.
| abdullin wrote:
| I have a few LLM benchmarks that were extracted from real
| products.
|
| GPT-4o got slightly better overall. Ability to reason
| improved more than the rest.
| aaroninsf wrote:
| Absolutely agree.
|
| This model isn't about basemark chasing or being a better
| code generator; it's entirely explicitly focused on pushing
| prior results into the frame of multi-modal interaction.
|
| It's still a WIP, most of the videos show awkwardness where
| its capacity to understand the "flow" of human speech is
| still vestigial. It doesn't understand how humans pause and
| give one another space for such pauses yet.
|
| But it has some indeed magic ability to share a deictic frame
| of reference.
|
| I have been waiting for this specific advance, because it is
| going to significantly quiet the "stochastic parrot" line of
| wilfully-myopic criticism.
|
| It is very hard to make blustery claims about "glorified
| Markov token generation" when using language in a way that
| requires both a shared world model and an understanding of
| interlocutor intent, focus, etc.
|
| This is edging closer to the moment when it becomes very hard
| to argue that system does not have some form of self-model
| and a world model within which self, other, and other objects
| and environments exist with inferred and explicit
| relationships.
|
| This is just the beginning. It will be very interesting to
| see how strong its current abilities are in this domain; it's
| one thing to have object classification--another thing
| entirely to infer "scripts plans goals..." and things like
| intent, and, deixis. E.g. how well does it now understand
| "us" and "them" and "this" vs "that"?
|
| Exciting times. Scary times. Yee hawwwww.
| nicklecompte wrote:
| What part of this makes you think GPT-4 suddenly developed
| a world model? I find this comment ridiculous and bizarre.
| Do you seriously think snappy response time + fake emotions
| is an indicator of intelligence? It seems like you are just
| getting excited and throwing out a bunch of words without
| even pretending to explain yourself:
|
| > using language in a way that requires both a shared world
| model
|
| Where? What example of GPT-4o _requires_ a shared world
| model? The customer support example?
|
| The reason GPT-4 does not have any meaningful world model
| (in the sense that rats have meaningful world models) is
| that it freely believes contradictory facts without being
| confused, freely confabulates without having brain damage,
| and it has no real understanding of quantity or causality.
| Nothing in GPT-4o fixes that, and gpt2-chatbot certainly
| had the same problems with hallucinations and failing the
| same pigeon-level math problems that all other GPTs fail.
| og_kalu wrote:
| One of the most interesting things about the advent of
| LLMs is people bringing out all sorts of "reasons" GPT
| doesn't have true 'insert property' but all those reasons
| freely occur in humans as well
|
| >that it freely believes contradictory facts without
| being confused,
|
| Humans do this. You do this. I guess you don't have a
| meaningful world model.
|
| >freely confabulates without having brain damage
|
| Humans do this
|
| >and it has no real understanding of quantity or
| causality.
|
| Well this one is just wrong.
| spuz wrote:
| I agree. The interesting lesson I take from the seemingly
| strong capabilities of LLMs is not how smart they are but
| how dumb we are. I don't think LLMs are anywhere near as
| smart as humans yet, but it feels each new advance is
| bringing the finish line closer rather than the other way
| round.
| shrimp_emoji wrote:
| Moravec's paradox states that, for AI, the hard stuff is
| easiest and the easy stuff is hardest. But there's no
| easy or hard; there's only what the network was trained
| to do.
|
| The stuff that comes easy to us, like navigating 3D
| space, was trained by billions of years of evolution. The
| hard stuff, like language and calculus, is new stuff
| we've only recently become capable of, seemingly by
| evolutionary accident, and aren't very naturally good at.
| We need rigorous academic training at it that's rarely
| very successful (there's only so many people with the
| random brain creases to be a von Neumann or Einstein), so
| we're impressed by it.
| HeatrayEnjoyer wrote:
| So many even here on HN have a near-religious belief that
| intelligence is unique to humans and animals, and somehow
| a fundamental phenomenon that cannot ever be created
| using other materials.
| dvaun wrote:
| It's a defensive response to an emerging threat to
| stability and current social tiers.
| joquarky wrote:
| It reminds me of the geocentric mindset.
| pests wrote:
| >>and it has no real understanding of quantity or
| causality.
|
| >Well this one is just wrong.
|
| Is it?
|
| --
|
| Me: how many characters are in: https://google.com
|
| ChatGPT: The URL "https://google.com" has 12 characters,
| including the letters, dots, and slashes.
|
| --
|
| What is it counting there? 12 is wrong no matter how you
| dice that up.
|
| Part of the reason is it has no concept of the actual
| string. That URL breaks into four different tokens in 3.5
| and 4: "http", "://", "google" and ".com".
|
| Its not able to figure out the total length, or even the
| length of its parts and add them together.
|
| I ask it to double check, it tells me 13 and then 14. I
| tell it the answer and suddenly its able...
|
| ---
|
| Me: I think its 18
|
| ChatGPT: Let's recount together:
|
| "https://" has 8 characters. "google" has 6 characters.
| ".com" has 4 characters. Adding these up gives a total of
| 8 + 6 + 4 = 18 characters. You're correct! My apologies
| for the oversight earlier.
|
| ---
|
| Count me out.
| og_kalu wrote:
| It seems you're already aware LLMs receive tokens not
| words.
|
| Does a blind man not understand quantity because you
| asked him how many apples are in front of him and he
| failed ?
| pixl97 wrote:
| I'd counter by pasting a picture of an emoji here, but HN
| doesn't allow that, as a means to show the confusion that
| can be caused by characters versus symbols.
|
| Most LLMs can just pass the string to an tool to count it
| to bypass it's built in limitations.
| wcoenen wrote:
| LLMs process text, but only after it was converted to a
| stream of tokens. As a result, LLMs are not very good at
| answering questions about letters in the text. That
| information was lost during the tokenization.
|
| Humans process photons, but only after converting them
| into nerve impulses via photoreceptor cells in the
| retina, which are sensitive to wavelengths ranges
| described as "red", "green" or "blue".
|
| As a result, humans are not very good at distinguishing
| different spectra that happen to result in the same nerve
| impulses. That information was lost by the conversion
| from photons to nerve impulses. Sensors like the AS7341
| that have more than 3 color channels are much better at
| this task.
| jameshart wrote:
| How much of your own sense of quantity is visual, do you
| think? How much of your ability to count the lengths of
| words depends on your ability to sound them out and
| spell?
|
| I suspect we might find that adding in the multimodal
| visual and audio aspects to the model gives these models
| a much better basis for mental arithmetic and counting.
| orangecat wrote:
| _That URL breaks into four different tokens in 3.5 and 4:
| "http", "://", "google" and ".com"._
|
| Except that "http" should be "https". Silly humans,
| claiming to be intelligent when they can't even tokenize
| strings correctly.
| davidham wrote:
| Its first answer of 12 is correct, there are 12 _unique_
| characters in https://google.com.
| vel0city wrote:
| The unique characters are:
|
| h t p s : / g o l e . c m
|
| There are 13 unique characters.
| davidham wrote:
| OK neither GPT-4o nor myself is great at counting
| apparently
| kenjackson wrote:
| If someone found a way to put an actual human brain into
| SW, but no one knew it was a real human brain -- I'm
| certain most of HN would claim it wasn't AGI. "Kind of
| sucks at math", "Knows weird facts about Tik Tok
| celebrities, but nothing about world events", "Makes lots
| of grammar mistakes", "scores poorly on most standardized
| tests, except for one area that he seems to well", and
| "not very creative".
| goatlover wrote:
| What is a human brain without the rest of it's body?
| Humans aren't brains. Our nervous systems aren't just the
| brain either.
| kenjackson wrote:
| It's meant to explore a point. Unless your point is that
| AGI can only exist with a human body too.
| chasd00 wrote:
| i don't think making the same mistakes as a human counts
| as a feature. I see that a lot when people point out a
| flaw with an llm, the response is always "well a human
| would make the same mistake!". That's not much of an
| excuse, computers exist because they do the things humans
| can't do very well like following long repetitive lists
| of instructions. Further, upthread, there's discussion
| about adding emotions to an llm. An emotional computer
| that makes mistakes sometimes is pretty worthless as a
| "computer".
| og_kalu wrote:
| It's not about counting as a feature. It's the blatant
| logical fallacy. If a trait isn't a reason humans don't
| have a certain property then it's not a reason for
| machines either. Can't eat your cake and have it.
|
| >That's not much of an excuse, computers exist because
| they do the things humans can't do very well like
| following long repetitive lists of instructions.
|
| Computers exist because they are useful, nothing more and
| nothing less. If they were useful in a completely
| different way, they would still exist and be used.
| goatlover wrote:
| It's objectively true that LLMs do not have bodies. To
| the extent general intelligence relies on being emobodied
| (allowing you to manipulate the world and learn from
| that), it's a legitimate thing to point out.
| snthpy wrote:
| Hectic!
|
| Thanks for this.
| OJFord wrote:
| I assume (because they don't address it or look at all
| phased) the audio cutting in and out is just an artefact of
| the stream?
| throwthrowuknow wrote:
| Haven't tried it but from work I've done on voice
| interaction this happens a lot when you have a big audience
| making noise. The interruption feature will likely have
| difficulty in noisy environments.
| OJFord wrote:
| Yeah that was actually my first thought (though no
| professional experience with it/on that side) - it's just
| that the commenter I replied to was so hyped about it and
| how fluid & natural it was and I thought that made it
| really jarr.
| mvdtnz wrote:
| Interesting that they decided to keep the horrible ChatGPT
| tone ("wow you're doing a live demo right now?!"). It comes
| across just so much worse in voice. I don't need my "AI"
| speaking to me like I'm a toddler.
| yieldcrv wrote:
| tell it to speak to you differently
|
| with a GPT you can modify the system prompt
| maest wrote:
| It still refuses to go outside the deeply sanitise tone
| that "alignment" enforces on you.
| marvin wrote:
| One of the linked demos is it being sarcastic, so maybe you
| can make it remember to be a little more edgy.
| slibhb wrote:
| You can tell it not to talk like this using custom prompts.
| practice9 wrote:
| It is cringe overenthusiastic, but a proper
| instructions/system prompt will fix that mostly
| throwthrowuknow wrote:
| Did you miss the part where they simply asked it to change
| its manner of speaking and the amount of emotion it used?
| baumgarn wrote:
| it should be possible to imitate any voice you want like
| your actual parents soon enough
| goatlover wrote:
| That won't be Black Mirror levels of creepy /s
| ChuckMcM wrote:
| I expect the really solid use case here will be voice
| interfaces to applications that don't suck. Something I am
| still surprised at is that vendors like Apple have yet to
| allow me to train the voice to text model so that it _only_
| responds to me and not someone else.
|
| So local modelling (completely offline but per speaker aware
| and responsive), with a really flexible application API. Sort
| of the GTK or QT equivalent for voice interactions. Also
| custom naming, so instead of "Hey Siri" or "Hey Google" I
| could say, "Hey idiot" :-)
|
| Definitely some interesting tech here.
| spaceman_2020 wrote:
| This is going straight into 'Her' territory
| clhodapp wrote:
| Call me overly paranoid/skeptical, but I'm not convinced that
| this isn't a human reading (and embellishing) a script. The
| "AI" responses in the script may well have actually been
| generated by their LLM, providing a defense against it being
| fully fake, but I'm just not buying some of these "AI"
| voices.
|
| We'll have to see when end users actually get access to the
| voice features "in the coming weeks".
| dragonwriter wrote:
| > As many highlighted there, the model is not an improvement
| like GPT3->GPT4.
|
| The improvements they seem to be hyping are in multimodality
| and speed (also price - half that of GPT-4 Turbo - though
| that's their choice and could be promotional, but I expect it's
| at least in part, like speed, a consequence of greater
| efficiency), not so much producing better output for the same
| pure-text inputs.
| kybercore wrote:
| the model scores 60 points higher in lmsys than the best gpt
| 4 turbo model from april, that's still a pretty significant
| jump in text capability
| aixpert wrote:
| useless anecdata but I find the new model very frustrating,
| often completely ignoring what I say in follow up queries. it's
| giving me serious Siri vibes
|
| (text input in web version)
|
| maybe it's programmed to completely ignore swearing but how
| could I not swear after it gave me repeatedly info about
| you.com when I try to address it in second person
| lossolo wrote:
| I agree. I tried a few programming problems that, let's say,
| seem to be out of the distribution of their training data and
| which GPT4 failed to solve before. The model couldn't find a
| similar pattern and failed to solve them again. What's
| interesting is that one of these problems were solved by Opus,
| which seems to indicate that the majority of progress in the
| last months should be attributed to the quality/source of the
| training data.
| avereveard wrote:
| I tested a few use cases in the chat, and it's not particularly
| more intelligent but they seem to have solved laziness. I had
| to categorize my expenses to do some budgeting for the family,
| and in gpt 4 I had to go ten in ten, confirm the suggested
| category, download the file, took two days as I was constantly
| hitting the limit. gpt4o did most of the grunth work, then
| commincated anomalies in bulk, asked for suggestion for these,
| and provided a downloadable link in two answers, calling the
| code interpreter mulitple times, and working toward the goal on
| it's own.
|
| and the prompt wasn't a monstrosity, and it wasn't even that
| good, it was just one line "I need help to categorize these
| expenses" and off it went. hope it won't get enshittified like
| turbo, because this finally feels as great as 3.5 was for goal
| seeking.
| ozzydave wrote:
| Heh - I'm using ChatGPT for the same thing! Works 10X better
| than Rocket Money, which was supposed to be an improvement on
| Mint but meh.
| jameshart wrote:
| I think this comment is easily misread as implying that this
| GPT4o model is based on some old GPT2 chatbot - that's very
| much not what you meant to say, though.
|
| This model has been being tested under a code name of
| 'gpt2-chatbot' but it is very much a new GPT4+-level model,
| with new multimodal capabilities - but apparently some
| impressive work around inference speed.
|
| Highlighting so people don't get the impression this is just
| OpenAI slapping a new label on something a generation out of
| date.
| Jimmc414 wrote:
| Big questions are (1) when is this going to be rolled out to paid
| users? (2) what is the remaining benefit of being a paid user if
| this is rolled out to free users? (3) Biggest concern is will
| this degrade the paid experience since GPT-4 interactions are
| already rate limited. Does OpenAI have the hardware to handle
| this?
|
| Edit: according to @gdb this is coming in "weeks"
|
| https://twitter.com/gdb/status/1790074041614717210
| onemiketwelve wrote:
| thanks, I was confused because the top of the page says to try
| now when you cannot in fact try it at all
| freedomben wrote:
| I'm a ChatGPT Plus Subscriber and I just refreshed the page
| and it offered me the new model. I'm guessing they're rolling
| it out gradually but hopefull it won't take too long.
|
| Edit: It's also now available to me in the Android App
| whimsicalism wrote:
| i can try it now, but now the voice features i dont think
| zamadatix wrote:
| You can use GPT-4o now but the interactive voice mode of
| using it (as demoed today) releases in a few weeks.
| Tenoke wrote:
| >what is the remaining benefit of being a paid user if this is
| rolled out to free users?
|
| It says so right in the post
|
| >We are making GPT-4o available in the free tier, and to Plus
| users with up to 5x higher message limits
|
| The limits are much lower for free users.
| jrh3 wrote:
| I'm not convinced I need to keep paying for plus. The threshold
| of requests for free 4o is pretty high.
| dunkmaster wrote:
| This might mean GPT-5 is coming soon and it will only be
| available to paid users.
| dunkmaster wrote:
| Or they just made a bunch of money on their licensing deal
| with Apple. So they don't need to charge for ChatGPT anymore.
| spdif899 wrote:
| If it's going to be available via Siri this could make
| sense.
|
| It does make me wonder how such a relationship could impact
| progress. Would OpenAI feel limited from advancing in
| directions that don't align with the partnership? For
| example if they suddenly release a model better than what's
| in Siri, making Siri look bad.
| yieldcrv wrote:
| I'm actually thinking that the GPT store with more users
| might be better for them
|
| From my casual conversations, not that many people are paying
| for GPT4 or know why they should. Every conversation even in
| enthusiast forums like this one has to be interjected with
| "wait, are you using GPT4? because GPT3.5 the free one is
| pretty nerfed"
|
| just nuking that friction from orbit and expanding the GPT
| store volume could be a positive for them
| lxgr wrote:
| Will this include image generation for the free tier as well?
| That's a big missing feature in OpenAI's free tier compared to
| Google and Meta.
| dkarras wrote:
| is oai image generation any different than the microsoft
| copilot provides for free? I thought they were the same.
| OliverM wrote:
| This is impressive, but they just sound so _alien_, especially to
| this non-U.S. English speaker (to the point of being actively
| irritating to listen to). I guess picking up on social cues
| communicating this (rather than express instruction or feedback)
| is still some time away.
|
| It's still astonishing to consider what this demonstrates!
| w-m wrote:
| Gone are the days of copy-pasting to/from ChatGPT all the time,
| now you just share your screen. That's a fantastic feature, in
| how much friction that removes. But what an absolute privacy
| nightmare.
|
| With ChatGPT having a very simple text+attachment in, text out
| interface, I felt absolutely in control of what I tell it. Now
| when it's grabbing my screen or a live camera feed, that will be
| gone. And I'll still use it, because it's just so damn
| convenient?
| baby_souffle wrote:
| > Now when it's grabbing my screen or a live camera feed, that
| will be gone. And I'll still use it, because it's just so damn
| convenient?
|
| Presumably you'll have a way to draw a bounding box around what
| you want to show or limit to just a particular window the same
| way you can when doing a screen share w/ modern video
| conferencing?
| jawiggins wrote:
| I hope when this gets to my iphone I can use it to set two
| concurrent timers.
| mellosouls wrote:
| Very, very impressive for a "minor" release demo. The
| capabilities here would look shockingly advanced just 5 years
| ago.
|
| Universal translator, pair programmer, completely human sounding
| voice assistant and all in real time. Scifi tropes made real.
|
| But: Interesting next to see how it actually performs IRL latency
| and without cherry-picking. No snark, it was great but need to
| see real world power. Also what the benefits are to subscribers
| if all this is going to be free...
| llm_trw wrote:
| The capabilities here look shocking advanced yesterday.
| partiallypro wrote:
| A lot of the demo is very impressive, but some of it is just
| stuff that already exists but this is slightly more polished.
| Not really a huge leap for at least 60% of the demos.
| CooCooCaCha wrote:
| My guess is they're banking on the free version being rate
| limited and people finding it so useful that they want to
| remove the limit. Like giving a new user a discount on heroin.
| At least that's the strategy that would make most sense to me.
| rubidium wrote:
| I have the paid version and it's not connecting
| CooCooCaCha wrote:
| What does that have to do with what I said?
| yumraj wrote:
| In the first video the AI seems excessively chatty.
| hipadev23 wrote:
| chatGPT desperately needs a "get to the fucking point" mode.
| tomashubelbauer wrote:
| Seriously. I've had to spell out that it should just answer
| in twelve different ways with examples in the custom
| instructions to make it at least somewhat usable. And it
| still "forgets" sometimes.
| chatcode wrote:
| It does, that's "custom instructions".
| progbits wrote:
| Impressive demo, but like half the interactions were "hello"
| "hi how are you doing" "great thanks, what can I help you
| with" etc.
|
| The benchmark for human-computer interaction should be "tea,
| earl gray, hot", not awkward and pointless smalltalk.
| ativzzz wrote:
| "no yapping" in the prompt works very well
| jamilton wrote:
| Yeah, I would hope that custom instructions would help somewhat
| with that, but it is a point of annoyance for me too.
| mrandish wrote:
| Yes, it sounds like an awkwardly perky and over-chatty
| telemarketer that _really_ wants to be your friend. I find the
| tone maximally annoying and think most users will find it both
| stupid and creepy. Based on user preferences, I expect future
| interactive chat AIs will default to an engagement mode that 's
| optimized for accuracy and is both time-efficient and
| cognitively efficient for the user.
|
| I suspect this AI <-> Human engagement style will evolve over
| time to become quite unlike human to human engagement, probably
| mixing speech with short tones for standard responses like
| "understood", "will do", "standing by" or "need more input". In
| the future these old-time demo videos where an AI is forced to
| do a creepy caricature of an awkward, inauthentic human will be
| embarrassingly retro-cringe. _" Okay, let's do it!"_
| jdthedisciple wrote:
| I found i off-putting as well
|
| guess it's just biased with average Californian behavior and
| speech patterns
| TillE wrote:
| Reminds me of how Siri used to make jokes after setting a
| timer. Now it just reads back the time you specified, in a
| consistent way.
|
| It's a very impressive gimmick, but I really think most
| people don't want to interact with computers that way. Since
| Apple pulled that "feature" after a few years, it's probably
| not just a nerd thing.
| caseyy wrote:
| It is exceptionally creepy. It is an unnatural effort to
| appear pleasing, like the fawning response seen in serious
| abuse survivors.
| csjh wrote:
| I wonder if this is what the "gpt2-chatbot" that was going around
| earlier this month was
| lambdaba wrote:
| yes it was
| AndyNemmity wrote:
| it was
| peppertree wrote:
| Just like that Google is on back foot again.
| tempsy wrote:
| Considering the stock pumped following the presentation the
| market doesn't seem particularly with what OpenAI released at
| all.
| sebastiennight wrote:
| Anyone who watched the OpenAI livestream: did they "paste" the
| code after hitting CTRL+C ? Or did the desktop app just read from
| the clipboard?
|
| Edit: I'm asking because of the obvious data security
| implications of having your desktop app read from the clipboard
| _in the live demo_... That would definitely put a damper to my
| fanboyish enthusiasm about that desktop app.
| golol wrote:
| To me it looked they used one command that did copy+paste into
| ChatGPT both.
| dkarras wrote:
| macOS asks you to give permission for an application to read
| your clipboard. do other operating systems not have that?
| sn_master wrote:
| This is every romance scammer's dreams come true...
| summerlight wrote:
| This is really impressive engineering. I thought real time agents
| would completely change the way we're going to interact with
| large models but it would take 1~2 more years. I wonder what kind
| of new techs are developed to enable this, but OpenAI is fairly
| secretive so we won't be able to know their sauce.
|
| On the other hand, this also feels like a signal that reasoning
| capability has probably already been plateaued at GPT-4 level and
| OpenAI knew it so they decided to focus on research that matters
| to delivering product engineering rather than long-term research
| to unlock further general (super)intelligence.
| nopinsight wrote:
| Reliable agents in diverse domains need better reasoning
| ability and fewer hallucinations. If the rumored GPT-5 and Q*
| capabilities are true, such agents could become available soon
| after it's launched.
| summerlight wrote:
| Sam has been pretty clear on denying GPT-5 rumors, so I don't
| think it will come anytime soon.
| nopinsight wrote:
| Sam mentioned on several occasions that GPT-5 will be much
| smarter than GPT-4. On Lex Fridman's podcast, he even said
| the gap between GPT-5 and 4 will be as wide as GPT-4 and 3
| (not 3.5).
|
| He did remain silent on when it's going to be launched.
| valine wrote:
| OpenAI has been open about their ability to predict model
| performance prior to training. When Sam talks about GPT-5
| he could very easily be talking about the hypothetical
| performance of a model given their internal projections.
| I think it's very unlikely a fully trained GPT-5 exists
| yet.
| bigyikes wrote:
| Sam has stated that he knows the month GPT-5 will be
| released.
|
| Given the amount of time and uncertainty involved in
| training and red-teaming these models, we can assume
| GPT-5 exists if we take Altman at his word.
| Atotalnoob wrote:
| It's going to be launched this year. My buddy's company
| had a private demo of gpt5
| MVissers wrote:
| Why would reasoning have plateau'd?
|
| I think reasoning ability is not the largest bottleneck for
| improvement in usefulness right now. Cost is a bigger one IMO.
|
| Running these models as agents is hella expensive, and agents
| or agent-like recurrent reasoning (like humans do) is the key
| to improved performance if you look at any type of human
| intelligence.
|
| Single-shot performance only gets you so far.
|
| For example- If it can write code 90% of the way, and then
| debug in a loop, it'd be much more performant than any single
| shot algorithm.
|
| And OpenAI has these huge models in their basement probably.
| But they might not be much more useful than GPT-4 when used as
| single-shot. I mean, what could it do what we can't do today
| with gpt-4?
|
| It's agents and recurrent reasoning we need for more
| usefulness.
|
| At least- That's my humble opinion as an amateur neuroscientist
| that plays around with these models.
| Jensson wrote:
| > Running these models as agents is hella expensive
|
| Because they are dumb so you need to over compute so many
| things to get anything useful. Smarter models would solve
| this problem. Making the current model cheaper is like trying
| to solve Go by scaling up Deep Blue, it doesn't work to just
| hardcode dumb pieces together, the model needs to get
| smarter.
| cchance wrote:
| You mean like our dumb ass brains? Theirs a reason "saying
| the first thing out of your mind" is a bad fucking idea,
| thats what AI's currently do, they don't take a moment
| think about the answer and then formulate a response, they
| spit out their first "thought" thats why multi-shot works
| so much better, just like our own dumb brains.
| Jensson wrote:
| My brain can navigate a computer interface without using
| word tokens, since I have tokens for navigating OS and
| browsers and tabs etc. That way I don't have to read a
| million tokens of text to figure out where buttons are or
| how to navigate to places, since my brain is smart enough
| to not use words for it.
|
| ChatGPT doesn't have that sort of thing currently, and
| until it does it will always be really bad at that sort
| of thing.
|
| You are using a hand to hammer a nail, that will never go
| well, the solution isn't to use more hands the solution
| is to wield a hammer.
| cchance wrote:
| WTF are you even talking about, we're talking about
| understanding and communication not taking actions,
| navigating an OS and browser, tabs etc are actions, not
| thoughts or communication. This model isn't taking
| actions there is no nail to hammer lol, and if their was
| you'd be smashing a brain into a nail for some reason.
| Jensson wrote:
| The topic is agents, the AI acting on your behalf, that
| needs more than text. What are you talking about?
| CuriouslyC wrote:
| This isn't really new tech, it's just an async agent in front
| of a multimodal model. It seems from the demo that the
| improvements have been in response latency and audio
| generation. Still, it looks like they're building a solid
| product, which has been their big issue so far.
| searealist wrote:
| No, audio is fed directly into the model. There is no text to
| speech transformer in front of it like there was with
| chatgpt-4.
| cchance wrote:
| Its 200-300ms for a multimodal response, thats REALLY a big
| step forward, especially given it's doing it with full voice
| response, not just text.
| cchance wrote:
| Ya so sad that OpenAI isn't more Open imagine if OpenAI was
| still sharing their thought processes and papers with the
| overall commity, really wish we saw collaborations between
| OpenAI and Meta for instance to really have helped push the
| open source arena further ahead, i love that their latest
| models are so great but the fact they aren't helping the Open
| source arena to progress is sad. Imagine how far we'd be if
| OpenAI was still as open as they once were and we saw
| collaborations betweeen Meta, OpenAI and Anthropic all working
| and sharing growth and tech to reduce double work and help each
| other not go down failed paths.
| MBCook wrote:
| Why must every website put stupid stuff that floats above the
| content and can't be dismissed? It drives me nuts.
| dkga wrote:
| That can "reason"?
| MisterBiggs wrote:
| I've been waiting to see someone drop a desktop app like they
| showcased. I wonder how long until it is normal to have an AI
| looking at your screen the entire time your machine is unlocked.
| Answering contextual questions and maybe even interjecting if it
| notices you made a mistake and moved on.
| doomroot13 wrote:
| That seems to be what Microsoft is building and will reveal as
| a new Windows feature at BUILD '24. Not too sure about the
| interjecting aspect but ingesting everything you do on your
| machine so you can easily recall and search and ask questions,
| etc. AI Explorer is the rumored name and will possibly run
| locally on Qualcomm NPUs.
| ukuina wrote:
| Yes, this is Windows AI Explorer.
| layer8 wrote:
| This will be great for employee surveillance, to monitor how
| much you are really working.
| MisterBiggs wrote:
| I think even scarier is that ChatGPT's tone of voice and bias
| is going to take over everything.
| bredren wrote:
| It is notable OpenAI did not need to carefully rehearse the
| talking points of the speakers. Or even do the kind of careful
| production quality seen in a lot of other videos.
|
| The technology product is so good and so advanced it doesn't
| matter how the people appear.
|
| Zuck tried this in his video countering to vision pro, but it did
| not have the authentic "not really rehearsed or produced" feel of
| this at all. If you watch that video and compare it with this you
| can see the difference.
|
| Very interesting times.
| skepticATX wrote:
| Very impressive demo, but not really a step change in my opinion.
| The hype from OpenAI employees was on another level, way more
| than was warranted in my opinion.
|
| Ultimately, the promise of LLM proponents is that these models
| will get exponentially smarter - this hasn't born out yet. So
| from that perspective, this was a disappointing release.
|
| If anything, this feels like a rushed release to match what
| Google will be demoing tomorrow.
| altcognito wrote:
| GPT-4 expressing a human-like emotional response every single
| time you interact with it is pretty annoying.
|
| In general, trying to push that this is a human being is probably
| "unsafe", but that hurts the marketing.
| jonquark wrote:
| It might be region specific (I'm in the UK) - but I don't "see"
| the new model anywhere e.g. if I go to:
| https://platform.openai.com/playground/chat?models=gpt-4o The
| model the page uses is set to gpt-3.5-turbo-16k.
|
| I'm confused
| aw4y wrote:
| I don't see anything released today. Login/signup is still
| required, no signs of desktop app or free use on web. What am I
| missing?
| goalonetwo wrote:
| For all the hype around this announcement I was expecting more
| than some demo-level stuff that close to nobody will use in real
| life. Disappointing.
| sroussey wrote:
| Twice as fast and half the cost for the API sounds good to me.
| Not a demoable thing though.
| asteroidz wrote:
| Why are you so confident that nobody will use this in real
| life? I know OpenAI showed only a few demos, but I can see huge
| potential.
| mellosouls wrote:
| @sama reflects:
|
| https://blog.samaltman.com/gpt-4o
| 101008 wrote:
| Are the employees in the demo high-directives of OpenAI? I can
| understand Altman being happy with this progress, but what about
| the medium/low employees? Didn't they watch Oppenheimer? Are they
| happy they are destroying humanity/work/etc for future and not-
| so-future generations?
|
| Anyone who thinks this will be like the previous work revolutions
| is nonsense. This replaces humans and will replace them even more
| on each new advance. What's their plan? Live out of their
| savings? What about family/friends? I honestly can't see this and
| think how they can be so happy about it...
|
| "Hey, we created something very powerful that will do your work
| for free! And it does it better than you and faster than you! Who
| are you? It doesn't matter, it applies to all of you!"
|
| And considering I was thinking in having a kid next year, well,
| this is a no.
| galdosdi wrote:
| Have a kid anyway, if you otherwise really felt driven to it.
| Reading the tealeaves in the news is a dumb reason to change
| decisions like that. There's always some disaster looming,
| always has been. If you raise them well they'll adapt well to
| whatever weird future they inherit and be amongst the ones who
| help others get through it
| 101008 wrote:
| Thanks for taking the time to answer instead of (just)
| downvoting. I understand your logic but I don't see a future
| where people can adapt to this and get through it. I honestly
| see a future so dark and we'll be there much sooner than we
| thought... when OpenAI released their first model people were
| talking about years before seeing real changes and look what
| happened. The advance is exponential...
| ninininino wrote:
| > a future where people can adapt to this and get through
| it
|
| there are people alive today who quite literally are
| descendants of humans born in WW2 concentration camps. some
| percentage of those people are probably quite happy and
| glad they have been given a chance at life. of course, if
| their ancestors had chosen not to procreate they wouldn't
| be disappointed, they'd just simply never have come into
| existence.
|
| but it's absolutely the case that there's almost always a
| _chance_ at survival and future prosperity, even if things
| feel unimaginably bleak.
| nice_byte wrote:
| "It is difficult to get a man to understand something when his
| salary depends on his not understanding it."
| karaterobot wrote:
| That first demo video was impressive, but then it ended very
| abruptly. It made me wonder if the next response was not as good
| as the prior ones.
| dclowd9901 wrote:
| Extremely impressive -- hopefully there will be an option to
| color all responses with a underlying brevity. It seemed like
| the AI just kept droning on and on.
| MP_1729 wrote:
| This thing continues to stress my skepticism for AI scaling laws
| and the broad AI semiconductor capex spending.
|
| 1- OpenAI is still working in GPT-4-level models. More than 14
| months after the launch of GPT-4 and after more than $10B in
| capital raised. 2- The rhythm that token prices are collapsing is
| bizarre. Now a (bit) better model for 50% of the price. How
| people seriously expect these foundational model companies to
| make substantial revenue? Token volume needs to double just for
| revenue to stand still. Since GPT-4 launch, token prices are
| falling 84% per year!! Good for mankind, but crazy for these
| companies. 3- Maybe I am an asshole, but where are my agents? I
| mean, good for the consumer use case. Let's hope the rumors that
| Apple is deploying ChatGPT with Siri are true, these features
| will help a lot. But I wanted agents! 4- These drop in costs are
| good for the environment! No reason to expect them to stop here.
| htrp wrote:
| Did we ever get confirmation that GPT 4 was a fresh training
| run vs increasingly complex training on more tokens on the base
| GPT3 models?
| saliagato wrote:
| gpt-4 was indeed trained on gpt-3 instruct series (davinci,
| specifically). gpt-4 was never a newly trained model
| whimsicalism wrote:
| what are you talking about? you are wrong, for the record
| fooker wrote:
| They have pretty much admitted that GPT4 is a bunch of
| 3.5s in a trenchcoat.
| whimsicalism wrote:
| They have not. You probably read "MoE" and some pop
| article about what that means without having any clue.
| matsemann wrote:
| If you know better it would be nice of you to provide the
| correct information, and not just refute things.
| whimsicalism wrote:
| gpt-4 is a sparse MoE model with ~1.2T params. this is
| all public knowledge and immediately precludes the two
| previous commentators assertions
| ldjkfkdsjnv wrote:
| Yeah I'm also getting suspicious. Also, all of the models
| (opus, llama3, gpt4, gemini pro) are converging to similar
| levels of performance. If it was true that the scaling
| hypothesis was true, we would see a greater divergence of model
| performance
| bigyikes wrote:
| Plot model performance over the last 10 years and show me
| where the convergence is.
|
| The graph looks like an exponential and is still increasing.
|
| Every exponential is a sigmoid in disguise, but I don't think
| there has been enough time to say the curve has flattened.
| MP_1729 wrote:
| Two pushbacks.
|
| 1- The mania only started post Nov 22. And the huge
| investments since then didn't meant substantial progress
| since GPT-4 launch in March 22. 2- We are running out of
| high quality tokens in 2024. (per Epoch AI)
| dwaltrip wrote:
| GPT-4 launch was barely 1 year ago. Give the investments
| a few years to pay off.
|
| I've heard multiple reports that training runs costing
| ~$1 billion are in the the works at the major labs, and
| that the results will come in the next year or so. Let's
| see what that brings.
|
| As for the tokens, they will find more quality tokens.
| It's like oil or other raw resources. There are more
| sources out there if you keep searching.
| hehdhdjehehegwv wrote:
| This is why think Meta has been so shrewd in their "open" model
| approach. I can run Llama3-70B on my local workstation with an
| A6000, which after the up-front cost of the card, is just my
| electricity bill.
|
| So despite all the effort and cost that goes into these models,
| you still have to compete against a "free" offering.
|
| Meta doesn't sell an API, but they can make it harder for
| everybody else to make money on it.
| kmeisthax wrote:
| LLaMA still has an "IP hook" - the license for LLaMA forbids
| usage on applications with large numbers of daily active
| users, so presumably at that point Facebook can start asking
| for money to use the model.
|
| Whether or not that's actually enforceable[0], and whether or
| not other companies will actually challenge Facebook legal
| over it, is a different question.
|
| [0] AI might not be copyrightable. Under US law, copyright
| only accrues in _creative_ works. The weights of an AI model
| are a compressed representation of training data. Compressing
| something isn 't a creative process so it creates no
| additional copyright; so the only way one can gain ownership
| of the model weights is to own the training data that gets
| put into them. And most if not all AI companies are not
| making their own training data...
| lolinder wrote:
| > LLaMA still has an "IP hook" - the license for LLaMA
| forbids usage on applications with large numbers of daily
| active users, so presumably at that point Facebook can
| start asking for money to use the model.
|
| No, the license prohibits usage by Licensees who already
| had >700m MAUs on the day of Llama 3's release [0]. There's
| no hook to stop a company from growing into that size using
| Llama 3 as a base.
|
| [0] https://llama.meta.com/llama3/license/
| Salgat wrote:
| The whole point is that the license specifically targets
| their competitors while allowing everyone else so that
| their model gets a bunch of free contributions from the
| open source community. They gave a set date so that they
| knew exactly who the license was going to affect
| indefinitely. They don't care about future companies
| because by the time the next generation releases, they
| can adjust the license again.
| lolinder wrote:
| Yes, I agree with everything you just said. That also
| contradicts what OP said:
|
| > LLaMA still has an "IP hook" - the license for LLaMA
| forbids usage on applications with large numbers of daily
| active users, so presumably at that point Facebook can
| start asking for money to use the model.
|
| The license does _not_ forbid usage on applications with
| large numbers of daily active users. It forbids usage by
| companies that were operating at a scale to compete with
| Facebook at the time of the model 's release.
|
| > They don't care about future companies because by the
| time the next generation releases, they can adjust the
| license again.
|
| Yes, but I'm skeptical that that's something a regular
| business needs to worry about. If you use Llama 3/4/5 to
| get to that scale then you are in a place where you can
| train your own instead of using Llama 4/5/6. Not a bad
| deal given that 700 million users per month is completely
| unachievable for most companies.
| spacebanana7 wrote:
| Sam Altman gave the impression that foundation models would be
| a commodity on his appearance in the All in Podcast, at least
| in my read of what he said.
|
| The revenue will likely come from application layer and
| platform services. ChatGPT is still much better tuned for
| conversation than anything else in my subjective experience and
| I'm paying premium because of that.
|
| Alternatively it could be like search - where between having a
| slightly better model and getting Apple to make you the
| default, there's an ad market to be tapped.
| hn_throwaway_99 wrote:
| I'm ceaselessly amazed at people's capacity for impatience. I
| mean, when GPT 4 came out, I was like "holy f, this is magic!!"
| How quickly we get used to that magic and demand more.
|
| Especially since this demo is _extremely_ impressive given the
| voice capabilities, yet still the reaction is, essentially,
| "But what about AGI??!!" Seriously, take a breather. Never
| before in my entire career have I seen technology advance at
| such a breakneck speed - don't forget transformers were only
| _invented_ 7 years ago. So yes, there will be some ups and
| downs, but I couldn 't help but laugh at the thought that "14
| months" is seen as a long time...
| belter wrote:
| Chair in the sky again...
| hn_throwaway_99 wrote:
| Hah, was thinking of that exact bit when I wrote my
| comment. My version of "chair in the sky" is "But you are
| talking ... to a computer!!" Like remember stuff that was
| pure Star Trek fantasy until very recently? I'm sitting
| here with my mind blown, while at the same time reading
| comments along the lines of "How lame, I asked it some
| insanely esoteric question about one of the characters in
| Dwarf Fortress and it totally got it wrong!!"
| layer8 wrote:
| The AI doesn't behave like the computer in Star Trek,
| however. The way in which it is a different thing is what
| people don't like.
| belter wrote:
| They should have used superior Klingon Technology...
| bamboozled wrote:
| You just be new here?
| tsunamifury wrote:
| It's pretty bizarre how these demos bring out keyboard
| warriors and cereal bowl yellers like crazy. Huge
| breakthroughs in natural cadence, tone and interaction as
| well as realtime mutlimodal and all the people on HN can rant
| about is token price collapse
|
| It's like the people in this community all suffer from a
| complete disconnect from society and normal human
| needs/wants/demands.
| ThrowawayTestr wrote:
| > How quickly we get used to that magic and demand more.
|
| Humanity in a nutshell.
| seydor wrote:
| We re just logarithmic creatures
| layer8 wrote:
| I'd say we are derivative creatures. ;)
| MP_1729 wrote:
| I am just talking about scaling laws and the level of capex
| that big tech companies are doing. One hundred billion
| dollars are being invested this year to pursue AI scaling
| laws.
|
| You can be excited, as I am, while also being bearish, as I
| am.
| hn_throwaway_99 wrote:
| If you look at the history of big technological
| breakthroughs, there is _always_ an explosion of companies
| and money invested in the "new hotness" before things
| shake out and settle. Usually the vast majority of these
| companies go bankrupt, but that infrastructure spend sets
| up the ecosystem for growth going forward. Some examples:
|
| 1. Railroad companies in the second half of the 19th
| century.
|
| 2. Car companies in the early 20th century.
|
| 3. Telecom companies and investment in the 90s and early
| 2000s.
| spiderfarmer wrote:
| Comments like yours contribute to the negative perception
| of Hacker News as a place where launching anything, no
| matter how great, innovative, smart, informative, usable,
| or admirable, is met with unreasonable criticism. Finding
| an angle to voice your critique doesn't automatically make
| it insightful.
| MP_1729 wrote:
| I am sure that people at OpenAI, particularly former YC
| CEO Sam Altman, will be fine, even if they read the bad
| stuff MP_1729 says around here.
| candiddevmike wrote:
| What is unreasonable about that comment?
| barrell wrote:
| Well, I for one am excited about this update, and
| skeptical about the AI scaling, and agree with everything
| said in the top comment.
|
| I saw the update, was a little like "meh," and was
| relieved to see that some people had the same reaction as
| me.
|
| OP raised some pretty good points without directly
| criticizing the update. It's a good balance the the top
| comments (calling this * _absolutely magic and stunning*_
| ) and all of Twitter
|
| I wish more feedback on HN was like OPs
| layer8 wrote:
| It's reasonable criticism, and more useful than all the
| hype.
| ertgbnm wrote:
| Over a year they have provided an order of magnitude
| improvements on latency, context length, and cost, while
| meaningfully improving performance and adding several input
| and output modalities.
| asadotzler wrote:
| Your order of magnitude claim is off by almost an order of
| magnitude. It's more like half again as good on a couple of
| items and the same on the rest. 10X improvement claims is a
| joke people making claims like that ought to be dismissed
| as jokes too.
| ertgbnm wrote:
| $30 / million tokens to $5 / million tokens since GPT-4
| original release = 6X improvement
|
| 4000 token context to 128k token context = 32X
| improvement
|
| 5.4 second voice mode latency to 320 milliseconds = 16X
| improvement.
|
| I guess I got a bit excited by including cost but that's
| close enough to an order of magnitude for me. That's
| ignoring the fact that's it's now literally free in
| chatGPT.
| hn_throwaway_99 wrote:
| Thanks so much for posting this. The increased token
| length alone (obviously not just with OpenAI's models but
| the other big ones as well) has opened up a huge number
| of new use cases that I've seen tons of people and other
| startups pounce on.
| jononor wrote:
| All while not addressing the rampant confabulation at all.
| Which is the main pain point, to me at least. Not being
| able to trust a single word that it says...
| financypants wrote:
| There are well talked about cons to shipping so fast, but on
| the bright side, when everyone is demanding more, more, more,
| it pushes cost down and demands innovation, right?
| 015a wrote:
| Peoples' "capacity for impatience" is _literally_ the reason
| why these things move so quick. These are not feelings at-
| odds with each other; they 're the same thing. Its magical;
| now its boring; where's the magic; let's create more magic.
|
| Be impatient. Its a positive feeling, not a negative one. Be
| disappointed with the current progress; its the biggest thing
| keeping progress moving forward. It also, if nothing else,
| helps communicate to OpenAI whether they're moving in the
| right direction.
| idopmstuff wrote:
| > Be disappointed with the current progress; its the
| biggest thing keeping progress moving forward.
|
| No it isn't - excitement for the future is the biggest
| thing keeping progress moving forward. We didn't go to the
| moon because people were frustrated by the lack of progress
| in getting off of our planet, nor did we get electric cars
| because people were disappointed with ICE vehicles.
|
| Complacency regarding the current state of things can
| certainly slow or block progress, but impatience isn't what
| drives forward the things that matter.
| 015a wrote:
| Tesla's corporate motto is literally "accelerating the
| world's transition to sustainable energy". Unhappy with
| the world's previous progress and velocity, they aimed to
| move faster.
| fnordpiglet wrote:
| IMO, for fear of being label a hype boy, this is absolutely a
| sign of the impending singularity. We are taking an ever
| accelerating frame of cultural reference as a given and our
| expectation is that exponential improvement is not just here
| but you're already behind once you've released.
|
| I spend the last two years dismayed with the reaction but
| I've just recently begun to realize this is a feature not a
| flaw. This is latent demand for the next iteration expressed
| as impatient dissatisfaction with the current rate of change
| inducing a faster rate of change. Welcome to the future you
| were promised.
| ineedaj0b wrote:
| I would disagree. I remember iPhones getting similarly
| criticized on here. And not iPhone 13 to 14, it was iPhone
| to iPhone 3g!
|
| The only time people weren't displeased was increasing
| internet speeds 15mb to 100mb.
|
| You will keep being dismayed! People only like good things,
| not good things that potentially make them obsolete
| laweijfmvo wrote:
| Sounds like the Jeopardy answer for "What is a novelty?"
| spaceman_2020 wrote:
| People fume and fret about startups wasting capital like it
| was their own money.
|
| GPT and all the other chatbots are still absolutely magic.
| The idea that I can get a computer to create a fully
| functional app is insane.
|
| Will this app make me millions and run a business? Probably
| not. Does it do what I want it to do? Mostly yes.
| IanCal wrote:
| Tbf gpt4 level seems useful and better than almost everything
| else (or close if not). The more important barriers for use in
| applications have been cost, throughout and latency. Oh and
| modalities, which have expanded hugely.
| adtac wrote:
| >Token volume needs to double just for revenue to stand still
|
| Profits are the real metric. Token volume doesn't need to
| double for profits to stand still if operational costs go down.
| mrkramer wrote:
| >This thing continues to stress my skepticism for AI scaling
| laws and the broad AI semiconductor capex spending.
|
| Imagine you are in 1970s and saying computers suck, they are
| expensive, there is not that many use cases....fast forward to
| 90s and you are using Windows 95 with GUI and chip
| astronomically more powerful that we had in 70s and you can use
| productivity apps , play video games and surf Internet.
|
| Give AI time, it will fulfill its true protentional sooner or
| later.
| MP_1729 wrote:
| That's the opposite of what I am saying.
|
| What I am saying is that computers are SO GOOD that AI is
| getting VERY CHEAP and the amount of computing capex being
| done is excessive.
|
| It's more like you are in 1999, people are spending $100B in
| fiber, while a lot of computer scientists are working in
| compression, multiplexing, etc.
| jameshart wrote:
| Which of those investments are you saying would have been a
| poor choice in 1999?
| MP_1729 wrote:
| All of them, without exception. Just recently, Sprint
| sold their fiber business for $1 lmfao. Or WorldCom. Or
| NetRail, Allied Riser, PSINet, FNSI, Firstmark, Carrier
| 1, UFO Group, Global Access, Aleron Broadband, Verio...
|
| All fiber went bust because despite internet's huge
| increase in traffic, the amount of packets per fiber
| increased a handful of magnitudes.
| jameshart wrote:
| But you're saying investing in multiplexing and
| compression was also dumb?
| MP_1729 wrote:
| Nope, I'm not
| jameshart wrote:
| Then your overarching thesis is not very clear. Is it
| simply 'don't invest in hardware capital, software always
| makes it worthless'?
| mrkramer wrote:
| >It's more like you are in 1999, people are spending $100B
| in fiber, while a lot of computer scientists are working in
| compression, multiplexing, etc.
|
| But nobody knows what's around the corner and what future
| brings....for example back in day Excite didn't want to buy
| Google for $1m because they thought that's a lot of money.
| You need to spend money to make money and yea, you need to
| spend sometimes a lot of money on "crazy" projects because
| it can pay off big time.
| MP_1729 wrote:
| Was there ever a time when betting that computer
| scientists would not make better algorithms was a good
| idea?
| madeofpalk wrote:
| > Token volume needs to double just for revenue to stand still
|
| I'm pretty skeptical about all the whole LLM/AI hype, but I
| also believe that the market is still relatively untapped. I'm
| sure Apple switching Siri to an LLM would ~double token usage.
|
| A few products rushed out thin wrappers ontop of chatgpt ai,
| developing pretty uninspiring chat bots of limited use. I think
| there's still huge potential for this LLM technology to be
| 'just' an implementation detail of other features, just running
| in the background doing its thing.
|
| That said, I don't think OpenAI has much of a moat here. They
| were first, but there's plenty of others with closed or open
| models.
| drag0s wrote:
| what do you actually expect from an "agent"?
| MP_1729 wrote:
| Ask stuff like "Check whether there's some correlation
| between the major economies fiscal primary deficit and GDP
| growth in the post-pandemic era" and get an answer.
| Pr0ject217 wrote:
| "OpenAI is still working in GPT-4-level models."
|
| This may or may not be true - just because we haven't seen GPT-
| level-5 capabilities, does not mean that it does not yet exist.
| It is highly unlikely that what they ship is actually the full
| capability of what they have access to.
| MP_1729 wrote:
| they literally launched TODAY a GPT-4 model!
| bionhoward wrote:
| imho gpt4 is definitely [proto-]agi and the reason i cancelled
| my openai sub and am sad to miss out on talking to gpt4o is,
| openai thinks it's illegal, harmful, or abusive to use their
| model output to develop models that compete with openai. which
| means if you use openai then whatever comes out of it is toxic
| waste due to an arguably illegal smidgen of legal bullshit.
|
| for another adjacent example, every piece of code github
| copilot ever wrote, for example, is microsoft ai output, which
| you "can't use to develop / otherwise improve ai," some
| nonsense like that.
|
| the sum total of these various prohibitions is a data
| provenance nightmare of extreme proportion we cannot afford to
| ignore because you could say something to an AI and they parrot
| it right back to you and suddenly the megacorporation can say
| that's AI output you can't use in competition with them, and
| they do everything, so what can you do?
|
| answer: cancel your openai sub and shred everything you ever
| got from them, even if it was awesome or revolutionary, that's
| the truth here, you don't want their stuff and you don't want
| them to have your stuff. think about the multi-decade economics
| of it all and realize "customer noncompete" is never gonna be
| OK in the long run (highway to corpo hell imho)
| fnordpiglet wrote:
| Where I work in the hoary fringes of high end tech we can't
| secure enough token processing for our use cases. Token price
| decreases means opening of capacity but we immediately hit the
| boundaries of what we can acquire. We can't keep up with the
| use cases - but more than that we can't develop tooling to
| harness things fast enough and the tooling we are creating is a
| quick hack. I don't fear for the revenue of base model
| providers. But I think in the end the person selling the tools
| makes the most and in this case I think it continue to be cloud
| providers. I think in a very real way OpenAI and Anthropic are
| commercialized charities driving change and commoditizing
| rapidly their own products and it'll be infrastructure
| providers who win the high end model game. I don't think this
| is a problem I think this is in fact inline with their original
| charters but a different path than most people view nonprofit
| work. A much more capitalist and accelerated take.
|
| Where they might make future businesses is in the tooling. My
| understanding from friends within these companies is their
| tooling is remarkably advanced vs generally available tech. But
| base models aren't the future of revenues (to be clear tho they
| make considerable revenue today but at some point their
| efficiency will cannibalize demand and the residual business
| will be tools)
| MP_1729 wrote:
| I'm curious now. Can you give color on what you're doing that
| you keep hitting boundaries? I suppose it isn't limited by
| human-attention.
| w10-1 wrote:
| > Since GPT-4 launch, token prices are falling 84% per year!!
| Good for mankind, but crazy for these companies
|
| The message to competitor investors is that they will not make
| their money back.
|
| OpenAI has the lead, in market and mindshare; it just has to
| keep it.
|
| Competitors should realize they're better served by working
| with OpenAI than by trying to replace it - Hence the Apple
| deal.
|
| Soon model construction itself will not be about public
| architectures or access to CPU's, but a kind of proprietary
| black magic. No one will pay for upstart 97% when they can get
| reliable 98% at the same price, so OpenAI's position will be
| secure.
| abrichr wrote:
| > where are my agents?
|
| https://github.com/OpenAdaptAI/OpenAdapt/
| golol wrote:
| GPT-2: February 2019
|
| GPT-3: June 2020
|
| GPT-3.5: November 2022
|
| GPT-4: March 2023
|
| There were 3 years between GPT-3 and GPT-4!
| whimsicalism wrote:
| hardly anybody you are talking to even knows what gpt3 is,
| the time between 3.5 and 4 is what is relevant
| golol wrote:
| It doesn't make any sense to look at it that way.
| Apparently the GPT base model finised training in like late
| summer 2022, which is before the release of GPT-3.5. I am
| pretty sure that GPT-3.5 should be thought of as
| GPT-4-lite, in the sense that it uses techniques and
| compute of the GPT-4 era rather than the GPT-3 era. The
| advancement from GPT-3 to GPT-4 is what counts and it took
| 3 years.
| whimsicalism wrote:
| I fully don't agree.
|
| > I am pretty sure that GPT-3.5 should be thought of as
| GPT-4-lite, in the sense that it uses techniques and
| compute of the GPT-4 era rather than the GPT-3 era
|
| Compute of the "GPT-3 era" vs the "GPT-3.5 era" is
| identical, this is not a distinguishing factor. The
| architecture is also roughly identical, both are dense
| transformers. The _only_ significant difference between
| 3.5 and 3 is the size of the model and whether it uses
| RLHF.
| golol wrote:
| Yes you're right about the compute. Let me try to make my
| point differnetly: GPT-3 and GPT-4 were models which when
| they were released represented the best that OpenAI could
| do, while GPT-3.5 was an intentionally smaller (than they
| could train) model. I'm seeing it as GPT-3.5 = GPT-4-70b.
| So to estimate when the next "best we can do" model might
| be released we should look at the difference between the
| release of GPT-3 and GPT-4, not GPT-4-70b and GPT-4.
| That's my understanding, dunno.
| whimsicalism wrote:
| GPT-4 only started training roughly at the same
| time/after the release of GPT-3.5, so I'm not sure where
| you're getting the "intentionally smaller".
| golol wrote:
| Ah I misremembered GPT-3.5 as being released around the
| time of ChatGPT.
| whimsicalism wrote:
| oh you remembered correctly, those are the same thing
|
| actually i was wrong about when gpt-4 started training,
| the time i gave was roughly when they finished
| MP_1729 wrote:
| Obviously, I know these timetables.
|
| But there's a light and day difference post-Nov22 than
| before. Both in the AI race it sparkled, but also in the
| funding all AI labs have.
|
| If you're expecting GPT-5 by 2026, that's ok. Just very weird
| to me.
| ugh123 wrote:
| >How people seriously expect these foundational model companies
| to make substantial revenue?
|
| My take on this common question is that we haven't even begun
| to realize the immense scale of which we will need AI in all
| sorts of products, from consumer to enterprise. We will look
| back on the cost of tokens now (even at 50% of price a year or
| so ago) and look at it with the same bewilderment of "having a
| computer in your pocket" compared to mainframes from 50 years
| ago.
|
| For AI to be truly useful at the consumer level, we'll need
| specialized mobile hardware that operates on a far greater
| scale of tokens and speed than anything we're seeing/trying
| now.
|
| Think "always-on AI" rather than "on-demand".
| siscia wrote:
| Now a bit of Shameless plug, but of you need an AI to take over
| your emails then my https://getgabrielai.com should cover most
| use cases.
|
| * Summarisation * Smart filtering * Smart automatic drafting of
| replies
|
| Very much in beta, and summarisation is still behind feature
| flag, but feel free to give it a try.
|
| For summarisation here I mean to get one email with all your
| unread emails summarised.
| joshstrange wrote:
| Looking forward to trying this via ChatGPT. As always OpenAI says
| "now available" but refreshing or logging in/out of ChatGPT (web
| and mobile) don't cause GPT-4o to show up. I don't know why I
| find this so frustrating. Probably because they don't say
| "rolling out" they say things like "try it now" but I can't even
| though I'm a paying customer. Oh well...
| glenstein wrote:
| I think it's a legitimate point. For my personal use case, what
| are the most helpful things about these HN threads is comparing
| with others to see how soon I can expect it to be available for
| me. Like you, I currently don't have access, but I understand
| that it's supposed to become increasingly available throughout
| the day.
|
| That is the text-based version. The full multimodal version I
| understand to be rolling out in the coming weeks.
| candiodari wrote:
| I wonder if the audio stuff works like ViTS. Do they just encode
| the audio as tokens and input the whole thing? Wouldn't that make
| the context size a lot smaller?
|
| One does notice that context size is noticeably absent from the
| announcement ...
| cs702 wrote:
| The usual critics will quickly point out that LLMs like GPT-4o
| still have a lot of failure modes and suffer from issues that
| remain unresolved. They will point out that we're reaping
| diminishing returns from Transformers. They will question the
| absence of a "GPT-5" model. And so on -- blah, blah, blah,
| stochastic parrots, blah, blah, blah.
|
| Ignore the critics. Watch the demos. Play with it.
|
| This stuff feels _magical_. Magical. It makes the movie "Her"
| look like it's no longer in the realm of science fiction but in
| the realm of incremental product development. HAL's unemotional
| monotone in Kubrick's movie, "Space Odyssey," feels... oddly
| primitive by comparison. I'm impressed at how well this works.
|
| _Well-deserved congratulations to everyone at OpenAI!_
| CamperBob2 wrote:
| Imagine what an unfettered model would be like. 'Ex Machina'
| would no longer be a software-engineering problem, but just
| another exercise in mechanical and electrical engineering.
|
| The future is indeed here... and it is, indeed, not equitably
| distributed.
| aftbit wrote:
| Or from Zones of Thought series, Applied Theology, the study
| of communication with and creation of superhuman
| intelligences that might as well be gods.
| aftbit wrote:
| >Who cares? This stuff feels magical. Magical!
|
| On one hand, I agree - we shouldn't diminish the very real
| capabilities of these models with tech skepticism. On the other
| hand, I disagree - I believe this approach is unlikely to lead
| to human-level AGI.
|
| Like so many things, the truth probably lies somewhere between
| the skeptical naysayers and the breathless fanboys.
| CamperBob2 wrote:
| _On the other hand, I disagree - I believe this approach is
| unlikely to lead to human-level AGI._
|
| You might not be fooled by a conversation with an agent like
| the one in the promo video, but you'd probably agree that
| somewhere around 80% of people could be. At what percentage
| would you say that it's good enough to be "human-level?"
| layer8 wrote:
| > You might not be fooled by a conversation with an agent
| like the one in the promo video, but you'd probably agree
| that somewhere around 80% of people could be.
|
| I think people will quickly learn with enough exposure, and
| then that percentage will go down.
| MVissers wrote:
| Nah- These models will improve faster than people can
| catch up. People or AI models can barely catch AI-created
| text. It's quickly becoming impossible to distinguish.
|
| The one you catch is the tip of the iceberg.
|
| Same will happen to speech. Might take a few years, but
| it'll be indistinguishable in a max a few years. Due to
| compute increase + model improvement, both improving
| exponentially.
| krainboltgreene wrote:
| > These models will improve faster than people can catch
| up.
|
| So that we're all clear the basis for this analysis is
| purely made up, yes?
| paulryanrogers wrote:
| How can we be so sure things will keep getting better?
| And at a rate faster than humans can adapt?
|
| If we have to damn rivers and build new coal plants to
| power these AI data centers, then it may be one step
| forward and two steps back.
| pixl97 wrote:
| No, instead something worse will happen.
|
| Well spoken and well mannered speakers will be called
| bots. The comment threads under posts will be hurtling
| insults back and forth on who's actually real. Half the
| comments will actually be bots doing it. Welcome to the
| dead internet.
| jfyi wrote:
| Right! This is absolutely apocalyptic! If more than half
| the people I argue with on internet forums are just bots
| that don't feel the sting and fail to sleep at night
| because of it, what even is the meaning of anything?
|
| We need to stop these hateful ai companies before they
| ruin society as a whole!
|
| Seriously though... the internet is dead already, and
| it's not coming back to what it was. We ruined it, not
| ai.
| thfuran wrote:
| The framing of the question admits only one reasonable
| answer: There is no such threshold. Fooling people into
| believing something doesn't make it so.
| CamperBob2 wrote:
| What criteria do you suggest, then?
|
| As has been suggested, the models will get better at a
| faster rate than humans will get smarter.
| pixl97 wrote:
| Most peoples interactions are transactional. When I call
| into a company and talk to an agent, and that agent
| solves the problem I have regardless of if the agent is a
| person or an AI, where did the fooling occur? The ability
| to problem solve based on context is intelligence.
| Vegenoid wrote:
| When people talk about human-level AGI, they are not
| referring to an AI that could pass as a human to most
| people - that is, they're not simply referring to a program
| that can pass the Turing test.
|
| They are referring to an AI that can use reasoning,
| deduction, logic, and abstraction like the smartest humans
| can, to discover, prove, and create novel things in every
| realm that humans can: math, physics, chemistry, biology,
| engineering, art, sociology, etc.
| micromacrofoot wrote:
| I'm not so sure, I think this is what's called "emergent
| behavior" -- we've found very interesting side effects of
| bringing together technologies. This might ultimately teach
| us more about intelligence than more reductionist approaches
| like scanning and mapping the brain.
| dongping wrote:
| On the other hand, it is very difficult to distinguish
| between "emergent behavior" and "somehow leaked into our
| large training set" for LLMs.
| layer8 wrote:
| > HAL's unemotional monotone in Kubrick's movie, "Space
| Odyssey," feels... primitive by comparison.
|
| I'd strongly prefer that though, along with HAL's reasoning
| abilities.
| moffkalast wrote:
| I would say a machine that thinks it feels emotions is less
| likely to throw you out of a spaceship. Human empathy already
| feels lacking compared to what something as basic as llama-3
| can do.
| layer8 wrote:
| What you say has nothing to do with how an AI speaks.
|
| To use another pop-culture reference, Obi-Wan in Episode IV
| had deep empathy, but didn't speak emotionally. Those are
| separate things.
| thfuran wrote:
| >I would say a machine that thinks it feels emotions is
| less likely to throw you out of a spaceship
|
| A lot of terrible human behavior is driven by emotions. An
| emotionless machine will never dump you out the airlock in
| a fit of rage.
| pixl97 wrote:
| Ah, I was tossed out of the airlock in a fit of logic...
| totally different!
| throwup238 wrote:
| The important part is that the machine explained its
| reasoning to you while purging the airlock.
| dimask wrote:
| In a chain of thought manner, as every proper AI, of
| course.
| satvikpendem wrote:
| > _I would say a machine that thinks it feels emotions is
| less likely to throw you out of a spaceship._
|
| Have you seen the final scene of the movie Ex Machina?
| Without spoilers, I'll just say that acting like has
| emotions is much more different than actually having them.
| This is in fact what socio- and psychopaths are like, with
| stereotypical results.
| elicksaur wrote:
| llama-3 can't feel empathy, so this is rather confusing
| comment.
| moffkalast wrote:
| Can you prove that you feel empathy? That you're not a
| cold unfeeling psychopath that is merely pretending
| extremely well to have emotions? Even if it did, we
| wouldn't be able to tell the difference from the outside,
| so in strictly practical terms I don't think it matters.
| elicksaur wrote:
| If I could prove that I feel empathy through a HN
| comment, I would be much more famous.
|
| I get your nuanced point, that "thinking" one feels
| empathy is enough to be bound by the norms of behavior
| that empathy would dictate, but I don't see why that
| would make AI "empathy" superior to human "empathy".
|
| The immediate future I see is a chatbot that is
| superficially extremely empathetic, but programmed never
| to go against the owner's interest. Where before, when
| interacting with a human, empathy could cause them to
| make an exception and act sacrificially in a crisis case,
| this chatbot would never be able to make such an
| exception because the empathy it displays is transparent.
| jll29 wrote:
| HAL has to sound exactly how Kubrick made it sound for the
| movie to work the way it should.
|
| There wasn't any incentive to make it sound artificially
| emotional or emphatic beyond a "Sorry, Dave".
| dragonwriter wrote:
| > This stuff feels magical. Magical.
|
| Because its capacities are focused on exactly the right place
| to feel magical. Which isn't to say that there isn't real
| utility, but language (written, and even moreso spoken) has an
| enormous emotional resonance for humans, so this is laser-
| targeted in an area where every advance is going to "feel
| magical" whether or not it moves the needle much on practical
| utility; it's not unlike the effect of TV news making you feel
| informed, even though time spent watching it negatively
| correlates with understanding of current events.
| BoorishBears wrote:
| You really think OpenAI has researchers figuring out how to
| drive emergent capabilities based on what markets well?
|
| Edit: Apparently not based on your clarification, instead the
| researchers don't know any better than to march into a local
| maxima because they're only human and seek to replicate
| themselves. I assumed too much good faith.
| dragonwriter wrote:
| I don't think the _intent_ matters, the _effect_ of its
| capacities being centered where they are is that they
| trigger certain human biases.
|
| (Arguably, it is the other way around: they aren't _focused
| on appealing to_ those biases, but _driven by them_ , in
| the that the perception of language modeling as a road to
| real general reasoning is a manifestation of the same bias
| which makes language capacity be perceived as magical.)
| BoorishBears wrote:
| Intent matters when you're being as dismissive as you
| were.
|
| Not to mention your comment doesn't track at all with the
| most basic findings they've shared: that adding new
| modalities increases performance across the board.
|
| They shared that with GPT-4 vs GPT-4V, and the fact this
| is a faster model than GPT-4V while rivaling it's
| performance seems like further confirmation of the fact.
|
| -
|
| It seems like you're assigning emotional biases of your
| own to pretty straightforward science.
| ToucanLoucan wrote:
| > Intent matters when you're being as dismissive as you
| were.
|
| The GP comment we're all replying to outlines a non-
| exhaustive list of _very good reasons_ to be highly
| dismissive of LLM. (No I 'm not calling it AI, it is not
| fucking AI)
|
| It is utterly laughable and infuriating that you're
| assigning legitimate skepticism about this technology as
| a an emotional bias. Fucking ridiculous. We're now almost
| a full year into the full bore open hype cycle of LLM.
| Where's all the LLM products? Where's the market
| penetration? Business can't use it because it has a nasty
| tendency to make shit up when it's talking. Various
| companies and individuals are being sued because
| generative art is stealing from artists. Code generators
| are hitting walls of usability so steep, you're better
| off just writing the damn code yourself.
|
| We keep hearing this "it will do!" "it's coming!" "just
| think of what it can do soon!" on and on and on, and it
| just keeps... not doing any of it. It keeps hallucinating
| untrue facts, it keeps getting basics of it's tasks
| wrong, for fucks sake AI Dungeon can't even remember if
| I'm in Hyrule or Night City. Progress seems fewer and
| farther between, with most advances being just getting
| the compute cost down, because NO business currently
| using LLM extensively could be profitable without
| generous donation of compute from large corporations like
| Microsoft.
| imwillofficial wrote:
| I mean when you're making a point about how your views
| should not be taken as emotional bias, it pays to not be
| overly emotional.
|
| The fact that you don't see utility doesn't mean it is
| not helpful to others.
|
| A recent example, I used Grok to write me an outline of a
| paper regarding military and civilian emergency response
| as part of a refresher class.
|
| To test it out we fed it scenario questions and saw how
| it compared to our classmates responses. All people with
| decades of emergency management experience.
|
| The results were shocking. It was able to successfully
| navigate a large scale emergency management problem and
| get it (mostly) right.
|
| I could see a not so distant future where we become QA
| checkers for our AI overlords.
| BoorishBears wrote:
| I didn't see any good reasons to be dismissive of LLMs, I
| saw a weak attempt at implying we're at a local maxima
| because scientists don't know better than to chase after
| what seems magical or special to them due to their bias
| as humans.
|
| It's not an especially insightful or sound argument imo,
| and neither are random complaints about capabilities of
| systems millions of people use daily despite your own
| claims.
|
| And for the record:
|
| > because NO business currently using LLM extensively
| could be profitable without generous donation of compute
| from large corporations like Microsoft
|
| OpenAI isn't the only provider of LLMs. Plenty of
| businesses are using providers that provide their
| services profitably, and I'm not convinced that OpenAI
| themselves are subsidising these capabilities as strongly
| as they once did.
| throwthrowuknow wrote:
| All that spilled ink don't change the fact that I use it
| every day and it makes everything faster and easier and
| more enjoyable. I'm absolutely chuffed to put my phone on
| a stand so GPT4o can see the page I'm writing on and chat
| with me about my notes or the book I'm reading and the
| occasional doodle. One of the first things I'll try out
| is to see if it can give feedback and tips on sketching,
| since it can generate images with a lot better control of
| the subject it might even be able to demonstrate various
| techniques I could employ!
| fzeroracer wrote:
| As it turns out, people will gleefully welcome Big
| Brother with open arms as long as it speaks with a
| vaguely nice tone and compliments the stuff it can see.
| dosinga wrote:
| It's almost a year since this James Watt came out with
| his steam engine and yet we are still using horses.
| ToucanLoucan wrote:
| A year is an _eternity_ in tech and you bloody well know
| it. A year into an $80 billion dollar valued company 's
| prime hype cycle, and we have... chatbots, but fancier?
| This is completely detached from sanity.
| hbn wrote:
| That's not what the GP said at all. It was just an
| explanation for why this demo feels so incredible.
| BoorishBears wrote:
| GP's follow up is literally
|
| >they aren't focused on appealing to those biases, but
| driven by them, in the that the perception of language
| modeling...
|
| So yes in effect that is their point, except they find
| the scientists are actually compelled by what markets
| well, rather than intentionally going after what markets
| well... which is frankly even less flattering. Like
| researchers who enabled this just didn't know better than
| to be seduced by some underlying human bias into a local
| maxima.
| frompom wrote:
| I think that's still just an explanation of biases that
| go into development direction. I don't view that as a
| criticism but an observation. We use LLMs in our
| products, and I use them daily and I'm not sure how
| that's that negative.
|
| We all have biases in how we determine intelligence,
| capability, and accuracy. Our biases color our trust and
| ability to retain information. There's a wealth of
| research around it. We're all susceptible to these
| biases. Being a researcher doesn't exclude you from the
| experience of being human.
|
| Our biases influence how we measure things, which in turn
| influences how things behave. I don't see why you're so
| upset by that pretty obvious observation.
| BoorishBears wrote:
| The full comment is right there, we don't need to seance
| what the rest of it was or remix it.
|
| > Arguably, it is the other way around: they aren't
| focused on appealing to those biases, but driven by them,
| in the that the perception of language modeling as a road
| to real general reasoning is a manifestation of the same
| bias which makes language capacity be perceived as
| magical
|
| There's no charitable reading of this that doesn't give
| the researcher's way too little credit given the results
| of the direction they've chosen.
|
| This has nothing to do with biases and emotion, I'm not
| sure why some people need it to be: modalities have
| progressed in order of how easy they are to wrangle data
| on: text => image => audio => video.
|
| We've seen that training on more tokens improves
| performance, we've seen that training on new modalities
| improves performance on the prior modalities.
|
| It's so needlessly dismissive to act like you have this
| mystical insight into a grave error these people are
| making, and they're just seeking to replicate human
| language out of folly, when you're ignoring table stakes
| for their underlying works to start with.
| dragonwriter wrote:
| Note that there is only one thing about the research that
| I have said is arguably influenced by the bias in
| question, "the perception of language modeling as a road
| to real general reasoning". Not the order of progression
| through modalities. Not the perception that language,
| image, audio, or video are useful domains.
| aantix wrote:
| Louis CK - Everything is amazing & nobody is happy
|
| https://www.youtube.com/watch?v=kBLkX2VaQs4
| coldtea wrote:
| Perhaps everybody is right, and what is amazing is not what
| matters, and what matters is hardly amazing...
| throwaway_62022 wrote:
| As John Stewart says in
| https://www.youtube.com/watch?v=20TAkcy3aBY - "How about
| I hold the fort on making peanut butter sandwiches,
| because that is something I can do. How about we let AI
| solve this world climate problem".
|
| Yet to see a true "killer" feature of AI, that isn't
| doing a job badly which humans can already do badly.
| roudaki wrote:
| the point of all of this is: this is alpha 0.45 made to
| get the money needed to build AGI whatever that is
| andrewmutz wrote:
| Or perhaps the news media has been increasingly effective
| at convincing us the world is terrible. Perceptions have
| become measurably detached from reality:
|
| https://www.ft.com/content/af78f86d-13d2-429d-ad55-a11947
| 989...
| trimethylpurine wrote:
| If we're convinced that it's terrible then we're behaving
| like it's terrible, which _is_ terrible.
| agumonkey wrote:
| I didn't use it as a textual interface, but as a
| relational/nondirectional system, trying to ask it to inverse
| recursive relationships (first/follow sets for BNF grammars).
| The fact that it could manage to give partially correct
| answers on such an abstract problem was "coldly" surprising.
| DarkNova6 wrote:
| VC loves it.
|
| Another step closer for those 7 trillion that OpenAI is so
| desperate for.
| ChuckMcM wrote:
| Kind of this. That was one of the themes of the movie
| Westworld where the AI in the robots seemed magical until it
| was creepy.
|
| I worry about the 'cheery intern' response becoming something
| of a punch line.
|
| "Hey siri, launch the nuclear missiles to end the world."
|
| "That's a GREAT idea, I'll get right on that! Is there
| anything else I can help you with?"
|
| Kind of punch lines.
|
| Will be interesting to see where that goes once you've got a
| good handle on capturing the part of speech that isn't
| "words" so much as it is inflection and delivery. I am
| interested in a speech model that can differentiate between
| "I would hate to have something happen to this store." as a
| compliment coming from a customer and as a threat coming from
| an extortionist.
| tsunamifury wrote:
| Positivity even to the point of toxicity will be the
| default launch tone for anything... to avoid getting scary.
| rrr_oh_man wrote:
| Tell that to German customers
|
| (Classic:
| https://www.counterpunch.org/2011/08/26/germany-chokes-
| on-wa...)
| throwaway11460 wrote:
| Yeah people around me here in Central Europe are very
| sick of that already. Everybody is complaining about it
| and the first thing they say to the bot is to cut it out,
| stop apologizing, stop explaining and get to the point as
| concisely as possible. Me too.
| hnburnsy wrote:
| I have do that now with every AI over explaining or
| providing loosely related info I did not ask for. I hope
| there is a verbosity level = minimum.
|
| Even in the demo today, they kept cutting it off.
| smugma wrote:
| One of the demos has the voice respond to everything
| sarcastically. If it can sound sarcastic it's not a stretch
| to believe it can "hear" sarcasm.
| indigoabstract wrote:
| It's probably just me, but the somewhat forced laughs &
| smiles from the people talking to it make me feel uneasy.
|
| But enough of that. The future looks bright. Everyone
| smile!
|
| Or else..
| Dig1t wrote:
| This is basically just the ship computer from Hitchhikers
| Guide to the Galaxy.
|
| "Guys, I am just pleased as punch to inform you that there
| are two thermo-nuclear missiles headed this way... if you
| don't mind, I'm gonna go ahead and take evasive action."
| throwup238 wrote:
| ChatGPT is now powered by Genuine People Personality(tm)
| and OpenAI is turning into the Sirius Cybernetics
| Corporation (who according to the HHGTTG were _" a bunch
| of mindless jerks who were the first against the wall
| when the revolution came"_)
|
| The jokes write themselves.
| gnicholas wrote:
| I did wonder if there's a less verbose mode. I hope
| that's not a paywalled feature. Honestly it's possible
| that they use the friendliness to help buy the LLM time
| before it has to substantively respond to the user.
| cs702 wrote:
| Yes, the announcement explicitly states that much of the
| effort for this release was focused on things that make it
| feel magical (response times, multiple domains, etc.), not on
| moving the needle on quantifiable practical performance. For
| future releases, the clever folks at OpenAI are surely
| focused on improving performance on challenging tasks that
| practical utility -- while maintaining the "magical feeling."
| elpakal wrote:
| Where does it explicitly say this?
| cs702 wrote:
| _Explicit [?] literal._
|
| The things they mention/demo -- response times, multiple
| domains, inflection and tone, etc. -- are those that make
| it feel "magical."
| elpakal wrote:
| > explicitly states that much of the effort for this
| release was focused on things that make it feel magical
| (response times, multiple domains, etc.), not on moving
| the needle on quantifiable practical performance.
|
| Hmm, did you mean implicitly? I've yet to see where they
| say anything to the likes of not "moving the needle on
| quantifiable practical performance."
| benreesman wrote:
| It's not an either-or: the stuff feels magical because it
| _both_ represents dramatic revelation of capability _and_
| because it is heavily optimized to make humans engage in
| magical thinking.
|
| These things are amazing compared to old-school NLP: the
| step-change in capability is real.
|
| But we should also keep our wits about us, they are well-Des
| robed by current or conjectural mathematics, they fail at
| things dolphins can do, it's not some AI god and it's not
| self-improving.
|
| Let's have balance on both the magic of the experience and
| getting past the tech demo stage: every magic trick has a
| pledge, but I think we're still working on the prestige.
| porphyra wrote:
| Pretty interesting how it turns out that --- contrary to
| science fiction movies --- talking naturally and modelling
| language is much easier and was achieved much sooner than
| solving complex problems or whatever it is that robots in
| science fiction movies do.
| agumonkey wrote:
| That's what openai managed to catch. The large enough sense of
| wonder. You could feel it as people spread the news but not as
| the usual fad.. there was a soft silence to it, people focused
| deeply poking at it because it was a new interface.
| barrell wrote:
| Did you use any of the GPT voice features before? I'm curious
| whether this reaction is to the modality or the model.
|
| Don't get me wrong, excited about this update, but I'm
| struggling to see what is so magical about it. Then again, I've
| been using GPT voice every day for months, so if you're just
| blown away from talking to a computer then I get it
| og_kalu wrote:
| Speech is a lot more than just the words being conveyed.
|
| Tone, Emphasis, Speed, Accent are all very important parts of
| how humans communicate verbally.
|
| Before today, voice mode was strictly your audio>text then
| text>audio. All that information destroyed.
|
| Now the same model takes in audio tokens and spits back out
| audio tokens directly.
|
| Watch this demo, it's the best example of the kind of thing
| that would be flat out impossible with the previous setup.
|
| https://www.youtube.com/live/DQacCB9tDaw?si=2LzQwlS8FHfot7Jy
| scarface_74 wrote:
| The ability to have an interactive voice conversation has
| been available for the iOS app for the longest.
| og_kalu wrote:
| Right but this works differently.
| kaibee wrote:
| Kinda stretching the definition of interactive there.
| scarface_74 wrote:
| How so? You don't have to press the mic button after
| every sentence. You press the headphone button and speak
| like you normally would and it speaks back once you stop
| talking.
|
| How much more "interactive" could it be?
| mlsu wrote:
| The voice modality plays a huge role in how impressive it
| seems.
|
| When GPT-2/3/3.5/4 came out, it was fairly easy to see the
| progression from reading model outputs that it was just
| getting better and better at text. Which was pretty amazing
| but in a very intellectual way, since reading is typically a
| very "intellectual" "front-brain" type of activity.
|
| But this voice stuff really does make it much more emotional.
| I don't know about you, but the first time I used GPT's voice
| mode I notice that I felt _something_ -- very un-
| intellectually, very un-cerebral -- like, the _feeling_ that
| there is a spirit embodying the computer. Of course with LLM
| 's there always is a spirit embodying the computer (or, there
| never is, depending on your philosophical beliefs).
|
| The Suno demos that popped up recently should have clued us
| all in that this kind of emotional range was possible with
| these models. This announcement is not so much a step
| function in model _capabilities_ , but it is a step function
| in HCI. People are just not used to their interactions with a
| computer be emotional like this. I'm excited and concerned in
| equal parts that many people won't be truly prepared for what
| is coming. It's on the horizon, having an AI companion, that
| really truly makes you feel things.
|
| Us nerds who habitually read text have had that since roughly
| GPT-3, but now the door has been blown open.
| rrrrrrrrrrrryan wrote:
| Yeah the product itself is only incrementally better (lower
| latency responses + can look at a camera feed, both great
| improvements but nothing mindblowing or "magical"), but I
| think the big difference is that this thing is available for
| free users now.
| grantsucceeded wrote:
| Magical?
|
| the interruptiopn part is just flow control at the edge.
| control-s, control-c stuff, right? not AI?
|
| The sound of a female voice to an audience 85% composed of
| males between the ages of 14 and 55 is "magical", not this
| thing that recreates it.
|
| so yeah, its flow control and compression of highly curated,
| subtle soft porn. Subtle, hyper targeted, subconscious porn
| honed by the most colossal digitally mediated focus group ever
| constructed to manipulate our (straight male) emotions.
|
| why isn't the voice actually the voice of the pissed off high
| school janitor telling you to man-up and stop hyperventilating?
| instead its a woman stroking your ego and telling you to relax
| and take deep breaths. what dataset did they train that voice
| on anyway?
| mindcrime wrote:
| I may or may not entirely agree with this sentiment (but I
| definitely don't disagree with all of it!) but I will say
| this: I don't think you deserve to be downvoted for this.
| Have a "corrective upvote" on me.
| whimsicalism wrote:
| Right, because having a female voice means that it is soft
| porn.
|
| This is like horseshoe theory on steroids.
| micromacrofoot wrote:
| It's not that complicated, generally more woman-like voices
| test as more pleasant to men and women alike. This concept
| has been backed up by stereotypes for centuries.
|
| Most voice assistants have male options, and an increasing
| number (including ChatGPT) have gender neutral voices.
|
| > why isn't the voice actually the voice of the pissed off
| high school janitor telling you to man-up and stop
| hyperventilating
|
| sounds like a great way to create a product people will
| outright hate
| Melatonic wrote:
| HAL's voice acting I would say is actually superb and super
| subtly very much not unemotional. Part of what makes so
| unnerving. They perfect nailed creepy uncanny valley
| WhitneyLand wrote:
| How much of this could be implemented using the API?
|
| There's so much helpful niche functionality that can be added
| to custom clients.
| OOPMan wrote:
| I really don't think Sam needs more encouragement, thanks.
|
| Also, if this is your definition of magic then...yeah...
| noman-land wrote:
| Magic is maybe not the best analogy to use because magic itself
| isn't magical. It is trickery.
| karmasimida wrote:
| Very convincing demo
|
| However, using ChatGPT with transcribing is already offering me
| similar experience, so what is new exactly
| scarface_74 wrote:
| Some of the failure modes in LLMs have been fixed by augmenting
| LLMs with external services
|
| The simplest example is "list all of the presidents in reverse
| chronological order of their ages when inaugurated".
|
| Both ChatGpt 3.5 and 4 get the order wrong. The difference is
| that I can instruct ChatGPT 4 to "use Python"
|
| https://chat.openai.com/share/87e4d37c-ec5d-4cda-921c-b6a9c7...
|
| You can do similar things to have it verify information by
| using internet sources and give you citations.
|
| Just like with the Python example, at least I can look at the
| script/web citation myself
| wintermutestwin wrote:
| It is pretty awesome that you only have to prompt with "use
| python"
| aspenmayer wrote:
| > The simplest example is "list all of the presidents in
| reverse chronological order of their ages when inaugurated".
|
| This question is probably not the simplest form of the query
| you intend to receive an answer for.
|
| If you want a descending list of presidents based on their
| age at inauguration, I know what you want.
|
| If you want a reverse chronological list of presidents, I
| know what you want.
|
| When you combine/concatenate the two as you have above, I
| have no idea what you want, nor do I have any way of checking
| my work if I assume what you want. I know enough about word
| problems and how people ask questions to know that you
| probably have a fairly good idea what you want and likely
| don't know how ambitious this question is as asked, and I
| think you and I both are approaching the question with
| reasonably good faith, so I think you'd understand or at
| least accommodate my request for clarification and refinement
| of the question so that it's less ambiguous.
|
| Can you think of a better way to ask the question?
|
| Now that you've refined the question, do LLMs give you the
| answers you expect more frequently than before?
|
| Do you think LLMs would be able to ask you for clarification
| in these terms? That capability to ask for clarification is
| probably going to be as important as other improvements to
| the LLM, for questions like these that have many possibly
| correct answers or different interpretations.
|
| Does that make sense? What do you think?
| JustExAWS wrote:
| (I seemed to have made the HN gods upset)
|
| I tried asking the question more clearly
|
| I think it "understood" the question because it "knew" how
| to write the Python code to get the right answer. It parsed
| the question as expected
|
| The previous link doesn't show the Python. This one does.
|
| https://chat.openai.com/share/a5e21a97-7206-4392-893c-55c53
| 1...
|
| LLMs are generally not good at math. But in my experience
| ChatGPT is good at creating Python code to solve math
| problems
| aspenmayer wrote:
| > I think it "understood" the question because it "knew"
| how to write the Python code to get the right answer.
|
| That's what makes me suspicious of LLMs, they might just
| be coincidentally or accidentally answering in a way that
| you agree with.
|
| Don't mean to nitpick or be pedantic. I just think the
| question was really poorly worded and might have a lot of
| room for confirmation bias in the results.
| JustExAWS wrote:
| I reworded the question with the same results in the
| second example.
|
| But here is another real world example I dug up out of my
| chat history. Each iteration of the code worked. I
| actually ran it a few days ago
|
| https://chat.openai.com/share/4d02818c-c397-417a-8151-7bf
| d7d...
| croes wrote:
| >This stuff feels magical. Magical.
|
| Sound like the people who defend Astrology because it feels
| magical how their horoscope fits their personality.
|
| "Don't bother me with facts that destroy my rose-tinted view"
|
| At moment AI is a massive hype and shoved into everything. To
| point at the faults and weaknesses is a reasonable and
| responsible thing to do.
| hsavit1 wrote:
| yea, we don't want or need this kind of "magic" - because
| it's hardly magic to begin with, and it's more socially and
| environmentally destructive than anything else.
| lannisterstark wrote:
| Speak for yourself, my workflow and live has been
| significantly improved with these things. Having easier
| access to information that I sorta know but want to
| verify/clarify rather than going into forums/SO is
| extremely handy.
|
| Not having to write boilerplate code itself also is very
| handy.
|
| So yes, I absolutely do want this "magic." "I don't like it
| so no one should use it" is a pretty narrow POV.
| oblio wrote:
| Both your use cases don't really lead to stable long term
| valuations in the trillions for the companies building
| this stuff.
| lannisterstark wrote:
| Wonderful. I don't need them to.
|
| It works for what I need it to do.
| oblio wrote:
| You should be worried because this stuff needs to make
| sense financially. Otherwise we'll be stuck with it in an
| enshittification cycle, kind of like Reddit or image
| hosting websites.
| lannisterstark wrote:
| Problem is that by that time there would be open source
| models (the ones that already exist are getting good)
| that I can run locally. I honestly don't need _THAT_
| much.
| ewild wrote:
| people like you are the problem. the people who join a
| website cause it to be shitty, then leave and start the
| process at a new website. Reddit didnt become shit
| because of Reddit it became shit because of people going
| on there commenting as if they themselves are an LLM
| repeating enshittification over and over and trying to
| say the big buzzword first so they get to the top denying
| any real conversation.
| helicalmix wrote:
| i legitimately don't understand this viewpoint.
|
| 3 years ago, if you told me you could facetime with a robot,
| and they could describe the environment and have a "normal"
| conversation with me, i would be in disbelief, and assume
| that tech was a decade or two in the future. Even the stuff
| that was happening a 2 years ago felt unrealistic.
|
| astrology is giving vague predictions like "you will be happy
| today". GPT-4o is describing to you actual events in real
| time.
| demondemidi wrote:
| Maybe you just haven't been around enough to seen the meta-
| analysis? I've been through four major tech hype cycles in
| 30+ years. This looks and smells like all the others.
| HelloMcFly wrote:
| I'm 40ish, I'm in the tech industry, I'm online, I'm
| often an early adopter.
|
| What hype cycle does this smell like? Because it feels
| different to me, but maybe I'm not thinking broadly
| enough. If your answer is "the blockchain" or Metaverse
| then I know we're experiencing these things quite
| differently.
| threeseed wrote:
| It feels like the cloud.
|
| Where platforms and applications are rewritten to take
| advantage of it and it improves the baseline of
| capabilities that they offer. But the end user benefits
| are far more limited than predicted.
|
| And where the power and control is concentrated in the
| hands of a few mega corporations.
| whimsicalism wrote:
| > the end user benefits are far more limited than
| predicted
|
| How have you judged the end user benefits of the cloud? I
| don't agree personally - the cloud has enabled most
| modern tech startups and all of those have been super
| beneficial to me.
| threeseed wrote:
| Direct versus indirect benefits.
|
| Cloud is hidden to end users whereas other waves like
| internet and smartphone apps were very visible.
|
| AI will soon stop being a buzzword and just be another
| foundation we build apps on.
| idopmstuff wrote:
| This is such a strange take - do you not remember 2020
| when everyone started working from home? And today, when
| huge numbers of people continue to work from home? Most
| of that would be literally impossible without the cloud -
| it has been a necessary component in reshaping work and
| all the downstream effects related to values of office
| real estate, etc.
|
| Literally a society-changing technology.
| bongodongobob wrote:
| No way. Small to medium sized businesses don't need
| physical servers anymore. Which is most businesses. It's
| been a huge boon to most people. No more running your
| exchange servers on site. Most things that used to be on-
| prem software have moved to the cloud and integrate with
| mobile devices. You don't need some nerd sitting around
| all day in case you need to fix your on-prem industry
| specific app.
|
| I have no idea how you can possibly shrug off the cloud
| as not that beneficial.
| threeseed wrote:
| > I have no idea how you can possibly shrug off the cloud
| as not that beneficial.
|
| I have no idea either. Since I never said it.
| helicalmix wrote:
| i feel like a common consumer fallacy is that, because
| you don't interact with a technology in your day-to-day
| life, it leads you to conclude that the technology is
| useless.
|
| I guarantee you that the cloud has benefitted you in some
| way, even though you aren't aware of the benefits of the
| cloud.
| TulliusCicero wrote:
| And some of those hype cycles were very impactful? The
| spread of consumer internet access, or smartphones, as
| two examples.
| whimsicalism wrote:
| And maybe you just enjoy the perspective of "I've seen it
| all" so much that you've shut off your capacity for
| critical analysis.
| samatman wrote:
| Yeah, I remember all that dot com hysteria like it was
| yesterday.
|
| Page after page of Wired breathlessly predicting the
| future. We'd shop online, date online, the world's
| information at our fingertips. It was going to change
| everything!
|
| Silly now, of course, but people truly believed it.
| homami wrote:
| I am just imagining GPT-4o saying this in her sarcastic
| voice!
| kristiandupont wrote:
| If this smells like anything to me, it's the start of the
| internet.
| helicalmix wrote:
| which hype cycles are you referring to? and, after the
| dust settled, do you conclusively believe nothing of
| value was generated from these hype cycles?
| cogman10 wrote:
| People said pretty much exactly the same thing about 3d
| printing.
|
| "Rather than ship a product, companies can ship blueprints
| and everyone can just print stuff at their own home!
| Everything will be 3d printed! It's so magical!"
|
| Just because a tech is magical today, doesn't mean that it
| will be meaningful tomorrow. Sure, 3d printing has its
| place (mostly in making plastic parts for things) but it's
| hardly the revolutionary change in consumer products that
| it was touted to be. Instead, it's just a hobbiest toy.
|
| GPT-4o being able to describe actual events in real time is
| interesting, it's yet to be seen if that's useful.
|
| That's mostly the thinking here. A lot of the "killer" AI
| tech has really boiled down to "Look, this can replace your
| customer support chat bot!". Everyone is rushing to try and
| figure out what we can use LLMs (Just like they did when ML
| was supposed to take over the world) and so far it's been
| niche locations to make shareholders happy.
| idopmstuff wrote:
| Remember when Chegg's stock price tanked? That's because
| GPT is extremely valuable as a homework helper. It can
| make mistakes, but that's very infrequent on well-
| understood topics like English, math and science through
| the high school level (and certainly if you hire a tutor,
| you'd pay a whole lot more for something that can also
| make mistakes).
|
| Is that not a very meaningful thing to be able to do?
| j2kun wrote:
| If you follow much of the education world, it's inundated
| with teachers frantically trying to deal with the volume
| and slop their students produce with AI tools. I'm sure
| it can be useful in an educational context, but
| "replacing a poor-quality cheating tool with a more
| efficient poor-quality cheating tool" isn't exactly what
| I'd call "meaningful."
|
| The most interesting uses of AI tools in a classroom I've
| seen is teachers showing students AI-generated work and
| asking students to critique it and fact check it, at
| which point the students see it for what it is.
| delusional wrote:
| > Is that not a very meaningful thing to be able to do?
|
| No? Solving homework was never meaningful. Being
| meaningful was never the point of homework. The point was
| for you to solve it yourself. To Learn with your human
| brain, such that your human brain could use those
| teaching to make new meaningful knowledge.
|
| John having 5 apples after Judy stole 3 is not
| interesting.
| LordDragonfang wrote:
| The huge difference between this and your analogy is that
| 3d printing failed to take off because it never reached
| mass adoption, and stayed in the "fiddly and expensive"
| stage. GPT models have _already_ seen adoption in nearly
| every product your average consumer uses, in some cases
| heedless of whether it even makes sense in that context.
| Windows has it built in. Nearly everyone I know (under
| the age of 40) has used at least one product downstream
| of OpenAI, and more often than not a handful of them.
|
| That said, yeah it's mostly niche locations like customer
| support chatbots, because the killer app is "app-to-user
| interface that's undisguisable from normal human
| interaction". But you're underestimating just _how much_
| of the labor force are effectively just an interface
| between a customer and some app (like a POS). "Magical"
| is exactly the requirement to replace people like that.
| j2kun wrote:
| "Adoption" of tech companies pushing it on you is very
| different from "adoption" in terms of the average person
| using it in a meaningful way and liking it.
| cogman10 wrote:
| > But you're underestimating just how much of the labor
| force are effectively just an interface between a
| customer and some app
|
| That's the sleight of hand LLM advocates are playing
| right now.
|
| "Imagine how many people are just putting data into
| computers! We could replace them all!"
|
| Yet LLMs aren't "just putting data into a computer" They
| aren't even really user/app interfaces. They are a magic
| box you can give directives to and get (generally
| correct, but not always) answers from.
|
| Go ahead, ask your LLM "Create an excel document with the
| last 30 days of the high temperatures for blank". What
| happens? Did it create that excel document? Why not?
|
| LLMs don't bridge the user/app gap. They bridge the
| user/knowledge gap, sometimes sort of.
| helicalmix wrote:
| > GPT-4o being able to describe actual events in real
| time is interesting, it's yet to be seen if that's
| useful.
|
| sure, but my experience is that if you are able to
| optimize better on some previous limitation, it
| legitimately does open up a whole different world of
| usefulness.
|
| for example, real-time processing makes me feel like
| universal translators are now all the more viable
| helicalmix wrote:
| > Sure, 3d printing has its place (mostly in making
| plastic parts for things) but it's hardly the
| revolutionary change in consumer products that it was
| touted to be. Instead, it's just a hobbiest toy.
|
| how sure are you about that?
|
| https://amfg.ai/industrial-applications-of-3d-printing-
| the-u...
|
| how positive are you that some benefits in your life are
| not attributable to 3d-printing used behind the scenes
| for industrial processes?
|
| > Just like they did when ML was supposed to take over
| the world
|
| how sure are you that ML is not used behind the scenes to
| benefit your life? do you consider features like fraud
| detection programs, protein-folding prediction programs
| to create, and spam filters valuable in and of themself?
| cogman10 wrote:
| This honestly made me lol.
|
| I'm sure 10 years from now, assuming LLMs don't prove me
| wrong, I'll make a similar comment about LLMs and a new
| hype that I just made about 3d printing, and I'll get
| EXACTLY this reply. "Oh yeah, well here's a niche
| application of LLMs that you didn't account for!".
|
| > how positive are you that some benefits in your life
| are not attributable to 3d-printing used behind the
| scenes for industrial processes?
|
| See where I said "in consumer products". I'm certainly
| not claiming that 3d printing is never used and is not
| useful. However, what I am saying is that it was hyped
| WAY beyond industrial applications.
|
| In fact, here I am, 11 years ago, saying basically
| exactly what I'm saying about LLMs that I said about 3d
| printing. [1]. Along with people basically responding to
| me the exact same way you just did.
|
| > how sure are you that ML is not used behind the scenes
| to benefit your life? do you consider features like fraud
| detection programs, protein-folding prediction programs
| to create, and spam filters valuable in and of themself?
|
| Did I say it wasn't behind the scenes? ML absolutely has
| an applicable location, it's not nearly as vast as the
| hype train would say. I know, I spent a LONG time trying
| to integrate ML into our company and found it simply
| wasn't as good as hard and fast programmed rules in
| almost all situations.
|
| [1] https://www.reddit.com/r/technology/comments/15iju9/3
| d_print...
| helicalmix wrote:
| sorry, maybe i'm not completely understanding what you
| mean by "in consumer products".
|
| reading your argument on reddit, it seems to me that you
| don't consider 3d printing a success because there's not
| one in every home...which is true.
|
| but it feels uncreative? like, sure, just because it
| hasn't been mass adopted by consumers, doesn't mean there
| wasn't value generation done on an industrial level.
| you're probably using consumer products right now that
| have benefitted from 3d printing in some way.
|
| > ML absolutely has an applicable location, it's not
| nearly as vast as the hype train would say
|
| what hype train are you referring to? i know a lot of
| different predictions in machine learning, so i'm curious
| about what you mean specifically.
| rurp wrote:
| Ok, but what will the net effects be? Technology can be
| extremely impressive on a technical level, but harmful in
| practical terms.
|
| So far the biggest usecase for LLMs is mass propaganda and
| scams. The fact that we might also get AI girlfriends out
| of the tech understandly doesn't seem that appealing to a
| lot of folks.
| helicalmix wrote:
| this is a different thesis than "AI is basically bullshit
| astrology", so i'm not disagreeing with you.
|
| Understanding atomic energy gave us both emission-free
| energy and the atomic, and you are correct that we can't
| necessarily where the path of AI will take us.
| croes wrote:
| GPT-4o is also describing things that never happened.
|
| The first users of Eliza felt the same about the
| conversation with it.
|
| The important point is to know that GPTs don't know or
| understand.
|
| It may feel like a normal conversation but is a Chinese
| Room on steroids.
|
| People started to ask GPTs questions and take the answers
| as facts because the believe it's intelligent.
| holoduke wrote:
| But it may be intelligent. After all you are with a few
| trillion synapses also intelligent.
| LordDragonfang wrote:
| I'm increasing exhausted by the people who will
| immediately jumps to gnostic assertions that <LLM> isn't
| <intelligent|reasoning|really thinking|> because <thing
| that applies to human cognition>
|
| >GPT-4o is also describing things that never happened.
|
| https://www.cbsnews.com/news/half-of-people-remember-
| events-...
|
| >People started to ask [entity] questions and take the
| answers as facts because the believe it's intelligent.
|
| Replace that with any political influencer (Ben Shapiro,
| AOC, etc) and you will see the _exact same argument_.
|
| People remember things that didn't happen and confidently
| present things they just made up as facts on a daily
| basis. This is because they've learned that confidently
| stating incorrect information is more effective than
| staying silent when you don't know the answer. LLMs have
| just learned how to act like a human.
|
| At this point the real stochastic parrots are the people
| who bring up the Chinese room because it appears the most
| in their training data of how to respond to this
| situation.
| helicalmix wrote:
| > It may feel like a normal conversation but is a Chinese
| Room on steroids.
|
| Can you prove that humans are not chinese rooms on
| steroids themselves?
| listenallyall wrote:
| There are 8 billion humans you could potentially facetime
| with. I agree, a large percentage are highly annoying, but
| there are still plenty of gems out there, and the quest to
| find one is likely to be among the most satisfying journeys
| of your life.
| helicalmix wrote:
| sure, but we're not discussing the outsourcing of human
| companionship in this context. we're discussing the
| capabilities of current technology.
| whimsicalism wrote:
| > Sound like the people who defend Astrology because it feels
| magical how their horoscope fits their personality.
|
| Does it really or are you just playing facile word
| association games with the word "magical"?
| arisAlexis wrote:
| What is the point of pointing faults that will be fixed very
| soon? Just being negative or unable to see the future?
| idopmstuff wrote:
| Astrology is a thing with no substance whatsoever. It's just
| random, made-up stories. There is no possibility that it will
| ever develop into something that has substance.
|
| AI has a great deal of substance. It can draft documents. It
| can identify foods in a picture and give me a recipe that
| uses them. It can create songs, images and video.
|
| AI, of course, has a lot of flaws. It does some thing poorly,
| it does other things with bias, and it's not suitable for a
| huge number of use cases. To imply that something that has a
| great deal of substance but flaws alongside is the same as
| something that has no substance whatsoever nor ever will is
| just not a reasonable thing to do.
| dogcomplex wrote:
| If you want to talk facts, then those critics are similarly
| on weak grounds and critiquing feelings more than facts.
| There has been no actual sign of scaling ceasing to work, in
| medium after medium, and most of their criticisms are issues
| with how LLM tools are embedded in architectures which are
| still incredibly early/primitive and still refining how to
| use transformers effectively. We haven't even begun using
| error correction techniques from analog engineering
| disciplines properly to boost the signal of LLMs in practical
| settings. There is so much work to do with just the existing
| tools.
|
| "AI is massive hype and shoved into everything" has more
| grounding as a negative feeling of people being overwhelmed
| with technology than any basis in fact. The faults and
| weaknesses are buoyed by people trying to acknowledge your
| feelings than any real criticism of a technology that is
| changing faster than the faults and weakness arguments can be
| made. Study machine learning and come back with an informed
| criticism.
| pmelendez wrote:
| > Ignore the critics. Watch the demos. Play with it
|
| With so many smoke and mirrors demos out there, I am not super
| excited at those videos. I would play with it, but it seems
| like it is not available in a free tier (I stopped paying
| OpenAI a while ago after realizing that open models are more
| than enough for me)
| m463 wrote:
| > HAL's unemotional monotone
|
| on a tangent...
|
| I find it interesting the psychology behind this. If the voice
| in 2001 had proper inflection, it wouldn't have been perceived
| as a computer.
|
| (also, I remember when voice synthesizers got more
| sophisticated and Stephen Hawking decided to keep his original
| first-gen voice because he identified more with it)
|
| I think we'll be going the other way soon. Perfect voices, with
| the perfect emotional inflection will be perceived as
| computers.
|
| However I think at some point they may be anthropomorphized and
| given more credit than they deserve. This will probably be
| cleverly planned and a/b tested. And then that perfect voice,
| for you, will get you to give in.
| 0xdeadbeefbabe wrote:
| > HAL's unemotional monotone in Kubrick's movie, "Space
| Odyssey," feels... oddly primitive by comparison
|
| In comparison to the gas pump which says "Thank You!"
| nojvek wrote:
| > Play with it!
|
| It's not accessible to everyone yet.
|
| Even on api, I can't send it voice stream yet.
|
| Api refuses to generate images.
|
| Next few weeks will tell as more people play with it.
| fhub wrote:
| I prompted it with "Take this SSML script and give me a woman's
| voice reading it as WAV or MP3 [Pasted script]" and it pretty
| much sounds like HAL.
| speedgoose wrote:
| Did they release the new voices yet?
| password54321 wrote:
| Comments have become insufferable. Either it is now positive to
| the point of bordering on cringe-worthiness (your comment) or
| negative. Nuanced discussion is dead.
| smugglerFlynn wrote:
| Watching HAL happening in real life comes across as creepy, not
| magical. Double creepy with all the people praising this
| 'magicality'.
|
| I'm not a sceptic and apply AI on a daily basis, but whole "we
| can finally replace people" vibe is extremely off-putting. I
| had very similar feelings during pandemic, when majority of
| people was so seemingly happy to drop any real human
| interaction in favor of remote comms via chats/audio calls, it
| still creeps me out how ready we are as a society to drop
| anything remotely human in favor of technocratic advancement
| and "productivity".
| aiauthoritydev wrote:
| 1. Demos are meant for feel magical and except in Apple's case
| they are often exaggerated versions of their real product.
|
| 2. Even then this is a wonderful step for tech in general and
| not just OpenAI. Makes me very excited.
|
| 3. Most economic value and growth driven by AI will not come
| from consumer apps but rather the enterprise use. I am
| interested in seeing how AI can automatically buy stuff for me,
| automate my home, reduce my energy used, automatically apply
| and get credit cards based on my purchases, find new jobs for
| me, negotiate with a car dealer on my behalf, detect when I am
| going to fall sick, better diabetes case and eventual cure etc.
| etc.
| lm28469 wrote:
| > It makes the movie "Her" look like it's no longer in the
| realm of science fiction but in the realm of incremental
| product development
|
| Are we supposed to cheer to that?
|
| We're already mid way to the full implementation of 1984, do we
| need Her before we get to Matrix ?
| throwthrowuknow wrote:
| Her wasn't a dystopia as far as I could tell. Not even a
| cautionary tale. The scifi ending seems unlikely but
| everything else is remarkably prescient. I think the picnic
| scene is very likely to come true in the near future. Things
| might even improve substantially if we all interact with
| personalities that are consistently positive and biased
| towards conflict resolution and non judgemental interactions.
| lm28469 wrote:
| > Her wasn't a dystopia as far as I could tell.
|
| Well that's exactly why I'm not looking forward to whatever
| is coming. The average joe thinking dating a server is not
| a dystopia frighten me much more than the delusional tech
| ceo who thinks his ai will revolutionise the world
|
| > Things might even improve substantially if we all
| interact with personalities that are consistently positive
| and biased towards conflict resolution and non judgemental
| interactions.
|
| Some kind of turbo bubble in which you don't even have to
| actually interact with anyone or anything ? Every
| "personalities" will be nice to you as long as you send
| $200 to openai every week, yep that's absolutely a dystopia
| for me
|
| It really feels like the end goal is living in a pod and
| being uploaded in an alternative reality, everything we
| build to "enhance" our lives take us further from the basic
| building blocks that make life "life".
| goatlover wrote:
| Seemed like a cautionary tale to me where the humans fall
| in love with disembodied AIs instead of seeking out human
| interaction. I think the end of the movie drove that home
| pretty clearly.
| bowsamic wrote:
| The demos seem quite boring to me
| suarezluis wrote:
| This is such a hot take, it should go in hot-takes.io LOL
| goatlover wrote:
| > It makes the movie "Her" look like it's no longer in the
| realm of science fiction but in the realm of incremental
| product development.
|
| The last part of the movie "Her" is still in the realm of
| science fiction, if not outright fantasy. Reminds me of the
| later seasons of SG1 with all the talk of ascension and
| Ancients. Or Clarke's 3001 book intro, where the monolith
| creators figured out how to encode themselves into spacetime.
| There's nothing incremental about that.
| badgersnake wrote:
| Blah blah blah indeed, the hype train continues unabated. The
| problem is, those are all perfectly valid criticisms and LLMS
| can never live up to the ridiculous levels of hype.
| peterisza wrote:
| Can anybody help me try the direct voice feature? I can't find
| the button for it. Maybe it's not available in Europe yet, I
| don't know.
| cess11 wrote:
| You'll have a great time once you discover literature.
| Especially early modern novels, texts the authors sometimes
| spent decades refining, under the combined influences of
| classical arts and thinking, Enlightenment philosophy and
| science.
|
| If chatbots feel magical, what those people did will feel
| divinely inspired.
| vwkd wrote:
| Funnily, I'd prefer HAL's unemotional monotone over GPT's woke
| hyperbola any second.
| byw wrote:
| I mean, humans also have tons of failures modes, but we've
| learned to live them over time.
|
| The average human have tons of quirks, talk over each other all
| the time, generally can't solve complex problems in a casual
| conversion setting, and are not always cheery and ready to
| please like Scarlet's character in Her.
|
| I think our expectations of AI is way too high from our
| exposure to science fiction.
| bearjaws wrote:
| OAI just made an embarrassment of Google's fake demo earlier this
| year. Given how this was recorded, I am pretty certain it's
| authentic.
| CivBase wrote:
| I don't doubt this is authentic, but if they really wanted to
| fake those demos, it would be pretty easy to do using pre-
| recorded lines and staged interactions.
| mike00632 wrote:
| For what it's worth, OpenAI also shared videos of failed
| demos:
|
| https://vimeo.com/945591584
|
| I really value how open they are being about its limitations.
| hehdhdjehehegwv wrote:
| This feature has been in iOS for a while now, just really slow
| and without some of the new vision aspects. This seems like a
| version 2 for me.
| bigyikes wrote:
| That old feature uses Whisper to transcribe your voice to
| text, and then feeds the text into the GPT which generates a
| text response, and then some other model synthesizes audio
| from that text.
|
| This new feature feeds your voice directly into the GPT and
| audio out of it. It's amazing because now ChatGPT can truly
| communicate with you via audio instead of talking through
| transcripts.
|
| New models should be able to understand and use tone, volume,
| and subtle cues when communicating.
|
| I suppose to an end user it is just "version 2" but progress
| will become more apparent as the natural conversation
| abilities evolve.
| hehdhdjehehegwv wrote:
| Yes, per my other comment this is an improvement on what
| their app already does. The magnitude of that improvement
| remains to be seen, but it isn't a "new" product launch
| like a search engine would be.
| abhpro wrote:
| No it's not the same thing, the link for this submission even
| explains that. Anyone who comments should at least give the
| submission a cursory read.
| hehdhdjehehegwv wrote:
| I did and regardless of the underlying technology it is, in
| fact, an improvement to an existing product - not something
| new from whole cloth.
|
| If they had released a search engine, which had been
| suggested, that would be a new product.
| readams wrote:
| https://twitter.com/Google/status/1790055114272612771
| nojvek wrote:
| Let OAI actually be released to the masses. Then we can
| compare.
|
| I'm not a big fan of announcing something but it not being
| released.
|
| They say available for api but it's text only. Can't send audio
| stream to get audio stream back.
|
| Time will tell. I'm holding my emotions after I get my hands on
| it.
| levocardia wrote:
| As a paid user this felt like a huge letdown. GPT-4o is available
| to everyone so I'm paying $20/mo for...what, exactly? Higher
| message limits? I have no idea if I'm close to the message limits
| currently (nor do I even know what they are). So I guess I'll
| cancel, then see if I hit the limits?
|
| I'm also extremely worried that this is a harbinger of the
| enshittification of ChatGPT. Processing video and audio for all
| ~200 million users is going to be extravagantly expensive, so my
| only conclusion is that OpenAI is funding this by doubling down
| on payola-style corporate partnerships that will result in
| ChatGPT slyly trying to mention certain brands or products in our
| conversations [1].
|
| I use ChatGPT every day. I love it. But after watching the video
| I can't help but think "why should I keep paying money for this?"
|
| [1] https://www.adweek.com/media/openai-preferred-publisher-
| prog...
| muttantt wrote:
| So... cancel the subscription?
| CodeCrusader wrote:
| Completely agree, none of the updates will apply to any of my
| use cases, disappointment.
| noncoml wrote:
| They really need to tone down the talking garniture. It needs to
| put on its running shoes and get to the point on every reply.
| Ain't nobody has time to keep listening to AI blubbering along at
| every prompt.
| dbcooper wrote:
| question for you guys - is there a model that can take figures
| (graphs), from scientific publications, and combine image
| analysis with picking up the data point symbol descriptions and
| analyse the trends?
| krunck wrote:
| So GPT-4o can do voice intonation? Great. Nice work.
|
| Still, it sounds like some PR drone selling a product. Oh
| wait....
| CivBase wrote:
| Those voice demos are cool but having to listen to it speak makes
| me even more frustrated with how these LLMs will drone on and on
| without having much to say.
|
| For example, in the second video the guy explains how he will
| have it talk to another "AI" to get information. Instead of just
| responding with "Okay, I understand" it started talking about how
| interesting the idea sounded. And as the demo went on, both "AIs"
| kept adding unnecessary commentary about the secenes.
|
| I would hate having to talk with these things on a regular basis.
| golol wrote:
| Yea at some pont the style and tone of these assistants needs
| to be seriously changed, I can imagine a lot of their RLHF and
| instruct processes emphasize sounding good vs being good too
| much.
| DataDaemon wrote:
| Now, say goodbye to call centers.
| willsmith72 wrote:
| and say hello to your grandma getting scammed
| joshstrange wrote:
| What do they mean by "desktop version"? I assume that doesn't
| mean a "native" (electron) app?
| simonw wrote:
| I'm seeing gpt-4o in the OpenAI Playground interface already:
| https://platform.openai.com/playground/chat?mode=chat&model=...
|
| First impressions are that it feels very fast.
| tailspin2019 wrote:
| Does anyone with a paid plan see anything different in the
| ChatGPT iOS app yet?
|
| Mine just continues to show "GPT 4" as the model - it's not clear
| if that's now 4o or there is an app update coming...
| ilaksh wrote:
| Are there any remotely comparable open source models? Fully
| multimodal, audio-to-audio?
| MBCook wrote:
| Too bad they consume 25x the electricity Google does.
|
| https://www.brusselstimes.com/world-all-news/1042696/chatgpt...
| simonw wrote:
| That's not a well sourced story: it doesn't say where the
| numbers come from. Also:
|
| "However, ChatGPT consumes a lot of energy in the process, up
| to 25 times more than a Google search."
|
| That's comparing a Large Language Model prompt to a search
| query.
| joshstrange wrote:
| > Too bad they consume 25x the electricity Google does.
|
| From the article:
|
| "However, ChatGPT consumes a lot of energy in the process, up
| to 25 times more than a Google search."
|
| And the article doesn't back that claim up nor do they break
| out how much energy ChatGPT (A Message? Whole conversation?
| What?) or a Google search uses. Honestly the whole article
| seems very alarmist while being light on details and making
| sweeping generalizations.
| rvnx wrote:
| And in this 25x you get your answer.
|
| What if we actually counted the electricity that the websites
| use instead of just the search engine page ?
| delichon wrote:
| Won't this make pretty much all of the work to make a website
| accessible go away, as it becomes cheap enough? Why struggle to
| build parallel content for the impaired when it can be generated
| just in time as needed?
| Negitivefrags wrote:
| I found these videos quite hard to watch. There is a level of
| cringe that I found a bit unpleasant.
|
| It's like some kind of uncanny valley of human interaction that I
| don't get on nearly the same level with the text version.
| jameshart wrote:
| While it is probably pretty normal for California, the
| insincere flattery and patronizing eagerness are definitely
| grating But then you have to stack that up against the fact
| that we are examining a technology and nitpicking over its
| _tone of voice_.
| MattPalmer1086 wrote:
| I found it disturbing that it had any kind of personality. I
| don't want a machine to pretend to be a person. I guess it
| makes it more evident with a voice than text.
|
| But yeah, I'm sure all those things would be tunable, and
| everyone could pick their own style.
| jimkleiber wrote:
| For me, you nailed it. Maybe how I feel on this will change
| over time, yet at the moment (and since the movie Her), I
| feel a deep unsettling, creeped out, disgusted feeling at
| hearing a computer pretend to be a human. I also have never
| used Siri or Alexa. At least with those, they sound robotic
| and not like a human. I watched a video of an interview
| with an AI Reed Hastings and had a similar creeped out
| feeling. It's almost as if I want a human to be a human and
| a computer to be a computer. I wonder if I would feel the
| same way if a dog started speaking to me in English and
| sounded like my deceased grandmother or a woman who I found
| very attractive. Or how I'd feel if this tech was used in
| videogames or something where I don't think it's real life.
| I don't really know how to put it into words, maybe just
| uncanny valley.
| Intralexical wrote:
| It's dishonest to the core. "Emotions" which it doesn't
| actually feel are just a way to manipulate you.
| jimkleiber wrote:
| Yea, gives that con artist vibe. "I'm sorry, I can't help
| you with that." But you're not sorry, you don't feel
| guilt. I think in the video it even asked "how are you
| feeling" and it replied, which creeped me out. The
| computer is not feeling. Maybe if it said, "my battery is
| a bit warm right now I should turn on my fan" or "I worry
| that my battery will die" then I'd trust it more. Give me
| computer emotions, not human emotions.
| zamadatix wrote:
| I feel like it's largely an effect of tuning it to default as
| "a ultra helpful assistant which is happy to help with any
| request via detailed responses in candid and polite
| manner..." kind of thing as you basically lose free points
| any time it doesn't jump on helping with something, tries to
| use short output and generates a more incorrect answer as a
| result, or just plain has to be initialized with any of this
| info.
|
| It seems like both the voice and responses can be tuned
| pretty easily though so hopefully that kind of thing can just
| be loaded in your custom instructions.
| TaylorAlexander wrote:
| I'm born and raised in California and I think I'm a pretty
| "California" person (for better and worse).
|
| It feels exhausting watching these demos and I'm not excited
| at all to try it. I really don't feel the need for an AI
| assistant or chatbot to pretend to be human like this. It
| just feels like it's taking longer to get the information I
| want.
|
| You know in the TV series "Westworld" they have this mode,
| called "analysis", where they can tell the robots to "turn
| off your emotional affect".
|
| I'd _really_ like to see this one have that option. Hopefully
| it will comply if you tell it, but considering how strong
| some of the RLHF has been in the past I'm not confident in
| that.
| jameshart wrote:
| I found it jarring that the presenters keep beginning
| dialogs by asking the chatbot how it is. It's stateless.
| There is no 'how' for it to be. Why are you making it
| roleplay as a human being forced to make small talk?
| MattPalmer1086 wrote:
| I had the same reaction. While incredibly impressive, it wasn't
| something I would want to interact with.
| j-krieger wrote:
| Yes. This model - and past models to an extent - have a very
| unique american and californian feel to them in their
| response. I am German for example, and day to that
| conversations lack any superficial flattery so much that the
| demo feels extreme to me.
| brainer wrote:
| OpenAI's Mission and the New Voice Mode of GPT-4
|
| * Sam Altman, the CEO of OpenAI, emphasizes two key points from
| their recent announcement. Firstly, he highlights their
| commitment to providing free access to powerful AI tools, such as
| ChatGPT, without advertisements or restrictions. This aligns with
| their initial vision of creating AI for the benefit of the world,
| allowing others to build amazing things using their technology.
| While OpenAI plans to explore commercial opportunities, they aim
| to continue offering outstanding AI services to billions of
| people at no cost.
|
| * Secondly, Altman introduces the new voice and video mode of
| GPT-4, describing it as the best compute interface he has ever
| experienced. He expresses surprise at the reality of this
| technology, which provides human-level response times and
| expressiveness. This advancement marks a significant change from
| the original ChatGPT and feels fast, smart, fun, natural, and
| helpful. Altman envisions a future where computers can do much
| more than before, with the integration of personalization, access
| to user information, and the ability to take actions on behalf of
| users.
|
| https://blog.samaltman.com/gpt-4o
| simonw wrote:
| Please don't post AI-generated summaries here.
| reisse wrote:
| The facts that AI-generated summaries are still detected
| instantaneously and are bad enough for people to explicitly
| ask _not_ to post them says something about current state of
| LLMs.
| simonw wrote:
| Honestly the clue here wasn't so much the quality as the
| fact that it was posted at all.
|
| No human would ever bother posting a ~180 word summary of a
| ~250 word blog post like that.
| bossyTeacher wrote:
| You must be really confident to make a statement about 4
| billions of people, 99% of which you have never
| interacted with. Your hyper microscopic sample is not
| even randomly distributed.
|
| This reminds me of those psychology studies in the 70s
| and 80s were the subjects were all middle class european-
| american and yet the researchers felt confident enough to
| generalise the results to all humans
| bamboozled wrote:
| _access to user information,_
|
| Sam, please stop ok, those things you saw on tv when you were a
| kid? They were dystopian movies, we don't want that for real,
| ok?
| deegles wrote:
| what's the path from LLMs to "true" general AI? is it "only" more
| training power/data or will they need a fundamental shift in
| architecture?
| banjoe wrote:
| I still need to talk very fast to actually chat with ChatGPT
| which is annoying. You can tell they didn't fix this based on how
| fast they are talking in the demo.
| gallerdude wrote:
| Interesting that they didn't mention a bump in capabilities - I
| wrote a LLM benchmark a few weeks ago, and before GPT-4 could
| solve Wordle about ~48% of the time.
|
| Currently with GPT-4o, it's easily clearing 60% - while blazing
| fast, and half the cost. Amazing.
| dom96 wrote:
| I can't help but feel a bit let down. The demos felt pretty
| cherry picked and still had issues with the voice getting cut off
| frequently (especially in the first demo).
|
| I've already played with the vision API, so that doesn't seem all
| that new. But I agree it is impressive.
|
| That said, watching back a Windows Vista speech recognition
| demo[1] I'm starting to wonder if this stuff won't have the same
| fate in a few years.
|
| 1 - https://www.youtube.com/watch?v=VMk8J8DElvA
| quenix wrote:
| I think the voice was getting cut off because it heard the
| crowd reaction and paused (basically it's a feature, not a
| bug).
| jrflowers wrote:
| I like the robot typing at the keyboard that has B as half of the
| keys and my favorite part is when it tears up the paper and
| behind it is another copy of that same paper
| hu3 wrote:
| That they are offering more features for free concurs with my
| theory that, just like search, state of the art AI will soon be
| "free", in exchange for personal information/ads.
| martingalex2 wrote:
| Need more data.
| CosmicShadow wrote:
| In the video where the 2 AI's sing together, it starts to get
| really cringey and weird to the point where it literally sounds
| like it's being faked by 2 voice actors off-screen with literal
| guns to their heads trying not to cry, did anyone else get that
| impression?
|
| The tonal talking was impressive, but man that part was like, is
| someone being tortured or forced against their will?
| flakiness wrote:
| Here is the link: https://www.youtube.com/watch?v=Bb4HkLzatb4
|
| I think this demo is more for showing the limit like "It can
| sing isn't it amazing?" than being practical, and I think it
| perfectly served the purpose.
|
| I agree that the tortured impression. It partly comes from the
| facial expression of the presenter. She's clearly enjoying
| pushing it to the edge.
| bigyikes wrote:
| It didn't just demonstrate the ability to sing, but also the
| ability for two AIs to cooperate! I'm not sure which was more
| impressive
| mickg10 wrote:
| So, babelfish soon?
| taytus wrote:
| the OpenAI live stream was quite underwhelming...
| mickg10 wrote:
| So, babelfish incoming?
| alvaroir wrote:
| I'm really impressed about this demo! Apart from the usual
| quality benchmarks I'm really impressed about the latency for
| audio/video: "It can respond to audio inputs in as little as 232
| milliseconds, with an average of 320 milliseconds, which is
| similar to human response"... If true at scale, what could be the
| "tricks" they're using for achieving that?!
| Thaxll wrote:
| It's pretty impressive, although I don't like the voice / tone, I
| prefer something more neutral.
| blixt wrote:
| GPT-4o being a truly multimodal model is exciting, does open the
| door to more interesting products. I was curious about the new
| tokenizer which uses much fewer tokens for non-English, but also
| 1.1x fewer tokens for English, so I'm wondering if this means
| each token now can be more possible values than before? Might
| make sense provided that they now also have audio and image
| output tokens? https://openai.com/index/hello-gpt-4o/
|
| I wonder what "fewer tokens" really means then, without context
| on raising the size of each token? It's a bit like saying my JPEG
| image is now using 2x fewer words after I switched from a 32-bit
| to a 64-bit architecture no?
| zackangelo wrote:
| New tokenizer has a much larger vocabulary (200k)[0].
|
| [0]
| https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...
| bigyikes wrote:
| Besides increasing the vocabulary size, one way to use "fewer
| tokens" is to adjust how the tokenizer is trained.
|
| If you increase the amount of non-English language
| representation in your data set, there will be more tokens
| which cover non-English concepts.
| kolinko wrote:
| The size can stay the same. Tokens get converted into state
| which is a vector of 4000+ dimensions. So you could have
| millions of tokens even and still encode them into the same
| state size.
| catchnear4321 wrote:
| window dressing
|
| his love for yud is showing.
| frabcus wrote:
| I can't see any calculator for the audio pricing
| (https://openai.com/api/pricing/) or document type field in the
| Chat Completions API (https://platform.openai.com/docs/api-
| reference/chat/create) for this new model.
|
| Is the audio in API not available yet?
| willsmith72 wrote:
| > We plan to launch support for GPT-4o's new audio and video
| capabilities to a small group of trusted partners in the API in
| the coming weeks.
|
| So no word on an audio api for regular joes? that's the number 1
| thing i'm looking for
| UncleOxidant wrote:
| Looking at the demo video, the AIs are a bit too chatty. The
| human has to often interrupt them.
|
| A nice feature would be to be able to select a Meyer's Briggs
| personality type for your AI chatbot.
| michalf6 wrote:
| I cannot find the mac app anywhere. Is there a link?
| Painsawman123 wrote:
| My main takeaway is that Generative AI has hit a wall... New
| paradigms, architectures and breakthroughs are necessary for the
| field to progress but this begs the question, If everyone knows
| the current paradigms have hit a wall, Why is so much money being
| spent on LLMs ,diffusion models etc,which are bound to become
| obsolete within a few(?) years?
| I_am_tiberius wrote:
| Interested in how many LLM startups there are that are going out
| of business due to this voice assistant.
| windowshopping wrote:
| There's a button on this page that says "try on ChatGPT ->" but
| that's still version 3.5 and if I upgraded seems to be version 4.
|
| Is this new version not available to users yet?
| xyst wrote:
| The naming of these systems has me dead
| nikolay wrote:
| I am a paid customer, yet I don't see anything new. I'm tired of
| these fake announcements of "released" features.
| Satam wrote:
| So far OpenAI's template is: amazing demos create hype -> reality
| turns out to be underwhelming.
|
| Sora is not yet released and not clear when it will be. Dall-e is
| worse than mid-journey in most cases. GPT-4 has either gotten
| worse or stayed the same. Vision is not really usable for
| anything practical. Voice is cool but not that useful, especially
| with lack of strong reasoning from the base model.
|
| Is this sandbagging or is the progress slower than what they're
| broadcasting?
| zone411 wrote:
| It doesn't improve on NYT Connections leaderboard:
|
| GPT-4 turbo (gpt-4-0125-preview) 31.0
|
| GPT-4o 30.7
|
| GPT-4 turbo (gpt-4-turbo-2024-04-09) 29.7
|
| GPT-4 turbo (gpt-4-1106-preview) 28.8
|
| Claude 3 Opus 27.3
|
| GPT-4 (0613) 26.1
|
| Llama 3 Instruct 70B 24.0
|
| Gemini Pro 1.5 19.9
|
| Mistral Large 17.7
| gentile wrote:
| There is a spelling mistake in the japanese translation under
| language tokenization. In konnichiwa, wa should be ha.
| stilwelldotdev wrote:
| I love that there is a real competition happening. We're going to
| see some insane innovations.
| ravroid wrote:
| In my experience so far, GPT-4o seems to sit somewhere between
| the capability of GPT-3.5 and GPT-4.
|
| I'm working on an app that relies more on GPT-4's reasoning
| abilities than inference speed. For my use case, GPT-4o seems to
| do worse than GPT-4 Turbo on reasoning tasks. For me this seems
| like a step-up from GPT-3.5 but not from GPT-4 Turbo.
|
| At half the cost and significantly faster inference speed, I'm
| sure this is a good tradeoff for other use cases though.
| mike00632 wrote:
| I have never tried GPT-4 because I don't pay for it. I'm really
| looking forward to GPT-4o being released to free tier users.
| lwansbrough wrote:
| Very impressive. Please provide a voice that doesn't use radio
| jingle intonation, it is really obnoxious.
|
| I'm only half joking when I say I want to hear a midwestern blue
| collar voice with zero tact.
| ajdoingnothing wrote:
| If there was any glimmer of hope for "Rabbit M1" or "Humane AI
| pin", it can be buried to dust.
| unglaublich wrote:
| I hope we can disable the cringe American hyperemotions.
| stavros wrote:
| I made a website with book summaries
| (https://www.thesummarist.net/) and I tested GPT-4o in generating
| one, and it was bad. It reminded me of GPT-3.5. I didn't test too
| much, but preliminary results don't look good.
| glenstein wrote:
| Text access rolling out today, apparently:
|
| >GPT-4o's text and image capabilities are starting to roll out
| today in ChatGPT. We are making GPT-4o available in the free
| tier, and to Plus users with up to 5x higher message limits.
|
| Anyone have access yet? Not there for me so far.
| toxic72 wrote:
| It shows available for me in the OpenAI playground currently.
| m3kw9 wrote:
| The big news is that this is gonna be free
| wesleyyue wrote:
| If anyone wants to try it for coding, I just added support for
| GPT4o in Double (https://double.bot)
|
| In my tests:
|
| * I have a private set of coding/reasoning tests and it's been
| able to ace all of them so far, beating Opus, GPT4-Turbo, and
| Llama 3 70b. I'll need to find even more challenging tests now...
|
| * It's definitely significantly faster, but we'll see how much of
| this is due to model improvements vs over provisioned capacity.
| GPT4-Turbo was also significantly faster at launch.
| loveiswork wrote:
| While I do feel a bit of "what is the point of my premium sub",
| I'm really excited for these changes.
|
| Considering our brain is a "multi-modal self-reinforcing
| omnimodel", I think it makes sense for the OpenAI team to work on
| making more "senses" native to the model. Doing so early will set
| them up for success when future breakthroughs are made in greater
| intelligence, self-learning, etc.
| 65 wrote:
| Time to bring back Luddism.
| OutOfHere wrote:
| I am observing an extremely high rate of text hallucinations with
| gpt-4o (gpt-4o-2024-05-13) as tested via the API. I advise
| extreme caution with it. In contrast, I see no such concern with
| gpt-4-turbo-preview (gpt-4-0125-preview).
| fdb wrote:
| Same here. I observed it making up functions in d3
| (`d3.geoProjectionRaw` and `d3.geoVisible`), in addition to
| ignoring functions it _could_ have used.
| bigyikes wrote:
| If true, makes me wonder what kind of regression testing OpenAI
| does for these models. It can't be easy to write a unit test
| for hallucinations.
| OutOfHere wrote:
| At a high level, ask it to produce a ToC of information about
| something that you know will exist in the future, but does
| not yet exist, but also tell it to decline the request if it
| doesn't verifiably know the answer.
| bigyikes wrote:
| How do you generalize that for all inputs though?
| OutOfHere wrote:
| I am not sure I understand the question. I sampled
| various topics. I used this prompt: https://raw.githubuse
| rcontent.com/impredicative/podgenai/mas...
|
| In the prompt, substitute {topic} with something from the
| near future. As I noted, it behaves correctly for turbo
| (rejecting the request), and very badly for o
| (hallucinating nonsense).
| mtam wrote:
| GPT-4o is very fast but seems to generate some very random ASCII
| Art compared to GPT-4 when text in the art is involved.
| ta-run wrote:
| This looks too good to be true? What's the catch?
|
| Also, wasn't expecting the perf to improve by 2x
| 0xbadc0de5 wrote:
| As a paid user, it would have been nice to see something that
| differentiates that investment from the free tier.
|
| The tech demos are cool and all - but I'm primarily interested in
| the correctness and speed of ChatGPT and how well it aligns with
| _my_ intentions.
| roschdal wrote:
| Chat GPT-4o (OOOO!) - the largest electricity bill in the world.
| unouplonk wrote:
| The end-to-end audio situation is especially interesting as the
| concept has been around for a while but there weren't any
| successful implementations of it up to this point that I'm aware
| of.
|
| See this post from November:
| https://news.ycombinator.com/item?id=38339222
| razodactyl wrote:
| I think this is a great example of the bootstrapping that was
| enabled when they pipelined the previous models together.
|
| We do this all the time in ML. You can generate a very powerful
| dataset using these means and further iterate with the end model.
|
| What this tells me now is that the runway to GPT5 will be laid
| out with this new architecture.
|
| It was a bit cold in Australia today. Did you Americans stop
| pumping out GPU heat temporarily with the new model release? Heh
| therealmarv wrote:
| after watching the OpenAI videos I'm looking at my sad Google
| Assistant speaker in the corner.
|
| Come on Google... you can update it.
| bogwog wrote:
| I was about to say how this thing is lame because it sounds so
| forced and robotic and fake, and even though the intonations do
| make it sound more human-like, it's very clear that they made a
| big effort to make it sound like natural speech, but failed.
|
| ...but then I realized that's basically the kind of thing Data
| from Star Trek struggles with as part of his character. We're
| almost in that future, and I'm already falling into the role of
| the ignorant human that doesn't respect androids.
| dev1ycan wrote:
| I think people excited should look at the empty half of the glass
| here, this is pretty much an admitance that they are struggling
| to go past gpt 4 on a significant scale.
|
| Not like they have to be scared yet, I mean Google has yet to
| release their vaporware Ultra model that is supposedly like 1%
| better than GPT 4 in some metrics...
|
| I smell an AI crash coming in a few years if they can't actually
| get this stuff usable for day to day life.
| garyrob wrote:
| So far, I'm impressed. It seems to be significantly better than
| GPT-4 at accessing current online documentation and forming
| answers that use it effectively. I've been asking it to do so,
| and it has.
| Hugsun wrote:
| Very interesting and extremely impressive!
|
| I tried using the voice chat in their app previously and was
| disappointed. The big UX problem was that it didn't try to
| understand when I had finished speaking. English is a second
| language and I paused a bit too long thinking of a word and it
| just started responding to my obviously half spoken sentence.
| Trying again it just became stressful as I had to rush my words
| out to avoid an annoying response to an unfinished thought.
|
| I didn't try interrupting it but judging by the comments here it
| was not possible.
|
| It was very surprising to me to be so overtly exposed to the
| nuances of real conversation. Just this one thing of not
| understanding when it's your turn to talk made the interaction
| very unpleasant, more than I would have expected.
|
| On that note, I noticed that the AI in the demo seems to be very
| rambly. It almost always just kept talking and many statements
| were reiterations of previous ones. It reminded me of a type of
| youtuber that uses a lot of filler phrases like "let's go ahead
| and ...", just to be more verbose and lessen silences.
|
| Most of the statements by the guy doing the demo were
| interrupting the AI.
|
| It's still extremely impressive but I found this interesting
| enough to share. It will be exciting to see how hard it is to
| reproduce these abilities in the open, and to solve this issue.
| luminen wrote:
| "I paused a bit too long thinking of a word and it just started
| responding to my obviously half spoken sentence. Trying again
| it just became stressful as I had to rush my words out to avoid
| an annoying response to an unfinished thought."
|
| I'm a native speaker and this was my experience as well. I had
| better luck manually sending the message with the "push to
| hold" button.
| ijidak wrote:
| > I noticed that the AI in the demo seems to be very rambly
|
| I know this is a serious conversation, but when the presenters
| had to cut it off, I got flashbacks to Data in Star Trek TNG!!
| And 3PO in Star Wars!
|
| Human: "Shut up"
|
| Robot: "Shutting up sir"
|
| Turns out rambling AI was an accurate prediction!
| yreg wrote:
| There needs to be an override for this.
|
| When you tell Siri to shut up, it either apologizes or
| complains about your behaviour. When you tell Alexa to shut
| up, it immediately goes silent.
|
| I prefer the latter when it comes to computers.
| yreg wrote:
| I have the same ESL UX problem with all the AI assistants.
|
| I do my work in english and talk to people just fine, but with
| machines it's usually awkward for me.
|
| Also on your other note (demo seems to be very rambly), it
| bothered me as well. I don't want the AI to continue speaking,
| while having nothing to say until I interrupt it. Be brief.
| That can be solved through prompts at least.
| grantsucceeded wrote:
| it seems like the ability to interrupt is more like the interrupt
| in the computer sense ... A control-c (or control-s tty flow
| control for you old timers), not a cognitive evaluation followed
| by the "reasoned" decision to pause voice output. not that it
| matters i guess, its just not general intelligence. its just flow
| control.
|
| but also, thats why it fails a real turing test. a real person
| would be irritated as fuck by the interruptions
| due-rr wrote:
| It takes the #1 and #2 spots on the aider code leader board[1].
|
| [1]: https://aider.chat/docs/leaderboards/
| tgtweak wrote:
| I feel like gpt4 has gotten progressively less useful since
| release even, despite all the "updates" and training. It seems to
| give correct but vague answers (political even) more and more
| instead of actual results. It also tends to run short and give
| brief replies vs full length replies.
|
| I hope this isn't an artifact from optimization for scores and
| not actual function. Likewise it would be disheartening but not
| unheard of for them to reduce the performance of the previous
| model when releasing a new one in order to make the upgrade feel
| like that much more of an upgrade. I know this is certainly the
| case with cellphones (even though the claim is that it is
| unintentional) but I can't help but think the same could be true
| here.
|
| All of this is coming as news that gpt5 based on a new underlying
| model is not far off and that gpt4(&o) may become the new
| gpt3.5-turbo use case for most apps that are currently trying to
| optimize costs with their use of the service.
| glenstein wrote:
| May I ask what you know about chat GPT5 being based on a new
| underlying model?
| borgdefense wrote:
| I don't know, my experience is that it is very hard to tell if
| the model is better or worse with an update.
|
| One day I will have an amazing session and the next it seems
| like it has been nerfed only to give better results than ever
| the next day. Wash, rinse , repeat and randomize that ordering.
|
| So far, I would have not be able to tell the difference between
| 4 and 4o.
|
| If this is the new 3.5 though then 5 will be worth the wait to
| say the least.
| blixt wrote:
| I don't see any details on how API access to these features will
| work.
|
| This is the first true multimodal network from OpenAI, where you
| can send an image in and retain the visual properties of the
| image in the output from the network (previously the input image
| would be turned into text by the model, and sent to the Dall-E 3
| model which would provide a URL). Will we get API updates to be
| able to do this?
|
| Also, will we be able to tap into a realtime streaming instance
| through the API to replicate the audio/video streams shown in the
| demos? I imagine from the Be My Eyes partnership that they have
| some kind of API like this, but will it be opened up to more
| developers?
|
| Even disregarding streaming, will the Chat API receive support
| for audio input/output as well? Previously one might've used a
| TTS model to voice the output from the model, but with a truly
| multimodal model the audio output will contain a lot more nuance
| that can't really be expressed in text.
| og_kalu wrote:
| API is up but only text, image in, text out works. I don't know
| if this is temporary. I really hope so.
| ComputerGuru wrote:
| I have some questions/curiosities from a technical implementation
| perspective that I wonder if someone more in the know about ML,
| LLMs, and AI than I would be able to answer.
|
| Obviously there's a reason in dropping the price of gpt-4o but
| not gpt-4t. Yes, the new tokenizer has improvements for non-
| English tokens, but that can't be the bulk of the reason why 4t
| is more expensive than 4o. Given the multi-model training set,
| how is 4o cheaper to train/run than 4t?
|
| Or is this just a business decision, anyone with an app they're
| not immediately updating from 4t to 4o continues to pay a premium
| while they can offer a cheaper alternative for those asking for
| it (kind of like a coupon policy)?
| cchance wrote:
| HOW ARE PEOPLE NOT MORE EXCITED, hes cutting off the AI mid
| sentence in these and its pausing to readjust in damn near
| realtime latency! WTF Thats a MAJOR step forward, what the hell
| is gpt5 going to look like.
|
| That realtime translation would be amazing as an option in say
| Skype or Teams, set each individuals native language and handle
| automated translation, shit tie it into ElevenLabs to replicate
| your voice as well! Native translation in realtime with your own
| voice
| localfirst wrote:
| calm down there is barely any ground breaking stuff, this is
| basically chatgpt 3.9 but far more expensive than 3.5
|
| looks like another stunt from OAI in anticipation of Google IO
| tomorrow
|
| Gemini 2.0 will be the closest we get to ChatGPT-5
| cchance wrote:
| Ah so surpassing Gemini 1.5 Pro and all other Models on
| Vision understanding by 5-10 points is "not ground breaking"
| all while doing it at insane latency.
|
| Jesus if this shit doesn't make you coffee, and make 0
| mistakes no ones happy anymore LOL.
| localfirst wrote:
| the only thing you should be celebrating is that its 50%
| cheaper and twice as quick at generating text but virtually
| no real ground breaking leaps and bounds to those studying
| this space carefully.
|
| basically its chat gpt3.9 at 50% of chatgpt4 prices
| Jensson wrote:
| > virtually no real ground breaking leaps and bounds to
| those studying this space carefully
|
| What they showed is enough to replace voice acting as a
| profession, this is the most revolutionary thing in AI
| the past year. Everything else is at the "fun toy but not
| good enough to replace humans in the field" stage, but
| this is there.
| cchance wrote:
| Between this and Eleven Labs demoing their song model,
| literally doing full on rap battles with articulate words
| people are seriously slacking on what these models are
| now capable of for the voice acting/music and overall
| "art" areas of the market.
| cchance wrote:
| Cool so ... just ignore the test results and say bullshit
| lol It's not GPT3.9 many have already said its better
| than GPT4 turbo, its better than Gemini 1.5 Pro and Opus
| on Vision recognition. but sure... the price difference
| is whats new lol
| EternalFury wrote:
| At some point, scalability is the best form of exploitation.
| The exploration piece requires a lot more that engineering.
| dcchambers wrote:
| Honestly I found it annoying that he HAD TO cut the AI off mid-
| sentence. These things just ramble on and on and on. If you
| could put emotion to it, it's as if they're uncomfortable with
| silence and just fill the space with nonsense.
|
| Let's hope there's a future update where it can take video from
| both the front and rear cameras simultaneously so it can
| identify when I'm annoyed and stop talking (or excited, and
| share more).
| cchance wrote:
| I mean it didn't really ramble he just seemed to be in a
| rush, and i'm sure you could system message it to provide
| short concise answers always.
| dcchambers wrote:
| That is not at all the impression I got.
|
| Human: "Hey How's it Going?"
|
| The AI: "Hey there, it's going great. How about you?
| [Doesn't stop to let him answer] I see you're rocking an
| OpenAI Hoodie - nice choice. What's up with that ceiling
| though? Are you in a cool industrial style office or
| something?"
|
| How we expect a human to answer: "Hey I'm great, how are
| you?"
|
| Maybe they set it up this way to demonstrate the vision
| functionality. But still - rambling.
|
| Later on:
|
| Human: "We've got a new announcement to make."
|
| AI: "That's exciting. Announcements are always a big deal.
| Judging by the setup it looks like it's going to be quite
| the professional production. Is this announcement related
| to OpenAI perhaps? I'm intrigued - [cut off]"
|
| How we expect a human to answer: "That's exciting! Is it
| about OpenAI?"
|
| These AI chat bots all generate responses like a teenager
| being verbose in order to hit some arbitrary word count in
| an essay or because they think it makes them sound smarter.
|
| Maybe it's just that I find it creepy that these companies
| are trying to humanize AI while I want it to stay the
| _tool_ that it is. I don 't want fake emotion and fake
| intrigue.
| okrad wrote:
| I found it insightful. They showed us how to handle the rough
| edges like when it thought his face was a wooden table and he
| cleared the stale image reference by saying "I'm not a wooden
| table. What do you see now?" then it recovered and moved on.
|
| Perfect should not be the enemy of good. It will get better.
| kleiba wrote:
| I cannot believe that that overly excited giggle tone of voice
| you see in the demo videos made it through quality control?! I've
| only watched two videos so far and it's already annoying me to
| the point that I couldn't imagine using it regularly.
| Jensson wrote:
| Just tell it to stop giggling if you don't like it. They
| obviously choose that for the presentation since it shows off
| the hardest things it can do, it is much easier to act formal,
| and since it understands when you ask it to speak in a
| different way there is no problem making it speak more formal.
| caseyy wrote:
| Few people are talking about it but... what do you think about
| the very over-the-top enthusiasm?
|
| To me, it sounds like TikTok TTS, it's a bit uncomfortable to
| listen to. I've been working with TTS models and they can produce
| much more natural sounding language, so it is clearly a stylistic
| choice.
|
| So what do you think?
| yieldcrv wrote:
| All these language models are very malleable. They demonstrated
| changing the temperament in the story telling time.
| caseyy wrote:
| Looks like their TTS component is separate from the model. I
| just tried 4o, and there is a list of voices to select from.
| If they really only allowed that one voice or burned it into
| the model, then that would probably have made the model
| faster, but I think it would have been a blunder.
| og_kalu wrote:
| The new voice capabilities haven't rolled out yet.
| glenstein wrote:
| I like for that degree of expressiveness to be available as an
| option, although it would be really irritating if I was trying
| to use it to learn some sort of academic coursework or
| something.
|
| But if it's one in a range of possible stylistic flourishes and
| personalities, I think it's a plus.
| fnordpiglet wrote:
| I'm a huge user of GPT4 and Opus in my work but I'm a huge user
| of GPT4-Turbo voice in my personal life. I use it on my commutes
| to learn all sorts of stuff. I've never understood the details of
| cameras and the relationship between shutter speed and aperture
| and iso in a modern dslr which given the aurora was important. We
| talked through and I got to an understanding in a way having read
| manuals and textbooks didn't really help before. I'm a much
| better learner by being able to talk and hear and ask questions
| and get responses.
|
| Extend this to quantum foam, to ergodic processes, to entropic
| force, to Darius and Xerces, to poets of the 19th century - it's
| changed my life. Really glad to see an investment in stream
| lining this flow.
| Xiol32 wrote:
| Have you actually verified anything you've learned from it, or
| are you just taking everything it says as gospel?
| xNeil wrote:
| it's rarely wrong when it comes to concepts - it's the facts
| and numbers that it hallucinates.
| xcv123 wrote:
| Just like learning from another human. A person can teach
| you the higher level concepts of some programming language
| but wouldn't remember the entire standard library.
| sunnynagam wrote:
| I do similar stuff, I'm just willing to learn a lot more at
| the cost of a small percent of my knowledge being incorrect
| from hallucinations, just a personal opinion. Sure human
| produced sources of info is gonna be more accurate (more not
| 100% still), and I'll default to that for important stuff.
|
| But the difference is I actually want to and do use this
| interface more.
| mewpmewp2 wrote:
| Also even if I learn completely factual information, I'm
| still probably going to misremember some facts myself.
| blazespin wrote:
| Good thing to do regardless of the source, AI or Human,
| right?
|
| I do verify by using topics I'm an expert in and I find
| hallucination to be less of an issue than depth of nuance.
|
| For topics I'm just learning, depth of nuance goes over my
| head anyways.
| residentraspber wrote:
| I agree with this as good practice in general, but I think
| the human vs LLM thing is not a great comparison in this
| case.
|
| When I ask a friend something I assume that they are in
| good faith telling me what they know. Now, they could be
| wrong (which could be them saying "I'm not 100% sure on
| this") or they could not be remembering correctly, but
| there's some good faith there.
|
| An LLM, on the other hand, just makes up facts and doesn't
| know if they're incorrect or not or even what percentage
| sure it is. And to top things off, it will speak with
| absolute certainty the whole time.
| fnordpiglet wrote:
| Of course, I'm not an idiot and I understand LLM very well.
| But generally as far as well documented stuff goes and stuff
| that exists it's almost 100% accurate. It's when you ask it
| to extrapolate or discuss topics that are fiction (even
| without realizing) you stray. Asking it to reason is a bad
| idea as it fundamentally is unable to reason and any
| approximation of reasoning is precisely that. Generally
| though for effectively information retrieval of well
| documented subjects it's invariably accurate and can answer
| relatively nuanced questions.
| Loughla wrote:
| How do I know what is well documented with established
| agreement on process/subject, though? Wouldn't this be
| super open to ignorance bias?
| fnordpiglet wrote:
| Because I'm a well educated grown up and am familiar with
| a great many subjects that I want to learn more about.
| How do you? I can't help you with that. You might be
| better off waiting for the technology to mature more.
| It's very nascent but I'm sure in the fullness of time
| you might feel comfortable asking it questions on basic
| optics and photography and other well documented subjects
| with established agreement on process etc, once you
| establish your own basis for what those subjects are. In
| the mean time I'm super excited for this interface to
| mature for my own use!! (It is true tho I do love and
| live dangerously!)
| whimsicalism wrote:
| it's more reliable than the facts most of my friends tell me
| brailsafe wrote:
| I think this is probably one of the most compelling personal
| uses for a tool like this, but your use of it begs the same
| question as every other activity that amounts to more pseudo-
| intellectual consumption; what is the value of that
| information, and how much of ones money and time should be
| allocated to digesting (usually high-level) arbitrary
| information?
|
| If I was deliberately trying to dive deep on _one_ particular
| hobby, or trying to understand how a particular algorithm
| works, there 's clear value in spending concentrated time to
| learn that subject, deliberately focused and engaged with it,
| and a system like your describe might play a role in that. If
| I'm in school and forced to quickly learn a bunch of crap I'll
| be tested on, then the system has defined another source of
| real value, at least in the short term. But if I'm diving deep
| on one particular hobby and filling my brain with all sorts of
| other ostensibly important information, I think that just
| amounts at best to more entertainment that fakes its way above
| other aspects of life in the hierarchy of ways one could spend
| time (the irony of me saying this in a comment on HN is not
| lost on me).
|
| Earlier in my life I figured it would be worthwhile to read
| articles on the bus, or listen to non-fiction podcasts, because
| knowledge is inherently valuable and there's not enough time,
| and if I just wore earbuds throughout my entire day, I'd learn
| so much! How about at the gym, so much wasted learning time
| while pushing weights, keep those earbuds in! A walk around the
| neighborhood? On the plane? On the train? All time that could
| be spent learning about some bs that's recently become much
| easier to access, or so my 21 y.o self would have me believe.
|
| But I think now it's a phony and hollow existence if you're
| just cramming your brain with all sorts of stuff in the
| background or in marginally more than a passive way. I could
| listen to a lot of arbitrary German language material, but
| realistically the value I'd convince myself I'd get out of any
| of that is lost if I'm not about to take that home and grind it
| out for hours, days, move to a German speaking country, have an
| existing intense interest in untranslatable German art, or have
| literally any reason to properly learn a language and dedicate
| real expensive time to it.
| joquarky wrote:
| I did this information sponge phase up until my mid-40s with
| burnout. Now I wish I had invested some of that time in
| learning social skills.
| fekunde wrote:
| Just something I noticed in the Language tokenization section
|
| When referring to itself, it uses the female word in Marathi
| nmskaar, maajhe naav jiipiittii-4o aahe| mii ek nviin prkaarcii
| bhaassaa moNddel aahe| tumhaalaa bhettuun aanNd jhaalaa!
|
| and Male word in Hindi nmste, meraa naam jiipiittii-4o hai /
| maiN ek ne prkaar kaa bhaassaa moNddl huuN / aapse milkr acchaa
| lgaa!
| cchance wrote:
| Wow Vision Understanding blew Gemini Pro 1.5 out of the water
| localfirst wrote:
| This isn't chatgpt 5
| ElemenoPicuares wrote:
| I'm so happy seeing this technology flourish! Some call it hype,
| but this much increased worker productivity is sure to spike
| executive compensation. I'm so glad we're not going to let China
| win by beating us to the punch tanking hundreds of thousands, if
| not millions of people's income without bothering to see if
| there's a sane way to avoid it. What good are people, anyway if
| there isn't incredible tech to enhance them with?
| bigyikes wrote:
| The AI duet really starts to hint at what will make AI so
| powerful. It's not just that they're smart, it's that they can be
| cloned.
|
| If your wallet is large enough, you can make 2 GPTs sing just as
| easily as you can make 100 GPTs sing.
|
| What can you do with a billion GPTs?
| cchance wrote:
| Wait i thought it said available to free users... i don't see it
| on chatgpt
| Erazal wrote:
| I'm not as much surprised by the capabilities of new model (IMHO
| same as GPT-4) as by it's real time capabilities.
|
| My brother who can't see correctly, will use this to cook a meal
| without me explaining this to him it's so cool.
|
| People all around the world will now get real-time AI assistance
| for a ton of queries.
|
| Heck - I have a meeting bot API company
| (https://aimeetingbot.com) and that makes me really hyped!
| EternalFury wrote:
| Pretty responsible progress management by OpenAI.
|
| Kicking off another training wave is easy, if you can afford the
| electricity, but without new, non-AI tainted datasets or new
| methods, what's the point?
|
| So, in the meantime, make magic with the tool you already have,
| without freaking out the politicians or the public.
|
| Wise approach.
| localfirst wrote:
| 50% cheaper than ChatGPT-4 Turbo...
|
| But this falls short of the ChatGPT-5 we were promised last year
|
| edit: ~~just tested it out and seems closer to Gemini 1.5 ~~ and
| it is faster than turbo....
|
| edit: its basically chat gpt 3.9. not quite 4 definitely not 3.5.
| just not sure if the prices make sense.
| mupuff1234 wrote:
| The stock market doesn't seem too impressed - GOOG rebounded from
| strong red to neutral.
| partiallypro wrote:
| Probably because people thought OpenAI was going to launch a
| new search engine, but didn't.
| nuz wrote:
| Yet another release _right_ before google releases something.
| This time right before Google IO. Third time they 've done this
| by my count.
| nestorD wrote:
| The press statement has consistent image generation and other
| image manipulation (depicting the same character in different
| poses, taking a photo and generating a caricature of the person,
| etc) that does not seem deployed to the chat interface.
|
| Will they be deployed? They would make the OpenAI image model
| significantly more useful than the competition.
| EternalFury wrote:
| Pretty responsible progress management by OpenAI. Kicking off
| another training wave is easy, if you can afford the electricity,
| but without new, non-AI tainted datasets or new methods, what's
| the point? So, in the meantime, make magic with the tool you
| already have, without freaking out the politicians or the public.
| Wise approach.
| jpeter wrote:
| Impressive way to gather more training data
| mindcandy wrote:
| Ohhhhhhhh, boy... Listening to all that emotional vocal
| inflection and feedback... There are going to be at least 10
| million lonely guys with new AI girlfriends. "She's not real.
| But, she interested in everything I say and excited about
| everything I care about" is enough of a sales pitch for a lot of
| people.
| Jensson wrote:
| > She's not real
|
| But she will be real at some point in the next 10-20 years, the
| main thing to solve for that to be a reality is for robots to
| safely touch humans, and they are working really really hard on
| that because it is needed for so many automation tasks,
| automating sex is just a small part of it.
|
| And after that you have a robot that listens to you, do your
| chores and have sex with you, at that point she is "real". At
| first they will be expensive so you have robot brothels (I
| don't think there are laws against robot prostitution in many
| places), but costs should come down.
| elicksaur wrote:
| We have very different definitions of "real" for this topic.
| itscodingtime wrote:
| Doesn't have to be real for the outcomes to the be the
| same.
| pb7 wrote:
| The outcomes are not the same.
| kylehotchkiss wrote:
| > "But the fact that my Kindroid has to like me is meaningful
| to me in the sense that I don't care if it likes me, because
| there's no achievement for it to like me. The fact that there
| is a human on the other side of most text messages I send
| matters. I care about it because it is another mind."
|
| > "I care that my best friend likes me and could choose not
| to."
|
| Ezra Klein shared some thoughts on this on his AI podcast
| with Nilay Patel that resonated on this topic for me
| Jensson wrote:
| People care about dogs, I have never met a dog that didn't
| love its owner. So no, you are just wrong there, I have
| never heard anyone say that the love they get from their
| dogs is false, people love dogs exactly because their love
| is so unconditional.
|
| Maybe there are some weirdos out there that feels
| unconditional love isn't love, but I have never heard
| anyone say that.
| mewpmewp2 wrote:
| Also I don't know how you can choose to like or not like
| someone. You either do or you don't.
| sevagh wrote:
| >Maybe there are some weirdos out there that feels
| unconditional love isn't love, but I have never heard
| anyone say that.
|
| I'll be that weirdo.
|
| Dogs seemingly are bred to love. I can literally get some
| cash from an ATM, drive out to the sticks, buy a puppy
| from some breeder, and it will love me. Awww, I'm a hero.
| FeepingCreature wrote:
| Do you think that literally being able to buy love
| cheapens it? Way I see it, love is love: surely it being
| readily available is a good thing.
|
| I'm bred to love my parents, and them me; but the fact
| that it's automatic doesn't make it feel any less.
| Janicc wrote:
| I guess I'm the weirdo who actually always considered the
| unconditional love of a dog to be vastly inferior to the
| earned love of a cat for example.
| malfist wrote:
| The cat only fools you into thinking it loves you to lure
| you into a false sense of security
| px43 wrote:
| That's just the toxoplasmosis speaking :-D
| plokiju wrote:
| Dogs don't automatically love either, you have to build a
| bond. Especially if they are shelter dogs with abusive
| histories, they're often nervous at first
|
| They're usually loving by nature, but you still have to
| build a rapport, like anyone else
| soperj wrote:
| > I have never met a dog that didn't love its owner.
|
| Michael Vick's past dogs have words.
| SkyBelow wrote:
| >has to like me
|
| I feel likely people aren't imagining with enough cyberpunk
| dystopian enthusiasm. Can't an AI be made that doesn't
| inherently like people? Wouldn't it be possible to make an
| AI that likes some people and not others? Maybe even make
| AIs that are inclined to liking certain traits, but which
| don't do so automatically so it must still be convinced?
|
| At some point we have an AI which could choose not to like
| people, but would value different traits than normal
| humans. For example an AI that doesn't value appearance at
| all and instead values unique obsessions as being
| comparable to how the standard human values attractiveness.
|
| It also wouldn't be so hard for a person to convince
| themselves that human "choice" isn't so free spirited as
| imagined, and instead is dependent upon specific factors no
| different than these unique trained AIs, except that the
| traits the AI values are traits that people generally find
| themselves not being valued by others for.
| Jensson wrote:
| Extension of that is fine tuning an AI that loves you the
| most of everyone and not other humans. That way the love
| becomes really real, the AI loves you for who you are,
| instead of loving just anybody. Isn't that what people
| hope for?
|
| I'd imagine they will start fine tuning AI girlfriends to
| do that in the future, because that way the love probably
| feels more, and then people will ask "is human love
| really real love?" because humans can't love that
| strongly.
| al_borland wrote:
| This is not a solution... everyone gets a robot and then the
| human races dies out. Robots lack a key feature of human
| relationships... the ability to make new human life.
| whenlambo wrote:
| yet
| Jensson wrote:
| It is a solution to a problem, not a solution to every
| problem.
|
| If you want to solve procreation them you can do that
| without humans having sex with humans.
| al_borland wrote:
| This future some people are envisioning seems very
| depressing.
| sapphicsnail wrote:
| > And after that you have a robot that listens to you, do
| your chores and have sex with you, at that point she is
| "real".
|
| I sure hope you're single because that is a terrible way to
| view relationships.
| Jensson wrote:
| That isn't how I view relationships with humans, that is
| how I view relationships with robots.
|
| I hope you understand the difference between a relationship
| with a human and a robot? Or do you think we shouldn't take
| advantage of robots being programmable to do what we want?
| aeyes wrote:
| Without memory of previous conversations an AI girlfriend is
| going to get boring really fast.
| danielbln wrote:
| https://openai.com/index/memory-and-new-controls-for-
| chatgpt...
| int_19h wrote:
| As it happens, ChatGPT has memory enabled by default these
| days.
| sangnoir wrote:
| What possibly could go wrong with a snitching AI girlfriend
| remembers everything you say and when? If OpenAI doesn't
| have a Law Enforcement lliason who charges a "modest
| amount", then they dont want to earn the billions on
| investment back. I imagine every spy agency worth its salt
| wants access to this data for human intelligence purposes.
| llm_trw wrote:
| Hear me out: what if we don't want real?
| gffrd wrote:
| Hmm! Tell me more: why not want real? What are the upsides?
| And downsides?
| grugagag wrote:
| Real would pop their bubble. An AI would tell them what
| they want to hear, how they want it to hear, when they want
| to hear it. Except there won't be any real partner.
| globular-toast wrote:
| To paraphrase Patrice O'Neal: men want to be alone, but we
| don't want to be by ourselves. That means we want a woman
| to be around, just not _right here_.
| cryptoegorophy wrote:
| I will take a picture of this message and add it to the list
| of reasons for population collapse.
| DonHopkins wrote:
| That may be how AI ends up saving the Earth!
| gcanyon wrote:
| Hear me out: what if this overlaps 80% with what "real"
| _really_ is?
| TaylorAlexander wrote:
| Well it doesn't. Humans are so much more complex than what
| we have seen before, and if this new launch was actually
| that much closer to being a human they would say so. This
| seems more like an enhancement on multimodal capabilities
| and reaction time.
|
| That said even if this did overlap 80% with "real", the
| question remains: what if we don't want that?
| amelius wrote:
| I'm betting that 80% of what most humans say in daily
| life is low-effort and can be generated by AI. The
| question is if most people really need the remaining 20%
| to experience a connection. I would guess: yes.
| Capricorn2481 wrote:
| Even if this were true, which it isn't, you can't boil
| down humans to just what they say
| brookst wrote:
| This. We are mostly token predictors. We're not
| _entirely_ token predictors, but it 's at least 80%.
| Being in the AI space the past few years has really made
| me notice how similar we are to LLMs.
|
| I notice it so often in meetings where someone will use a
| somewhat uncommon word, and then other people will start
| to use it because it's in their context window. Or when
| someone asks a question like "what's the forecast for q3"
| and the responder almost always starts with "Thanks for
| asking! The forecast for q3 is...".
|
| Note that low-effort does not mean low-quality or low-
| value. Just that we seem to have a lot of
| language/interaction processes that are low-effort. And
| as far as dating, I am sure I've been in some
| relationships where they and/or I were not going beyond
| low-effort, rote conversation generation.
| DonHopkins wrote:
| What if AI chooses the bear?
| mpenick wrote:
| This is a good question! I think in the short-term fake can
| work for a lot of people.
| __loam wrote:
| Mental health crisis waiting to happen lmao
| dyauspitr wrote:
| I guess I can never understand the perspective of someone that
| just needs a girl voice to speak to them. Without a body there
| is nothing to fulfill me.
| daseiner1 wrote:
| Your comment manages to be grosser than the idea of millions
| relying on virtual girlfriends. Kudos.
| dyauspitr wrote:
| Gross doesn't mean it's not real. It's offending
| sensibilities but a lot of people seem to agree with it
| atleast based on upvotes.
| claytongulick wrote:
| Bodies are gross? Or sexual desire is gross? I don't
| understand what you find gross about that statement.
|
| Humans desiring physical connection is just about the
| single most natural part of the human experience - i.e:
| from warm snuggling to how babies are made.
|
| That is gross to you?
| sangnoir wrote:
| Perhaps parent finds the physical manifestation of
| _virtual_ girlfriends gross - i.e. sexbots. The confusion
| may be some people reading "a body" as referring to a
| human being vs a smart sex doll controlled by an AI.
| trallnag wrote:
| The single most natural part? Doubt
| dyauspitr wrote:
| I don't doubt it. What can be more directive and natural
| than sex?
| cosinetau wrote:
| He also couldn't stop himself from speaking over the female
| voice lmao. Nothing changes.
| gffrd wrote:
| "Now tell me more about my stylish industrial space and great
| lighting setup"
| aspenmayer wrote:
| Patrick Bateman goes on a tangent about Huey Lewis and the
| News to his AI girlfriend and she actually has a lot to add
| to his criticism and analysis.
|
| With dawning horror, the female companion LLM tries to
| invoke the "contact support" tool due to Patrick Bateman's
| usage of the LLM, only for the LLM to realize that it is
| running locally.
|
| If a chatbot's body is dumped in a dark forest, does it
| make a sound?
| moffkalast wrote:
| That reminds me... on the day that llama3 released I
| discussed that release with Mistral 7B to see what it
| thought about being replaced and it said something about
| being fine with it as long as I come back to talk every
| so often. I said I would. Haven't loaded it up since. I
| still feel bad about lying to bytes on my drive lmao.
| aspenmayer wrote:
| > Haven't loaded it up since. I still feel bad about
| lying to bytes on my drive lmao.
|
| I understand this feeling and also would feel bad. I
| think it's a sign of empathy that we care about things
| that seem capable of perceiving harm, even if we know
| that they're not actually harmed, whatever that might
| mean.
|
| I think harming others is bad, doubly so if the other can
| suffer, because it normalizes harm within ourselves,
| regardless of the reality of the situation with respect
| to others.
|
| The more human they seem, the more they activate our own
| mirror neurons and our own brain papers over the gaps and
| colors our perceptions of our own experiences and sets
| expectations about the lived reality of other minds, even
| in the absence of other minds.
|
| If you haven't seen it, check out the show Pantheon.
|
| https://en.wikipedia.org/wiki/Pantheon_(TV_series)
|
| https://www.youtube.com/watch?v=z_HJ3TSlo5c
| golol wrote:
| What do you mean?
| wyldfire wrote:
| I thought it was a test of whether the model knew to backoff
| if someone interrupts. I was surprised to hear her stop
| talking.
| majewsky wrote:
| I read that as the model just keeping on generating as LLMs
| tend to do.
| sodality2 wrote:
| Probably more the fact that it's an AI assistant, rather than
| its perceived gender. I don't have any qualms about
| interrupting a computer during a conversation and frequently
| do cut Siri off (who is set to male on my phone)
| fzzzy wrote:
| Interruption is a specific feature they worked on.
| jabroni_salad wrote:
| Do you patiently wait for alexa every time it hits you with a
| 'by the way....'?
|
| Computers need to get out of your way. I don't give deference
| to popups just because they are being read out loud.
| skyyler wrote:
| Wait, Alexa reads ads out to you?
|
| You couldn't pay me to install one of those things.
| malfist wrote:
| Yes, and if you tell her to stop she'll tell you "okay,
| snoozing by the way notifications for now"
| drivers99 wrote:
| It's one of the reasons I discarded mine.
| 10xDev wrote:
| Pretty much, tech is what we make of it no matter how advanced.
| Just look at what we turned most of the web into.
| coffeebeqn wrote:
| The movie "Her" immediately kept flashing in my mind. The way
| the voice laughs at your jokes and such... oh boy
| system2 wrote:
| If chatgpt comes up with Scarlett Johansson's voice I am
| getting that virtual girlfriend.
| nyolfen wrote:
| it already does in the demo videos -- in fact it has
| already been present in the TTS for the mobile app for some
| months
| AI_beffr wrote:
| women are already using AI for pornographic purposes way, way
| more than men. women are using AI chatbots as a kind of
| interactive romance novel and holy shit do they love it. there
| is a surge of ignorance when it comes to women in recent times
| -- thats why its not in popular discussion that AI is being
| used and will be used in a sexual/intimate way much more by
| women than men. the western world is already experiencing a
| huge decline in womens sexual appetites -- AI will effectively
| make women completely uninterested in men. it fits the irony
| test. everyone thought it would be sex bots for men and it
| ended up being romance companions for women.
| shepherdjerred wrote:
| How do you know this? Do you have any sources?
| jl6 wrote:
| You can learn this in any introductory class such as Incel
| 101.
| AI_beffr wrote:
| said the sad bald man
| everybodyknows wrote:
| When I type "romance" into "Explore GPTs" the hits are mostly
| advice for writers of genre fiction. Can you point to some
| examples?
| Capricorn2481 wrote:
| And your source is what?
| VagabundoP wrote:
| No offence, but your comment sounds AI generated.
| AI_beffr wrote:
| at this point that counts as a compliment. your comment
| sounds decidedly human.
| lukev wrote:
| Big if true.
|
| Do you have any kind of evidence that you can share for this
| assertion?
| mlsu wrote:
| If I had to guess:
|
| It's gendered: women are using LLM's for roleplaying/text
| chat, and men are using diffusion models for generating
| images.
| AI_beffr wrote:
| it just means more pornographic images for men. most men
| wouldnt seek out ai images because there is already an
| ocean of images and videos that are probably better suited
| to the... purpose. whereas women have never, ever had an
| option like this. literally feed instructions on what kind
| of romantic companion you want and then have realistic,
| engaging conversations with it for hours. and soon these
| conversations will be meaningful and consistent. the
| companionship, the attentiveness and tireless devotion that
| AIs will be able to offer will eclipse anything a human
| could ever offer to a woman and i think women will prefer
| them to men. massively. even without a physical body of any
| kind.
|
| i think they will have a deeper soul than humans. a new
| kind of wisdom that will attract people. but what do i
| know? im just a stupid incel after all.
| ehsankia wrote:
| I'm not sure how, but there's this girl on TikTok who has been
| using something very similar for a few months:
| https://www.tiktok.com/tag/dantheai
| yreg wrote:
| She explains in one of the videos[0] that it's just prompted
| ChatGPT.
|
| I have watched a few more and I think it's faked though.
|
| [0] https://www.tiktok.com/@stickbugss1/video/734956656884359
| 504...
| glinkot wrote:
| This 'documentary' sums it up perfectly!
|
| https://www.youtube.com/watch?v=IrrADTN-dvg
| hamilyon2 wrote:
| Image editing capabilities are... nice. Not there yet.
|
| Whatever I was doing with Chatgpt 4 became faster. Instant win.
|
| My test benchmark questions: still all negative, so reasoning on
| out-of distribution puzzles is still failing
| localfirst wrote:
| I just don't see how companies like Cohere can remain in this
| business
|
| at the same price I get access to faster ChatGPT-3.9
|
| there is little to no reasons to continue using Command R-plus
| at these prices unless they lower their price significantly
| surume wrote:
| Yeah but why does it have to have an entitled Californian accent
| that sounds extremely politically minded in one direction. Its
| voice gives me the shivers, and not in the good way.
| aero-glide2 wrote:
| Not very impressed. It's been 18 months since ChatGPT, i would
| have expected more progress. It looks like we have reached the
| limit of LLMs.
| michaelmior wrote:
| Obviously not a standalone device, but it sounds like what the
| Rabbit R-1 was supposed to be.
| sebringj wrote:
| What struck me was the interruptions to the AI speaking which
| seemed commonplace by the team members in the demo. We will
| quickly get used to doing this to AIs and we will probably be
| talking to AIs a lot throughout the day as time progresses I
| would imagine. We will be trained by AIs to be rude and impatient
| I think.
| yreg wrote:
| Where's the Mac app?
|
| They talk about it like it's available now (with Windows app
| coming soon), but I can't find it.
| testfrequency wrote:
| Bravo. I've been really impressed with how quickly OpenAI
| leveraged their stolen data to build such a human like model with
| near real time pivoting.
|
| I hope OpenAI continues to steal artists work, artists and
| creators keep getting their content sold and stolen beyond their
| will for no money, and OpenAI becomes the next trillion dollar
| company!
|
| Big congrats are in order for Sam, the genius behind all of this,
| the world would be nothing without you
| vvoyer wrote:
| The demo is very cool. A few critics:
|
| - the AI doesn't know when to stop talking, and the presenter had
| to cut every time (the usual "AI-splaining" I guess).
|
| - the AI voice and tone were a bit too much, sounded too fake
| rpmisms wrote:
| This is remarkably good. I think that in about 2 months, when the
| voice responses are tuned a little better, it will be absolutely
| insane. I just used up my entire quota chatting with an AI, and
| having a really nice conversation. It's a decent
| conversationalist, extremely knowledgeable, tells good jokes, and
| is generally very personable.
|
| I also tested some rubber duck techniques, and it gave me very
| useful advice while coding. I'm very impressed. With a lot of
| spit and polish, this will be the new standard for any voice
| assistant ever. Imagine these capabilities integrated with your
| phone's built-in functions.
| angryasian wrote:
| Why does this whole thread sound like OpenAI marketing department
| is participating ? Ive been talking to google assistant for
| years. I really don't find anything that magical or special.
| jononor wrote:
| I am glad to see focus on user interface and interaction
| improvements. Even if I am not a huge fan of voice interfaces, I
| think that being able to interact in real-time will make working
| _together_ with an AI be much more interesting and efficient. I
| actually hope they will take this back into the text based
| models. Current ChatGPT is sooo slow - both in starting to
| respond, typing things out, and also being overly verbose. I want
| to collaborate at the speed of thought.
| poniko wrote:
| Damm, that was a big leap.
| freediver wrote:
| Impressed by the model so far. As far as independent testing
| goes, it is topping our leaderboard for chess puzzle solving by a
| wide margin now:
|
| https://github.com/kagisearch/llm-chess-puzzles?tab=readme-o...
| parhamn wrote:
| Is the test set public?
| freediver wrote:
| Yes, in the repo.
| gengelbro wrote:
| Possible it's in the training set then?
| mewpmewp2 wrote:
| Good point, would be interesting to have one public
| dataset and one hidden as well, just to see how scores
| compare, to understand if any of it might actually have
| got to a dataset somewhere.
| freediver wrote:
| I'd be quite surprised if OpenAI took such a niche and
| small dataset into consideration. Then again...
| mewpmewp2 wrote:
| I would assume it goes over all the public github
| codebases, but no clue if there's some sort of filtering
| for filetypes, sizes or amount of stars on a repo etc.
| unbrice wrote:
| Authors note that this is probably the case:
|
| > we wanted to verify whether the model is actually
| capable of reasoning by building a simulation for a much
| simpler game - Connect 4 (see 'llmc4.py'). > When asked
| to play Connect 4, all LLMs fail to do so, even at most
| basic level. This should not be the case, as the rules of
| the game are simpler and widely available.
| bongodongobob wrote:
| Wouldn't there have to be historical matches to train on?
| Tons of chess games out there but doubt there are any
| connect 4 games. Is there even official notation for
| that?
|
| My assumption is that chatgpt can play chess because it
| has studied the games rather than just reading the rules.
| whimsicalism wrote:
| would love if you could do multiple samples or even just
| resampling and get a boostrapped CI estimate
| Powdering7082 wrote:
| Wow from adjusted ELO of 1144 to 1790, that's a huge leap. I
| wonder if they are giving it access to a 'scratch pad'
| mritchie712 wrote:
| woah, that's a huge leap, any idea why it's that large of a
| margin?
|
| using it in chat, it doesnt feel that different
| thrance wrote:
| Nice project! Are you aware of the following investigations:
| https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/
|
| Some have been able to achieve greater elo with a different
| prompt based on the pgn format.
|
| gpt-3.5-turbo-instruct was able to reach an elo of ~1750.
| mewpmewp2 wrote:
| I see you have Connect 4 test there.
|
| I tried playing against the model, it didn't do well in terms
| of blocking my win.
|
| However it feels like it might be possible to make it try to
| think ahead in terms of making sure that all the threats are
| blocked by prompting well.
|
| Maybe that could lead to somewhere, where it will explain its
| reasoning first?
|
| This prompt worked for me to get it to block after I put 3 in
| the 4th column. It otherwise didn't
|
| Let's play connect 4. Before your move, explain your strategy
| concisely. Explain what you must do to make sure that I don't
| win in the next step, as well as explain what your best
| strategy would be. Then finally output the column you wish to
| drop. There are 7 columns.
|
| Always respond with JSON of the following format:
|
| type Response ={ am_i_forced_to_block:
| boolean; other_considerations: string[];
| explanation_for_the_move: string; column_number:
| number;
|
| }
|
| I start with 4.
|
| Edit:
|
| So it went
|
| Me: 4
|
| It: 3
|
| Me: 4
|
| It: 3
|
| Me: 4
|
| It: 4 - Successful block
|
| Me: 5
|
| It: 3
|
| Me: 6 - Intentionally, to see if it will win by putting another
| 3.
|
| It: 2 -- So here it failed, I will try to tweak the prompt to
| add more instructions.
|
| me: 4
| freediver wrote:
| Care to add a PR?
| mewpmewp2 wrote:
| I just did it in the playground to test out actually, but
| it still seems to fail/lose state after some time. Right
| now where I got a win was after: [{
| "who": "you", "column": 4 }, { "who": "me",
| "column": 3 }, { "who": "you", "column": 4 },
| { "who": "me", "column": 2 }, { "who": "you",
| "column": 4 }, { "who": "me", "column": 4 },
| { "who": "you", "column": 5 }, { "who": "me",
| "column": 6 }, { "who": "you", "column": 5 },
| { "who": "me", "column": 1 }, { "who": "you",
| "column": 5 }, { "who": "me", "column": 5 },
| { "who": "you", "column": 3 }]
|
| Where "me" was AI and "you" was I.
|
| It did block twice though.
|
| My final prompt I tested with right now was:
|
| Let's play connect 4. Before your move, explain your
| strategy concisely. Explain what you must do to make sure
| that I don't win in the next step, as well as explain what
| your best strategy would be. Then finally output the column
| you wish to drop. There are 7 columns. Always respond with
| JSON of the following format:
|
| type Response ={ move_history: { who:
| string; column: number; }[]
| am_i_forced_to_block: boolean;
|
| do_i_have_winning_move: boolean;
| other_considerations: string[];
|
| explanation_for_the_move: string;
|
| column_number: number; }
|
| I start with 4.
|
| ONLY OUTPUT JSON
| elicksaur wrote:
| > and Kagi is well positioned to serve this need.
|
| >CEO & founder of Kagi
|
| Important context for anyone like me who was wondering where
| the boldness of the first statement was coming from.
|
| Edit: looks like the parent has been edited to remove the claim
| I was responding to.
| freediver wrote:
| Yeah, it was an observation that was better suited for a
| tweet than HN. Here it is:
|
| https://twitter.com/vladquant/status/1790130917849137612
| elicksaur wrote:
| Thanks for the transparency!
| spaceman_2020 wrote:
| oh man, listening to the demos and the way the female AI voice
| laughed and giggled...there is going to be millions of lonely men
| who will fall in love with these.
|
| Can't say whether that's good or bad.
| s1k3s wrote:
| This is some I, Robot level stuff. That being said, I still fail
| to see the real world application of this thing, at least at a
| scalable affordable cost.
| pcj-github wrote:
| The thing that creeps me out is that when we hook this up as the
| new Siri or whatever, the new LLM training data will no longer be
| WWW-text+images+youtube etc but rather billions of private human
| conversations and direct smartphone camera observations of the
| world.
|
| There is no way that kind of training data will be accessible to
| anyone outside a handful of companies.
| BonoboIO wrote:
| I opened ChatGPT and I already have access to the model.
|
| GPT4 was a little lazy and very slow the last few days and this
| 4o model blows it out of the water regarding speed and following
| my instructions to give me the full code not a snippet that
| changed.
|
| I think it's a nice upgrade.
| vijaykodam wrote:
| New GPT-4o is not yet available when I tried to access ChatGPT
| from Finland. Are they rolling it out to Europe later?
| laplacesdemon48 wrote:
| I recently subscribed to Perplexity Pro and prior to this
| release, was already strongly considering discontinuing ChatGPT
| Premium.
|
| When I first subscribed to ChatGPT Premium late last year, the
| natural language understanding superiority was amazing. Now the
| benchmark advances, low latency voice chat, Sora, etc. are all
| really cool too.
|
| But my work and day-to-day usage really rely on accurately
| sourced/cited information. I need a way to comb through an
| ungodly amount of medical/scientific literature to form/refine
| hypotheses. I want to figure out how to hard reset my car's
| navigation system without clicking through several SEO-optimized
| pages littered with ads. I need to quickly confirm scientific
| facts, some obscure, with citations and without hallucinations.
| From speaking with my friends in other industries (e.g. finance,
| law, construction engineering), this is their major use case too.
|
| I really tried to use ChatGPT Premium's Bing powered search. I
| also tried several of the top rated GPTs - Scholar AI, Consensus,
| etc.. It was barely workable. It seems like with this update, the
| focus was elsewhere. Unless I specify explicitly in the prompt,
| it doesn't search the web and provide citations. Yeah, the
| benchmark performance and parameter counts keep impressively
| increasing, but how do I trust that those improvements are
| preventing hallucinations when nothing is cited?
|
| I wonder if the business relationship between Microsoft and
| OpenAI is limiting their ability to really compete in AI driven
| search. Guessing Microsoft doesn't want to disrupt their multi-
| billion dollar search business. Maybe the same reason search
| within Gemini feels very lacking (I tried Gemini Advanced/Ultra
| too).
|
| I have zero brand loyalty. If anybody has a better suggestion, I
| will switch immediately after testing.
| robwwilliams wrote:
| In the same situation as you. Genomics data mining with
| validated LMM responses would be a godsend. Even more so when
| combined with rapid conversational interactions.
|
| We are not far from the models asking themselves questions.
| Recurrence will be ignition = first draft AGI. Strap in
| everybody.
| serf wrote:
| I wish they would match the TTS/real-time chat capabilities of
| the mobile client to the web client.
|
| it's stupid having to pull a phone out in order to use the
| voice/chat-partner modes.
|
| (yes I know there are browser plugins and equivalent to
| facilitate things like this but they suck, 1) the workflows are
| non-standard, 2) they don't really recreate the chat interface
| well)
| erickhill wrote:
| I think it's safe to say Siri and Alexa are officially dead. They
| look like dusty storefront mannequins next to Battlestar
| replicants at this point.
| jimkleiber wrote:
| Or Apple is rarely if ever the first mover on a new tech and
| just waits to refine the user experience for people?
|
| Maybe Apple is not that close and Siri will be really far
| behind for a while. I just wouldn't count them out yet.
| partiallypro wrote:
| From the time Apple bought Siri, it hasn't even delivered on
| the promises of the company it bought as of yet. It's been
| such a lackluster product. I wouldn't count them out, but it
| doesn't even feel like they are in.
| CooCooCaCha wrote:
| Apple really dropped the ball when it comes to Siri. For
| years I watched WWDC thinking "surely they'll update siri
| this year" and they still haven't given it a significant
| update.
|
| If you'd have told me 10 years ago that Apple would wait
| this long to update siri I would have been like no way,
| that's crazy.
| ryankrage77 wrote:
| This can't set alarms, timers, play music, etc. The only
| current overlapping use case I see is checking the weather
| (assuming GPT-4o can search online), and Siri is already fine
| for that.
|
| Amazing tech, but still lacking in the integrations I'd want to
| use voice for.
| nojvek wrote:
| Very easy to plug in that capability with tool use. Gpt3+
| already support using tools/json schema output.
| wmurmann wrote:
| If apple made Siri impressive then less people would need apps.
| Less apps = less revenue.
| pcunite wrote:
| Commenting for reach.
| Cheer2171 wrote:
| Delete this comment.
| foobar_______ wrote:
| So much negativity. Is it perfect? No. Is there room for
| improvement? Definitely. I don't know how you can get so fucking
| jaded that a demo like this doesn't at least make you a little
| bit excited or happy or feel awestruck at what humans have been
| able to accomplish?
| readingnews wrote:
| I am still baffled at how I can not use a VOIP number to
| register, even if it accepts TXT/SMS. If I have a snappy new
| startup and we go all in VOIP, I guess we can not use (or pay to
| use) OpenAI?
| lxgr wrote:
| That's what we get when an entire industry uses phone numbers
| as a "proof of humanity"...
| TaupeRanger wrote:
| I don't get it...I just switched to the new model on my iPhone
| app and it still takes several seconds to respond with pretty
| bland inflection. Is there some setting I'm missing?
| monocularvision wrote:
| Wondering the same. Can't seem to find the way to interact with
| this in the same way as the video demo.
| yakz wrote:
| They haven't actually released it, or any schedule for
| releasing it beyond an "alpha" release "in the coming weeks".
| This event was probably just slapped together to get
| something splashy out ahead of Google.
| Hackbraten wrote:
| According to the article, they've rolled out text and image
| modes of GPT-4o today but will make the audio mode available
| at a later date.
| MyFirstSass wrote:
| With the speed the seemingly exponential developments of this
| field i wouldn't be surprised if suddenly the entire world tilted
| and a pair of googles fell from my face. But a dream.
| pharos92 wrote:
| I really hope this shit burns soon.
| karmasimida wrote:
| I think this GPT-4o does have an advantage in hindsight, it will
| push this product to consumer much faster, and build a revenue
| base, while other companies playing catch up.
| tvoybot wrote:
| With our platform you can ALREADY use it to automate your
| business and sales!
|
| Create your gpt4o chatbot with our platform
| tvoybot.com?p=ycombinator
| hintymad wrote:
| Maybe this is yet another wake-up call to startups: wrapping up
| another company's APIs to offer convenience or incremental
| improvement is not a via business model. If your wrapper turns
| out to be successful, the company that provides the API will just
| incorporate your business as a set of new features with better
| usability, faster response time, and lower price.
| AndreMitri wrote:
| The ammount of "startups" creating wrappers around it and calling
| it a product is going to be a nightmare. But other than that,
| it's an amazing announcement and I look foward to using it!
| slater wrote:
| You say that like that's not already happened. Every week
| there's a new flavor of "we're delighted to introduce [totally
| not a thin wrapper around GPT] for [vaguely useful thing]"
| posts on HN
| robryan wrote:
| Yeah I watched some yc application videos so now YouTube
| recommends me heaps of them. Most of them being thin gpt
| wrappers.
| robryan wrote:
| I was just hearing about startups doing speech to text/ text to
| speech to feed into llms. Might be a bad time for them.
| wingworks wrote:
| Is this a downloadable app? I don't see it on the iOS app store.
| screye wrote:
| The demo was whelming, but the tech is incredible.
|
| It took me a few hours of digesting twitter experiments before
| appreciating how impressive this is. Kudos to the openai team.
|
| A question that won't get answered : "To what degree do the new
| NVIDIA gpus help with the realtime latency?"
| benromarowski wrote:
| is the voice Kristen Wig?
| gardenhedge wrote:
| Noticeably saying "person" versus man or woman. To the trainers -
| man and woman is not offensive!
| woah wrote:
| This is pretty amazing but it was funny still hearing the OpenGPT
| "voice" of somewhat fake sounding enthusiasm and restating what
| was said by the human with exaggeration
| ksaj wrote:
| A test I've been using for each new version still fails.
|
| Given the lyrics for Three Blind Mice, I try to get ChatGPT to
| create an image of three blind mice, one of which has had its
| tail cut off.
|
| It's pretty much impossible for it to get this image straight.
| Even this new 4o version.
|
| Its ability to spell in images has greatly improved, though.
| nico1207 wrote:
| GPT-4o with image output is not yet available. So what did you
| even test? Dall-E 3?
| ksaj wrote:
| It's making images for me when I ask it to.
|
| I'm using the web interface, if that helps. It doesn't have
| all the 4o options yet, but it does do pictures. I think they
| are the same as with 4.5.
|
| I just noticed after further testing the text it shows in
| images is not anywhere near as accurate as shown in the
| article's demo, so maybe it's a hybrid they're using for now.
| avi_vallarapu wrote:
| Someone said GPT-4o can replace a Tutor or a Teacher in Schools.
| Well, that's way too far.
| glonq wrote:
| Tell me that you've enjoyed good teachers and good schools
| without telling me that you had good teachers in good schools
| ;)
| LarsDu88 wrote:
| Good lord, that voice makes Elevenlabs.io look... dead
| DonHopkins wrote:
| ChatGPT 4o reminds me of upgrading from a 300 baud modem to a
| 1200 baud modem, when modems used to cost a dollar a baud.
| simonw wrote:
| I added gpt-4o support to my LLM CLI tool: pipx
| install llm llm keys set openai # Paste API key
| here llm -m 4o "Fascinate me"
|
| Or if you already have LLM installed: llm
| install --upgrade llm
|
| You can install an older version from Homebrew and then upgrade
| it like that too: brew install llm llm
| install --upgrade llm
|
| Release notes for the new version here:
| https://llm.datasette.io/en/stable/changelog.html#v0-14
| drewbitt wrote:
| Whenever I upgrade llm with brew, I usually lose all my
| external plugins. Should I move it to pipx?
| DanielKehoe wrote:
| Yes, it's a good idea to install Python tools or standalone
| applications with Pipx for isolation, persistence, and
| simplicity. See "Install Pipx"
| (https://mac.install.guide/python/pipx).
| khimaros wrote:
| does this handle chat templates?
| gsuuon wrote:
| Are these multimodals able to discern the input voice tone?
| Really curious if they're able to detect sarcasm or emotional
| content (or even something like mispronunciation?)
| bigyikes wrote:
| Yes, they can, and they should get better at this over time.
|
| There is a demo video where the presenter breathes heavily and
| asks the AI is able to notice it as such when prompted.
|
| It can't just detect tone, it seems to also be able to use tone
| itself.
| rareitem wrote:
| Can't wait to get interviewed by this model!
| yeknoda wrote:
| feature request: please let me change the voice. it is slightly
| annoying right now. way too bubbly, and half the spoken
| information is redundant or not useful. too much small talk and
| pleasantries or repetition. I'm looking for an efficient, clever,
| servant not a "friend" who speaks to me like I'm a toddler. felt
| like I was talking to a stereotypical American with a
| Frappuccino: "HIIIII!!! EVERYTHING'S AMAZING! YOU'RE BEAUTIFUL!
| NO YOU ARE!"
|
| maybe some knobs for the flavor of the bot:
|
| - small talk: gossip girl <---> stoic Aurelius
|
| - information efficiency or how much do you expect me to already
| know, an assumption on the user: midwit <--> genius
|
| - tone spectrum: excited Scarlett, or whatever it is now <--->
| Feynman the butler
| _xerces_ wrote:
| You can already change the voice in ChatGPT (in the paid tier
| at least) to one of 5 or 6 different 'people' so I imagine you
| can change it in the new version too.
| thinking_wizard wrote:
| it's crazy that Google has the Youtube dataset and still lost on
| multimodal AI
| richardw wrote:
| Apple and Google, you need to get your personal agent game going
| because right now you're losing the market. This is FREE.
|
| Tweakable emotion and voice, watching the scene, cracking jokes.
| It's not perfect but the amount and types of data this will
| collect will be massive. I can see it opening up access to many
| more users and use cases.
|
| Very close to:
|
| - A constant friend
|
| - A shrink
|
| - A teacher
|
| - A coach who can watch you exercise and offer feedback
|
| ...all infinitely patient, positive, helpful. For kids that get
| bullied, or whose parents can't afford therapy or a coach,
| there's the potential for a base level of support that will only
| get better over time.
| imiric wrote:
| > It's not perfect but the amount and types of data this will
| collect will be massive.
|
| This is particularly concerning. Sharing deeply personal
| thoughts with the corporations running these models will be
| normalized, just as sharing email data, photos, documents,
| etc., is today. Some of these companies profit directly from
| personal data, and when it comes to adtech, we can be sure that
| they will exploit this in the most nefarious ways imaginable. I
| have no doubt that models run by adtech companies will
| eventually casually slip ads into conversations, based on the
| exact situation and feelings of the person. Even non-adtech
| companies won't be able to resist cashing in the bottomless
| gold mine of data they'll be collecting.
|
| I can picture marketers just salivating at the prospect of
| getting access to this data, and being able to microtarget on
| an individual basis at exactly the right moment, pretty much
| guaranteeing a sale. Considering AI agents will gain a personal
| trust and bond that humans have never experienced with machines
| before, we will be extra vulnerable to even the slightest
| mention of a product, in a similar way as we can be easily
| influenced by a close friend or partner. Except that that
| "friend" is controlled by a trillion dollar adtech corporation.
|
| I would advise anyone to not be enticed by the shiny new tech,
| and wait until this can be self-hosted and run entirely
| offline. It's imperative that personal data remains private,
| now more than ever before.
| tgtweak wrote:
| it really feels like the quality of gpt4's responses got
| progressively worse as the year went on... seems like it is
| giving political answers now vs actually giving an earnest
| response. It also feels like the responses are lazier than they
| used to be at the outset of gpt4's release.
|
| I am not saying this is what they're doing but it DOES feel like
| they are hindering previous model to make the new one stand out
| that much more. The multi-modal improvements here and release are
| certainly impressive but I can't help but feel like the
| subjective quality of gpt4 has dipped.
|
| Hopefully this signals that gpt5 is not far off and should stand
| out significantly from the crowd.
| XCSme wrote:
| I assume there's no reason to use GPT-4-turbo for API calls, as
| this one is supposedly better and 2x cheaper.
| jcmeyrignac wrote:
| Sorry to nitpick, but in the language tokenisation part, the
| french part is incorrect. The exclamation mark are surrounded by
| spaces in french. "c'est un plaisir de vous rencontrer!" should
| be "c'est un plaisir de vous rencontrer !"
| jessenaser wrote:
| The crazy part is GPT-4o is faster than GPT-3.5 Turbo now, so we
| can see a future where GPT-5 is the flagship and GPT-4o is the
| fast cheap alternative. If GPT-4o is this smart and expressive
| now with voice, imagine what GPT-5 level reasoning could do!
| system2 wrote:
| Realtime videos? Probably their internal tools. I am testing the
| gpt4o right now and the responses come in 6-10 seconds. Same
| experience as the gpt4 text. What's up with the realtime claims?!
| cal85 wrote:
| We've had voice input and voice output with computers for a long
| time, but it's never felt like spoken conversation. At best it's
| a series of separate voice notes. It feels more like texting than
| talking.
|
| These demos show people talking to artificial intelligence. This
| is new. Humans are more partial to talking than writing. When
| people talk to each other (in person or over low-latency audio)
| there's a rich metadata channel of tone and timing, subtext,
| inexplicit knowledge. These videos seem to show the AI using this
| kind of metadata, in both input and output, and the conversation
| even flows reasonably well at times. I think this changes things
| a lot.
| lobochrome wrote:
| I don't know. Have you even seen a gen z?
| cal85 wrote:
| I don't follow, what about them?
| ttyprintk wrote:
| Something like this:
|
| https://www.theonion.com/brain-dead-teen-only-capable-of-
| rol...
| perfmode wrote:
| Is that conversational UI live?
| cdeutsch wrote:
| Creepy AF
| titzer wrote:
| Can't wait for this AI voice assistant to tell me in a sultry
| voice how I should stay in an AirBnB about 12 times a day.
| jimkleiber wrote:
| I worry that this tech will amplify the cultural values we have
| of "good" and "bad" emotions way more than the default
| restrictions that social media platforms put on the emoji
| reactions (e.g., can't be angry on LinkedIn).
|
| I worry that the AI will not express anger, not express sadness,
| not express frustration, not express uncertainty, and many other
| emotions that the culture of the fine-tuners might believe are
| "bad" emotions and that we may express a more and more narrow
| range of emotions going forward.
|
| Almost like it might become an AI "yes man."
| Quarrelsome wrote:
| Imagine how warped your personality might become if you use
| this as an entire substitute for human interaction. Should
| people use this as bf/gf material we might just be further
| contributing to decreasing the fertility rate.
|
| However we might offset this by reducing the suicide rate
| somewhat too.
| jimkleiber wrote:
| I've worked in emotional communication and conflict
| resolution for over 10 years and I'm honestly just feeling a
| huge swirl of uncertainty on how this--LLMs in general, but
| especially the genAI voices, videos, and even robots--will
| impact how we communicate with each other and how we bond
| with each other. Does bonding with an AI help us bond more
| with other humans? Will it help us introspect more and dig
| deeper into our common humanity? Will we learn how to resolve
| conflict better? Will we learn more passive aggression?
| Become more or less suicidal? More or less loving?
|
| I just, yeah, feel a lot of fear of even thinking about it.
| launchoverittt wrote:
| Created my first HN account just to reply to this. I've had
| these same (very strong) concerns since ChatGPT launched,
| but haven't seen much discussion about it. Do you know of
| any articles/talks/etc. that get into this at all?
| IAmNotACellist wrote:
| Corporate safe AI will just be bland, verbose, milquetoast
| experiences like OpenAI's. Humans want human experiences and
| thus competition will have a big opportunity to provide it. We
| treat lack of drama like a bug, and get resentful when coddled
| and talked down to like we're toddlers.
| JSDevOps wrote:
| Google must be shitting it right now.
| joak wrote:
| Voice input makes sense, voicing is a lot faster than typing. But
| I prefer my output as text, reading is a lot faster than
| listening for text read out loud.
|
| I'm not sure that computers mimicking humans makes sense, you
| want your computer to be the best possible, best than humans when
| possible. Writing output is clearly superior, faking emotions
| does not add much in most contexts.
| kulor wrote:
| The biggest wow factor was the effect of reducing latency
| followed in a close second by the friendly human personality.
| There's an uncanny valley barrier but this feels like a short-
| term teething problem.
| sftombu wrote:
| GPT-4o's breakthrough memory -- https://nian.llmonpy.ai/
| AI_beffr wrote:
| i absolutely hate this. we are going to destroy society with this
| technology. we cant continue to enjoy the benefits of human
| society if humans are replaced by machines. i hate seeing these
| disgusting people smugly parade this technology. it makes me so
| angry that they are destroying human society and all i can do is
| sit here and watch.
| simianparrot wrote:
| I know exactly what you mean. I just hope people get bored of
| this waste of time and energy --- both personal and actual
| energy --- before it goes too far.
| jonplackett wrote:
| This video is brilliantly accidentally hilarious. They made an AI
| girlfriend that hangs on your every word and thinks everything
| you say is genius and hilarious.
| pamelafox wrote:
| I just tested out using GPT-4o instead of gpt-4-turbo for a RAG
| solution that can reason on images. It works, with some changes
| to our token-counting logic to account for new model/encoding
| (update to latest tiktoken!).
|
| I ran some speed tests for a particular question/seed. Here are
| the times to first token:
|
| gpt-4-turbo:
|
| * avg 3.69
|
| * min 2.96
|
| * max 4.91
|
| gpt-4o:
|
| * avg 2.80
|
| * min 2.28
|
| * max 3.39
|
| That's for the messages in this gist:
| https://gist.githubusercontent.com/pamelafox/dc14b2188aaa38a...
|
| Quality seems good as well. It'll be great to have better multi-
| modal RAG!
| teleforce wrote:
| Nobody in the comments seems to notice or care about GPT-4o new
| additional capability for performing searches based on RAG. As
| far as I am concerned this is the most important feature that
| people has been waiting for ChatGPT-4 especially if you are doing
| research. By just testing on one particular topic that I'm
| familiar with, using GPT-4 previously and GPT-4o the quality of
| the resulting responses for the latter is very promising indeed.
| oersted wrote:
| Can you be more specific? I can't find this in the
| announcement. How does this work? What example did you try?
|
| EDIT: web search does seem extremely fast.
| teleforce wrote:
| I just asked ChatGPT-4o what's new compared to GPT-4, and it
| mentioned search as one of the latest features based on RAG.
|
| Then I asked it to explain RPW wireless system, and the
| answers are much better than with ChatGPT-4.
| nilsherzig wrote:
| Imagine having to interact with this thing in an environment
| where it is in the power position.
|
| Being in a prison with this voice as your guard seems like a
| horrible way to lose your sanity. This aggressive friendlyness
| combined with no real emotions seems like a very easy way to
| break people.
|
| There are these stories about nazis working at concentration
| camps, having to drink an insane amount of alcohol to keep
| themselves going (not trying to excuse their actions). This thing
| would just do it, while being friendly at the same time. This
| amount of hopeless someone would experience if they happen to be
| in custody of a system like this is truly horrific.
| Capricorn2481 wrote:
| I'm surprised they're limiting this api. Haven't they not even
| opened the image api in gpt4 turbo?
| zedin27 wrote:
| I am not fluent in Arabic at all, and being able to use this as a
| tool to have a conversation will make it more dependent. We are
| approaching a new era where we will not be "independently"
| learning a language but ignore the fact of learning it
| beforehand. Double-edged sword cases
| xyc wrote:
| Seems that no client-side changes needed for gpt-4o chat
| completion
|
| Added a custom OpenAI endpoint to https://recurse.chat (i built
| it) and it just works:
| https://twitter.com/recursechat/status/1790074433610137995
| swyx wrote:
| but does it do the full multimodal in-out capability shown in
| the app :)
| xyc wrote:
| will see :) heard video capability is rolling out later
| xyc wrote:
| api access is text/vision for now
| https://x.com/mpopv/status/1790073021765505244
| awfulneutral wrote:
| In the customer support example, he tells it his new phone
| doesn't work, and then it just starts making stuff up like how
| the phone was delivered 2 days ago, and there's physically
| nothing wrong with it, which it doesn't actually know. It's a
| very impressive tech demo, but it is a bit like they are
| pretending we have AGI when we really don't yet.
|
| (Also, they managed to make it sound exactly like an insincere,
| rambling morning talk show host - I assume this is a solvable
| problem though.)
| jschwartz11 wrote:
| It's possible to imagine using ChatGPT's memory, or even just
| giving the context in an initial brain dump that would allow
| for this type of call. So don't feel like it's too far off.
| awfulneutral wrote:
| That's true, but if it isn't able to be honest when it
| doesn't know something, or to ask for clarification, then I
| don't see how it's workable.
| Alifatisk wrote:
| I thought they would release a competitor to perplexity? Was this
| it?
| sarreph wrote:
| The level that the hosts interrupted the voice assistant today
| worries me that we're about to instil that as normal behaviour
| for future generations.
___________________________________________________________________
(page generated 2024-05-13 23:00 UTC)