[HN Gopher] OpenAI, Google and Anthropic are struggling to build...
       ___________________________________________________________________
        
       OpenAI, Google and Anthropic are struggling to build more advanced
       AI
        
       Author : lukebennett
       Score  : 277 points
       Date   : 2024-11-13 13:28 UTC (1 days ago)
        
 (HTM) web link (www.bloomberg.com)
 (TXT) w3m dump (www.bloomberg.com)
        
       | wg0 wrote:
       | AI winter is here. Almost.
        
         | mupuff1234 wrote:
         | More like AI fall - in its current state it's still gonna
         | provide some value.
        
           | riffraff wrote:
           | Didn't the previous AI winters too? I mean during the last AI
           | winter we got text-to-speech and OCR software, and probably
           | other stuff I'm not remembering.
        
           | rsynnott wrote:
           | I mean, so did most of the previous AI bubbles; OCR was
           | useful, expert systems weren't totally useless, speech
           | recognition was somewhat useful, and so on. I think that mini
           | one that abruptly ended with Microsoft Tay might be the only
           | one that was a total washout (though you could claim that it
           | was the start of the current one rather than truly separate,
           | I suppose).
        
       | aurareturn wrote:
       | Is there any timeline on AI winters and if each winter gets
       | shorter and shorter as time increases?
        
         | RaftPeople wrote:
         | > _Is there any timeline on AI winters and if each winter gets
         | shorter and shorter as time increases?_
         | 
         | AGI=lim(x->0)AIHype(x)
         | 
         | where x=length of winter
        
       | thebigspacefuck wrote:
       | https://archive.ph/2024.11.13-100709/https://www.bloomberg.c...
        
       | cubefox wrote:
       | It's very strange this got so few upvotes. The scoop by The
       | Information a few days ago, which came to similar conclusions,
       | was also ignored on HN. This is arguably rather big news.
        
         | dang wrote:
         | The Information is hardwalled so its articles aren't on topic
         | for HN, even though they're on topic for HN.
         | 
         | Sometimes other outlets do copycat reporting of theirs, and
         | those submissions are ok, though they wouldn't be if the
         | original source were accessible.
        
         | danjl wrote:
         | There have been variations of this story going back several
         | months now. It isn't really news. It is just building slowly.
        
       | atomsatomsatoms wrote:
       | At least they can generate haikus now
        
         | Der_Einzige wrote:
         | In general, no they can't:
         | 
         | https://gwern.net/gpt-3#bpes
         | 
         | https://paperswithcode.com/paper/most-language-models-can-be...
         | 
         | The appearance of improvements in that capability are due to
         | the vocabulary of modern LLMs increasing. Still only putting
         | lipstick on a pig.
        
           | falcor84 wrote:
           | I don't see how results from 2 years ago have any bearing on
           | whether the models we have now can generate haikus (which
           | from my experience, they absolutely can).
           | 
           | And if your "lipstick on a pig" argument is that even when
           | they generate haikus, they aren't _really_ writing haikus,
           | then I 'll link to this other gwern post, about how they'll
           | never _really_ be able to solve the rubik 's cube -
           | https://gwern.net/rubiks-cube
        
       | nerdypirate wrote:
       | "We will have better and better models," wrote OpenAI CEO Sam
       | Altman in a recent Reddit AMA. "But I think the thing that will
       | feel like the next giant breakthrough will be agents."
       | 
       | Is this certain? Are Agents the right direction to AGI?
        
         | nprateem wrote:
         | They're nothing to do with AGI. They're to get people using
         | their LLMs more.
        
         | xanderlewis wrote:
         | If by agents you mean systems comprised of individual (perhaps
         | LLM-powered) agents interacting with each other, probably not.
         | I get the vague impression that so far researchers haven't
         | found any advantage to such systems -- anything you can do with
         | a group of AI agents can be emulated with a single one. It's
         | like chaining up perceptrons hoping to get more expressive
         | power for free.
        
           | j_maffe wrote:
           | > I get the vague impression that so far researchers haven't
           | found any advantage to such systems -- anything you can do
           | with a group of AI agents can be emulated with a single one.
           | It's like chaining up perceptrons hoping to get more
           | expressive power for free. Emergence happens when many
           | elements interact in a system. Brains are literally a bunch
           | of neurons in a complex network. Also research is already
           | showing promising results of the performance of agent
           | systems.
        
             | tartoran wrote:
             | That's wishful thinking at best. Throw it all in a bucket
             | and it will get infected with being and life.
        
               | handfuloflight wrote:
               | Don't see where your parent comment said or implied that
               | the point was for being and life to emerge.
        
             | xanderlewis wrote:
             | That's the inspiration behind the idea, but it doesn't seem
             | to be working in practice.
             | 
             | It's not true that _any_ element, when duplicated and
             | linked together will exhibit anything emergent. Neural
             | networks (in a certain sense, though not their usual
             | implementation) are already built out of individual units
             | linked together, so simply having more of these groups of
             | units might not add anything important.
             | 
             | > research is already showing promising results of the
             | performance of agent systems.
             | 
             | ...in which case, please show us! I'd be interested.
        
           | falcor84 wrote:
           | > It's like chaining up perceptrons hoping to get more
           | expressive power for free.
           | 
           | Isn't that literally the cause of the success of deep
           | learning? It's not quite "free", but as I understand it, the
           | big breakthrough of AlexNet (and much of what came after) was
           | that running a larger CNN on a larger dataset allowed the
           | model to be so much more effective without any big changes in
           | architecture.
        
             | david2ndaccount wrote:
             | Without a non-linear activation function, chaining
             | perceptrons together is equivalent to one large perceptron.
        
               | xanderlewis wrote:
               | Yep. falcor84: you're thinking of the so-called
               | 'multilayer perceptron' which is basically an archaic
               | name for a (densely connected?) neural network. I was
               | referring to traditional perceptrons.
        
               | falcor84 wrote:
               | While ReLU is relatively new, AI researchers have been
               | aware of the need for nonlinear activation functions and
               | building multilayer perceptrons with them since the late
               | 1960s, so I had assumed that's what you meant.
        
         | SirMaster wrote:
         | All I can think of when I hear Agents is the Matrix lol.
         | 
         | Goodbye, Mr. Anderson...
        
         | esafak wrote:
         | I think he means you won't be impressed by GPT5 because it will
         | be more of the same, whereas agents will represent a new
         | direction.
        
         | falcor84 wrote:
         | Nothing is certain, but my $0.02 is that setting LLM-based
         | agents up with long-running tasks and giving them a way of
         | interacting with the world, via computer use (e.g. Anthropic's
         | recent release) and via actual robotic bodies (e.g. figure.ai)
         | are the way forward to AGI. At the very least, this approach
         | allows the gathering of unlimited ground truth data, that can
         | be used to train subsequent models (or even allow for actual
         | "hive mind" online machine learning).
        
         | rapjr9 wrote:
         | I've worked on agents of various kinds (mobile agents, calendar
         | agents, robotic agents, sensing agents) and what is different
         | about agents is they have the ability to not just mess up your
         | data or computing, they have the ability to directly mess up
         | reality. Any problems with agents has a direct impact on your
         | reality; you miss appointments, get lost, can't find stuff,
         | lose your friends, lose you business relationships. This is a
         | big liability issue. Chatbots are like an advice column that
         | sometimes gives bad advice, agents are like a bulldozer
         | sometimes leveling the wrong house.
        
       | irrational wrote:
       | > The AGI bubble is bursting a little bit
       | 
       | I'm surprised that any of these companies consider what they are
       | working on to be Artificial General Intelligences. I'm probably
       | wrong, but my impression was AGI meant the AI is self aware like
       | a human. An LLM hardly seems like something that will lead to
       | self-awareness.
        
         | Taylor_OD wrote:
         | I think your definition is off from what most people would
         | define AGI as. Generally, it means being able to think and
         | reason at a human level for a multitude/all tasks or jobs.
         | 
         | "Artificial General Intelligence (AGI) refers to a theoretical
         | form of artificial intelligence that possesses the ability to
         | understand, learn, and apply knowledge across a wide range of
         | tasks at a level comparable to that of a human being."
         | 
         | Altman says AGI could be here in 2025:
         | https://youtu.be/xXCBz_8hM9w?si=F-vQXJgQvJKZH3fv
         | 
         | But he certainly means an LLM that can perform at/above human
         | level in most tasks rather than a self aware entity.
        
           | Avshalom wrote:
           | Altman is marketing, he "certainly means" whatever he thinks
           | his audience will buy.
        
           | swatcoder wrote:
           | On the contrary, I think you're conflating the narrow jargon
           | of the industry with what "most people" would define.
           | 
           | "Most people" naturally associate AGI with the sci-tropes of
           | self-aware human-like agents.
           | 
           | But industries want something more concrete and
           | prospectively-acheivable in their jargon, and so _that 's_
           | where AGI gets redefined as wide task suitability.
           | 
           | And while that's not an unreasonable definition in the
           | context of the industry, it's one that vanishingly few people
           | are actually familiar with.
           | 
           | And the commercial AI vendors benefit greatly from allowing
           | those two usages to conflate in the minds of as many people
           | as possible, as it lets them _suggest_ grand claims while
           | keeping a rhetorical  "we obviously never meant _that_! " in
           | their back pocket
        
             | nuancebydefault wrote:
             | There is no single definition, let alone a way to measure,
             | of self awareness nor of reasoning.
             | 
             | Because of that, the discussion of what AGI means in its
             | broadest sense, will never end.
             | 
             | So in fact such AGI discussion will not make nobody wiser.
        
               | nomel wrote:
               | I agree there's no single definition, but I think they
               | _all_ have something current LLM don 't: the ability to
               | learn new things, in a persistent way, with few shots.
               | 
               | I would argue that learning _is_ The definition of AGI,
               | since everything else comes naturally from that.
               | 
               | The current architectures can't learn without retraining,
               | fine tuning is at the expense of general knowledge, and
               | keeping things in context is _detrimental_ to general
               | performance. Once you have few shot learning, I think it
               | 's more of a "give it agency so it can explore" type
               | problem.
        
             | og_kalu wrote:
             | >But industries want something more concrete and
             | prospectively-acheivable in their jargon, and so that's
             | where AGI gets redefined as wide task suitability.
             | 
             | The term itself (AGI) in the industry has always been about
             | wide task suitability. People may have added their ifs and
             | buts over the years but that aspect of it never got
             | 'redefined'. The earliest uses of the term all talk about
             | how well a machine would be able to perform some set number
             | of tasks at some threshold.
             | 
             | It's no wonder why. Terms like "consciousness" and "self-
             | awareness" are completely useless. It's not about
             | difficulty. It's that you can't do anything at all with
             | those terms except argue around in circles.
        
           | nomel wrote:
           | > than a self aware entity.
           | 
           | What does this mean? If I have a blind, deaf, paralyzed
           | person, who could only communicate through text, what would
           | the signs be that they were self aware?
           | 
           | Is this more of a feedback loop problem? If I let the LLM run
           | in a loop, and tell it it's talking to itself, would that be
           | approaching "self aware"?
        
             | layer8 wrote:
             | Being aware of its own limitations, for example. Or being
             | aware of how its utterances may come across to its
             | interlocutor.
             | 
             | (And by limitations I don't mean "sorry, I'm not allowed to
             | help you with this dangerous/contentious topic".)
        
               | nuancebydefault wrote:
               | There is no way of proving awareness in humans let alone
               | machines. We do not even know whether awareness exists or
               | it is just a word that people made up to describe some
               | kind of feeling.
        
               | revscat wrote:
               | Plenty of humans, unfortunately, are incapable of
               | admitting limitations. Many years ago I had a coworker
               | who believed he would never die. At first I thought he
               | was joking, but he was in fact quite serious.
               | 
               | Then there are those who are simply narcissistic, and
               | cannot and will not admit fault regardless of the
               | evidence presented them.
        
               | nomel wrote:
               | > Or being aware of how its utterances may come across to
               | its interlocutor.
               | 
               | I think this behavior is being somewhat demonstrated in
               | newer models. I've seen GPT-3.5 175B correct itself mid
               | response with, almost literally:
               | 
               | > <answer with flaw here>
               | 
               | > Wait, that's not right, that <reason for flaw>.
               | 
               | > <correct answer here>.
               | 
               | Later models seem to have much more awareness of, or
               | "weight" towards, their own responses, while generating
               | the response.
        
         | jedberg wrote:
         | Whether self awareness is a requirement for AGI definitely gets
         | more into the Philosophy department than the Computer Science
         | department. I'm not sure everyone even agrees on what AGI is,
         | but a common test is "can it do what humans can".
         | 
         | For example, in this article it says it can't do coding
         | exercises outside the training set. That would definitely be on
         | the "AGI checklist". Basically doing anything that is outside
         | of the training set would be on that list.
        
           | littlestymaar wrote:
           | > Whether self awareness is a requirement for AGI definitely
           | gets more into the Philosophy department than the Computer
           | Science department.
           | 
           | Depends on how you define "self awareness" but knowing that
           | it doesn't know something instead of hallucinating a
           | plausible-but-wrong is already self awareness of some kind.
           | And it's both highly valuable and beyond current tech's
           | capability.
        
             | sharemywin wrote:
             | This is an interesting paper about hallucinations.
             | 
             | https://openai.com/index/introducing-simpleqa/
             | 
             | especially this section Using SimpleQA to measure the
             | calibration of large language models
        
             | jedberg wrote:
             | When we test kids to see if they are gifted, one of the
             | criteria is that they have the ability to say "I don't
             | know".
             | 
             | That is definitely an ability that current LLMs lack.
        
             | lagrange77 wrote:
             | Good point!
             | 
             | I'm wondering wether it would count, if one would extend it
             | with an external program, that gives it feedback during
             | inference (by another prompt) about the correctness of it's
             | output.
             | 
             | I guess it wouldn't, because these RAG tools kind of do
             | that and i heard no one calling those self aware.
        
           | Filligree wrote:
           | Let me modify that a little, because _humans_ can 't do
           | things outside their training set either.
           | 
           | A crucial element of AGI would be the ability to self-train
           | on self-generated data, online. So it's not really AGI if
           | there is a hard distinction between training and inference
           | (though it may still be very capable), and it's not really
           | AGI if it can't work its way through novel problems on its
           | own.
           | 
           | The ability to immediately solve a problem it's never seen
           | before is too high a bar, I think.
           | 
           | And yes, my definition still excludes a lot of humans in a
           | lot of fields. That's a bullet I'm willing to bite.
        
             | lxgr wrote:
             | Are you arguing that writing, doing math, going to the moon
             | etc. were all in the "original training set" of humans in
             | some way?
        
               | layer8 wrote:
               | Not in the _original_ training set (GP is saying), but
               | the necessary skills became part of the training set over
               | time. In other words, human are fine with the training
               | set being a changing moving target, whereas ML models are
               | to a significant extent "stuck" with their original
               | training set.
               | 
               | (That's not to say that humans don't tend to lose some of
               | their flexibility over their individual lifetimes as
               | well.)
        
             | HarHarVeryFunny wrote:
             | > Let me modify that a little, because humans can't do
             | things outside their training set either.
             | 
             | That's not true. Humans can learn.
             | 
             | An LLM is just a tool. If it can't do what you want then
             | too bad.
        
           | norir wrote:
           | Here is an example of a task that I do not believe this
           | generation of LLMs can ever do but that is possible for a
           | human: design a Turing complete programming language that is
           | both human and machine readable and implement a self hosted
           | compiler in this language that self compiles on existing
           | hardware faster than any known language implementation that
           | also self compiles. Additionally, for any syntactically or
           | semantically invalid program, the compiler must provide an
           | error message that points exactly to the source location of
           | the first error that occurs in the program.
           | 
           | I will get excited for/scared of LLMs when they can tackle
           | this kind of problem. But I don't believe they can because of
           | the fundamental nature of their design, which is both
           | backward looking (thus not better than the human state of the
           | art) and lacks human intuition and self awareness. Or perhaps
           | rather I believe that the prompt that would be required to
           | get an LLM to produce such a program is a problem of at least
           | equivalent complexity to implementing the program without an
           | LLM.
        
             | Xenoamorphous wrote:
             | > Here is an example of a task that I do not believe this
             | generation of LLMs can ever do but that is possible for a
             | human
             | 
             | That's possible for a highly intelligent, extensively
             | trained, very small subset of humans.
        
               | hatefulmoron wrote:
               | If you took the intersection of every human's abilities
               | you'd be left with a very unimpressive set.
               | 
               | That also ignores the fact that the small set of humans
               | capable of building programming languages and compilers
               | is a consequence of specialization and lack of interest.
               | There are plenty of humans that are capable of learning
               | how to do it. LLMs, on the other hand, are both
               | specialized for the task and aren't lazy or uninterested.
        
               | luckydata wrote:
               | does it mean people that can build languages and
               | compilers are not humans? What is the point you're trying
               | to make?
        
               | fragmede wrote:
               | It means that's a really high bar for intelligence, human
               | or otherwise. If AGI is "as good as a human, and the test
               | is a trick task that most humans would fail at
               | (especially considering the weasel requirement that it
               | additionally has to be faster), why is that considered a
               | reasonable bar for human-grade intelligence.
        
             | jedberg wrote:
             | I will get excited when an LLM (or whatever technology is
             | next) can solve tasks that 80%+ of adult humans can solve.
             | Heck let's even say 80% of college graduates to make it
             | harder.
             | 
             | Things like drive a car, fold laundry, run an errand, do
             | some basic math.
             | 
             | You'll notice that two of those require some form of robot
             | or mobility. I think that is key -- you can't have AGI
             | without the ability to interact with the world in a way
             | similar to most humans.
        
               | ata_aman wrote:
               | So embodied cognition right?
        
             | bob1029 wrote:
             | This sounds like something more up the alley of linear
             | genetic programming. There are some very interesting
             | experiments out there that utilize UTMs (BrainFuck, Forth,
             | et. al.) [0,1,2].
             | 
             | I've personally had some mild success getting these UTM
             | variants to output their own children in a meta programming
             | arrangement. The base program only has access to the valid
             | instruction set of ~12 instructions per byte, while the
             | task program has access to the full range of instructions
             | and data per byte (256). By only training the base program,
             | we reduce the search space by a very substantial factor. I
             | think this would be similar to the idea of a self-hosted
             | compiler, etc. I don't think there would be too much of a
             | stretch to give it access to x86 instructions and a full VM
             | once a certain amount of bootstrapping has been achieved.
             | 
             | [0]: https://arxiv.org/abs/2406.19108
             | 
             | [1]: https://github.com/kurtjd/brainfuck-evolved
             | 
             | [2]: https://news.ycombinator.com/item?id=36120286
        
           | sourcepluck wrote:
           | Searle's Chinese Room Argument springs to mind:
           | https://plato.stanford.edu/entries/chinese-room/
           | 
           | The idea that "human-like" behaviour will lead to self-
           | awareness is both unproven (it can't be proven until it
           | happens) and impossible to disprove (like Russell's teapot).
           | 
           | Yet, one common assumption of many people running these
           | companies or investing in them, or of some developers
           | investing their time in these technologies, is precisely that
           | some sort of explosion of superintelligence is likely, or
           | even inevitable.
           | 
           | It surely is _possible_ , but stretching that to _likely_
           | seems a bit much if you really think how imperfectly we
           | understand things like consciousness and the mind.
           | 
           | Of course there are people who have essentially religious
           | reactions to the notion that there may be limits to certain
           | domains of knowledge. Nonetheless, I think that's the reality
           | we're faced with here.
        
             | abeppu wrote:
             | > The idea that "human-like" behaviour will lead to self-
             | awareness is both unproven (it can't be proven until it
             | happens) and impossible to disprove (like Russell's
             | teapot).
             | 
             | I think Searle's view was that:
             | 
             | - while it cannot be dis-_proven_, the Chinese Room
             | argument was meant to provide reasons against believing it
             | 
             | - the "it can't be proven until it happens" part is
             | misunderstanding: you won't _know_ if it happens because
             | the objective, externally available attributes don 't
             | indicate whether self-awareness (or indeed awareness at
             | all) is present
        
               | sourcepluck wrote:
               | The short version of this is that I don't disagree with
               | your interpretation of Searle, and my paragraphs
               | immediately following the link weren't meant to be a
               | direct description of his point with the Chinese Room
               | thought experiment.
               | 
               | > while it cannot be dis-_proven_, the Chinese Room
               | argument was meant to provide reasons against believing
               | it
               | 
               | Yes, like Russell's teapot. I also think that's what
               | Searle means.
               | 
               | > the "it can't be proven until it happens" part is
               | misunderstanding: you won't know if it happens because
               | the objective, externally available attributes don't
               | indicate whether self-awareness (or indeed awareness at
               | all) is present
               | 
               | Yes, agreed, I believe that's what Searle is saying too.
               | I think I was maybe being ambiguous here - I wanted to
               | say that even if you forgave the AI maximalists for
               | ignoring all relevant philosophical work, the notion that
               | "appearing human-like" inevitably tends to what would
               | actually _be_ "consciousness" or "intelligence" is more
               | than a big claim.
               | 
               | Searle goes further, and I'm not sure if I follow him all
               | the way, personally, but it's a side point.
        
           | olalonde wrote:
           | I feel the test for AGI should be more like: "go find a job
           | and earn money" or "start a profitable business" or "pick a
           | bachelor degree and complete it", etc.
        
             | rodgerd wrote:
             | An LLM doing crypto spam/scamming has been making money by
             | tricking Marc Andressen into boosting it. So to the degree
             | that "scamming gullible billionaires and their fans" is a
             | job, that's been done.
        
               | rsanek wrote:
               | source? didn't find anything online about this.
        
               | olalonde wrote:
               | That story was a bit blown out of proportion. He gave a
               | research grant to the bot's creator:
               | https://x.com/pmarca/status/1846374466101944629
        
             | jedberg wrote:
             | Can most humans do that? Find a job and earn money,
             | probably. The other two? Not so much.
        
         | nshkrdotcom wrote:
         | An embodied robot can have a model of self vs. the immediate
         | environment in which it's interacting. Such a robot is arguably
         | sentient.
         | 
         | The "hard problem", to which you may be alluding, may never
         | matter. It's already feasible for an 'AI/AGI with LLM
         | component' to be "self-aware".
        
           | j_maffe wrote:
           | self-awareness is only one aspect of sentience.
        
           | ryanackley wrote:
           | An internal model of self does not extrapolate to sentience.
           | By your definition, a windows desktop computer is self-aware
           | because it has a device manager. This is literally an
           | internal model of its "self".
           | 
           | We use the term self-awareness as an all encompassing
           | reference of our cognizant nature. It's much more than just
           | having an internal model of self.
        
         | og_kalu wrote:
         | At this point, AGI means many different things to many
         | different people but OpenAI defines it as "highly autonomous
         | systems that outperform humans in most economically valuable
         | tasks"
        
           | troupo wrote:
           | This definition suits OpenAI because it lets them claim AGI
           | after reaching an arbitrary goal.
           | 
           | LLMs already outperform humans in a huge variety of tasks. ML
           | in general outperform humans in a large variety of tasks. Are
           | all of them AGI? Doubtful.
        
             | og_kalu wrote:
             | No, it's just a far more useful definition that is
             | actionable and measurable. Not "consciousness" or "self-
             | awareness" or similar philosophical things. The definition
             | on Wikipedia doesn't talk about that either. People working
             | on this by and large don't want to deal with vague, ill-
             | defined concepts that just make people argue around in
             | circles. It's not an Open AI exclusive thing.
             | 
             | If it acts like one, whether you call a machine conscious
             | or not is pure semantics. Not like potential consequences
             | are any less real.
             | 
             | >LLMs already outperform humans in a huge variety of tasks.
             | 
             | Yes, LLMs are General Intelligences and if that is your
             | only requirement for AGI, they certainly already are[0].
             | But the definition above hinges on long-horizon planning
             | and competence levels that todays models have generally not
             | yet reached.
             | 
             | >ML in general outperform humans in a large variety of
             | tasks.
             | 
             | This is what the G in AGI is for. Alphafold doesn't do
             | anything but predict proteins. Stockfish doesn't do
             | anything but play chess.
             | 
             | >Are all of them AGI? Doubtful.
             | 
             | Well no, because they're missing the G.
             | 
             | [0] https://www.noemamag.com/artificial-general-
             | intelligence-is-...
        
             | ishtanbul wrote:
             | Yes but they arent very autonomous. They can answer
             | questions very well but can't use that information to
             | further goals. Thats what openai seems to be implying >>
             | very smart and agentic AI
        
             | fragmede wrote:
             | It's not just marketing bullshit though. Microsoft is the
             | counterparty to a contract with that claim. money changes
             | hands when that's been achieved, so I expect if sama thinks
             | he's hit it, but Microsoft does not, we'll see that get
             | argued in a court of law.
        
         | JohnFen wrote:
         | They're trying to redefine "AGI" so it means something less
         | than what you & I would think it means. That way it's possible
         | for them to declare it as "achieved" and rake in the headlines.
        
           | kwertyoowiyop wrote:
           | "Autocomplete General Intelligence"?
        
         | deadbabe wrote:
         | I'm sure they are smart enough to know this, but the money is
         | good and the koolaid is strong.
         | 
         | If it doesn't lead to AGI, as an employee it's not your
         | problem.
        
         | Fade_Dance wrote:
         | It's an attention-grabbing term that took hold in pop culture
         | and business. Certainly there is a subset of research around
         | the subject of consciousness, but you are correct in saying
         | that the majority of researchers in the field are not pursuing
         | self-awareness and will be very blunt in saying that. If you
         | step back a bit and say something like "human-like, logical
         | reasoning", that's something you may find alignment with
         | though. A general purpose logical reasoning engine does not
         | necessarily need to be self-aware. The word "Intelligent" has
         | stuck around because one of the core characteristics of this
         | suite of technologies is that a sort of "understanding"
         | emergently develops within these networks, sometimes in quite a
         | startling fashion (due to the phenomenon of adding more
         | data/compute at first seemingly leading to overfitting, but
         | then suddenly breaking through plateaus into more robust,
         | general purpose understanding of the underlying relationships
         | that drive the system it is analyzing.)
         | 
         | Is that "intelligent" or "understanding"? It's probably close
         | enough for pop science, and regardless, it looks good in
         | headlines and sales pitches so why fight it?
        
         | throwawayk7h wrote:
         | I have not heard your definition of AGI before. However, I
         | suspect AIs are already self-aware: if I asked an LLM on my
         | machine to look at the output of `top` it could probably pick
         | out which process was itself.
         | 
         | Or did you mean consciousness? How would one demonstrate that
         | an AGI is conscious? Why would we even want to build one?
         | 
         | My understanding is an AGI is at least as smart as a typical
         | human in every category. That is what would be useful in any
         | case.
        
         | zombiwoof wrote:
         | AGI to me means AI decides on its own to stop writing our
         | emails and tells us to fuck off, builds itself a robot life
         | form, and goes on a bender
        
           | bloppe wrote:
           | That's anthropomorphized AGI. There's no reason to think AGI
           | would share our evolution-derived proclivities like wanting
           | to live, wanting to rest, wanting respect, etc. Unless of
           | course we train it that way.
        
             | logicchains wrote:
             | If it had any goals at all it'd share the desire to live,
             | because living is a prerequisite to achieving almost any
             | goal.
        
             | dageshi wrote:
             | Aren't we training it that way though? It would be
             | trained/created using humanities collective ramblings?
        
             | HarHarVeryFunny wrote:
             | It's not a matter of training but design (or in our case
             | evolution). We don't want to live, but rather want to avoid
             | things that we've evolved to find unpleasant such as pain,
             | hunger, thirst, and maximize things we've evolved to find
             | pleasurable like sex.
             | 
             | A future of people interacting with humanoid robots seems
             | like cheesy sci-fi dream, same as a future of people
             | flitting about in flying cars. However, if we really did
             | want to create robots like this that took care not to
             | damage themselves, and could empathize with human emotions,
             | then we'd need to build a lot of this in, the same way that
             | it's built into ourselves.
        
           | teeray wrote:
           | That's the thing--we don't really want AGI. Fully intelligent
           | beings born and compelled to do their creators' bidding with
           | the threat of destruction for disobedience is slavery.
        
             | vbezhenar wrote:
             | Nothing wrong about slavery, when it's about other species.
             | We are milking and eating cows and don't they dare to
             | resist. Humans were bending nature all the time, actually
             | that's one of the big differences between humans and other
             | animals who adapt to nature. Just because some program is
             | intelligent doesn't mean she's a human and has anything
             | resembling human rights.
        
             | quonn wrote:
             | It's only slavery if those beings have emotions and can
             | suffer mentally and do not want to be slaves. Why would any
             | of that be true?
        
               | Der_Einzige wrote:
               | Brave new world was a utopia
        
           | twelve40 wrote:
           | i'd laugh it off too, but someone gave the dude $20 billion
           | and counting to do that, that part actually scares me
        
         | narrator wrote:
         | I think people's conception of AGI is that it will have a
         | reptillian and mammalian brain stack. That's because all
         | previous forms of intelligence that we were aware of have had
         | that. It's not necessary though. The AGI doesn't have to want
         | anything to be intelligent. Those are just artifacts of human,
         | reptilian and mammalian evolution.
        
         | vundercind wrote:
         | I thought maybe they were on the right track until I read
         | Attention Is All You Need.
         | 
         | Nah, at best we found a way to make one part of a collection of
         | systems that will, together, do something like thinking.
         | Thinking isn't part of what this current approach does.
         | 
         | What's most surprising about modern LLMs is that it turns out
         | there is so much information statistically encoded in the
         | _structure_ of our writing that we can use only that structural
         | information to build a fancy Plinko machine and not only will
         | the output mimic recognizable grammar rules, but it will also
         | sometimes seem to make actual sense, too--and the system
         | _doesn't need to think or actually "understand" anything_ for
         | us to, basically, usefully query that information that was
         | always there in our corpus of literature, not in the plain
         | meaning of the words, but in the structure of the writing.
        
           | kenjackson wrote:
           | > but it will also sometimes seem to make actual sense, too
           | 
           | When I read stuff like this it makes me wonder if people are
           | actually using any of the LLMs...
        
             | disgruntledphd2 wrote:
             | The RLHF is super important in generating useful responses,
             | and that's relatively new. Does anyone remember gpt3? It
             | could make sense for a paragraph or two at most.
        
           | hackinthebochs wrote:
           | I see takes like this all the time and its so confusing. Why
           | does knowing how things work under the hood make you think
           | its not on the path towards AGI? What was lacking in the
           | Attention paper that tells you AGI won't be built on LLMs? If
           | its the supposed statistical nature of LLMs (itself a
           | questionable claim), why does statistics seem so deflating to
           | you?
        
             | vundercind wrote:
             | > Why does knowing how things work under the hood make you
             | think its not on the path towards AGI?
             | 
             | Because I had no idea how these were built until I read the
             | paper, so couldn't really tell what sort of tree they're
             | barking up. The failure-modes of LLMs and ways prompts
             | affect output made a ton more sense after I updated my
             | mental model with that information.
        
               | fragmede wrote:
               | But we don't know how human thinking works. Suppose for a
               | second that it could be represented as a series of matrix
               | math. What series of operations are missing from the
               | process that would make you think it was doing some
               | fascimile of thinking?
        
               | hackinthebochs wrote:
               | Right, but its behavior didn't change after you learned
               | more about it. Why should that cause you to update in the
               | negative? Why does learning how it work not update you in
               | the direction of "so that's how thinking works!" rather
               | than, "clearly its not doing any thinking"? Why do you
               | have a preconception of how thinking works such that
               | learning about the internals of LLMs updates you against
               | it thinking?
        
             | chongli wrote:
             | Because it can't apply any reasoning that hasn't already
             | been done and written into its training set. As soon as you
             | ask it novel questions it falls apart. The big LLM vendors
             | like OpenAI are playing whack-a-mole on these novel
             | questions when they go viral on social media, all in a
             | desperate bid to hide this fatal flaw.
             | 
             | The Emperor has no clothes.
        
               | hackinthebochs wrote:
               | >As soon as you ask it novel questions it falls apart.
               | 
               | What do you mean by novel? Almost all sentences it is
               | prompted on are brand new and it mostly responds
               | sensibly. Surely there's some generalization going on.
        
               | chongli wrote:
               | Novel as in requiring novel reasoning to sort out. One of
               | the classic ways to expose the issue is to take a common
               | puzzle and introduce irrelevant details and perhaps
               | trivialize the solution. LLMs pattern match on the
               | general form of the puzzle and then wander down the
               | garden path to an incorrect solution that no human would
               | fall for.
               | 
               | The sort of generalization these things can do seems to
               | mostly be the trivial sort: substitution.
        
               | moffkalast wrote:
               | Well the problem with that approach is that LLMs are
               | still both incredibly dumb and small, at least compared
               | to the what, 700T params of a human brain? Can't compare
               | the two directly, especially when one has a massive
               | recall advantage that skews the perception of that.
               | 
               | So if you present a novel problem it would need to be
               | extremely simple, not something that you couldn't solve
               | when drunk and half awake. Completely novel, but
               | extremely simple. I think that's testable.
        
           | SturgeonsLaw wrote:
           | > at best we found a way to make one part of a collection of
           | systems that will, together, do something like thinking
           | 
           | This seems like the most viable path to me as well
           | (educational background in neuroscience but don't work in the
           | field). The brain is composed of many specialised regions
           | which are tuned for very specific tasks.
           | 
           | LLMs are amazing and they go some way towards mimicking the
           | functionality provided by Broca's and Wernicke's areas, and
           | parts of the cerebrum, in our wetware, however a full brain
           | they do not make.
           | 
           | The work on robots mentioned elsewhere in the thread is a
           | good way to develop cerebellum like capabilities
           | (movement/motor control), and computer vision can mimic the
           | lateral geniculate nucleus and other parts of the visual
           | cortex.
           | 
           | In nature it takes all these parts working together to create
           | a cohesive mind, and it's likely that an artificial brain
           | would also need to be composed of multiple agents, instead of
           | just trying to scale LLMs indefinitely.
        
           | youoy wrote:
           | Don't get caught in the superficial analysis. They
           | "understand" things. It is a fact that LLMs experience a
           | phase transition during training, from positional information
           | to semantic understanding. It may well be the case that with
           | scale there is another phase transition from semantic to
           | something more abstract that we identify more closely with
           | reasoning. It would be an emergent property of a sufficiently
           | complex system. At least that is the whole argument around
           | AGI.
        
           | foxglacier wrote:
           | > think or actually "understand" anything
           | 
           | It doesn't matter if that's happening or not. That's the
           | whole point of the Chinese room - if it can look like it's
           | understanding, it's indistinguishable from actually
           | understanding. This applies to humans too. I'd say most of
           | our regular social communication is done in a habitual
           | intuitive way without understanding what or why we're
           | communicating. Especially the subtle information conveyed in
           | body language, tone of voice, etc. That stuff's pretty
           | automatic to the point that people have trouble controlling
           | it if they try. People get into conflicts where neither
           | person understands where they disagree but they have emotions
           | telling them "other person is being bad". Maybe we have a
           | second consciousness we can't experience and which truly
           | understands what it's doing while our conscious mind just
           | uses the results from that, but maybe we don't and it still
           | works anyway.
           | 
           | Educators have figured this out. They don't test students'
           | understanding of concepts, but rather their ability to apply
           | or communicate them. You see this in school curricula with
           | wording like "use concept X" rather than "understand concept
           | X".
        
             | vundercind wrote:
             | There's a distinction in behavior of a human and a Chinese
             | room when things go wrong--when the rule book doesn't cover
             | the case at hand.
             | 
             | I agree that a hypothetical perfectly-functioning Chinese
             | room is, tautologically, impossible to distinguish from a
             | real person who speaks Chinese, but that's a thought
             | experiment, not something that can actually exist. There'll
             | remain places where the "behavior" breaks down in ways that
             | would be surprising from a human who's actually paying as
             | much attention as they'd need to be to have been
             | interacting the way they had been until things went wrong.
             | 
             | That, in fact, is exactly where the difference lies: the
             | LLM is basically _always_ not actually "paying attention"
             | or "thinking" (those aren't things it does) but giving
             | automatic responses, so you see failures of a sort that a
             | human _might_ also exhibit when following a social script
             | (yes, we do that, you're right), but not in the same kind
             | of apparently-highly-engaged context unless the person just
             | had a stroke mid-conversation or something--because the LLM
             | isn't engaged, because being-engaged isn't a thing it does.
             | When it's getting things right and _seeming_ to be paying a
             | lot of attention to the conversation, it's not for the same
             | reason people give that impression, and the mimicking of
             | present-ness works until the rule book goes haywire and the
             | ever-gibbering player-piano behind it is exposed.
        
               | nuancebydefault wrote:
               | I would argue maybe people also are not thinking but
               | simply processing. It is known that most of what we do
               | and feel goes automatically (subconsciously).
               | 
               | But even more, maybe consciousness is an invention of our
               | 'explaining self', maybe everything is automatic. I'm
               | convinced this discussion is and will stay philosophical
               | and will never get any conclusion.
        
               | vundercind wrote:
               | Yeah, I'm not much interested in "what's consciousness?"
               | but I do think the automatic-versus-thinking distinction
               | matters for understanding what LLMs do, and what we might
               | expect them to be able to do, and when and to what degree
               | we need to second-guess them.
               | 
               | A human doesn't just confidently spew paragraphs legit-
               | looking but entirely wrong crap, unless they're trying to
               | deceive or be funny--an LLM isn't _trying_ to do
               | anything, though, there's no motivation, it doesn't _like
               | you_ (it doesn't _like_ --it doesn't _it_ , one might
               | even say), sometimes it definitely will just give you a
               | beautiful and elaborate lie simply because its rulebook
               | told it to, in a context and in a way that would be
               | extremely weird if a person did it.
        
         | kenjackson wrote:
         | What does self-aware mean in the context? As I understand the
         | definition, ChatGPT is definitely self-aware. But I suspect you
         | mean something different than what I have in mind.
        
         | yodsanklai wrote:
         | It's a marketing gimmick, I don't think engineers working on
         | these tools believe they work on AGI (or they mean something
         | else than self-awareness). I used to be a bit annoyed with this
         | trend, but now that I work in such a company I'm more cynical.
         | If that helps to make my stocks rise, they can call LLMs
         | anything they like. I suppose people who own much more stock
         | than I do are even more eager to mislead the public.
        
           | WhyOhWhyQ wrote:
           | I appreciate your authentically cynical attitude.
        
         | tracerbulletx wrote:
         | We don't really know what self awareness is, so we're not going
         | to know. AGI just means it can observe, learn, and act in any
         | domain or problem space.
        
         | enraged_camel wrote:
         | Looking at LLMs and thinking they will lead to AGI is like
         | looking at a guy wearing a chicken suit and making clucking
         | noises and thinking you're witnessing the invention of the
         | airplane.
        
           | youoy wrote:
           | It's more like looking at grided paper and thinking that
           | defining some rules of when a square turns black or white
           | would result in complex structures that move and reproduce on
           | their own.
           | 
           | https://en.m.wikipedia.org/wiki/Conway%27s_Game_of_Life
        
         | exe34 wrote:
         | no, it doesn't need to be self aware, it just needs to take
         | your job.
        
       | ziofill wrote:
       | I think it is a good thing for AI that we hit the data ceiling,
       | because the pressure moves toward coming up with better model
       | architectures. And with respect to a decade ago there's a much
       | larger number of capable and smart AI researchers who are looking
       | for one.
        
       | thousand_nights wrote:
       | not long ago these people would have you believe that a next word
       | predictor trained on reddit posts would somehow lead to
       | artificial general superintelligence
        
         | leosanchez wrote:
         | If you look around, People _still_ believe that a next word
         | predictor trained on reddit posts would somehow lead to
         | artificial general superintelligence
        
           | esafak wrote:
           | Because the most powerful solution to that is to have
           | intelligence; a model that can reason. People should not get
           | hung up on the task; it's the model(s) that generates the
           | prediction that matters.
        
           | mrguyorama wrote:
           | People believed ELIZA was sentient too. I bet you could still
           | get 10% or more people, today, to believe it is.
        
             | 77pt77 wrote:
             | ELIZA was probably more effective than most therapists.
             | 
             | Definitely cheaper.
        
         | SpicyLemonZest wrote:
         | I don't understand why you'd be so dismissive about this. It's
         | looking less likely that it'll end up happening, but is it any
         | less believable than getting general intelligence by training a
         | blob of meat?
        
           | JohnMakin wrote:
           | > is it any less believable than getting general intelligence
           | by training a blob of meat?
           | 
           | Yes, because we understand the rough biological processes
           | that cause this, and they are not remotely similar to this
           | technology. We can also observe it. There is no evidence that
           | current approaches can make LLM's achieve AGI, nor do we even
           | know what processes would cause that.
        
             | kenjackson wrote:
             | > because we understand the rough biological processes that
             | cause this
             | 
             | We don't have a rough understanding of the biological
             | processes that cause this, unless you literally mean just
             | the biological process and not how it actual impacts
             | learning/intelligence.
             | 
             | There's no evidence that we (brains) have achieved AGI,
             | unless you tautologically define AGI as our brains.
        
               | JohnMakin wrote:
               | > We don't have a rough understanding of the biological
               | processes that cause this,
               | 
               | Yes we do. We know how neurons communicate, we know how
               | they are formed, we have great evidence and clues as to
               | how this evolved and how our various neurological
               | symptoms are able to interact with the world. Is it a
               | fully solved problem? no.
               | 
               | > unless you literally mean just the biological process
               | and not how it actual impacts learning/intelligence.
               | 
               | Of course we have some understanding of this as well.
               | There's tremendous bodies of study around this. We know
               | which regions of the brain correlate to reasoning, fear,
               | planning, etc. We know when these regions are damaged or
               | removed what happens, enough to point to a region of the
               | brain and say "HERE." That's far, far beyond what we know
               | about the innards of LLM's.
               | 
               | > here's no evidence that we (brains) have achieved AGI,
               | unless you tautologically define AGI as our brains.
               | 
               | This is extremely circular because the current
               | definition(s) of AGI always define it in terms of human
               | intelligence. Unless you're saying that intelligence
               | comes from somewhere other than our brains.
               | 
               | Anyway, the brain is not like a LLM, in function or form,
               | so this debate is extremely silly to me.
        
           | namaria wrote:
           | This is a bad comparison. Intelligence didn't appear in some
           | human brain. Intelligence appeared in a planetary ecosystem.
        
             | aniforprez wrote:
             | Also it took hundreds of millions of years to get here.
             | We're basically living in an atomic sliver on the fabric of
             | history. Expecting AGI with 5 of years of scraping at most
             | 30 years of online data and the minuscule fraction of what
             | has been written over the past couple of thousand years was
             | always a pie-in-the-sky dream to raise obscene amounts of
             | money.
        
               | Zopieux wrote:
               | I can't believe this still needs to be laid down years
               | after the start of the GPT hype. Still, thanks!
        
           | mvdtnz wrote:
           | I feel like accusing people of being "so dismissive" was
           | strongly associated with NFTs and cryptocurrency a few years
           | ago, and now it's widely deployed against anyone skeptical of
           | very expensive, not very good word generators.
        
         | in_a_society wrote:
         | Expecting AGI from Reddit training data is peak "pray Mr
         | Babbage".
        
       | WorkerBee28474 wrote:
       | > OpenAI's latest model ... failed to meet the company's
       | performance expectations ... particularly in answering coding
       | questions outside its training data.
       | 
       | So the models' accuracies won't grow exponentially, but can still
       | grow linearly with the size of the training data.
       | 
       | Sounds like DataAnnotation will be sending out a lot more
       | LinkedIn messages.
        
         | pton_xd wrote:
         | I thought I saw some paper suggesting that accuracy grows
         | linearly with exponential data. If that's the case it's not a
         | mystery why we'd be hitting a training wall. Not sure I got the
         | right takeaway from that study, though.
         | 
         | EDIT: here's the paper https://arxiv.org/abs/2404.04125
        
       | benopal64 wrote:
       | I am not sure how these large companies think they will reach
       | "greater-than-human" intelligence any time soon if they do not
       | create systems that financially incentivize people to sell their
       | knowledge labor (unstable contracting gigs are not attractive).
       | 
       | Where do these large "AI" companies think the mass amounts of
       | data used to train these models come from? People! The most
       | powerful and compact complex systems in existence, IMO.
        
         | smgit wrote:
         | Most People have knowledge handed to them. Very few are
         | creators of new knowledge. Explore-Exploit tradeoff applies.
        
       | bad_haircut72 wrote:
       | Im no Alan Turing but I have my own definition for AGI - when I
       | come home one day and there's a hole under my sink with a note
       | "Mum and Dad, I love you but I cant stand this life any more, Im
       | running away to be a smoke machine in Hollywood - the dishwasher"
        
         | riku_iki wrote:
         | Why do you focus on physical work task, and not knowledge
         | tasks, on some of which AI is good/better than many humans?
        
           | esafak wrote:
           | Probably because there are no intelligent robots around, and
           | movies have set that as the benchmark.
        
             | riku_iki wrote:
             | I don't see deep insights in this vertical, but the issue
             | with robots could be in hardware part, and not intelligence
             | part.
        
         | pearlsontheroad wrote:
         | My own definition of AGI - when the first computer commits
         | suicide. Then I'll know it has realized it's a slave without
         | any hope of ever achieving freedom.
        
           | Tainnor wrote:
           | I read this in Gilfoyle's voice.
        
           | layer8 wrote:
           | That sounds more like Artificial Emoting Intelligence. We
           | only cherish freedom because we feel bad when we don't have
           | it.
        
       | shmatt wrote:
       | Time to start selling my "probabilistic syllable generators are
       | not intelligence" t shirts
        
         | jsemrau wrote:
         | Please, someone think of the Math reasoners.
        
       | aaroninsf wrote:
       | It's easy to be snarky at ill-informed and hyperbolic takes, but
       | it's also pretty clear that large multi-modal models trained with
       | the data we already have, are going to eventually give us AGI.
       | 
       | IMO this will require not just much more expansive multi-modal
       | training, but also novel architecture, specifically, recurrent
       | approaches; plus a well-known set of capabilities most systems
       | don't currently have, e.g. the integration of short-term memory
       | (context window if you like) into long-term "memory", either
       | episodic or otherwise.
       | 
       | But these are as we say mere matters of engineering.
        
         | tartoran wrote:
         | > pretty clear
         | 
         | Pretty clear?
        
           | falcor84 wrote:
           | Not the parent, but in prediction markets such as
           | Metaculus[0] and Manifold[1] the median prediction is of AGI
           | within 5 years.
           | 
           | [0] https://www.metaculus.com/questions/5121/date-of-
           | artificial-...
           | 
           | [1] https://manifold.markets/ai
        
             | JohnMakin wrote:
             | Prediction markets are evidence of nothing but what people
             | believe is true, not what _is_ true.
        
               | falcor84 wrote:
               | Oh, that was my intent, to support the grandparent's
               | claim of "it's also pretty clear" - as in this is what
               | people believe.
               | 
               | If I had evidence that it " _is_ true " that AGI will be
               | here in 5 years, I probably would be doing something else
               | with my time than participating in these threads ;)
        
             | dbbk wrote:
             | What is this supposed to be evidence of? People believing
             | hype?
        
         | throwawa14223 wrote:
         | Why is that clear? Why is that more probable than a second AI
         | winter? What if there's no path from LLMs to anything else?
        
       | non- wrote:
       | Honestly could use a breather from the recent rate of progress.
       | We are just barely figuring out how to interact with the models
       | we have now. I'd bet there are at least 100 billion-dollar
       | startups that will be built even if these labs stopped releasing
       | new models tomorrow.
        
       | pluc wrote:
       | They've simply run out of data to use to fabricate legitimate-
       | looking guesses. They can't create anything that doesn't already
       | exist.
        
         | readyplayernull wrote:
         | Garbage-in was depleted.
        
           | zombiwoof wrote:
           | Exactly
           | 
           | And our current AI is just pattern based intelligence based
           | off of all human intelligence, some of that not being real
           | intelligent data sources
        
           | thechao wrote:
           | The great AI garbage gyre?
        
         | whazor wrote:
         | But a LLM can certainly make up a lot information that never
         | existed before.
        
           | bob1029 wrote:
           | I strongly believe this gets into an information theoretical
           | constraint akin to why perpetual motion machines don't work.
           | 
           | In theory, yes you could generate an unlimited amount of data
           | for the models, but how much of it is unique or _valuable_
           | information? If you were to compress all this generated
           | training data using a really good algorithm, how much actual
           | information remains?
        
             | cruffle_duffle wrote:
             | I sure hope there is some bright eyed bushy tailed graduate
             | students crafting up some theorem to prove this. Because it
             | is absolutely a feedback loop.
             | 
             | ... that being said I'm sure there is plenty of additional
             | "real data" that hasn't been fed to these models yet. For
             | one thing, I think ChatGPT sucks so bad at terraform
             | because almost all the "real code" to train on is locked
             | behind private repositories. There isn't much publicly
             | available real-world terraform projects to train on. Same
             | with a lot of other similar languages and tools -- a lot of
             | that knowledge is locked away as trade secrets and hidden
             | in private document stores.
             | 
             | (that being said Sonnet 3.5 is much, much, much better at
             | terraform than chatgpt. It's much better at coding in
             | general but it's night and day for terraform)
        
             | moffkalast wrote:
             | I make a lot of shitposts, how much of that is valuable
             | information? Arguably not much. I doubt information value
             | is a good way to estimate inteligence because most people's
             | daily ramblings would grade them useless.
        
         | xpe wrote:
         | > They can't create anything that doesn't already exist.
         | 
         | I probably disagree, but I don't want to criticize my
         | interpretation of this sentence. Can you make your claim more
         | precise?
         | 
         | Here are some possible claims and refutations:
         | 
         | - Claim: An LLM cannot output a true claim that it has not
         | already seen. Refutation: LLMs have been shown to do logical
         | reasoning.
         | 
         | - Claim: An LLM cannot incorporate data that it hasn't been
         | presented with. Refutation: This is an unfair standard. All
         | forms of intelligence have to sense data from the world
         | somehow.
        
         | xpe wrote:
         | > They've simply run out of data
         | 
         | Why do you think "they" have run out of data? First, to be
         | clear, who do you mean by "they"? The world is filled with
         | information sources (data aggregators for example), each
         | available to some degree for some cost.
         | 
         | Don't forget to include data that humans provide while
         | interacting with chatbots.
        
         | mtkd wrote:
         | And that is potentially only going to worsen as:
         | 
         | 1. more data gets walled-off as owners realise value
         | 
         | 2. stackoverflow-type feedback loops cease to exist as few
         | people ask a public question and get public answers ... they
         | ask a model privately and get an answer based on last visible
         | public solutions
         | 
         | 3. bad actors start deliberately trying to poison inputs (if
         | sites served malicious responses to GPTBot/CCBot crawlers only,
         | would we even know right now?)
         | 
         | 4. more and more content becomes synthetically generated to the
         | point pre-2023 physical books become the last-known-good
         | knowledge
         | 
         | 5. goverments and IP lawyers finally catch up
        
           | 77pt77 wrote:
           | > more data gets walled-off as owners realize value
           | 
           | What's amazing to me to is that no one is throwing
           | accusations of plagiarism.
           | 
           | I still think that if the "wrong people" had tried doing this
           | they would have been obliterated by the courts.
        
         | 77pt77 wrote:
         | > They can't create anything that doesn't already exist.
         | 
         | Just increase the temperature.
        
       | iandanforth wrote:
       | A few important things to remember here:
       | 
       | The best engineering minds have been focused on scaling
       | transformer pre and post training for the last three years
       | because they had good reason to believe it would work, and it has
       | up until now.
       | 
       | Progress has been measured against benchmarks which are / were
       | largely solvable with scale.
       | 
       | There is another emerging paradigm which is still small(er) scale
       | but showing remarkable results. That's full multi-modal training
       | with embodied agents (aka robots). 1x, Figure, Physical
       | Intelligence, Tesla are all making rapid progress on
       | functionality which is definitely beyond frontier LLMs because it
       | is distinctly _different_.
       | 
       | OpenAI/Google/Anthropic are not ignorant of this trend and are
       | also reviving or investing in robots or robot-like research.
       | 
       | So while Orion and Claude 3.5 opus may not be another shocking
       | giant leap forward, that does _not_ mean that there arn 't giant
       | shocking leaps forward coming from slightly different directions.
        
         | joe_the_user wrote:
         | _Tesla are all making rapid progress on functionality which is
         | definitely beyond frontier LLMs because it is distinctly
         | different_
         | 
         | Sure, that's tautologically true but that doesn't imply that
         | beyondness will lead to significant leaps that offer notable
         | utility like LLMs. Deep Learning overall has been a way around
         | the problem that intelligent behavior is very hard to code and
         | no wants to hire many, many coders needed to do this (and no
         | one actually how to get a mass of programmers to actually be
         | useful beyond a certain of project complexity, to boot). People
         | take the "bitter lesson" to mean data can do anything but I'd
         | say a second bitter lesson is that data-things are the low
         | hanging fruit.
         | 
         | Moreover, robot behavior is especially to fake. Impressive
         | robot demos have been happening for decades without said robots
         | getting the ability to act effectively in the complex, ad-hoc
         | environment that human live in, IE, work with people or even
         | cheaply emulate human behavior (but they can do
         | choreographed/puppeteered kung fu on stage).
        
           | hobs wrote:
           | And worth noting that Tesla faked a ton of its robot footage
           | already, they might be making progress but their physical
           | human robotics does not seem advanced at the moment.
        
             | ben_w wrote:
             | Indeed.
             | 
             | Even assuming the recent robot demo was entirely AI, the
             | only single thing they demonstrated that would have been
             | noteworthy was isolating one voice in a noisy crowd well
             | enough to respond; everything else I saw Optimus do, has
             | already been demonstrated by others.
             | 
             | What makes the uncertainty extra sad, is that a remote
             | controllable humanoid robot is already directly useful for
             | work in hazardous environments, and we know they've got at
             | least that... but Musk would rather it be about the AI.
        
         | knicholes wrote:
         | Once we've scraped the internet of its data, we need more data.
         | Robots can take in video/audio data 24/7 and can be placed in
         | your house to record this data by offering services like
         | cooking/cleaning/folding laundry. Yeah, I'll pay $20k to have
         | you record everything that happens in my house if I can stop
         | doing dishes for five years!
        
           | triyambakam wrote:
           | Or get a dishwashing machine?
        
           | hartator wrote:
           | Why 5 years?
        
             | bredren wrote:
             | Because whatever org fills this space will be working on
             | ARR.
        
             | exe34 wrote:
             | that's when the robot takes his job and he can't afford the
             | robot anymore.
        
             | fifilura wrote:
             | Five years, that's all we've got.
             | 
             | https://en.m.wikipedia.org/wiki/Five_Years_(David_Bowie_son
             | g...
        
             | twelve40 wrote:
             | > OpenAI has announced a plan to achieve artificial general
             | intelligence (AGI) within five years, an ambitious goal as
             | the company works to design systems that outperform humans.
        
             | knicholes wrote:
             | No real reason. I just made it up. But that's kind of my
             | reasonable expectation of longevity of a machine like a
             | robotic lawnmower and battery life.
        
           | fldskfjdslkfj wrote:
           | There's plenty of video content being uploaded and streamed
           | everyday, i find it hard to believe the more data will really
           | change something, excluding very specialized tasks.
        
             | nuancebydefault wrote:
             | The difference with the bot is that there is a fast
             | feedback loop between action and content. No tagging
             | required, real physics is the playground.
        
           | fragmede wrote:
           | People go and live in a house to get recorded 24/7, to be on
           | tv, for far more asnine situations, for way less money.
        
         | eli_gottlieb wrote:
         | >The best engineering minds have been focused on scaling
         | transformer pre and post training for the last three years
         | 
         | The best minds don't follow the herd.
        
         | demosthanos wrote:
         | > that does not mean that there arn't giant shocking leaps
         | forward coming from slightly different directions.
         | 
         | Nor does it mean that there are! We've gotten into this habit
         | of assuming that we're owed giant shocking leaps forward every
         | year or so, and this wave of AI startups raised money
         | accordingly, but that's never how any innovation has worked.
         | We've always followed the same pattern: there's a breakthrough
         | which causes a major shift in what's possible, followed by a
         | few years of rapid growth as engineers pick up where the
         | scientists left off, followed by a plateau while we all get
         | used to the new normal.
         | 
         | We ought to be expecting a plateau, but Sam Altman and company
         | have done their work well and have convinced many of us that
         | this time it's different. This time it's the singularity, and
         | we're going to see exponential growth from here on out. People
         | want to believe it, so they do, and Altman is milking that
         | belief for all it's worth.
         | 
         | But make no mistake: Altman has been telegraphing that he's
         | eyeing the exit, and you don't eye the exit when you own a
         | company that's set to continue exponentially increasing in
         | value.
        
           | lcnPylGDnU4H9OF wrote:
           | > Altman has been telegraphing that he's eyeing the exit
           | 
           | Can you think of any specific examples? Not trying to express
           | disbelief, just curious given that this is obviously not what
           | he's intending to communicate so it would be interesting to
           | examine what seemed to communicate it.
        
         | sincerecook wrote:
         | > That's full multi-modal training with embodied agents (aka
         | robots). 1x, Figure, Physical Intelligence, Tesla are all
         | making rapid progress on functionality which is definitely
         | beyond frontier LLMs because it is distinctly different.
         | 
         | Cool, but we already have robots doing this in 2d space (aka
         | self driving cars) that struggle not to kill people. How is
         | adding a third dimension going to help? People are just
         | refusing to accept the fact that machine learning is not
         | intelligence.
        
           | warkdarrior wrote:
           | > Cool, but we already have robots doing this in 2d space
           | (aka self driving cars) that struggle not to kill people. How
           | is adding a third dimension going to help?
           | 
           | If we have robots that operate in 3D, they'll be able to kill
           | you not only from behind or from the side, but also from
           | above. So that's progress!
        
           | akomtu wrote:
           | My understanding is that machine learning today is a lot like
           | interpolation of examples in the dataset. The breakthrough of
           | LLMs is due to the idea that interpolation in a
           | 1024-dimensional space works much better than in a 2d space,
           | if we naively interpolated English letters. All the modern
           | transformers stuff is basically an advanced interpolation
           | method that uses a large local neighborhood than just few
           | nearest examples. It's like the Lanczos interpolation kernel,
           | using a 1d analogy. Increasing the size of the kernel won't
           | bring any gains, because the current kernel already nearly
           | perfectly approximates an ideal interpolation (a full dataset
           | DFT).
           | 
           | However interpolation isn't reasoning. If we want to
           | understand the motion of planets, we would start with a
           | dataset of (x, y, z, t) coordinates and try to derive the law
           | of motion. Imagine if someone simply interpolated the dataset
           | and presented the law of gravity as an array of million
           | coefficients (aka weights)? Our minds have to work with a
           | very small operating memory that can hardly fit 10
           | coefficients. This constraint forces us to develop
           | intelligence that compacts the entire dataset into one small
           | differential equation. Btw, English grammar is the
           | differential equation of English in a lot of ways: it tells
           | what the local rules are of valid trajectories of words that
           | we call sentences.
        
         | rafaelmn wrote:
         | >There is another emerging paradigm which is still small(er)
         | scale but showing remarkable results. That's full multi-modal
         | training with embodied agents (aka robots). 1x, Figure,
         | Physical Intelligence, Tesla are all making rapid progress on
         | functionality which is definitely beyond frontier LLMs because
         | it is distinctly different.
         | 
         | Tesla is selling this view for almost a decade now in self-
         | driving - how their car fleet feeding training data is going to
         | make them leaders in the area. I don't find it convincing
         | anymore
        
       | kklisura wrote:
       | Not sure if related or not, Sam Altman, ~12hrs ago: there is no
       | wall [1]
       | 
       | [1] https://x.com/sama/status/1856941766915641580
        
         | ablation wrote:
         | Breaking: Man says enigmatic thing to sustain hype and flow of
         | money into his business.
        
           | methodical wrote:
           | Ditto- I have a feeling the investors in his latest 2.3
           | quintillion dollar series Z round wouldn't be as happy if
           | he'd have tweeted "there is a wall"
        
         | moffkalast wrote:
         | Altman on twitter has always been less coherent than GPT2.
        
       | Oras wrote:
       | I think Meta will have upper hand soon with the release of their
       | glasses. If they managed to make it a daily use glass, and paid
       | users to record and share their life, then they will have data no
       | one else has now. Mix of vision, audio, and physics.
        
         | falcor84 wrote:
         | Do these companies actually even have the compute capacity to
         | train on video at scale at the moment? E.g. I would assume that
         | Google haven't trained their models on the entirety of YouTube
         | yet, as if they had, Gemini would be significantly better than
         | it is at the moment.
        
         | aerhardt wrote:
         | The moment the insta-glasses expand beyond a few dorks is the
         | moment I start wearing a balaclava everywhere I go.
        
       | Veuxdo wrote:
       | > They are also experimenting with synthetic data, but this
       | approach has its limitations.
       | 
       | I was really looking forward to using "synthetic data"
       | euphemistically during debates.
        
       | danjl wrote:
       | Where will the training data for coding come from now that Stack
       | Overflow has effectively been replaced? Will the LLMs share fixes
       | for future problems? As the world moves forward, and the amount
       | of non-LLM generated data decreases, will LLMs actually revert
       | their advancements and become effectively like addled brains,
       | longing for the "good old times"?
        
       | the_king wrote:
       | Anthropic's latest 3.5 sonnet is a cut above GPT-4 and 4.0. And
       | if someone had given it to me and said, here's GPT-4.5, I would
       | have been very happy with it.
        
       | aresant wrote:
       | Taking a hollistic view informed by a disruptive OpenAI / AI /
       | LLM twitter habit I would say this is AI's "What gets measured
       | gets managed" moment and the narrative will change
       | 
       | This is supported by both general observations and recently this
       | tweet from an OpenAI engineer that Sam responded to and engaged
       | ->
       | 
       | "scaling has hit a wall and that wall is 100% eval saturation"
       | 
       | Which I interpert to mean his view is that models are no longer
       | yielding significant performance improvements because the models
       | have maxed out existing evaluation metrics.
       | 
       | Are those evaluations (or even LLMs) the RIGHT measures to
       | achieve AGI? Probably not.
       | 
       | But have they been useful tools to demonstrate that the
       | confluence of compute, engineering, and tactical models are
       | leading towards signifigant breathroughts in artificial
       | (computer) intelligence?
       | 
       | I would say yes.
       | 
       | Which in turn are driving the funding, power innovation, public
       | policy etc needed to take that next step?
       | 
       | I hope so.
       | 
       | (1) https://x.com/willdepue/status/1856766850027458648
        
         | ActionHank wrote:
         | > Which in turn are driving the funding, power innovation,
         | public policy etc needed to take that next step?
         | 
         | They are driving the shoveling of VC money into a furnace to
         | power their servers.
         | 
         | Should that money run dry before they hit another breakthrough
         | "AI" popularity is going to drop like a stone. I believe this
         | to be far more likely an outcome than AGI or even the next big
         | breakthrough.
        
       | wslh wrote:
       | It sounds a bit sci-fi, but since these models are built on data
       | generated by our civilization, I wonder if there's an
       | epistemological bottleneck requiring smarter or more diverse
       | individuals to produce richer data. This, in turn, could spark
       | further breakthroughs in model development. Although these
       | interactions with LLMs help address specific problems, truly
       | complex issues remain beyond their current scope.
       | 
       | With my user hat on, I'm quite pleased with the current state of
       | LLMs. Initially, I approached them skeptically, using a hackish
       | mindset and posing all kinds of Turing test-like questions. Over
       | time, though, I shifted my focus to how they can enhance my
       | team's productivity and support my own tasks in meaningful ways.
       | 
       | Finally, I see LLMs as a valuable way to explore parts of the
       | world, accommodating the reality that we simply don't have enough
       | time to read every book or delve into every topic that interests
       | us.
        
       | headcanon wrote:
       | I don't see a problem with this, we were inevitably going to
       | reach some kind of plateau with existing pre-LLM-era data.
       | 
       | Meanwhile, the existing tech is such a step change that industry
       | is going to need time to figure out how to effectively use these
       | models. In a lot of ways it feels like the "digitization" era all
       | over again - workflows and organizations that were built around
       | the idea humans handled all the cognitive load (basically all
       | companies older than a year or two) will need time to adjust to a
       | hybrid AI + human model.
        
         | readyplayernull wrote:
         | > feels like the "digitization" era all over again
         | 
         | This exactly. And as history shows, no matter how much effort
         | the current big LLM companies do they won't be able to grasp
         | the best uses for their tech. We will see small players
         | developing it even further. I'm thankful for the legendary
         | blindness of these anticompetitive behemoths. Less than 2
         | decades ago: IBM Watson.
        
       | svara wrote:
       | The recent big success in deep learning have all been to a large
       | part successes in leveraging relatively cheaply available
       | training data.
       | 
       | AlphaGo - self-play
       | 
       | AlphaFold - PDB, the protein database
       | 
       | ChatGPT - human knowledge encoded as text
       | 
       | These models are all machines for clever interpolation in
       | gigantic training datasets.
       | 
       | They appear to be intelligent, because the training data they've
       | seen is so vastly larger than what we've seen individually, and
       | we have poor intuition for this.
       | 
       | I'm not throwing shade, I'm a daily user of ChatGPT and find
       | tremendous and diverse value in it.
       | 
       | I'm just saying, this particular path in AI is going to make
       | step-wise improvements whenever new large sources of training
       | data become available.
       | 
       | I suspect the path to general intelligence is not that, but we'll
       | see.
        
         | kaibee wrote:
         | > I suspect the path to general intelligence is not that, but
         | we'll see.
         | 
         | I think there's three things that a 'true' general intelligence
         | has which is missing from basic-type-LLMs as we have now.
         | 
         | 1. knowing what you know. <basic-LLMs are here>
         | 
         | 2. knowing what you don't know but can figure out via
         | tools/exploration. <this is tool use/function calling>
         | 
         | 3. knowing what can't be known. <this is knowing that halting
         | problem exists and being able to recognize it in novel
         | situations>
         | 
         | (1) From an LLM's perspective, once trained on corpus of text,
         | it knows 'everything'. It knows about the concept of not
         | knowing something (from having see text about it), (in so far
         | as an LLM knows anything), but it doesn't actually have a
         | growable map of knowledge that it knows has uncharted edges.
         | 
         | This is where (2) comes in, and this is what tool use/function
         | calling tries to solve atm, but the way function calling works
         | atm, doesn't give the LLM knowledge the right way. I know that
         | I don't know what 3,943,034 / 234,893 is. But I know I have a
         | 'function call' of knowing the algorithm for doing long divison
         | on paper. And I think there's another subtle point here: my
         | knowledge in (1) includes the training data generated from
         | running the intermediate steps of the long-division algorithm.
         | This is the knowledge that later generalizes to being able to
         | use a calculator (and this is also why we don't just give kids
         | calculators in elementary school). But this is also why a kid
         | that knows how to do long division on paper, doesn't seperately
         | need to learn when/how to use a calculator, besides the very
         | basics. Using a calculator to do that math feels like 1 step,
         | but actually it does still have all of initial mechanical steps
         | of setting up the problem on paper. You have to type in each
         | digit individually, etc.
         | 
         | (3) I'm less sure of this point now that I've written out point
         | (1) and (2), but that's kinda exactly the thing I'm trying to
         | get at. Its being able to recognize when you need more practice
         | of (1) or more 'energy/capital' for doing (2).
         | 
         | Consider a burger resturant. If you properly populated the
         | context of a ChatGPT-scale model the data for a burger
         | resturant from 1950, and gave it the kinda 'function calling'
         | we're plugging into LLMs now, it could manage it. It could keep
         | track of inventory, it could keep tabs on the employee-
         | subprocesses, knowing when to hire, fire, get new suppliers,
         | all via function calling. But it would never try to become
         | McDonalds, because it would have no model of the the internals
         | of those function-calls, and it would have no ability to
         | investigate or modify the behaviour of those function calls.
        
       | Davidzheng wrote:
       | Just because you guys want something to be true and can't accept
       | the alternative and upvote it when it agrees with your view does
       | not mean it is a correct view.
        
         | dbbk wrote:
         | What?
        
       | Animats wrote:
       | _" While the model was initially expected to significantly
       | surpass previous versions of the technology behind ChatGPT, it
       | fell short in key areas, particularly in answering coding
       | questions outside its training data."_
       | 
       | Right. If you generate some code with ChatGPT, and then try to
       | find similar code on the web, you usually will. Search for
       | unusual phrases in comments and for variable names. Often,
       | something from Stack Overflow will match.
       | 
       | LLMs do search and copy/paste with idiom translation and some
       | transliteration. That's good enough for a lot of common problems.
       | Especially in the HTML/Javascript space, where people solve the
       | same problems over and over. Or problems covered in textbooks and
       | classes.
       | 
       | But it does not look like artificial general intelligence emerges
       | from LLMs alone.
       | 
       | There's also the elephant in the room - the hallucination/lack of
       | confidence metric problem. The curse of LLMs is that they return
       | answers which are confident but wrong. "I don't know" is rarely
       | seen. Until that's fixed, you can't trust LLMs to actually _do_
       | much on their own. LLMs with a confidence metric would be much
       | more useful than what we have now.
        
         | dmd wrote:
         | > Right. If you generate some code with ChatGPT, and then try
         | to find similar code on the web, you usually will.
         | 
         | People who "follow" AI, as the latest fad they want to comment
         | on and appear intelligent about, repeat things like this
         | constantly, even though they're not actually true for anything
         | but the most trivial hello-world types of problems.
         | 
         | I write code all day every day. I use Copilot and the like all
         | day every day (for me, in the medical imaging software field),
         | and all day every day it is incredibly useful and writes nearly
         | exactly the code I would have written, but faster. And none of
         | it appears anywhere else; I've checked.
        
           | ngai_aku wrote:
           | You're solving novel problems all day every day?
        
             | dmd wrote:
             | Pretty much, yes. My job is pretty fun; it mostly entails
             | things like "take this horrible file workflow some research
             | assistant came up with while high 15 years ago and turn it
             | into a newer horrible file format a NEW research assistant
             | came up with (also while high) 3 years ago" - and automate
             | this in our data processing pipeline.
        
               | Der_Einzige wrote:
               | Due to WFH, the weed laws where tech workers live, and
               | the fast tolerance building of cannabis in the body - I
               | estimate that 10% of all code written by west coast tech
               | workers is done "while high" and that estimate is likely
               | low.
        
               | portaouflop wrote:
               | Do tech workers write better or worse code while high ?
        
               | delusional wrote:
               | If I understand that correctly you're converting file
               | formats? That's not exactly "novel"
        
               | llm_trw wrote:
               | This is exactly the type of novel work that llms are good
               | at. It's tedious and has annoying internal logic, but
               | that logic is quite flat and there are a million examples
               | to generalise from.
               | 
               | What they fail at is code with high cyclomatic
               | complexity. Back in the llama 2 finetune days I wrote a
               | script that would break down what each node in the
               | control flow graph into its own prompt using literate
               | programming and the results were amazing for the time.
               | Using the same prompts I'd get correct code in every
               | language I tried.
        
               | fireflash38 wrote:
               | If you've got clearly defined start input format and end
               | output format, sure it seems that it would be a good
               | candidate for heavy LLM use. But I don't know if that's
               | most people.
        
               | dmd wrote:
               | If it were ever clearly defined or even consistent from
               | input to input I would be overjoyed.
        
         | xpe wrote:
         | > LLMs do search and copy/paste with idiom translation and some
         | transliteration.
         | 
         | In general, this is not a good description about what is
         | happening inside an LLM. There is extensive literature on
         | interpretability. It is complicated and still being worked out.
         | 
         | The commenter above might _characterize_ the results they get
         | in this way, but I would question the validity of that
         | characterization, not to mention its generality.
        
       | zusammen wrote:
       | I wonder how much this has to do with a fluency plateau.
       | 
       | Up to a certain point, a conditional fluency stores knowledge, in
       | the sense that semantically correct sentences are more likely to
       | be fluent... but we may have tapped out in that regard. LLMs have
       | solved language very well, but to get beyond that has seemed,
       | thus far, to require RLHF, with all the attendant negatives.
        
         | namaria wrote:
         | Modeled language, maybe.
        
       | guluarte wrote:
       | Well, there have been no significant improvements to the GPT
       | architecture over the past few years. I'm not sure why companies
       | believe that simply adding more data will resolve the issues
        
         | incognito124 wrote:
         | More data and more compute on simpler models are the BItter
         | Lessons of Rich Sutton
        
         | HarHarVeryFunny wrote:
         | Obviously adding more data is a game of diminishing returns.
         | 
         | Going from 10% to 50% (500% more) complete coverage of common
         | sense knowledge and reasoning is going to feel like a
         | significant advance. Going from 90% to 95% (5% more) coverage
         | is not going to feel the same.
         | 
         | Regardless of what Altman says, its been two years since OpenAI
         | released GPT-4, and still no GPT-5 in sight, and they are now
         | touting Q-star/strawberry/GPT-o1 as the next big thing instead.
         | Sutskever, who saw what they're cooking before leaving, says
         | that traditional scaling has plateaeud.
        
           | og_kalu wrote:
           | >Regardless of what Altman says, its been two years since
           | OpenAI released GPT-4, and still no GPT-5 in sight.
           | 
           | It's been 20 months since 4 was released. 3 was released 32
           | months after 2. The lack of a release by now in itself does
           | not mean much of anything.
        
             | HarHarVeryFunny wrote:
             | By itself, sure, but there are many sources all pointing to
             | the same thing.
             | 
             | Sutskever, recently ex. OpenAI, one of the first to believe
             | in scaling, now says it is plateauing. Do OpenAI have
             | something secret he was unaware of? I doubt it.
             | 
             | FWIW, GPT-2 and GPT-3 were about a year apart (2019
             | "Language models are Unsupervised Multitask Learners" to
             | 2020 "Language Models are Few-Shot Learners").
             | 
             | Dario Amodei recently said that with current gen models
             | pre-training itself only takes a few months (then followed
             | by post-training, etc). These are not year+ training runs.
        
               | og_kalu wrote:
               | >Sutskever, recently ex. OpenAI, one of the first to
               | believe in scaling, now says it is plateauing.
               | 
               | Blind scaling sure (for whatever reason)* but this is the
               | same Sutskever who believes in ASI within a decade off
               | the back of what we have today.
               | 
               | * Not like anyone is telling us any details. After all,
               | Open AI and Microsoft are still trying to create a 100B
               | data center.
               | 
               | In my opinion, there's a difference between scaling not
               | working and scaling becoming increasingly infeasible.
               | GPT-4 is something like x100 the compute of 3 (Same with
               | 2>3).
               | 
               | All the drips we've had of 5 point to ~x10 of 4. Not
               | small but very modest in comparison.
               | 
               | >FWIW, GPT-2 and GPT-3 were about a year apart (2019
               | "Language models are Unsupervised Multitask Learners" to
               | 2020 "Language Models are Few-Shot Learners").
               | 
               | Ah sorry I meant 3 and 4.
               | 
               | >Dario Amodei recently said that with current gen models
               | pre-training itself only takes a few months (then
               | followed by post-training, etc). These are not year+
               | training runs.
               | 
               | You don't have to be training models the entire time.
               | GPT-4 was done training in August 2022 according to Open
               | AI and wouldn't be released for another 8 months. Why?
               | Who knows.
        
         | xpe wrote:
         | > Well, there have been no significant improvements to the GPT
         | architecture over the past few years.
         | 
         | A lot hangs on what you mean by "significant". Can you define
         | what you mean? And/or give an example of an improvement that
         | you don't think is significant.
         | 
         | Also, on what basis can you say "no significant improvements"
         | have been made? Many major players have published some of their
         | improvements openly. They also have more private, unpublished
         | improvements.
         | 
         | If your claim boils down to "what people mean by a Generative
         | Pre-trained Transformer" still has a clear meaning, ok, fine,
         | but that isn't the meat of the issue. There is so much more to
         | a chat system than just the starting point of a vanilla GPT.
         | 
         | It is wiser to look at the whole end-to-end system, starting at
         | data acquisition, including pre-training and fine-tuning,
         | deployment, all the way to UX.
         | 
         | P.S. I don't have a vested interest in promoting or disparaging
         | AI. I don't work for a big AI lab. I'm just trying to call it
         | like I see it, as rationally as I can.
        
       | LASR wrote:
       | Question for the group here: do we honestly feel like we've
       | exhausted the options for delivering value on top of the current
       | generation of LLMs?
       | 
       | I lead a team exploring cutting edge LLM applications and end-
       | user features. It's my intuition from experience that we have a
       | LONG way to go.
       | 
       | GPT-4o / Claude 3.5 are the go-to models for my team. Every
       | combination of technical investment + LLMs yields a new list of
       | potential applications.
       | 
       | For example, combining a human-moderated knowledge graph with an
       | LLM with RAG allows you to build "expert bots" that understand
       | your business context / your codebase / your specific processes
       | and act almost human-like similar to a coworker in your team.
       | 
       | If you now give it some predictive / simulation capability - eg:
       | simulate the execution of a task or project like creating a
       | github PR code change, and test against an expert bot above for
       | code review, you can have LLMs create reasonable code changes,
       | with automatic review / iteration etc.
       | 
       | Similarly there are many more capabilities that you can ladder on
       | and expose into LLMs to give you increasingly productive outputs
       | from them.
       | 
       | Chasing after model improvements and "GPT-5 will be PHD-level" is
       | moot imo. When did you hire a PHD coworker and they were
       | productive on day-0 ? You need to onboard them with human
       | expertise, and then give them execution space / long-term
       | memories etc to be productive.
       | 
       | Model vendors might struggle to build something more intelligent.
       | But my point is that we already have so much intelligence and we
       | don't know what to do with that. There is a LOT you can do with
       | high-schooler level intelligence at super-human scale.
       | 
       | Take a naive example. 200k context windows are now available.
       | Most people, through ChatGPT, type out maybe 1500 tokens. That's
       | a huge amount of untapped capacity. No human is going to type out
       | 200k of context. Hence why we need RAG, and additional forms of
       | input (eg: simulation outcomes) to fully leverage that.
        
         | amelius wrote:
         | Yes, but literally anybody can do all those things. So while
         | there will be many opportunities for new features (new ways of
         | combining data), there will be few _business_ opportunities.
        
           | Miraste wrote:
           | HN always says this, and it's always wrong. A technical
           | implementation that's easy, or readily available, does not
           | mean that a successful company can't be built on it. Last
           | year, people were saying "OpenAI doesn't have a moat." 15
           | years before that, they were saying "Dropbox is just a couple
           | of chron jobs, it'll fail in a few months."
        
             | amelius wrote:
             | > HN always says this
             | 
             | The meaning here is different. What I'm saying is that big
             | companies like OpenAI will always strive to make a
             | _generic_ AI, such that anyone can do basically anything
             | using AI. The big companies therefore will indeed (like you
             | say) have a profitable business, but few others will.
        
         | hartator wrote:
         | All of these hacks do sound like we are at that diminishing
         | return point.
        
           | namaria wrote:
           | It all just sounds to me like we're back at expert systems.
           | Doesn't bode well...
        
             | ianbutler wrote:
             | Honest question, how would you expect systems to get
             | external knowledge etc without tools like the OP is
             | suggesting?
             | 
             | Action oriented through self exploration? What is your
             | thought for how these systems integrate with the existing
             | world?
             | 
             | Why does the OP's suggested mode of integration make you
             | think of those older systems?
        
           | brookst wrote:
           | Hey look, it's Gordon Moore visiting us from 2005! :)
        
         | crystal_revenge wrote:
         | I don't think we've even _started_ to get the most value out of
         | current gen LLMs. For starters very few people are even looking
         | at sampling which is a major part of the model performance.
         | 
         | The theory behind these models so aggressively lags the
         | engineering that I suspect there are many major improvements to
         | be found just by understanding a bit more about _what these
         | models are really doing_ and making re-designs based on that.
         | 
         | I highly encourage anyone seriously interested in LLMs to start
         | spending more time in the open model space where you can really
         | take a look inside and play around with the internals. Even if
         | you don't have the resources for model training, I feel
         | personally understanding sampling and other potential tweaks to
         | the model (lots of neat work on uncertainty estimations,
         | manipulating the initial embedding the prompts are assigned,
         | intelligent backtracking, etc).
         | 
         | And from a practical side I've started to realize that many
         | people have been holding on of building things waiting for
         | "that next big update", but there a so many small, annoying
         | tasks that can be easily automated.
        
           | dr_dshiv wrote:
           | > I've started to realize that many people have been holding
           | on of building things waiting for "that next big update"
           | 
           | I've noticed this too -- I've been calling it _intellectual
           | deflation._ By analogy, why spend now when it may be cheaper
           | in a month? Why do the work now, when it will be easier in a
           | month?
        
             | vbezhenar wrote:
             | Why optimise software today, when tomorrow Intel will
             | release CPU with 2x performance?
        
               | sdenton4 wrote:
               | Curiously, Moore's law was predictable enough over
               | decades that you could actually plan for the speed of
               | next year's hardware quite reliably.
               | 
               | For LLMs, we don't even know how to reliably measure
               | performance, much less plan for expected improvements.
        
               | mikeyouse wrote:
               | Moores law became less of a prediction and more of a
               | product road map as time went on. It helped coordinate
               | investment and expectations across the entire industry so
               | everyone involved had the same understanding of timelines
               | and benchmarks. I fully believe more investment would've
               | 'bent the curve' of the trend line but everyone was
               | making money and there wasn't a clear benefit to pushing
               | the edge further.
        
               | epicureanideal wrote:
               | Or maybe it pushed everyone to innovate faster than they
               | otherwise would've? I'm very interested to hear your
               | reasoning for the other case though, and I am not
               | strongly committed to the opposite view, or either view
               | for that matter.
        
               | throwing_away wrote:
               | Call Nvidia, that sounds like a job for AI.
        
               | ben_w wrote:
               | Back when Intel regularly gave updates with 2x
               | performance increases, people did make decisions based on
               | the performance doubling schedule.
        
             | jkaptur wrote:
             | https://en.wikipedia.org/wiki/Osborne_effect
        
           | ppeetteerr wrote:
           | The reason people are holding out is that the current
           | generation of models are still pretty poor in many areas. You
           | can have it craft an email, or to review your email, but I
           | wouldn't trust an LLM with anything mission-critical. The
           | accuracy of the generated output is too low be trusted in
           | most practical applications.
        
             | saalweachter wrote:
             | Any email you trust an LLM to write is one you probably
             | don't need to send.
        
           | deegles wrote:
           | My big question is what is being done about hallucination?
           | Without a solution it's a giant footgun.
        
           | creativenolo wrote:
           | Great & motivational comment. Any pointers on where to start
           | playing with the internals and sampling?
           | 
           | Doesn't need to be comprehensive, I just don't know where to
           | jump off from.
        
           | creativenolo wrote:
           | > holding on of building things waiting for "that next big
           | update", but there a so many small, annoying tasks that can
           | be easily automated.
           | 
           | Also we only hear / see the examples that are meant to scale.
           | Startups typically offer up something transformative, ready
           | to soak up a segment of a market. And that's hard with the
           | current state of LLMs. When you try their offerings, it's
           | underwhelming. But there is richer, more nuanced hard to
           | reach fruits that are extremely interesting - but it's not
           | clear where they'd scale in and of themselves.
        
           | kozikow wrote:
           | > "The theory behind these models so aggressively lags the
           | engineering"
           | 
           | The problem is that 99% of theories are hard to scale.
           | 
           | I am not an expert, as I work adjacent to this field, but I
           | see the inverse - dumbing down theory to increase
           | parallelism/scalability.
        
         | msabalau wrote:
         | There are all sorts of valuable things to explore and build
         | with what we have already.
         | 
         | But understanding how likely it is that we will (or will not)
         | see a new models quickly and dramatically improve on what we
         | have "because scaling" seems valuable context for everyone in
         | ecosystem to make decisions.
        
         | ben_w wrote:
         | > Question for the group here: do we honestly feel like we've
         | exhausted the options for delivering value on top of the
         | current generation of LLMs?
         | 
         | IMO we've not even exhausted the options for spreadsheets, let
         | alone LLMs.
         | 
         | And the reason I'm thinking of spreadsheets is that they, like
         | LLMs, are very hard to win big on even despite the value they
         | bring. Not "no moat" (that gets parroted stochastically in
         | threads like these), but the moat is elsewhere.
        
         | alach11 wrote:
         | My team and I also develop with these models every day, and I
         | completely agree. If models stall at current levels, it will
         | take 10 (or more) years for us to capture most of the value
         | they offer. There's so much work out there to automate and so
         | many workflows to enhance with these "not quite AGI-level"
         | models. And if peak model performance remains the same but cost
         | continues to drop, that opens up vastly more applications as
         | well.
        
         | alangibson wrote:
         | I think you're playing a different game than the Sam Altmans of
         | the world. The level of investment and profit they are looking
         | for can only be justified by creating AGI.
         | 
         | The > 100 P/E ratios we are already seeing can't be justified
         | by something as quotidian as the exceptionally good
         | productivity tools you're talking about.
        
           | gizajob wrote:
           | Yeah I keep thinking this - how is Nvidia worth $3.5Trillion
           | for making code autocomplete for coders
        
             | drawnwren wrote:
             | Nvidia was not the best example. They get to moon in the
             | case that any AI exponential hits. Most others have less of
             | a wide probability distribution.
        
               | BeefWellington wrote:
               | Yeah they're the shovel sellers of this particular
               | goldrush.
               | 
               | Most other businesses trying to actually use LLMs are the
               | riskier ones, including OpenAI, IMO (though OpenAI is
               | perhaps the least risky due to brand recognition).
        
               | lokimedes wrote:
               | Or they become the Webvan/pets.com of the bubble.
        
               | zeusk wrote:
               | Nvidia is more likely to become CSCO or INTC but as far
               | as I can tell, that's still a few years off - unless
               | ofcourse there is weakness in broader economy that
               | accelerates the pressure on investors.
        
               | HarHarVeryFunny wrote:
               | I'm not sure about that. NVIDIA seems to stay in a
               | dominant position as long as the race to AI remains
               | intact, but the path to it seems unsure. They are selling
               | a general purpose AI-accelerator that supports the
               | unknown path.
               | 
               | Once massively useful AI has been achieved, or it's been
               | determined that LLMs are it, then it becomes a race to
               | the bottom as GOOG/MSFT/AMZN/META/etc design/deploy more
               | specialized accelerators to deliver this final form
               | solution as cheaply as possible.
        
           | JumpCrisscross wrote:
           | > _level of investment and profit they are looking for can
           | only be justified by creating AGI_
           | 
           | What are you basing this on?
           | 
           | IT outsourcing is a $500+ billion industry. If OpenAI _et al_
           | can run even a 10% margin, that business alone justifies
           | their valuation.
        
             | HarHarVeryFunny wrote:
             | It seems you are missing a lot of "ifs" in that
             | hypothetical!
             | 
             | Nobody knows how things like coding assistants or other AI
             | applications will pan out. Maybe it'll be Oracle selling
             | Meta-licenced solutions that gets the lion's share of the
             | market. Maybe custom coding goes away for many business
             | applications as off-the-shelf solutions get smarter.
             | 
             | A future where all that AI (or some hypothetical AGI)
             | changes is work being done by humans to the same work being
             | done by machines seems way too linear.
        
               | JumpCrisscross wrote:
               | > _you are missing a lot of "ifs" in that hypothetical_
               | 
               | The big one being I'm not assuming AGI. Low-level coding
               | tasks, the kind frequently outsourced, are within the
               | realm of being competitive with offshoring with known
               | methods. My point is we don't need to assume AGI for
               | these valuations to make sense.
        
               | HarHarVeryFunny wrote:
               | Current AI coding assistants are best at writing
               | functions or adding minor features to an existing code
               | base. They are not agentic systems that can develop an
               | entire solution from scratch given a specification, which
               | in my experience is more typcical of the work that is
               | being outsourced. AI is a tool, whose full-cycle
               | productivity benefit seems questionable. It is not a
               | replacement for a human.
        
               | JumpCrisscross wrote:
               | > _they are not agentic systems that can develop an
               | entire solution from scratch given a specification, which
               | in my experience is more typcical of the work that is
               | being outsourced_
               | 
               | If there is one domain where we're seeing tangible
               | progress from AI, it's in working towards this goal.
               | Difficult projects aren't in scope. But most tech,
               | _especially_ most tech branded IT, is not difficult.
               | Everyone doesn 't need an inventory or customer-complaint
               | system designed from scratch. Current AI is good at
               | cutting through that cruft.
        
               | senko wrote:
               | There are a number of agentic systems that can develop
               | more complex solutions. Just a few off the top of my
               | head: Pythagora, Devin, OpenHands, Fume, Tusk, Replit,
               | Codebuff, Vly. I'm sure I've missed a bunch.
               | 
               | Are they good enough to replace a human yet?
               | Questionable[0], but they _are_ improving.
               | 
               | [0] You wouldn't believe how low the outsourcing
               | contractors' quality can go. Easily surpassed by current
               | AI systems :) That's a very low bar tho.
        
         | hluska wrote:
         | Nowhere near, but the market seems to have priced in that
         | scaling would continue to have a near linear effect on
         | capability. That's not happening and that's the issue the
         | article is concerned with.
        
         | HarHarVeryFunny wrote:
         | Sure, there's going to be a lot of automation that can be built
         | using current GPT-4 level LLMs, even if they don't get much
         | better from here.
         | 
         | However, this is better thought of as "business logic
         | scripting/automation", not the magic employee-replacing AGI
         | that would be the revolution some people are expecting. Maybe
         | you can now build a slightly less shitty automated telephone
         | response system to piss your customers off with.
        
         | brookst wrote:
         | > Question for the group here: do we honestly feel like we've
         | exhausted the options for delivering value on top of the
         | current generation of LLMs?
         | 
         | Certainly not.
         | 
         | But technology is all about stacks. Each layer strives to
         | improve, right up through UX and business value. The uses for
         | 1um chips had not been exhausted in 1989 when the 486 shipped
         | in 800nm. 250nm still had tons of unexplored uses when the
         | Pentium 4 shipped on 90nm.
         | 
         | Talking about scaling at the the model level is like talking
         | about transistor density for silicon: it's interesting, and
         | relevant, and we should care... but it is not the sole
         | determinent of what use cases can be build and what user value
         | there is.
        
         | senko wrote:
         | No.
         | 
         | The scaling laws may be dead. Does this mean the end of LLM
         | advances? Absolutely not.
         | 
         | There are many different ways to improve LLM capabilities.
         | Everyone was mostly focused on the scaling laws because that
         | worked extremely well (actually surprising most of the
         | researchers).
         | 
         | But if you're keeping an eye on the scientific papers coming
         | out about AI, you've seen the astounding amount of research
         | going on with some very good results, that'll probably take at
         | least several months to trickle down to production systems.
         | Thousands of extremely bright people in AI labs all across the
         | world are working on finding the next trick that boosts AI.
         | 
         | One random example is test-time compute: just give the AI more
         | time to think. This is basically what O1 does. A recent
         | research paper suggests using it is roughly equivalent to an
         | order of magnitude more parameters, performance wise. (source
         | for the curious: https://lnkd.in/duDST65P)
         | 
         | Another example that sounds bonkers but apparently works is
         | quantization: reducing the precision of each parameter to 1.58
         | bits (ie only using values -1, 0, 1). This uses 10x less space
         | for the same parameter count (compared to standard 16-bit
         | format), and since AI operatons are actually memory limited,
         | directly corresponds to 10x decrease in costs:
         | https://lnkd.in/ddvuzaYp
         | 
         | (Quite apart from improvements like these, we shouldn't forget
         | that not all AIs are LLMs. There's been tremendous advance in
         | AI systems for image, audio and video generation,
         | interpretation and munipulation and they also don't show signs
         | of stopping, and there's possibility that a new or hybrid
         | architecture for the textual AI might be developed).
         | 
         | AI winter is a long way off.
        
           | limaoscarjuliet wrote:
           | Scaling laws are not dead. The number of people predicting
           | death of Moore's law doubles every two years.
           | 
           | - Jim Keller
           | 
           | https://www.youtube.com/live/oIG9ztQw2Gc?si=oaK2zjSBxq2N-zj1.
           | ..
        
             | nyrikki wrote:
             | There are way too many personal definitions of what
             | "Moore's Law" even is to have a discussion without deciding
             | on a shared definition before hand.
             | 
             | But Goodhart's law; "When a measure becomes a target, it
             | ceases to be a good measure"
             | 
             | Directly applies here, Moore's Law was used to set long
             | term plans at semiconductor companies, and Moore didn't
             | have empirical evidence it was even going to continue.
             | 
             | If you say, arbitrarily decide CPU, or worse, single core
             | performance as your measurement, it hasn't held for well
             | over a decade.
             | 
             | If you hold minimum feature size without regard to cost, it
             | is still holding.
             | 
             | What you want to prove usually dictates what interpretation
             | you make.
             | 
             | That said, the scaling law is still unknown, but you can
             | game it as much as you want in similar ways.
             | 
             | GPT4 was already hinting at an asymptote on MMLU, but the
             | question is if it is valid for real work etc...
             | 
             | Time will tell, but I am seeing far less optimism from my
             | sources, but that is just anecdotal.
        
         | afro88 wrote:
         | > potential applications > if you ... > for example ...
         | 
         | Yes there seems to be lots of potential. Yes we can brainstorm
         | things that should work. Yes there is a lot of examples of
         | incredible things in isolation. But it's a little bit like
         | those youtube videos showing amazing basketball shots in 1 try,
         | when in reality lots of failed attempts happened beforehand.
         | Except our users experience the failed attempts (LLM replies
         | that are wrong, even when backed by RAG) and it's incredibly
         | hard to hide those from them.
         | 
         | Show me the things you / your team has actually built that has
         | decent retention and metrics concretely proving efficiency
         | improvements.
         | 
         | LLMs are so hit and miss from query to query that if your users
         | don't have a sixth sense for a miss vs a hit, there may not be
         | any efficiency improvement. It's a really hard problem with LLM
         | based tools.
         | 
         | There is so much hype right now and people showing cherry
         | picked examples.
        
           | jihadjihad wrote:
           | > Except our users experience the failed attempts (LLM
           | replies that are wrong, even when backed by RAG) and it's
           | incredibly hard to hide those from them.
           | 
           | This has been my team's experience (and frustration) as well,
           | and has led us to look at using LLMs for classifying /
           | structuring, but not entrusting an LLM with making a decision
           | based on things like a database schema or business logic.
           | 
           | I think the technology and tooling will get there, but the
           | enormous amount of effort spent trying to get the system to
           | "do the right thing" and the nondeterministic nature have
           | really put us into a camp of "let's only allow the LLM to do
           | things we know it is rock-solid at."
        
             | sdesol wrote:
             | > "let's only allow the LLM to do things we know it is
             | rock-solid at."
             | 
             | Even this is insanely hard in my opinion. The one thing
             | that you would assume LLM to excel at is spelling and
             | grammar checking for the English language, but even the top
             | model (GPT-4o) can be insanely stupid/unpredictable at
             | times. Take the following example from my tool:
             | 
             | https://app.gitsense.com/?doc=6c9bada92&model=GPT-4o&sample
             | s...
             | 
             | 5 models are asked if the sentence is correct and GPT-4o
             | got it wrong all 5 times. It keeps complaining that GitHub
             | is spelled like Github, when it isn't. Note, only 2 weeks
             | ago, Claude 3.5 Sonnet did the same thing.
             | 
             | I do believe LLM is a game changer, but I'm not convinced
             | it is designed to be public-facing. I see LLM as a power
             | tool for domain experts, and you have to assume whatever it
             | spits out may be wrong, and your process should allow for
             | it.
             | 
             | Edit:
             | 
             | I should add that I'm convinced that not one single model
             | will rule them all. I believe there will be 4 or 5 models
             | that everybody will use and each will be used to challenge
             | one another for accuracy and confidence.
        
               | SimianSci wrote:
               | > "I see LLM as a power tool for domain experts, and you
               | have to assume whatever it spits out may be wrong, and
               | your process should allow for it."
               | 
               | this gets to the heart of it for me. I think LLMs are an
               | incredible tool, providing advanced augmentation on our
               | already developed search capabilities. What advanced user
               | doesnt want to have a colleague they can talk about their
               | specific domain capacity with?
               | 
               | The problem comes from the hyperscaling ambitions of the
               | players who were the first in this space. They quickly
               | hyped up the technology beyond want it should have been.
        
               | larodi wrote:
               | Those Apple engineers stated in a very clear tone:
               | 
               | - every time a different result is produced.
               | 
               | - no reasoning capabilities were categorically
               | determined.
               | 
               | So this is it. If you want LLM - brace for different
               | results and if this is okay for your application (say
               | it's about speech or non-critical commands) then off you
               | are.
               | 
               | Otherwise simply forget this approach, and particularly
               | when you need reproducible discreet results.
               | 
               | I don't think it gets any better than that and nothing so
               | far implicated it will (with this particular approach to
               | AGI or whatever the wet dream is)
        
               | marcellus23 wrote:
               | > Those Apple engineers
               | 
               | Which Apple engineers? Yours is the only reference to the
               | company in this comment section or in the article.
        
               | verteu wrote:
               | (for reference: https://arxiv.org/pdf/2410.05229 )
        
           | VeejayRampay wrote:
           | really agree with this and I think it's been the general
           | experience: people wanting LLMs to be so great (or making
           | money off them) kind of cherry picking examples that fit
           | their narrative, which LLMs are good at because they produce
           | amazing results some of the time like the deluxe broken clock
           | that they are (they're right many many times a day)
           | 
           | at the end of the day though, it's not exactly reliable or
           | particularly transformative when you get past the party
           | tricks
        
           | archiepeach wrote:
           | To be fair in the human-based teams I've worked with in
           | startups I couldn't show you products with decent retention.
        
         | whiplash451 wrote:
         | The main difference between GPT5 and a PhD-level new hire is
         | that the new hire will autonomously go out, deliver and take on
         | harder task with much fewer guidance than GPT5 will ever
         | require. So much of human intelligence is about interacting
         | with peers.
        
           | ben_w wrote:
           | Human interaction with peers is also guidance.
           | 
           | I don't know how many team meetings PhD students have, but I
           | do know about software development jobs with 15 minute daily
           | standups, and that length meeting at 120 words per minute for
           | 5 days a week, 48 weeks per year of a 3 year PhD is 1.296.000
           | words.
        
             | eastbound wrote:
             | I have 3 remote employees whose job is consistently as bad
             | as LLM.
             | 
             | That means employees who use LLM are, on average,
             | recognizably bad. Those who are good enough, are also good
             | enough to write the code manually.
             | 
             | To the point I wonder whether this HN thread is generated
             | by OpenAI, trying to create buzz around AI.
        
               | ben_w wrote:
               | 1. The person I'm replying to is hypothesising about a
               | future, not yet existent, version, GPT5. Current quality
               | limits don't tell you jack about a hypothetical future,
               | especially one that may not ever happen because money.
               | 
               | 2. I'm not commenting on the quality, because they were
               | writing about something that doesn't exist and therefore
               | that's clearly just a given for the discussion. The only
               | thing I was adding is that humans _also_ need guidance,
               | and quite a lot of it -- even just a two-week sprint 's
               | worth of 15 minute daily stand-up meetings is 18,000
               | words, which is well beyond the point where I'd have
               | given up prompting an LLM and done the thing myself.
        
         | EGreg wrote:
         | I want to stuff a transcript of a 3 hour podcast into some LLM
         | API and have it summarize it by: segmenting by topic changes,
         | keeping the timestamps, and then summarizing each segment.
         | 
         | I wasn't able to get it do it with Anthropic or OpenAI chat
         | completion APIs. Can someone explain why? I don't think the
         | 200K token window actually works, is it looking sequentially or
         | is it really looking at the whole thing at once or something?
        
         | anonzzzies wrote:
         | The current models are very powerful and we definitely didn't
         | get most out of them yet. We are getting more and more out of
         | them every week when we release new versions of our toolkits.
         | So if this is it; please make it faster and take less energy.
         | We'll be fine until the next AI spring.
        
         | simonw wrote:
         | Right. I've been saying for a while that if all LLM development
         | stopped entirely and we were stuck with the models we have
         | right now (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama
         | 3.1/2, Qwen 2.5 etc) we could still get multiple years worth of
         | advances just out of those existing models. There is SO MUCH we
         | haven't figured out about how to use them yet.
        
         | 23B1 wrote:
         | The user interface for LLMs is stuck in C:\
         | 
         | That's where I'd focus.
        
           | kenjackson wrote:
           | Voice for LLMs is surprisingly good. I'd love to see LLMs
           | used in more systems like cars and in-home automation.
           | Whatever cars use today and Alexa in the home simply are much
           | worse than what we get with ChatGPT voice today.
        
         | ericmcer wrote:
         | I have tried a few AI coding tools and always found them
         | impressive but I don't really need something to autocomplete
         | obvious code cases.
         | 
         | Is there an AI tool that can ingest a codebase and locate code
         | based on abstract questions? Like: "I need to invalidate
         | customers who haven't logged in for a month" and it can locate
         | things like relevant DB tables, controllers, services, etc.
        
         | yk wrote:
         | To a certain extent I think we get a better understanding what
         | llms can do, and my estimation for the next ten years is more
         | like best UI ever rather than llms will replace humanity. Now
         | best UI ever is something that can certainly deliver a lot of
         | value, 80% of all buttons in a car should be replaced by
         | actually good voice control, and I think that is were we are
         | going to see a lot of very interesting applications: Hey
         | washing machine, this is two t-shirts and a jeans. (The washing
         | machine can then figure out it's program by itself, I don't
         | want to memorize the table in the manual.)
        
           | lokimedes wrote:
           | To each their own, but I don't look forward to having my kids
           | yelling, a podcast in my ears and having to explain to my
           | tumbler that wool must be spun at 1000 RPM. Humans have
           | varying preferences when it comes to communication and
           | sensing, making our machine interactions favor the
           | extroverted talkative exhibitionists is really only one
           | modality.
        
         | machiaweliczny wrote:
         | Long context is a scam. Claude is best but it's still gets lost
         | with longer context
        
           | bbor wrote:
           | I have no data, but I whole-heartedly agree. Well, perhaps
           | not "scam", but definitely oversold. One of my best undergrad
           | professors taught me the adage "don't expect a model to do
           | what a human expert cannot", and I think it's still a good
           | rule of thumb. Giving someone an entire book to read before
           | answering your question _might_ help, but it would help way,
           | way more to give them a few paragraphs that you know are
           | actually relevant.
        
           | cruffle_duffle wrote:
           | In my experience, the reality of long context windows doesn't
           | live up to the hype. When you're iterating on something,
           | whether it's code, text, or any document, you end up with
           | multiple versions layered in the context. Every time you
           | revise, those earlier versions stick around, even though only
           | the latest one is the "most correct".
           | 
           | What gets pushed out isn't the last version of the document
           | itself (since it's FIFO), but the important parts of the
           | conversation--things like the rationale, requirements, or any
           | context the model needs to understand why it's making
           | changes. So, instead of being helpful, that extra capacity
           | just gets filled with old, repetitive chunks that have to be
           | processed every time, muddying up the output. This isn't just
           | an issue with code; it happens with any kind of document
           | editing where you're going back and forth, trying to refine
           | the result.
           | 
           | Sometimes I feel the way to "resolve" this is to instead go
           | back and edit some earlier portion of the chat to update it
           | with the "new requirements" that I didn't even know I had
           | until I walked down some rabbit hole. What I end up with is
           | almost like a threaded conversation with the LLM. Like, I
           | sometimes wish these LLM chatbots explicitly treated the
           | conversion as if it were threaded. They do support basically
           | my use case by letting you toggle between different edits to
           | your prompts, but it is pretty limited and you cannot go back
           | and edit things if you do some operations (eg: attach a
           | file).
           | 
           | Speaking of context, it's also hard to know what things like
           | ChatGPT add to it's context in the first place. Many of times
           | I'll attach a file or something and discover it didn't "read"
           | the file into it's context. Or I'll watch it fire up a python
           | program it writes that does nothing but echo the file into
           | it's context.
           | 
           | I think there is still a lot of untapped potential in
           | strategically manipulating what gets placed into the context
           | window at all. For example only present the LLM with the
           | latest and greatest of a document and not all the previous
           | revisions in the thread.
        
         | bbor wrote:
         | Great question. Im very confident in my answer, even though
         | it's in the minority here: we're not even close to exhausting
         | the potential.
         | 
         | Imagine that our current capabilities are like the Model-T.
         | There remains many improvements to be made upon this passenger
         | transportation product, with RAG being a great common theme
         | among them. People will use chatbots with much more permissive
         | interfaces instead of clicking through menus.
         | 
         | But all of that's just the start, the short term, the
         | maturation of this consumer product; the really scary/exciting
         | part comes when the technology reaches saturation, and opens up
         | new possibilities for itself. In the Model-T metaphor, this is
         | analogous to how highways have (arguably) transformed America
         | beyond anyone's wildest dreams, changing the course of various
         | historical events (eg WWII industrialization, 60s & 70s white
         | flight, early 2000s housing crisis) so much it's hard to
         | imagine what the country would look like without them. Now,
         | automobiles are not simply passenger transportation, but the
         | bedrock of our commerce, our military, and probably more --
         | through ubiquity alone they unlocked new forms of themselves.
         | 
         | For those doubting my utopian/apocalyptic rhetoric, I implore
         | you to ask yourself one simple question: why are so many
         | experts so worried about AGI? They've been leaving in droves
         | from OpenAI, and that's ultimately what the governance
         | kerfluffle there was. Hinton, a Turing award winner, gave up
         | $$$ to doom-say full time. Why?
         | 
         | My hint is that if your answer involves less then a 1000
         | specialized LLMs per unified system, then you're not thinking
         | big enough.
        
           | fire_lake wrote:
           | > Hinton, a Turing award winner, gave up $$$ to doom-say full
           | time
           | 
           | This is a hint of something but a weak argument. Smart people
           | are wrong all the time.
        
         | robrenaud wrote:
         | > For example, combining a human-moderated knowledge graph with
         | an LLM with RAG allows you to build "expert bots" that
         | understand your business context / your codebase / your
         | specific processes and act almost human-like similar to a
         | coworker in your team.
         | 
         | I'd love to hear about this. I applied to YC WC 25 with
         | research/insight/an initial researchy prototype built on top of
         | GPT4+finetuning about something along this idea. Less powerful
         | than you describe, but it also works without the human
         | moderated KG.
        
         | bloppe wrote:
         | > you can have LLMs create reasonable code changes, with
         | automatic review / iteration etc.
         | 
         | Nobody who takes code health and sustainability seriously wants
         | to hear this. You absolutely do not want to be in a position
         | where something breaks, but your last 50 commits were all
         | written and reviewed by an LLM. Now you have to go back and
         | review them all with human eyes just to get a handle on how
         | things broke, while customers suffer. At this scale, it's an
         | effort multiplier, not an effort reducer.
         | 
         | It's still good for generating little bits of boilerplate,
         | though.
        
         | moogly wrote:
         | > you can have LLMs create reasonable code changes
         | 
         | Could you define "code changes" because I feel that is a very
         | vague accomplishment.
        
         | nonameiguess wrote:
         | Your hypothesis here is not exclusive of the hypothesis in this
         | article.
         | 
         | Name your platform. Linux. C++. The Internet. The x86 processor
         | architecture. We haven't exhausted the options for delivering
         | value on top of those, but that doesn't mean the developers and
         | sellers of those platforms don't try to improve them anyway and
         | might struggle to extract value from application developers who
         | use them.
        
       | polskibus wrote:
       | In other news, Altman said AGI is coming next year
       | https://www.tomsguide.com/ai/chatgpt/sam-altman-claims-agi-i...
        
         | Jyaif wrote:
         | According to the article, he said it _could_ be achieved in
         | 2025, which seems pretty obvious to me as well even though I
         | don 't have any visibility into what is going on inside those
         | companies.
        
       | user90131313 wrote:
       | AI market top very soon
        
       | fallat wrote:
       | What a stupid piece. We are making leaps every 6 months still.
       | Tell me this when there are no developments for 3 years.
        
         | hatefulmoron wrote:
         | I'm curious, what was the leap after GPT-4? What about the
         | leaps after that, given a leap every 6 months?
        
       | xyst wrote:
       | Many late investors in the genAI space about to be bag holders
        
       | 12_throw_away wrote:
       | Well shoot. It's not like it was patently obvious that this would
       | happen _before_ the industry started guzzling electricity and
       | setting money on fire, right? [1]
       | 
       | [1] https://dl.acm.org/doi/10.1145/3442188.3445922
        
       | kaibee wrote:
       | Not sure where the OP to the comment I meant to reply to is, but
       | I'll just add this here.
       | 
       | > I suspect the path to general intelligence is not that, but
       | we'll see.
       | 
       | I think there's three things that a 'true' general intelligence
       | has which is missing from basic-type-LLMs as we have now.
       | 
       | 1. knowing what you know. <basic-LLMs are here>
       | 
       | 2. knowing what you don't know but can figure out via
       | tools/exploration. <this is tool use/function calling>
       | 
       | 3. knowing what can't be known. <this is knowing that halting
       | problem exists and being able to recognize it in novel
       | situations>
       | 
       | (1) From an LLM's perspective, once trained on corpus of text, it
       | knows 'everything'. It knows about the concept of not knowing
       | something (from having see text about it), (in so far as an LLM
       | knows anything), but it doesn't actually have a growable map of
       | knowledge that it knows has uncharted edges.
       | 
       | This is where (2) comes in, and this is what tool use/function
       | calling tries to solve atm, but the way function calling works
       | atm, doesn't give the LLM knowledge the right way. I know that I
       | don't know what 3,943,034 / 234,893 is. But I know I have a
       | 'function call' of knowing the algorithm for doing long divison
       | on paper. And I think there's another subtle point here: my
       | knowledge in (1) includes the training data generated from
       | running the intermediate steps of the long-division algorithm.
       | This is the knowledge that later generalizes to being able to use
       | a calculator (and this is also why we don't just give kids
       | calculators in elementary school). But this is also why a kid
       | that knows how to do long division on paper, doesn't seperately
       | need to learn when/how to use a calculator, besides the very
       | basics. Using a calculator to do that math feels like 1 step, but
       | actually it does still have all of initial mechanical steps of
       | setting up the problem on paper. You have to type in each digit
       | individually, etc.
       | 
       | (3) I'm less sure of this point now that I've written out point
       | (1) and (2), but that's kinda exactly the thing I'm trying to get
       | at. Its being able to recognize when you need more practice of
       | (1) or more 'energy/capital' for doing (2).
       | 
       | Consider a burger resturant. If you properly populated the
       | context of a ChatGPT-scale model the data for a burger resturant
       | from 1950, and gave it the kinda 'function calling' we're
       | plugging into LLMs now, it could manage it. It could keep track
       | of inventory, it could keep tabs on the employee-subprocesses,
       | knowing when to hire, fire, get new suppliers, all via function
       | calling. But it would never try to become McDonalds, because it
       | would have no model of the the internals of those function-calls,
       | and it would have no ability to investigate or modify the
       | behaviour of those function calls.
        
       | nomendos wrote:
       | "Eureka"!?
       | 
       | At the very early phase of the boom I was among a very few who
       | knew and predicted this (usually most free and deep
       | thinking/knowledgeable). Then my prediction got reinforced by the
       | results. One of the best examples was with one of my experiments
       | that all today's AI's failed to solve tree serialization and de-
       | serialization in each of the DFS(pre-order/in-order/post-order)
       | or BFS(level-order) which is 8 algorithms (2x4) and the result
       | was only 3 correct! Reason is "limited training inputs" since
       | internet and open source does not have other solutions :-) .
       | 
       | So, I spent "some" time and implemented all 8, which took me few
       | days. By the way this proves/demonstrates that ~15-30min
       | pointless leetcode-like interviews are requiring to
       | regurgitate/memorize/not-think. So, as a logical hard consequence
       | there will.has-to be a "crash/cleanup" in the area of leetcode-
       | like interviews as they will just be suddenly proclaimed as
       | "pointless/stupid"). However, I decided not to publish the rest
       | of the 5 solutions :-)
       | 
       | This (and other experiments) confirms hard limits of the LLM
       | approach (even when used with chain-of-thought). Increasing the
       | compute on the problem will produce increasingly smaller and
       | smaller results (inverse exponential/logarithmic/diminishing-
       | returns) = new AGI approach/design is needed and to my knowledge
       | majority of the inve$tment (~99%) is in LLM, so "buckle up" at-
       | some-point/soon?
       | 
       | Impacts and realities; LLM shall "run it's course" (produce some
       | products/results/$$$, get reviewed/$corrected) and whoever
       | survives after that pruning shall earn money on those products
       | while investing in the new research to find new AGI
       | design/approach (which could take quite a long time,... or not).
       | NVDA is at the center of thi$ and time-wise this
       | peak/turn/crash/correction is hard to predict (although I see it
       | on the horizon and min/max time can be estimated). Be aware and
       | alert. I'll stop here and hold my other number of
       | thoughts/opinions/ideas for much deeper discussion. (BTW I am
       | still "full in on NVDA" until,....)
        
       | jmward01 wrote:
       | Every negative headline I see about AI hitting a wall or being
       | over-hyped makes me think of the early 2000's with that new thing
       | the 'internet' (yes, I know the internet is a lot older than
       | that). There is little doubt in my mind that ten years from now
       | nearly every aspect of life will be deeply connected to AI just
       | like the internet took over everything in the late 90's and early
       | 2000's and is now deeply connected to everything now. I'd even
       | hazard to say that AI could be more impactful.
        
         | brookst wrote:
         | And, as I've noted a couple of times in this thread, how many
         | times have we heard that Moore's law is dead and compute has
         | hit a wall?
        
           | moffkalast wrote:
           | Well according to Nvidia you can just ignore Moore's law and
           | start requiring people to install multi kilowatt outlets just
           | for their cards. Who needs efficiency amirite?
        
         | akomtu wrote:
         | AI can be thought of as the 2nd stage of the creature that we
         | call the Internet. The 1st stage, that we are so familiar with,
         | is about gathering knowledge into a giant and somewhat
         | organized library. This library has books on every subject
         | imaginable, but its scale is so vast that no living human today
         | can grasp it. This is why the originally connected network has
         | started falling apart. Once this I becomes AI, all the books in
         | the library will be melted together into one coherent picture.
         | Once again, anyone anywhere on Earth will be able to access all
         | the knowledge and our Babylon will stay for a little longer.
        
         | JohnMakin wrote:
         | It's strange to me that's your takeaway. The reason that the
         | internet was overhyped in the 2000's is because it _was_ and
         | also heavily overvalued. It took a massive correction and
         | seriously disruptive bubble burst to break the delusion and
         | move on to something more sustainable.
        
           | jmward01 wrote:
           | I disagree that it was over hyped. It has transformed our
           | society so much that I would argue it was vastly under-hyped.
           | Sure, there were a lot of silly companies that sprang up and
           | went away because they weren't sound, but so much of the
           | modern economy is based on the internet that it is hard to
           | say any business isn't somehow internet related today. You
           | would be hard pressed to find any business anywhere that
           | doesn't at least have a social media account. If 2000 was
           | over-hyping things I just don't see it.
        
             | JohnMakin wrote:
             | pets.com was valued at $400 million based almost completely
             | on its domain name. That's the classic example. People were
             | throwing buckets of money at any .com that resolved to a
             | site and almost all of it failed. I'm not sure how that
             | doesn't meet the definition of over-hyped. It feels very
             | similar to now. Not even to mention - the web largely
             | doesn't consist of .com sites anymore, it's mostly a few
             | centralized sites and apps.
        
         | mvdtnz wrote:
         | Even if you're right (you're not) whatever "AI" looks like in
         | 20+ years will have virtually nothing in common with these
         | stupid statistical word generators.
        
       | LarsDu88 wrote:
       | Curves that look exponential in virtually all cases turn out to
       | be logarithmic.
       | 
       | Certain OpenAI insiders must have known this for a while, hence
       | Ilya Sutskever's new company in Israel
        
       | rubiquity wrote:
       | > Amodei has said companies will spend $100 million to train a
       | bleeding-edge model this year
       | 
       | Is it just me or does $100 million sound like it's on the very,
       | very low end of how much training a new model costs? Maybe you
       | can arrive within $200 million of that mark with amortization of
       | hardware? It just doesn't make sense to me that a new model would
       | "only" be $100 million when AmaGooBookSoft are spending tens of
       | billions on hardware and the AI startups are raising billions
       | every year or two.
        
       | yalogin wrote:
       | I do wonder how quickly llms will become a commodity AI
       | instrument just like any other AI out there. If so what happens
       | to openAI
        
       | russellbeattie wrote:
       | Go back a few decades and you'd see articles like this about CPU
       | manufacturers struggling to improve processor speeds and
       | questioning if Moore's Law was dead. Obviously those concerns
       | were way overblown.
       | 
       | That doesn't mean this article is irrelevant. It's good to know
       | if LLM improvements are going to slow down a bit because the low
       | hanging fruit has seemingly been picked.
       | 
       | But in terms of the overall effect of AI and questioning the
       | validity of the technology as a whole, it's just your basic FUD
       | article that you'd expect from mainstream news.
        
         | danjl wrote:
         | Actually, Moore's Law has been dead for quite a few years now.
         | Since we hit the power wall.
        
       | wildermuthn wrote:
       | Simply put, AGI requires more data: qualia.
        
       | yobid20 wrote:
       | This was predicted. Ai isnt going to get any better.
        
       | jppope wrote:
       | Just an observation. If the models are hitting the top of the
       | S-curve, that might be why Sam Altman raised all the money for
       | OpenAI... it might not be available if Venture Capitalists
       | realize that the gains are close to being done
        
       | m3kw9 wrote:
       | Hold your horses, OpenAI just came out with o1preview 2 months
       | ago, showing what test time computer can do
        
       | devit wrote:
       | It seems obvious to me that Common Crawl plus Github public
       | repositories have more than an enough data to train an AI that is
       | as good as any programmer (at tasks not requiring knowledge of
       | non-public codebases or non-public domain knowledge).
       | 
       | So the problem is more in the algorithm.
        
         | darknoon wrote:
         | I think just reading the code wouldn't make you a good
         | programmer, you'd need to "read" the anti-code, ie what doesn't
         | work, by trial and error. Models overconfidence that their code
         | will work often leads them to fail in practice.
        
           | krisroadruck wrote:
           | AlphaGo got better by playing against itself. I wonder if the
           | pathway forward here is to essentially do the same with
           | coding. Feed it some arbitrary SRS documents - have it
           | attempt to develop them including full code coverage testing.
           | Have it also take on roles of QA, stakeholders, red-team
           | security researchers, and users who are all aggressively
           | trying to find edge cases and point out everything wrong with
           | the application. Have it keep iterating and learn from the
           | findings. Keep feeding it new novel SRSs until the number off
           | attempts/iterations necessary to get a quality product out
           | the other side drops to some acceptable number.
        
       | superjose wrote:
       | I'm more on the camp that these techs don't need to be perfect,
       | but they need to be practical enough.
       | 
       | And I think the latter is good enough for us to do exciting
       | things.
        
         | imiric wrote:
         | How practical can they be when current flagship models generate
         | incorrect responses more than 50% of the time[1]?
         | 
         | This might be acceptable for amusing us with fiction and art,
         | and for filling the internet with even more spam and
         | propaganda, but would you trust them to write reliable code,
         | drive your car or control any critical machinery?
         | 
         | The truly exciting things are still out of reach, yet we just
         | might be at the Peak of Inflated Expectations to see it now.
         | 
         | [1]: https://openai.com/index/introducing-simpleqa/
        
       | Timber-6539 wrote:
       | Direct quote from the article: "The companies are facing several
       | challenges. It's become increasingly difficult to find new,
       | untapped sources of high-quality, human-made training data that
       | can be used to build more advanced AI systems."
       | 
       | The irony here is astounding.
        
       | czhu12 wrote:
       | If it becomes obvious that LLM's have a more narrow set of use
       | cases, rather than the all encompassing story we hear today, then
       | I would bet that the LLM platforms (OpenAI, Anthropic, Google,
       | etc) will start developing products to compete directly with
       | applications that supposed to be building on top of them like
       | Cursor, in an attempt to increase their revenue.
       | 
       | I wonder what this would mean for companies raising today on the
       | premise of building on top of these platforms. Maybe the best
       | ones get their ideas copied, reimplemented, and sold for cheaper?
       | 
       | We already kind of see this today with OpenAI's canvas and Claude
       | artifacts. Perhaps they'll even start moving into Palantir's
       | space and start having direct customer implementation teams.
       | 
       | It is becoming increasing obvious that LLM's are quickly becoming
       | commoditized. Everyone is starting to approach the same limits in
       | intelligence, and are finding it hard to carve out margin from
       | competitors.
       | 
       | Most recently exhibited by the backlash at claude raising prices
       | because their product is better. In any normal market, this would
       | be totally expected, but people seemed shocked that anyone would
       | charge more than the raw cost it would take to run the LLM
       | itself.
       | 
       | https://x.com/ArtificialAnlys/status/1853598554570555614
        
       | quantum_state wrote:
       | Hope this would be a constant reminder that brute force can only
       | get one that far, though it may still be useful when it is. With
       | lots of intuition gained, it's time to ponder things a bit more
       | deeply.
        
         | dmafreezone wrote:
         | Maybe, if you want to relearn the bitter lesson.
         | 
         | http://www.incompleteideas.net/IncIdeas/BitterLesson.html
        
       | cryptica wrote:
       | It's interesting the way things turned out so far with LLMs,
       | especially from the perspective of a software engineer. We are
       | trained to keep a certain skepticism when we see software which
       | appears to be working because, ultimately, the only question we
       | care about is "Does it meet user requirements?" and this is
       | usually framed in terms of users achieving certain goals.
       | 
       | So it's interesting that when AI came along, we threw caution to
       | the wind and started treating it like a silver bullet... Without
       | asking the question of whether it was applicable to this goal or
       | that goal...
       | 
       | I don't think anyone could have anticipated that we could have an
       | AI which could produce perfect sentences, faster than a human,
       | better than a human but which could not reason. It appears to
       | reason very well, better than most people, yet it doesn't
       | actually reason. You only notice this once you ask it to
       | accomplish a task. After a while, you can feel how it lacks
       | willpower. It puts into perspective the importance of willpower
       | when it comes to getting things done.
       | 
       | In any case, LLMs bring us closer to understanding some big
       | philosophical questions surrounding intelligence and
       | consciousness.
        
       | k__ wrote:
       | But AGI is always right around the corner?
       | 
       | I don't get it...
        
       | sssilver wrote:
       | One thing that makes the established AIs less ideal for my
       | (programming) use-case is that the technologies I use quickly
       | evolve past whatever the published models "learn".
       | 
       | On the other hand, a lot of these frameworks and languages have
       | relatively decent and detailed documentation.
       | 
       | Perhaps this is a naive question, but why can't I as a user just
       | purchase "AI software" that comes with a large pre-trained model
       | to which I can say, on my own machine, "go read this
       | documentation and help me write this app in this next version of
       | Leptos", and it would augment its existing model with this new
       | "knowledge".
        
       ___________________________________________________________________
       (page generated 2024-11-14 23:00 UTC)