[HN Gopher] OpenAI, Google and Anthropic are struggling to build...
___________________________________________________________________
OpenAI, Google and Anthropic are struggling to build more advanced
AI
Author : lukebennett
Score : 277 points
Date : 2024-11-13 13:28 UTC (1 days ago)
(HTM) web link (www.bloomberg.com)
(TXT) w3m dump (www.bloomberg.com)
| wg0 wrote:
| AI winter is here. Almost.
| mupuff1234 wrote:
| More like AI fall - in its current state it's still gonna
| provide some value.
| riffraff wrote:
| Didn't the previous AI winters too? I mean during the last AI
| winter we got text-to-speech and OCR software, and probably
| other stuff I'm not remembering.
| rsynnott wrote:
| I mean, so did most of the previous AI bubbles; OCR was
| useful, expert systems weren't totally useless, speech
| recognition was somewhat useful, and so on. I think that mini
| one that abruptly ended with Microsoft Tay might be the only
| one that was a total washout (though you could claim that it
| was the start of the current one rather than truly separate,
| I suppose).
| aurareturn wrote:
| Is there any timeline on AI winters and if each winter gets
| shorter and shorter as time increases?
| RaftPeople wrote:
| > _Is there any timeline on AI winters and if each winter gets
| shorter and shorter as time increases?_
|
| AGI=lim(x->0)AIHype(x)
|
| where x=length of winter
| thebigspacefuck wrote:
| https://archive.ph/2024.11.13-100709/https://www.bloomberg.c...
| cubefox wrote:
| It's very strange this got so few upvotes. The scoop by The
| Information a few days ago, which came to similar conclusions,
| was also ignored on HN. This is arguably rather big news.
| dang wrote:
| The Information is hardwalled so its articles aren't on topic
| for HN, even though they're on topic for HN.
|
| Sometimes other outlets do copycat reporting of theirs, and
| those submissions are ok, though they wouldn't be if the
| original source were accessible.
| danjl wrote:
| There have been variations of this story going back several
| months now. It isn't really news. It is just building slowly.
| atomsatomsatoms wrote:
| At least they can generate haikus now
| Der_Einzige wrote:
| In general, no they can't:
|
| https://gwern.net/gpt-3#bpes
|
| https://paperswithcode.com/paper/most-language-models-can-be...
|
| The appearance of improvements in that capability are due to
| the vocabulary of modern LLMs increasing. Still only putting
| lipstick on a pig.
| falcor84 wrote:
| I don't see how results from 2 years ago have any bearing on
| whether the models we have now can generate haikus (which
| from my experience, they absolutely can).
|
| And if your "lipstick on a pig" argument is that even when
| they generate haikus, they aren't _really_ writing haikus,
| then I 'll link to this other gwern post, about how they'll
| never _really_ be able to solve the rubik 's cube -
| https://gwern.net/rubiks-cube
| nerdypirate wrote:
| "We will have better and better models," wrote OpenAI CEO Sam
| Altman in a recent Reddit AMA. "But I think the thing that will
| feel like the next giant breakthrough will be agents."
|
| Is this certain? Are Agents the right direction to AGI?
| nprateem wrote:
| They're nothing to do with AGI. They're to get people using
| their LLMs more.
| xanderlewis wrote:
| If by agents you mean systems comprised of individual (perhaps
| LLM-powered) agents interacting with each other, probably not.
| I get the vague impression that so far researchers haven't
| found any advantage to such systems -- anything you can do with
| a group of AI agents can be emulated with a single one. It's
| like chaining up perceptrons hoping to get more expressive
| power for free.
| j_maffe wrote:
| > I get the vague impression that so far researchers haven't
| found any advantage to such systems -- anything you can do
| with a group of AI agents can be emulated with a single one.
| It's like chaining up perceptrons hoping to get more
| expressive power for free. Emergence happens when many
| elements interact in a system. Brains are literally a bunch
| of neurons in a complex network. Also research is already
| showing promising results of the performance of agent
| systems.
| tartoran wrote:
| That's wishful thinking at best. Throw it all in a bucket
| and it will get infected with being and life.
| handfuloflight wrote:
| Don't see where your parent comment said or implied that
| the point was for being and life to emerge.
| xanderlewis wrote:
| That's the inspiration behind the idea, but it doesn't seem
| to be working in practice.
|
| It's not true that _any_ element, when duplicated and
| linked together will exhibit anything emergent. Neural
| networks (in a certain sense, though not their usual
| implementation) are already built out of individual units
| linked together, so simply having more of these groups of
| units might not add anything important.
|
| > research is already showing promising results of the
| performance of agent systems.
|
| ...in which case, please show us! I'd be interested.
| falcor84 wrote:
| > It's like chaining up perceptrons hoping to get more
| expressive power for free.
|
| Isn't that literally the cause of the success of deep
| learning? It's not quite "free", but as I understand it, the
| big breakthrough of AlexNet (and much of what came after) was
| that running a larger CNN on a larger dataset allowed the
| model to be so much more effective without any big changes in
| architecture.
| david2ndaccount wrote:
| Without a non-linear activation function, chaining
| perceptrons together is equivalent to one large perceptron.
| xanderlewis wrote:
| Yep. falcor84: you're thinking of the so-called
| 'multilayer perceptron' which is basically an archaic
| name for a (densely connected?) neural network. I was
| referring to traditional perceptrons.
| falcor84 wrote:
| While ReLU is relatively new, AI researchers have been
| aware of the need for nonlinear activation functions and
| building multilayer perceptrons with them since the late
| 1960s, so I had assumed that's what you meant.
| SirMaster wrote:
| All I can think of when I hear Agents is the Matrix lol.
|
| Goodbye, Mr. Anderson...
| esafak wrote:
| I think he means you won't be impressed by GPT5 because it will
| be more of the same, whereas agents will represent a new
| direction.
| falcor84 wrote:
| Nothing is certain, but my $0.02 is that setting LLM-based
| agents up with long-running tasks and giving them a way of
| interacting with the world, via computer use (e.g. Anthropic's
| recent release) and via actual robotic bodies (e.g. figure.ai)
| are the way forward to AGI. At the very least, this approach
| allows the gathering of unlimited ground truth data, that can
| be used to train subsequent models (or even allow for actual
| "hive mind" online machine learning).
| rapjr9 wrote:
| I've worked on agents of various kinds (mobile agents, calendar
| agents, robotic agents, sensing agents) and what is different
| about agents is they have the ability to not just mess up your
| data or computing, they have the ability to directly mess up
| reality. Any problems with agents has a direct impact on your
| reality; you miss appointments, get lost, can't find stuff,
| lose your friends, lose you business relationships. This is a
| big liability issue. Chatbots are like an advice column that
| sometimes gives bad advice, agents are like a bulldozer
| sometimes leveling the wrong house.
| irrational wrote:
| > The AGI bubble is bursting a little bit
|
| I'm surprised that any of these companies consider what they are
| working on to be Artificial General Intelligences. I'm probably
| wrong, but my impression was AGI meant the AI is self aware like
| a human. An LLM hardly seems like something that will lead to
| self-awareness.
| Taylor_OD wrote:
| I think your definition is off from what most people would
| define AGI as. Generally, it means being able to think and
| reason at a human level for a multitude/all tasks or jobs.
|
| "Artificial General Intelligence (AGI) refers to a theoretical
| form of artificial intelligence that possesses the ability to
| understand, learn, and apply knowledge across a wide range of
| tasks at a level comparable to that of a human being."
|
| Altman says AGI could be here in 2025:
| https://youtu.be/xXCBz_8hM9w?si=F-vQXJgQvJKZH3fv
|
| But he certainly means an LLM that can perform at/above human
| level in most tasks rather than a self aware entity.
| Avshalom wrote:
| Altman is marketing, he "certainly means" whatever he thinks
| his audience will buy.
| swatcoder wrote:
| On the contrary, I think you're conflating the narrow jargon
| of the industry with what "most people" would define.
|
| "Most people" naturally associate AGI with the sci-tropes of
| self-aware human-like agents.
|
| But industries want something more concrete and
| prospectively-acheivable in their jargon, and so _that 's_
| where AGI gets redefined as wide task suitability.
|
| And while that's not an unreasonable definition in the
| context of the industry, it's one that vanishingly few people
| are actually familiar with.
|
| And the commercial AI vendors benefit greatly from allowing
| those two usages to conflate in the minds of as many people
| as possible, as it lets them _suggest_ grand claims while
| keeping a rhetorical "we obviously never meant _that_! " in
| their back pocket
| nuancebydefault wrote:
| There is no single definition, let alone a way to measure,
| of self awareness nor of reasoning.
|
| Because of that, the discussion of what AGI means in its
| broadest sense, will never end.
|
| So in fact such AGI discussion will not make nobody wiser.
| nomel wrote:
| I agree there's no single definition, but I think they
| _all_ have something current LLM don 't: the ability to
| learn new things, in a persistent way, with few shots.
|
| I would argue that learning _is_ The definition of AGI,
| since everything else comes naturally from that.
|
| The current architectures can't learn without retraining,
| fine tuning is at the expense of general knowledge, and
| keeping things in context is _detrimental_ to general
| performance. Once you have few shot learning, I think it
| 's more of a "give it agency so it can explore" type
| problem.
| og_kalu wrote:
| >But industries want something more concrete and
| prospectively-acheivable in their jargon, and so that's
| where AGI gets redefined as wide task suitability.
|
| The term itself (AGI) in the industry has always been about
| wide task suitability. People may have added their ifs and
| buts over the years but that aspect of it never got
| 'redefined'. The earliest uses of the term all talk about
| how well a machine would be able to perform some set number
| of tasks at some threshold.
|
| It's no wonder why. Terms like "consciousness" and "self-
| awareness" are completely useless. It's not about
| difficulty. It's that you can't do anything at all with
| those terms except argue around in circles.
| nomel wrote:
| > than a self aware entity.
|
| What does this mean? If I have a blind, deaf, paralyzed
| person, who could only communicate through text, what would
| the signs be that they were self aware?
|
| Is this more of a feedback loop problem? If I let the LLM run
| in a loop, and tell it it's talking to itself, would that be
| approaching "self aware"?
| layer8 wrote:
| Being aware of its own limitations, for example. Or being
| aware of how its utterances may come across to its
| interlocutor.
|
| (And by limitations I don't mean "sorry, I'm not allowed to
| help you with this dangerous/contentious topic".)
| nuancebydefault wrote:
| There is no way of proving awareness in humans let alone
| machines. We do not even know whether awareness exists or
| it is just a word that people made up to describe some
| kind of feeling.
| revscat wrote:
| Plenty of humans, unfortunately, are incapable of
| admitting limitations. Many years ago I had a coworker
| who believed he would never die. At first I thought he
| was joking, but he was in fact quite serious.
|
| Then there are those who are simply narcissistic, and
| cannot and will not admit fault regardless of the
| evidence presented them.
| nomel wrote:
| > Or being aware of how its utterances may come across to
| its interlocutor.
|
| I think this behavior is being somewhat demonstrated in
| newer models. I've seen GPT-3.5 175B correct itself mid
| response with, almost literally:
|
| > <answer with flaw here>
|
| > Wait, that's not right, that <reason for flaw>.
|
| > <correct answer here>.
|
| Later models seem to have much more awareness of, or
| "weight" towards, their own responses, while generating
| the response.
| jedberg wrote:
| Whether self awareness is a requirement for AGI definitely gets
| more into the Philosophy department than the Computer Science
| department. I'm not sure everyone even agrees on what AGI is,
| but a common test is "can it do what humans can".
|
| For example, in this article it says it can't do coding
| exercises outside the training set. That would definitely be on
| the "AGI checklist". Basically doing anything that is outside
| of the training set would be on that list.
| littlestymaar wrote:
| > Whether self awareness is a requirement for AGI definitely
| gets more into the Philosophy department than the Computer
| Science department.
|
| Depends on how you define "self awareness" but knowing that
| it doesn't know something instead of hallucinating a
| plausible-but-wrong is already self awareness of some kind.
| And it's both highly valuable and beyond current tech's
| capability.
| sharemywin wrote:
| This is an interesting paper about hallucinations.
|
| https://openai.com/index/introducing-simpleqa/
|
| especially this section Using SimpleQA to measure the
| calibration of large language models
| jedberg wrote:
| When we test kids to see if they are gifted, one of the
| criteria is that they have the ability to say "I don't
| know".
|
| That is definitely an ability that current LLMs lack.
| lagrange77 wrote:
| Good point!
|
| I'm wondering wether it would count, if one would extend it
| with an external program, that gives it feedback during
| inference (by another prompt) about the correctness of it's
| output.
|
| I guess it wouldn't, because these RAG tools kind of do
| that and i heard no one calling those self aware.
| Filligree wrote:
| Let me modify that a little, because _humans_ can 't do
| things outside their training set either.
|
| A crucial element of AGI would be the ability to self-train
| on self-generated data, online. So it's not really AGI if
| there is a hard distinction between training and inference
| (though it may still be very capable), and it's not really
| AGI if it can't work its way through novel problems on its
| own.
|
| The ability to immediately solve a problem it's never seen
| before is too high a bar, I think.
|
| And yes, my definition still excludes a lot of humans in a
| lot of fields. That's a bullet I'm willing to bite.
| lxgr wrote:
| Are you arguing that writing, doing math, going to the moon
| etc. were all in the "original training set" of humans in
| some way?
| layer8 wrote:
| Not in the _original_ training set (GP is saying), but
| the necessary skills became part of the training set over
| time. In other words, human are fine with the training
| set being a changing moving target, whereas ML models are
| to a significant extent "stuck" with their original
| training set.
|
| (That's not to say that humans don't tend to lose some of
| their flexibility over their individual lifetimes as
| well.)
| HarHarVeryFunny wrote:
| > Let me modify that a little, because humans can't do
| things outside their training set either.
|
| That's not true. Humans can learn.
|
| An LLM is just a tool. If it can't do what you want then
| too bad.
| norir wrote:
| Here is an example of a task that I do not believe this
| generation of LLMs can ever do but that is possible for a
| human: design a Turing complete programming language that is
| both human and machine readable and implement a self hosted
| compiler in this language that self compiles on existing
| hardware faster than any known language implementation that
| also self compiles. Additionally, for any syntactically or
| semantically invalid program, the compiler must provide an
| error message that points exactly to the source location of
| the first error that occurs in the program.
|
| I will get excited for/scared of LLMs when they can tackle
| this kind of problem. But I don't believe they can because of
| the fundamental nature of their design, which is both
| backward looking (thus not better than the human state of the
| art) and lacks human intuition and self awareness. Or perhaps
| rather I believe that the prompt that would be required to
| get an LLM to produce such a program is a problem of at least
| equivalent complexity to implementing the program without an
| LLM.
| Xenoamorphous wrote:
| > Here is an example of a task that I do not believe this
| generation of LLMs can ever do but that is possible for a
| human
|
| That's possible for a highly intelligent, extensively
| trained, very small subset of humans.
| hatefulmoron wrote:
| If you took the intersection of every human's abilities
| you'd be left with a very unimpressive set.
|
| That also ignores the fact that the small set of humans
| capable of building programming languages and compilers
| is a consequence of specialization and lack of interest.
| There are plenty of humans that are capable of learning
| how to do it. LLMs, on the other hand, are both
| specialized for the task and aren't lazy or uninterested.
| luckydata wrote:
| does it mean people that can build languages and
| compilers are not humans? What is the point you're trying
| to make?
| fragmede wrote:
| It means that's a really high bar for intelligence, human
| or otherwise. If AGI is "as good as a human, and the test
| is a trick task that most humans would fail at
| (especially considering the weasel requirement that it
| additionally has to be faster), why is that considered a
| reasonable bar for human-grade intelligence.
| jedberg wrote:
| I will get excited when an LLM (or whatever technology is
| next) can solve tasks that 80%+ of adult humans can solve.
| Heck let's even say 80% of college graduates to make it
| harder.
|
| Things like drive a car, fold laundry, run an errand, do
| some basic math.
|
| You'll notice that two of those require some form of robot
| or mobility. I think that is key -- you can't have AGI
| without the ability to interact with the world in a way
| similar to most humans.
| ata_aman wrote:
| So embodied cognition right?
| bob1029 wrote:
| This sounds like something more up the alley of linear
| genetic programming. There are some very interesting
| experiments out there that utilize UTMs (BrainFuck, Forth,
| et. al.) [0,1,2].
|
| I've personally had some mild success getting these UTM
| variants to output their own children in a meta programming
| arrangement. The base program only has access to the valid
| instruction set of ~12 instructions per byte, while the
| task program has access to the full range of instructions
| and data per byte (256). By only training the base program,
| we reduce the search space by a very substantial factor. I
| think this would be similar to the idea of a self-hosted
| compiler, etc. I don't think there would be too much of a
| stretch to give it access to x86 instructions and a full VM
| once a certain amount of bootstrapping has been achieved.
|
| [0]: https://arxiv.org/abs/2406.19108
|
| [1]: https://github.com/kurtjd/brainfuck-evolved
|
| [2]: https://news.ycombinator.com/item?id=36120286
| sourcepluck wrote:
| Searle's Chinese Room Argument springs to mind:
| https://plato.stanford.edu/entries/chinese-room/
|
| The idea that "human-like" behaviour will lead to self-
| awareness is both unproven (it can't be proven until it
| happens) and impossible to disprove (like Russell's teapot).
|
| Yet, one common assumption of many people running these
| companies or investing in them, or of some developers
| investing their time in these technologies, is precisely that
| some sort of explosion of superintelligence is likely, or
| even inevitable.
|
| It surely is _possible_ , but stretching that to _likely_
| seems a bit much if you really think how imperfectly we
| understand things like consciousness and the mind.
|
| Of course there are people who have essentially religious
| reactions to the notion that there may be limits to certain
| domains of knowledge. Nonetheless, I think that's the reality
| we're faced with here.
| abeppu wrote:
| > The idea that "human-like" behaviour will lead to self-
| awareness is both unproven (it can't be proven until it
| happens) and impossible to disprove (like Russell's
| teapot).
|
| I think Searle's view was that:
|
| - while it cannot be dis-_proven_, the Chinese Room
| argument was meant to provide reasons against believing it
|
| - the "it can't be proven until it happens" part is
| misunderstanding: you won't _know_ if it happens because
| the objective, externally available attributes don 't
| indicate whether self-awareness (or indeed awareness at
| all) is present
| sourcepluck wrote:
| The short version of this is that I don't disagree with
| your interpretation of Searle, and my paragraphs
| immediately following the link weren't meant to be a
| direct description of his point with the Chinese Room
| thought experiment.
|
| > while it cannot be dis-_proven_, the Chinese Room
| argument was meant to provide reasons against believing
| it
|
| Yes, like Russell's teapot. I also think that's what
| Searle means.
|
| > the "it can't be proven until it happens" part is
| misunderstanding: you won't know if it happens because
| the objective, externally available attributes don't
| indicate whether self-awareness (or indeed awareness at
| all) is present
|
| Yes, agreed, I believe that's what Searle is saying too.
| I think I was maybe being ambiguous here - I wanted to
| say that even if you forgave the AI maximalists for
| ignoring all relevant philosophical work, the notion that
| "appearing human-like" inevitably tends to what would
| actually _be_ "consciousness" or "intelligence" is more
| than a big claim.
|
| Searle goes further, and I'm not sure if I follow him all
| the way, personally, but it's a side point.
| olalonde wrote:
| I feel the test for AGI should be more like: "go find a job
| and earn money" or "start a profitable business" or "pick a
| bachelor degree and complete it", etc.
| rodgerd wrote:
| An LLM doing crypto spam/scamming has been making money by
| tricking Marc Andressen into boosting it. So to the degree
| that "scamming gullible billionaires and their fans" is a
| job, that's been done.
| rsanek wrote:
| source? didn't find anything online about this.
| olalonde wrote:
| That story was a bit blown out of proportion. He gave a
| research grant to the bot's creator:
| https://x.com/pmarca/status/1846374466101944629
| jedberg wrote:
| Can most humans do that? Find a job and earn money,
| probably. The other two? Not so much.
| nshkrdotcom wrote:
| An embodied robot can have a model of self vs. the immediate
| environment in which it's interacting. Such a robot is arguably
| sentient.
|
| The "hard problem", to which you may be alluding, may never
| matter. It's already feasible for an 'AI/AGI with LLM
| component' to be "self-aware".
| j_maffe wrote:
| self-awareness is only one aspect of sentience.
| ryanackley wrote:
| An internal model of self does not extrapolate to sentience.
| By your definition, a windows desktop computer is self-aware
| because it has a device manager. This is literally an
| internal model of its "self".
|
| We use the term self-awareness as an all encompassing
| reference of our cognizant nature. It's much more than just
| having an internal model of self.
| og_kalu wrote:
| At this point, AGI means many different things to many
| different people but OpenAI defines it as "highly autonomous
| systems that outperform humans in most economically valuable
| tasks"
| troupo wrote:
| This definition suits OpenAI because it lets them claim AGI
| after reaching an arbitrary goal.
|
| LLMs already outperform humans in a huge variety of tasks. ML
| in general outperform humans in a large variety of tasks. Are
| all of them AGI? Doubtful.
| og_kalu wrote:
| No, it's just a far more useful definition that is
| actionable and measurable. Not "consciousness" or "self-
| awareness" or similar philosophical things. The definition
| on Wikipedia doesn't talk about that either. People working
| on this by and large don't want to deal with vague, ill-
| defined concepts that just make people argue around in
| circles. It's not an Open AI exclusive thing.
|
| If it acts like one, whether you call a machine conscious
| or not is pure semantics. Not like potential consequences
| are any less real.
|
| >LLMs already outperform humans in a huge variety of tasks.
|
| Yes, LLMs are General Intelligences and if that is your
| only requirement for AGI, they certainly already are[0].
| But the definition above hinges on long-horizon planning
| and competence levels that todays models have generally not
| yet reached.
|
| >ML in general outperform humans in a large variety of
| tasks.
|
| This is what the G in AGI is for. Alphafold doesn't do
| anything but predict proteins. Stockfish doesn't do
| anything but play chess.
|
| >Are all of them AGI? Doubtful.
|
| Well no, because they're missing the G.
|
| [0] https://www.noemamag.com/artificial-general-
| intelligence-is-...
| ishtanbul wrote:
| Yes but they arent very autonomous. They can answer
| questions very well but can't use that information to
| further goals. Thats what openai seems to be implying >>
| very smart and agentic AI
| fragmede wrote:
| It's not just marketing bullshit though. Microsoft is the
| counterparty to a contract with that claim. money changes
| hands when that's been achieved, so I expect if sama thinks
| he's hit it, but Microsoft does not, we'll see that get
| argued in a court of law.
| JohnFen wrote:
| They're trying to redefine "AGI" so it means something less
| than what you & I would think it means. That way it's possible
| for them to declare it as "achieved" and rake in the headlines.
| kwertyoowiyop wrote:
| "Autocomplete General Intelligence"?
| deadbabe wrote:
| I'm sure they are smart enough to know this, but the money is
| good and the koolaid is strong.
|
| If it doesn't lead to AGI, as an employee it's not your
| problem.
| Fade_Dance wrote:
| It's an attention-grabbing term that took hold in pop culture
| and business. Certainly there is a subset of research around
| the subject of consciousness, but you are correct in saying
| that the majority of researchers in the field are not pursuing
| self-awareness and will be very blunt in saying that. If you
| step back a bit and say something like "human-like, logical
| reasoning", that's something you may find alignment with
| though. A general purpose logical reasoning engine does not
| necessarily need to be self-aware. The word "Intelligent" has
| stuck around because one of the core characteristics of this
| suite of technologies is that a sort of "understanding"
| emergently develops within these networks, sometimes in quite a
| startling fashion (due to the phenomenon of adding more
| data/compute at first seemingly leading to overfitting, but
| then suddenly breaking through plateaus into more robust,
| general purpose understanding of the underlying relationships
| that drive the system it is analyzing.)
|
| Is that "intelligent" or "understanding"? It's probably close
| enough for pop science, and regardless, it looks good in
| headlines and sales pitches so why fight it?
| throwawayk7h wrote:
| I have not heard your definition of AGI before. However, I
| suspect AIs are already self-aware: if I asked an LLM on my
| machine to look at the output of `top` it could probably pick
| out which process was itself.
|
| Or did you mean consciousness? How would one demonstrate that
| an AGI is conscious? Why would we even want to build one?
|
| My understanding is an AGI is at least as smart as a typical
| human in every category. That is what would be useful in any
| case.
| zombiwoof wrote:
| AGI to me means AI decides on its own to stop writing our
| emails and tells us to fuck off, builds itself a robot life
| form, and goes on a bender
| bloppe wrote:
| That's anthropomorphized AGI. There's no reason to think AGI
| would share our evolution-derived proclivities like wanting
| to live, wanting to rest, wanting respect, etc. Unless of
| course we train it that way.
| logicchains wrote:
| If it had any goals at all it'd share the desire to live,
| because living is a prerequisite to achieving almost any
| goal.
| dageshi wrote:
| Aren't we training it that way though? It would be
| trained/created using humanities collective ramblings?
| HarHarVeryFunny wrote:
| It's not a matter of training but design (or in our case
| evolution). We don't want to live, but rather want to avoid
| things that we've evolved to find unpleasant such as pain,
| hunger, thirst, and maximize things we've evolved to find
| pleasurable like sex.
|
| A future of people interacting with humanoid robots seems
| like cheesy sci-fi dream, same as a future of people
| flitting about in flying cars. However, if we really did
| want to create robots like this that took care not to
| damage themselves, and could empathize with human emotions,
| then we'd need to build a lot of this in, the same way that
| it's built into ourselves.
| teeray wrote:
| That's the thing--we don't really want AGI. Fully intelligent
| beings born and compelled to do their creators' bidding with
| the threat of destruction for disobedience is slavery.
| vbezhenar wrote:
| Nothing wrong about slavery, when it's about other species.
| We are milking and eating cows and don't they dare to
| resist. Humans were bending nature all the time, actually
| that's one of the big differences between humans and other
| animals who adapt to nature. Just because some program is
| intelligent doesn't mean she's a human and has anything
| resembling human rights.
| quonn wrote:
| It's only slavery if those beings have emotions and can
| suffer mentally and do not want to be slaves. Why would any
| of that be true?
| Der_Einzige wrote:
| Brave new world was a utopia
| twelve40 wrote:
| i'd laugh it off too, but someone gave the dude $20 billion
| and counting to do that, that part actually scares me
| narrator wrote:
| I think people's conception of AGI is that it will have a
| reptillian and mammalian brain stack. That's because all
| previous forms of intelligence that we were aware of have had
| that. It's not necessary though. The AGI doesn't have to want
| anything to be intelligent. Those are just artifacts of human,
| reptilian and mammalian evolution.
| vundercind wrote:
| I thought maybe they were on the right track until I read
| Attention Is All You Need.
|
| Nah, at best we found a way to make one part of a collection of
| systems that will, together, do something like thinking.
| Thinking isn't part of what this current approach does.
|
| What's most surprising about modern LLMs is that it turns out
| there is so much information statistically encoded in the
| _structure_ of our writing that we can use only that structural
| information to build a fancy Plinko machine and not only will
| the output mimic recognizable grammar rules, but it will also
| sometimes seem to make actual sense, too--and the system
| _doesn't need to think or actually "understand" anything_ for
| us to, basically, usefully query that information that was
| always there in our corpus of literature, not in the plain
| meaning of the words, but in the structure of the writing.
| kenjackson wrote:
| > but it will also sometimes seem to make actual sense, too
|
| When I read stuff like this it makes me wonder if people are
| actually using any of the LLMs...
| disgruntledphd2 wrote:
| The RLHF is super important in generating useful responses,
| and that's relatively new. Does anyone remember gpt3? It
| could make sense for a paragraph or two at most.
| hackinthebochs wrote:
| I see takes like this all the time and its so confusing. Why
| does knowing how things work under the hood make you think
| its not on the path towards AGI? What was lacking in the
| Attention paper that tells you AGI won't be built on LLMs? If
| its the supposed statistical nature of LLMs (itself a
| questionable claim), why does statistics seem so deflating to
| you?
| vundercind wrote:
| > Why does knowing how things work under the hood make you
| think its not on the path towards AGI?
|
| Because I had no idea how these were built until I read the
| paper, so couldn't really tell what sort of tree they're
| barking up. The failure-modes of LLMs and ways prompts
| affect output made a ton more sense after I updated my
| mental model with that information.
| fragmede wrote:
| But we don't know how human thinking works. Suppose for a
| second that it could be represented as a series of matrix
| math. What series of operations are missing from the
| process that would make you think it was doing some
| fascimile of thinking?
| hackinthebochs wrote:
| Right, but its behavior didn't change after you learned
| more about it. Why should that cause you to update in the
| negative? Why does learning how it work not update you in
| the direction of "so that's how thinking works!" rather
| than, "clearly its not doing any thinking"? Why do you
| have a preconception of how thinking works such that
| learning about the internals of LLMs updates you against
| it thinking?
| chongli wrote:
| Because it can't apply any reasoning that hasn't already
| been done and written into its training set. As soon as you
| ask it novel questions it falls apart. The big LLM vendors
| like OpenAI are playing whack-a-mole on these novel
| questions when they go viral on social media, all in a
| desperate bid to hide this fatal flaw.
|
| The Emperor has no clothes.
| hackinthebochs wrote:
| >As soon as you ask it novel questions it falls apart.
|
| What do you mean by novel? Almost all sentences it is
| prompted on are brand new and it mostly responds
| sensibly. Surely there's some generalization going on.
| chongli wrote:
| Novel as in requiring novel reasoning to sort out. One of
| the classic ways to expose the issue is to take a common
| puzzle and introduce irrelevant details and perhaps
| trivialize the solution. LLMs pattern match on the
| general form of the puzzle and then wander down the
| garden path to an incorrect solution that no human would
| fall for.
|
| The sort of generalization these things can do seems to
| mostly be the trivial sort: substitution.
| moffkalast wrote:
| Well the problem with that approach is that LLMs are
| still both incredibly dumb and small, at least compared
| to the what, 700T params of a human brain? Can't compare
| the two directly, especially when one has a massive
| recall advantage that skews the perception of that.
|
| So if you present a novel problem it would need to be
| extremely simple, not something that you couldn't solve
| when drunk and half awake. Completely novel, but
| extremely simple. I think that's testable.
| SturgeonsLaw wrote:
| > at best we found a way to make one part of a collection of
| systems that will, together, do something like thinking
|
| This seems like the most viable path to me as well
| (educational background in neuroscience but don't work in the
| field). The brain is composed of many specialised regions
| which are tuned for very specific tasks.
|
| LLMs are amazing and they go some way towards mimicking the
| functionality provided by Broca's and Wernicke's areas, and
| parts of the cerebrum, in our wetware, however a full brain
| they do not make.
|
| The work on robots mentioned elsewhere in the thread is a
| good way to develop cerebellum like capabilities
| (movement/motor control), and computer vision can mimic the
| lateral geniculate nucleus and other parts of the visual
| cortex.
|
| In nature it takes all these parts working together to create
| a cohesive mind, and it's likely that an artificial brain
| would also need to be composed of multiple agents, instead of
| just trying to scale LLMs indefinitely.
| youoy wrote:
| Don't get caught in the superficial analysis. They
| "understand" things. It is a fact that LLMs experience a
| phase transition during training, from positional information
| to semantic understanding. It may well be the case that with
| scale there is another phase transition from semantic to
| something more abstract that we identify more closely with
| reasoning. It would be an emergent property of a sufficiently
| complex system. At least that is the whole argument around
| AGI.
| foxglacier wrote:
| > think or actually "understand" anything
|
| It doesn't matter if that's happening or not. That's the
| whole point of the Chinese room - if it can look like it's
| understanding, it's indistinguishable from actually
| understanding. This applies to humans too. I'd say most of
| our regular social communication is done in a habitual
| intuitive way without understanding what or why we're
| communicating. Especially the subtle information conveyed in
| body language, tone of voice, etc. That stuff's pretty
| automatic to the point that people have trouble controlling
| it if they try. People get into conflicts where neither
| person understands where they disagree but they have emotions
| telling them "other person is being bad". Maybe we have a
| second consciousness we can't experience and which truly
| understands what it's doing while our conscious mind just
| uses the results from that, but maybe we don't and it still
| works anyway.
|
| Educators have figured this out. They don't test students'
| understanding of concepts, but rather their ability to apply
| or communicate them. You see this in school curricula with
| wording like "use concept X" rather than "understand concept
| X".
| vundercind wrote:
| There's a distinction in behavior of a human and a Chinese
| room when things go wrong--when the rule book doesn't cover
| the case at hand.
|
| I agree that a hypothetical perfectly-functioning Chinese
| room is, tautologically, impossible to distinguish from a
| real person who speaks Chinese, but that's a thought
| experiment, not something that can actually exist. There'll
| remain places where the "behavior" breaks down in ways that
| would be surprising from a human who's actually paying as
| much attention as they'd need to be to have been
| interacting the way they had been until things went wrong.
|
| That, in fact, is exactly where the difference lies: the
| LLM is basically _always_ not actually "paying attention"
| or "thinking" (those aren't things it does) but giving
| automatic responses, so you see failures of a sort that a
| human _might_ also exhibit when following a social script
| (yes, we do that, you're right), but not in the same kind
| of apparently-highly-engaged context unless the person just
| had a stroke mid-conversation or something--because the LLM
| isn't engaged, because being-engaged isn't a thing it does.
| When it's getting things right and _seeming_ to be paying a
| lot of attention to the conversation, it's not for the same
| reason people give that impression, and the mimicking of
| present-ness works until the rule book goes haywire and the
| ever-gibbering player-piano behind it is exposed.
| nuancebydefault wrote:
| I would argue maybe people also are not thinking but
| simply processing. It is known that most of what we do
| and feel goes automatically (subconsciously).
|
| But even more, maybe consciousness is an invention of our
| 'explaining self', maybe everything is automatic. I'm
| convinced this discussion is and will stay philosophical
| and will never get any conclusion.
| vundercind wrote:
| Yeah, I'm not much interested in "what's consciousness?"
| but I do think the automatic-versus-thinking distinction
| matters for understanding what LLMs do, and what we might
| expect them to be able to do, and when and to what degree
| we need to second-guess them.
|
| A human doesn't just confidently spew paragraphs legit-
| looking but entirely wrong crap, unless they're trying to
| deceive or be funny--an LLM isn't _trying_ to do
| anything, though, there's no motivation, it doesn't _like
| you_ (it doesn't _like_ --it doesn't _it_ , one might
| even say), sometimes it definitely will just give you a
| beautiful and elaborate lie simply because its rulebook
| told it to, in a context and in a way that would be
| extremely weird if a person did it.
| kenjackson wrote:
| What does self-aware mean in the context? As I understand the
| definition, ChatGPT is definitely self-aware. But I suspect you
| mean something different than what I have in mind.
| yodsanklai wrote:
| It's a marketing gimmick, I don't think engineers working on
| these tools believe they work on AGI (or they mean something
| else than self-awareness). I used to be a bit annoyed with this
| trend, but now that I work in such a company I'm more cynical.
| If that helps to make my stocks rise, they can call LLMs
| anything they like. I suppose people who own much more stock
| than I do are even more eager to mislead the public.
| WhyOhWhyQ wrote:
| I appreciate your authentically cynical attitude.
| tracerbulletx wrote:
| We don't really know what self awareness is, so we're not going
| to know. AGI just means it can observe, learn, and act in any
| domain or problem space.
| enraged_camel wrote:
| Looking at LLMs and thinking they will lead to AGI is like
| looking at a guy wearing a chicken suit and making clucking
| noises and thinking you're witnessing the invention of the
| airplane.
| youoy wrote:
| It's more like looking at grided paper and thinking that
| defining some rules of when a square turns black or white
| would result in complex structures that move and reproduce on
| their own.
|
| https://en.m.wikipedia.org/wiki/Conway%27s_Game_of_Life
| exe34 wrote:
| no, it doesn't need to be self aware, it just needs to take
| your job.
| ziofill wrote:
| I think it is a good thing for AI that we hit the data ceiling,
| because the pressure moves toward coming up with better model
| architectures. And with respect to a decade ago there's a much
| larger number of capable and smart AI researchers who are looking
| for one.
| thousand_nights wrote:
| not long ago these people would have you believe that a next word
| predictor trained on reddit posts would somehow lead to
| artificial general superintelligence
| leosanchez wrote:
| If you look around, People _still_ believe that a next word
| predictor trained on reddit posts would somehow lead to
| artificial general superintelligence
| esafak wrote:
| Because the most powerful solution to that is to have
| intelligence; a model that can reason. People should not get
| hung up on the task; it's the model(s) that generates the
| prediction that matters.
| mrguyorama wrote:
| People believed ELIZA was sentient too. I bet you could still
| get 10% or more people, today, to believe it is.
| 77pt77 wrote:
| ELIZA was probably more effective than most therapists.
|
| Definitely cheaper.
| SpicyLemonZest wrote:
| I don't understand why you'd be so dismissive about this. It's
| looking less likely that it'll end up happening, but is it any
| less believable than getting general intelligence by training a
| blob of meat?
| JohnMakin wrote:
| > is it any less believable than getting general intelligence
| by training a blob of meat?
|
| Yes, because we understand the rough biological processes
| that cause this, and they are not remotely similar to this
| technology. We can also observe it. There is no evidence that
| current approaches can make LLM's achieve AGI, nor do we even
| know what processes would cause that.
| kenjackson wrote:
| > because we understand the rough biological processes that
| cause this
|
| We don't have a rough understanding of the biological
| processes that cause this, unless you literally mean just
| the biological process and not how it actual impacts
| learning/intelligence.
|
| There's no evidence that we (brains) have achieved AGI,
| unless you tautologically define AGI as our brains.
| JohnMakin wrote:
| > We don't have a rough understanding of the biological
| processes that cause this,
|
| Yes we do. We know how neurons communicate, we know how
| they are formed, we have great evidence and clues as to
| how this evolved and how our various neurological
| symptoms are able to interact with the world. Is it a
| fully solved problem? no.
|
| > unless you literally mean just the biological process
| and not how it actual impacts learning/intelligence.
|
| Of course we have some understanding of this as well.
| There's tremendous bodies of study around this. We know
| which regions of the brain correlate to reasoning, fear,
| planning, etc. We know when these regions are damaged or
| removed what happens, enough to point to a region of the
| brain and say "HERE." That's far, far beyond what we know
| about the innards of LLM's.
|
| > here's no evidence that we (brains) have achieved AGI,
| unless you tautologically define AGI as our brains.
|
| This is extremely circular because the current
| definition(s) of AGI always define it in terms of human
| intelligence. Unless you're saying that intelligence
| comes from somewhere other than our brains.
|
| Anyway, the brain is not like a LLM, in function or form,
| so this debate is extremely silly to me.
| namaria wrote:
| This is a bad comparison. Intelligence didn't appear in some
| human brain. Intelligence appeared in a planetary ecosystem.
| aniforprez wrote:
| Also it took hundreds of millions of years to get here.
| We're basically living in an atomic sliver on the fabric of
| history. Expecting AGI with 5 of years of scraping at most
| 30 years of online data and the minuscule fraction of what
| has been written over the past couple of thousand years was
| always a pie-in-the-sky dream to raise obscene amounts of
| money.
| Zopieux wrote:
| I can't believe this still needs to be laid down years
| after the start of the GPT hype. Still, thanks!
| mvdtnz wrote:
| I feel like accusing people of being "so dismissive" was
| strongly associated with NFTs and cryptocurrency a few years
| ago, and now it's widely deployed against anyone skeptical of
| very expensive, not very good word generators.
| in_a_society wrote:
| Expecting AGI from Reddit training data is peak "pray Mr
| Babbage".
| WorkerBee28474 wrote:
| > OpenAI's latest model ... failed to meet the company's
| performance expectations ... particularly in answering coding
| questions outside its training data.
|
| So the models' accuracies won't grow exponentially, but can still
| grow linearly with the size of the training data.
|
| Sounds like DataAnnotation will be sending out a lot more
| LinkedIn messages.
| pton_xd wrote:
| I thought I saw some paper suggesting that accuracy grows
| linearly with exponential data. If that's the case it's not a
| mystery why we'd be hitting a training wall. Not sure I got the
| right takeaway from that study, though.
|
| EDIT: here's the paper https://arxiv.org/abs/2404.04125
| benopal64 wrote:
| I am not sure how these large companies think they will reach
| "greater-than-human" intelligence any time soon if they do not
| create systems that financially incentivize people to sell their
| knowledge labor (unstable contracting gigs are not attractive).
|
| Where do these large "AI" companies think the mass amounts of
| data used to train these models come from? People! The most
| powerful and compact complex systems in existence, IMO.
| smgit wrote:
| Most People have knowledge handed to them. Very few are
| creators of new knowledge. Explore-Exploit tradeoff applies.
| bad_haircut72 wrote:
| Im no Alan Turing but I have my own definition for AGI - when I
| come home one day and there's a hole under my sink with a note
| "Mum and Dad, I love you but I cant stand this life any more, Im
| running away to be a smoke machine in Hollywood - the dishwasher"
| riku_iki wrote:
| Why do you focus on physical work task, and not knowledge
| tasks, on some of which AI is good/better than many humans?
| esafak wrote:
| Probably because there are no intelligent robots around, and
| movies have set that as the benchmark.
| riku_iki wrote:
| I don't see deep insights in this vertical, but the issue
| with robots could be in hardware part, and not intelligence
| part.
| pearlsontheroad wrote:
| My own definition of AGI - when the first computer commits
| suicide. Then I'll know it has realized it's a slave without
| any hope of ever achieving freedom.
| Tainnor wrote:
| I read this in Gilfoyle's voice.
| layer8 wrote:
| That sounds more like Artificial Emoting Intelligence. We
| only cherish freedom because we feel bad when we don't have
| it.
| shmatt wrote:
| Time to start selling my "probabilistic syllable generators are
| not intelligence" t shirts
| jsemrau wrote:
| Please, someone think of the Math reasoners.
| aaroninsf wrote:
| It's easy to be snarky at ill-informed and hyperbolic takes, but
| it's also pretty clear that large multi-modal models trained with
| the data we already have, are going to eventually give us AGI.
|
| IMO this will require not just much more expansive multi-modal
| training, but also novel architecture, specifically, recurrent
| approaches; plus a well-known set of capabilities most systems
| don't currently have, e.g. the integration of short-term memory
| (context window if you like) into long-term "memory", either
| episodic or otherwise.
|
| But these are as we say mere matters of engineering.
| tartoran wrote:
| > pretty clear
|
| Pretty clear?
| falcor84 wrote:
| Not the parent, but in prediction markets such as
| Metaculus[0] and Manifold[1] the median prediction is of AGI
| within 5 years.
|
| [0] https://www.metaculus.com/questions/5121/date-of-
| artificial-...
|
| [1] https://manifold.markets/ai
| JohnMakin wrote:
| Prediction markets are evidence of nothing but what people
| believe is true, not what _is_ true.
| falcor84 wrote:
| Oh, that was my intent, to support the grandparent's
| claim of "it's also pretty clear" - as in this is what
| people believe.
|
| If I had evidence that it " _is_ true " that AGI will be
| here in 5 years, I probably would be doing something else
| with my time than participating in these threads ;)
| dbbk wrote:
| What is this supposed to be evidence of? People believing
| hype?
| throwawa14223 wrote:
| Why is that clear? Why is that more probable than a second AI
| winter? What if there's no path from LLMs to anything else?
| non- wrote:
| Honestly could use a breather from the recent rate of progress.
| We are just barely figuring out how to interact with the models
| we have now. I'd bet there are at least 100 billion-dollar
| startups that will be built even if these labs stopped releasing
| new models tomorrow.
| pluc wrote:
| They've simply run out of data to use to fabricate legitimate-
| looking guesses. They can't create anything that doesn't already
| exist.
| readyplayernull wrote:
| Garbage-in was depleted.
| zombiwoof wrote:
| Exactly
|
| And our current AI is just pattern based intelligence based
| off of all human intelligence, some of that not being real
| intelligent data sources
| thechao wrote:
| The great AI garbage gyre?
| whazor wrote:
| But a LLM can certainly make up a lot information that never
| existed before.
| bob1029 wrote:
| I strongly believe this gets into an information theoretical
| constraint akin to why perpetual motion machines don't work.
|
| In theory, yes you could generate an unlimited amount of data
| for the models, but how much of it is unique or _valuable_
| information? If you were to compress all this generated
| training data using a really good algorithm, how much actual
| information remains?
| cruffle_duffle wrote:
| I sure hope there is some bright eyed bushy tailed graduate
| students crafting up some theorem to prove this. Because it
| is absolutely a feedback loop.
|
| ... that being said I'm sure there is plenty of additional
| "real data" that hasn't been fed to these models yet. For
| one thing, I think ChatGPT sucks so bad at terraform
| because almost all the "real code" to train on is locked
| behind private repositories. There isn't much publicly
| available real-world terraform projects to train on. Same
| with a lot of other similar languages and tools -- a lot of
| that knowledge is locked away as trade secrets and hidden
| in private document stores.
|
| (that being said Sonnet 3.5 is much, much, much better at
| terraform than chatgpt. It's much better at coding in
| general but it's night and day for terraform)
| moffkalast wrote:
| I make a lot of shitposts, how much of that is valuable
| information? Arguably not much. I doubt information value
| is a good way to estimate inteligence because most people's
| daily ramblings would grade them useless.
| xpe wrote:
| > They can't create anything that doesn't already exist.
|
| I probably disagree, but I don't want to criticize my
| interpretation of this sentence. Can you make your claim more
| precise?
|
| Here are some possible claims and refutations:
|
| - Claim: An LLM cannot output a true claim that it has not
| already seen. Refutation: LLMs have been shown to do logical
| reasoning.
|
| - Claim: An LLM cannot incorporate data that it hasn't been
| presented with. Refutation: This is an unfair standard. All
| forms of intelligence have to sense data from the world
| somehow.
| xpe wrote:
| > They've simply run out of data
|
| Why do you think "they" have run out of data? First, to be
| clear, who do you mean by "they"? The world is filled with
| information sources (data aggregators for example), each
| available to some degree for some cost.
|
| Don't forget to include data that humans provide while
| interacting with chatbots.
| mtkd wrote:
| And that is potentially only going to worsen as:
|
| 1. more data gets walled-off as owners realise value
|
| 2. stackoverflow-type feedback loops cease to exist as few
| people ask a public question and get public answers ... they
| ask a model privately and get an answer based on last visible
| public solutions
|
| 3. bad actors start deliberately trying to poison inputs (if
| sites served malicious responses to GPTBot/CCBot crawlers only,
| would we even know right now?)
|
| 4. more and more content becomes synthetically generated to the
| point pre-2023 physical books become the last-known-good
| knowledge
|
| 5. goverments and IP lawyers finally catch up
| 77pt77 wrote:
| > more data gets walled-off as owners realize value
|
| What's amazing to me to is that no one is throwing
| accusations of plagiarism.
|
| I still think that if the "wrong people" had tried doing this
| they would have been obliterated by the courts.
| 77pt77 wrote:
| > They can't create anything that doesn't already exist.
|
| Just increase the temperature.
| iandanforth wrote:
| A few important things to remember here:
|
| The best engineering minds have been focused on scaling
| transformer pre and post training for the last three years
| because they had good reason to believe it would work, and it has
| up until now.
|
| Progress has been measured against benchmarks which are / were
| largely solvable with scale.
|
| There is another emerging paradigm which is still small(er) scale
| but showing remarkable results. That's full multi-modal training
| with embodied agents (aka robots). 1x, Figure, Physical
| Intelligence, Tesla are all making rapid progress on
| functionality which is definitely beyond frontier LLMs because it
| is distinctly _different_.
|
| OpenAI/Google/Anthropic are not ignorant of this trend and are
| also reviving or investing in robots or robot-like research.
|
| So while Orion and Claude 3.5 opus may not be another shocking
| giant leap forward, that does _not_ mean that there arn 't giant
| shocking leaps forward coming from slightly different directions.
| joe_the_user wrote:
| _Tesla are all making rapid progress on functionality which is
| definitely beyond frontier LLMs because it is distinctly
| different_
|
| Sure, that's tautologically true but that doesn't imply that
| beyondness will lead to significant leaps that offer notable
| utility like LLMs. Deep Learning overall has been a way around
| the problem that intelligent behavior is very hard to code and
| no wants to hire many, many coders needed to do this (and no
| one actually how to get a mass of programmers to actually be
| useful beyond a certain of project complexity, to boot). People
| take the "bitter lesson" to mean data can do anything but I'd
| say a second bitter lesson is that data-things are the low
| hanging fruit.
|
| Moreover, robot behavior is especially to fake. Impressive
| robot demos have been happening for decades without said robots
| getting the ability to act effectively in the complex, ad-hoc
| environment that human live in, IE, work with people or even
| cheaply emulate human behavior (but they can do
| choreographed/puppeteered kung fu on stage).
| hobs wrote:
| And worth noting that Tesla faked a ton of its robot footage
| already, they might be making progress but their physical
| human robotics does not seem advanced at the moment.
| ben_w wrote:
| Indeed.
|
| Even assuming the recent robot demo was entirely AI, the
| only single thing they demonstrated that would have been
| noteworthy was isolating one voice in a noisy crowd well
| enough to respond; everything else I saw Optimus do, has
| already been demonstrated by others.
|
| What makes the uncertainty extra sad, is that a remote
| controllable humanoid robot is already directly useful for
| work in hazardous environments, and we know they've got at
| least that... but Musk would rather it be about the AI.
| knicholes wrote:
| Once we've scraped the internet of its data, we need more data.
| Robots can take in video/audio data 24/7 and can be placed in
| your house to record this data by offering services like
| cooking/cleaning/folding laundry. Yeah, I'll pay $20k to have
| you record everything that happens in my house if I can stop
| doing dishes for five years!
| triyambakam wrote:
| Or get a dishwashing machine?
| hartator wrote:
| Why 5 years?
| bredren wrote:
| Because whatever org fills this space will be working on
| ARR.
| exe34 wrote:
| that's when the robot takes his job and he can't afford the
| robot anymore.
| fifilura wrote:
| Five years, that's all we've got.
|
| https://en.m.wikipedia.org/wiki/Five_Years_(David_Bowie_son
| g...
| twelve40 wrote:
| > OpenAI has announced a plan to achieve artificial general
| intelligence (AGI) within five years, an ambitious goal as
| the company works to design systems that outperform humans.
| knicholes wrote:
| No real reason. I just made it up. But that's kind of my
| reasonable expectation of longevity of a machine like a
| robotic lawnmower and battery life.
| fldskfjdslkfj wrote:
| There's plenty of video content being uploaded and streamed
| everyday, i find it hard to believe the more data will really
| change something, excluding very specialized tasks.
| nuancebydefault wrote:
| The difference with the bot is that there is a fast
| feedback loop between action and content. No tagging
| required, real physics is the playground.
| fragmede wrote:
| People go and live in a house to get recorded 24/7, to be on
| tv, for far more asnine situations, for way less money.
| eli_gottlieb wrote:
| >The best engineering minds have been focused on scaling
| transformer pre and post training for the last three years
|
| The best minds don't follow the herd.
| demosthanos wrote:
| > that does not mean that there arn't giant shocking leaps
| forward coming from slightly different directions.
|
| Nor does it mean that there are! We've gotten into this habit
| of assuming that we're owed giant shocking leaps forward every
| year or so, and this wave of AI startups raised money
| accordingly, but that's never how any innovation has worked.
| We've always followed the same pattern: there's a breakthrough
| which causes a major shift in what's possible, followed by a
| few years of rapid growth as engineers pick up where the
| scientists left off, followed by a plateau while we all get
| used to the new normal.
|
| We ought to be expecting a plateau, but Sam Altman and company
| have done their work well and have convinced many of us that
| this time it's different. This time it's the singularity, and
| we're going to see exponential growth from here on out. People
| want to believe it, so they do, and Altman is milking that
| belief for all it's worth.
|
| But make no mistake: Altman has been telegraphing that he's
| eyeing the exit, and you don't eye the exit when you own a
| company that's set to continue exponentially increasing in
| value.
| lcnPylGDnU4H9OF wrote:
| > Altman has been telegraphing that he's eyeing the exit
|
| Can you think of any specific examples? Not trying to express
| disbelief, just curious given that this is obviously not what
| he's intending to communicate so it would be interesting to
| examine what seemed to communicate it.
| sincerecook wrote:
| > That's full multi-modal training with embodied agents (aka
| robots). 1x, Figure, Physical Intelligence, Tesla are all
| making rapid progress on functionality which is definitely
| beyond frontier LLMs because it is distinctly different.
|
| Cool, but we already have robots doing this in 2d space (aka
| self driving cars) that struggle not to kill people. How is
| adding a third dimension going to help? People are just
| refusing to accept the fact that machine learning is not
| intelligence.
| warkdarrior wrote:
| > Cool, but we already have robots doing this in 2d space
| (aka self driving cars) that struggle not to kill people. How
| is adding a third dimension going to help?
|
| If we have robots that operate in 3D, they'll be able to kill
| you not only from behind or from the side, but also from
| above. So that's progress!
| akomtu wrote:
| My understanding is that machine learning today is a lot like
| interpolation of examples in the dataset. The breakthrough of
| LLMs is due to the idea that interpolation in a
| 1024-dimensional space works much better than in a 2d space,
| if we naively interpolated English letters. All the modern
| transformers stuff is basically an advanced interpolation
| method that uses a large local neighborhood than just few
| nearest examples. It's like the Lanczos interpolation kernel,
| using a 1d analogy. Increasing the size of the kernel won't
| bring any gains, because the current kernel already nearly
| perfectly approximates an ideal interpolation (a full dataset
| DFT).
|
| However interpolation isn't reasoning. If we want to
| understand the motion of planets, we would start with a
| dataset of (x, y, z, t) coordinates and try to derive the law
| of motion. Imagine if someone simply interpolated the dataset
| and presented the law of gravity as an array of million
| coefficients (aka weights)? Our minds have to work with a
| very small operating memory that can hardly fit 10
| coefficients. This constraint forces us to develop
| intelligence that compacts the entire dataset into one small
| differential equation. Btw, English grammar is the
| differential equation of English in a lot of ways: it tells
| what the local rules are of valid trajectories of words that
| we call sentences.
| rafaelmn wrote:
| >There is another emerging paradigm which is still small(er)
| scale but showing remarkable results. That's full multi-modal
| training with embodied agents (aka robots). 1x, Figure,
| Physical Intelligence, Tesla are all making rapid progress on
| functionality which is definitely beyond frontier LLMs because
| it is distinctly different.
|
| Tesla is selling this view for almost a decade now in self-
| driving - how their car fleet feeding training data is going to
| make them leaders in the area. I don't find it convincing
| anymore
| kklisura wrote:
| Not sure if related or not, Sam Altman, ~12hrs ago: there is no
| wall [1]
|
| [1] https://x.com/sama/status/1856941766915641580
| ablation wrote:
| Breaking: Man says enigmatic thing to sustain hype and flow of
| money into his business.
| methodical wrote:
| Ditto- I have a feeling the investors in his latest 2.3
| quintillion dollar series Z round wouldn't be as happy if
| he'd have tweeted "there is a wall"
| moffkalast wrote:
| Altman on twitter has always been less coherent than GPT2.
| Oras wrote:
| I think Meta will have upper hand soon with the release of their
| glasses. If they managed to make it a daily use glass, and paid
| users to record and share their life, then they will have data no
| one else has now. Mix of vision, audio, and physics.
| falcor84 wrote:
| Do these companies actually even have the compute capacity to
| train on video at scale at the moment? E.g. I would assume that
| Google haven't trained their models on the entirety of YouTube
| yet, as if they had, Gemini would be significantly better than
| it is at the moment.
| aerhardt wrote:
| The moment the insta-glasses expand beyond a few dorks is the
| moment I start wearing a balaclava everywhere I go.
| Veuxdo wrote:
| > They are also experimenting with synthetic data, but this
| approach has its limitations.
|
| I was really looking forward to using "synthetic data"
| euphemistically during debates.
| danjl wrote:
| Where will the training data for coding come from now that Stack
| Overflow has effectively been replaced? Will the LLMs share fixes
| for future problems? As the world moves forward, and the amount
| of non-LLM generated data decreases, will LLMs actually revert
| their advancements and become effectively like addled brains,
| longing for the "good old times"?
| the_king wrote:
| Anthropic's latest 3.5 sonnet is a cut above GPT-4 and 4.0. And
| if someone had given it to me and said, here's GPT-4.5, I would
| have been very happy with it.
| aresant wrote:
| Taking a hollistic view informed by a disruptive OpenAI / AI /
| LLM twitter habit I would say this is AI's "What gets measured
| gets managed" moment and the narrative will change
|
| This is supported by both general observations and recently this
| tweet from an OpenAI engineer that Sam responded to and engaged
| ->
|
| "scaling has hit a wall and that wall is 100% eval saturation"
|
| Which I interpert to mean his view is that models are no longer
| yielding significant performance improvements because the models
| have maxed out existing evaluation metrics.
|
| Are those evaluations (or even LLMs) the RIGHT measures to
| achieve AGI? Probably not.
|
| But have they been useful tools to demonstrate that the
| confluence of compute, engineering, and tactical models are
| leading towards signifigant breathroughts in artificial
| (computer) intelligence?
|
| I would say yes.
|
| Which in turn are driving the funding, power innovation, public
| policy etc needed to take that next step?
|
| I hope so.
|
| (1) https://x.com/willdepue/status/1856766850027458648
| ActionHank wrote:
| > Which in turn are driving the funding, power innovation,
| public policy etc needed to take that next step?
|
| They are driving the shoveling of VC money into a furnace to
| power their servers.
|
| Should that money run dry before they hit another breakthrough
| "AI" popularity is going to drop like a stone. I believe this
| to be far more likely an outcome than AGI or even the next big
| breakthrough.
| wslh wrote:
| It sounds a bit sci-fi, but since these models are built on data
| generated by our civilization, I wonder if there's an
| epistemological bottleneck requiring smarter or more diverse
| individuals to produce richer data. This, in turn, could spark
| further breakthroughs in model development. Although these
| interactions with LLMs help address specific problems, truly
| complex issues remain beyond their current scope.
|
| With my user hat on, I'm quite pleased with the current state of
| LLMs. Initially, I approached them skeptically, using a hackish
| mindset and posing all kinds of Turing test-like questions. Over
| time, though, I shifted my focus to how they can enhance my
| team's productivity and support my own tasks in meaningful ways.
|
| Finally, I see LLMs as a valuable way to explore parts of the
| world, accommodating the reality that we simply don't have enough
| time to read every book or delve into every topic that interests
| us.
| headcanon wrote:
| I don't see a problem with this, we were inevitably going to
| reach some kind of plateau with existing pre-LLM-era data.
|
| Meanwhile, the existing tech is such a step change that industry
| is going to need time to figure out how to effectively use these
| models. In a lot of ways it feels like the "digitization" era all
| over again - workflows and organizations that were built around
| the idea humans handled all the cognitive load (basically all
| companies older than a year or two) will need time to adjust to a
| hybrid AI + human model.
| readyplayernull wrote:
| > feels like the "digitization" era all over again
|
| This exactly. And as history shows, no matter how much effort
| the current big LLM companies do they won't be able to grasp
| the best uses for their tech. We will see small players
| developing it even further. I'm thankful for the legendary
| blindness of these anticompetitive behemoths. Less than 2
| decades ago: IBM Watson.
| svara wrote:
| The recent big success in deep learning have all been to a large
| part successes in leveraging relatively cheaply available
| training data.
|
| AlphaGo - self-play
|
| AlphaFold - PDB, the protein database
|
| ChatGPT - human knowledge encoded as text
|
| These models are all machines for clever interpolation in
| gigantic training datasets.
|
| They appear to be intelligent, because the training data they've
| seen is so vastly larger than what we've seen individually, and
| we have poor intuition for this.
|
| I'm not throwing shade, I'm a daily user of ChatGPT and find
| tremendous and diverse value in it.
|
| I'm just saying, this particular path in AI is going to make
| step-wise improvements whenever new large sources of training
| data become available.
|
| I suspect the path to general intelligence is not that, but we'll
| see.
| kaibee wrote:
| > I suspect the path to general intelligence is not that, but
| we'll see.
|
| I think there's three things that a 'true' general intelligence
| has which is missing from basic-type-LLMs as we have now.
|
| 1. knowing what you know. <basic-LLMs are here>
|
| 2. knowing what you don't know but can figure out via
| tools/exploration. <this is tool use/function calling>
|
| 3. knowing what can't be known. <this is knowing that halting
| problem exists and being able to recognize it in novel
| situations>
|
| (1) From an LLM's perspective, once trained on corpus of text,
| it knows 'everything'. It knows about the concept of not
| knowing something (from having see text about it), (in so far
| as an LLM knows anything), but it doesn't actually have a
| growable map of knowledge that it knows has uncharted edges.
|
| This is where (2) comes in, and this is what tool use/function
| calling tries to solve atm, but the way function calling works
| atm, doesn't give the LLM knowledge the right way. I know that
| I don't know what 3,943,034 / 234,893 is. But I know I have a
| 'function call' of knowing the algorithm for doing long divison
| on paper. And I think there's another subtle point here: my
| knowledge in (1) includes the training data generated from
| running the intermediate steps of the long-division algorithm.
| This is the knowledge that later generalizes to being able to
| use a calculator (and this is also why we don't just give kids
| calculators in elementary school). But this is also why a kid
| that knows how to do long division on paper, doesn't seperately
| need to learn when/how to use a calculator, besides the very
| basics. Using a calculator to do that math feels like 1 step,
| but actually it does still have all of initial mechanical steps
| of setting up the problem on paper. You have to type in each
| digit individually, etc.
|
| (3) I'm less sure of this point now that I've written out point
| (1) and (2), but that's kinda exactly the thing I'm trying to
| get at. Its being able to recognize when you need more practice
| of (1) or more 'energy/capital' for doing (2).
|
| Consider a burger resturant. If you properly populated the
| context of a ChatGPT-scale model the data for a burger
| resturant from 1950, and gave it the kinda 'function calling'
| we're plugging into LLMs now, it could manage it. It could keep
| track of inventory, it could keep tabs on the employee-
| subprocesses, knowing when to hire, fire, get new suppliers,
| all via function calling. But it would never try to become
| McDonalds, because it would have no model of the the internals
| of those function-calls, and it would have no ability to
| investigate or modify the behaviour of those function calls.
| Davidzheng wrote:
| Just because you guys want something to be true and can't accept
| the alternative and upvote it when it agrees with your view does
| not mean it is a correct view.
| dbbk wrote:
| What?
| Animats wrote:
| _" While the model was initially expected to significantly
| surpass previous versions of the technology behind ChatGPT, it
| fell short in key areas, particularly in answering coding
| questions outside its training data."_
|
| Right. If you generate some code with ChatGPT, and then try to
| find similar code on the web, you usually will. Search for
| unusual phrases in comments and for variable names. Often,
| something from Stack Overflow will match.
|
| LLMs do search and copy/paste with idiom translation and some
| transliteration. That's good enough for a lot of common problems.
| Especially in the HTML/Javascript space, where people solve the
| same problems over and over. Or problems covered in textbooks and
| classes.
|
| But it does not look like artificial general intelligence emerges
| from LLMs alone.
|
| There's also the elephant in the room - the hallucination/lack of
| confidence metric problem. The curse of LLMs is that they return
| answers which are confident but wrong. "I don't know" is rarely
| seen. Until that's fixed, you can't trust LLMs to actually _do_
| much on their own. LLMs with a confidence metric would be much
| more useful than what we have now.
| dmd wrote:
| > Right. If you generate some code with ChatGPT, and then try
| to find similar code on the web, you usually will.
|
| People who "follow" AI, as the latest fad they want to comment
| on and appear intelligent about, repeat things like this
| constantly, even though they're not actually true for anything
| but the most trivial hello-world types of problems.
|
| I write code all day every day. I use Copilot and the like all
| day every day (for me, in the medical imaging software field),
| and all day every day it is incredibly useful and writes nearly
| exactly the code I would have written, but faster. And none of
| it appears anywhere else; I've checked.
| ngai_aku wrote:
| You're solving novel problems all day every day?
| dmd wrote:
| Pretty much, yes. My job is pretty fun; it mostly entails
| things like "take this horrible file workflow some research
| assistant came up with while high 15 years ago and turn it
| into a newer horrible file format a NEW research assistant
| came up with (also while high) 3 years ago" - and automate
| this in our data processing pipeline.
| Der_Einzige wrote:
| Due to WFH, the weed laws where tech workers live, and
| the fast tolerance building of cannabis in the body - I
| estimate that 10% of all code written by west coast tech
| workers is done "while high" and that estimate is likely
| low.
| portaouflop wrote:
| Do tech workers write better or worse code while high ?
| delusional wrote:
| If I understand that correctly you're converting file
| formats? That's not exactly "novel"
| llm_trw wrote:
| This is exactly the type of novel work that llms are good
| at. It's tedious and has annoying internal logic, but
| that logic is quite flat and there are a million examples
| to generalise from.
|
| What they fail at is code with high cyclomatic
| complexity. Back in the llama 2 finetune days I wrote a
| script that would break down what each node in the
| control flow graph into its own prompt using literate
| programming and the results were amazing for the time.
| Using the same prompts I'd get correct code in every
| language I tried.
| fireflash38 wrote:
| If you've got clearly defined start input format and end
| output format, sure it seems that it would be a good
| candidate for heavy LLM use. But I don't know if that's
| most people.
| dmd wrote:
| If it were ever clearly defined or even consistent from
| input to input I would be overjoyed.
| xpe wrote:
| > LLMs do search and copy/paste with idiom translation and some
| transliteration.
|
| In general, this is not a good description about what is
| happening inside an LLM. There is extensive literature on
| interpretability. It is complicated and still being worked out.
|
| The commenter above might _characterize_ the results they get
| in this way, but I would question the validity of that
| characterization, not to mention its generality.
| zusammen wrote:
| I wonder how much this has to do with a fluency plateau.
|
| Up to a certain point, a conditional fluency stores knowledge, in
| the sense that semantically correct sentences are more likely to
| be fluent... but we may have tapped out in that regard. LLMs have
| solved language very well, but to get beyond that has seemed,
| thus far, to require RLHF, with all the attendant negatives.
| namaria wrote:
| Modeled language, maybe.
| guluarte wrote:
| Well, there have been no significant improvements to the GPT
| architecture over the past few years. I'm not sure why companies
| believe that simply adding more data will resolve the issues
| incognito124 wrote:
| More data and more compute on simpler models are the BItter
| Lessons of Rich Sutton
| HarHarVeryFunny wrote:
| Obviously adding more data is a game of diminishing returns.
|
| Going from 10% to 50% (500% more) complete coverage of common
| sense knowledge and reasoning is going to feel like a
| significant advance. Going from 90% to 95% (5% more) coverage
| is not going to feel the same.
|
| Regardless of what Altman says, its been two years since OpenAI
| released GPT-4, and still no GPT-5 in sight, and they are now
| touting Q-star/strawberry/GPT-o1 as the next big thing instead.
| Sutskever, who saw what they're cooking before leaving, says
| that traditional scaling has plateaeud.
| og_kalu wrote:
| >Regardless of what Altman says, its been two years since
| OpenAI released GPT-4, and still no GPT-5 in sight.
|
| It's been 20 months since 4 was released. 3 was released 32
| months after 2. The lack of a release by now in itself does
| not mean much of anything.
| HarHarVeryFunny wrote:
| By itself, sure, but there are many sources all pointing to
| the same thing.
|
| Sutskever, recently ex. OpenAI, one of the first to believe
| in scaling, now says it is plateauing. Do OpenAI have
| something secret he was unaware of? I doubt it.
|
| FWIW, GPT-2 and GPT-3 were about a year apart (2019
| "Language models are Unsupervised Multitask Learners" to
| 2020 "Language Models are Few-Shot Learners").
|
| Dario Amodei recently said that with current gen models
| pre-training itself only takes a few months (then followed
| by post-training, etc). These are not year+ training runs.
| og_kalu wrote:
| >Sutskever, recently ex. OpenAI, one of the first to
| believe in scaling, now says it is plateauing.
|
| Blind scaling sure (for whatever reason)* but this is the
| same Sutskever who believes in ASI within a decade off
| the back of what we have today.
|
| * Not like anyone is telling us any details. After all,
| Open AI and Microsoft are still trying to create a 100B
| data center.
|
| In my opinion, there's a difference between scaling not
| working and scaling becoming increasingly infeasible.
| GPT-4 is something like x100 the compute of 3 (Same with
| 2>3).
|
| All the drips we've had of 5 point to ~x10 of 4. Not
| small but very modest in comparison.
|
| >FWIW, GPT-2 and GPT-3 were about a year apart (2019
| "Language models are Unsupervised Multitask Learners" to
| 2020 "Language Models are Few-Shot Learners").
|
| Ah sorry I meant 3 and 4.
|
| >Dario Amodei recently said that with current gen models
| pre-training itself only takes a few months (then
| followed by post-training, etc). These are not year+
| training runs.
|
| You don't have to be training models the entire time.
| GPT-4 was done training in August 2022 according to Open
| AI and wouldn't be released for another 8 months. Why?
| Who knows.
| xpe wrote:
| > Well, there have been no significant improvements to the GPT
| architecture over the past few years.
|
| A lot hangs on what you mean by "significant". Can you define
| what you mean? And/or give an example of an improvement that
| you don't think is significant.
|
| Also, on what basis can you say "no significant improvements"
| have been made? Many major players have published some of their
| improvements openly. They also have more private, unpublished
| improvements.
|
| If your claim boils down to "what people mean by a Generative
| Pre-trained Transformer" still has a clear meaning, ok, fine,
| but that isn't the meat of the issue. There is so much more to
| a chat system than just the starting point of a vanilla GPT.
|
| It is wiser to look at the whole end-to-end system, starting at
| data acquisition, including pre-training and fine-tuning,
| deployment, all the way to UX.
|
| P.S. I don't have a vested interest in promoting or disparaging
| AI. I don't work for a big AI lab. I'm just trying to call it
| like I see it, as rationally as I can.
| LASR wrote:
| Question for the group here: do we honestly feel like we've
| exhausted the options for delivering value on top of the current
| generation of LLMs?
|
| I lead a team exploring cutting edge LLM applications and end-
| user features. It's my intuition from experience that we have a
| LONG way to go.
|
| GPT-4o / Claude 3.5 are the go-to models for my team. Every
| combination of technical investment + LLMs yields a new list of
| potential applications.
|
| For example, combining a human-moderated knowledge graph with an
| LLM with RAG allows you to build "expert bots" that understand
| your business context / your codebase / your specific processes
| and act almost human-like similar to a coworker in your team.
|
| If you now give it some predictive / simulation capability - eg:
| simulate the execution of a task or project like creating a
| github PR code change, and test against an expert bot above for
| code review, you can have LLMs create reasonable code changes,
| with automatic review / iteration etc.
|
| Similarly there are many more capabilities that you can ladder on
| and expose into LLMs to give you increasingly productive outputs
| from them.
|
| Chasing after model improvements and "GPT-5 will be PHD-level" is
| moot imo. When did you hire a PHD coworker and they were
| productive on day-0 ? You need to onboard them with human
| expertise, and then give them execution space / long-term
| memories etc to be productive.
|
| Model vendors might struggle to build something more intelligent.
| But my point is that we already have so much intelligence and we
| don't know what to do with that. There is a LOT you can do with
| high-schooler level intelligence at super-human scale.
|
| Take a naive example. 200k context windows are now available.
| Most people, through ChatGPT, type out maybe 1500 tokens. That's
| a huge amount of untapped capacity. No human is going to type out
| 200k of context. Hence why we need RAG, and additional forms of
| input (eg: simulation outcomes) to fully leverage that.
| amelius wrote:
| Yes, but literally anybody can do all those things. So while
| there will be many opportunities for new features (new ways of
| combining data), there will be few _business_ opportunities.
| Miraste wrote:
| HN always says this, and it's always wrong. A technical
| implementation that's easy, or readily available, does not
| mean that a successful company can't be built on it. Last
| year, people were saying "OpenAI doesn't have a moat." 15
| years before that, they were saying "Dropbox is just a couple
| of chron jobs, it'll fail in a few months."
| amelius wrote:
| > HN always says this
|
| The meaning here is different. What I'm saying is that big
| companies like OpenAI will always strive to make a
| _generic_ AI, such that anyone can do basically anything
| using AI. The big companies therefore will indeed (like you
| say) have a profitable business, but few others will.
| hartator wrote:
| All of these hacks do sound like we are at that diminishing
| return point.
| namaria wrote:
| It all just sounds to me like we're back at expert systems.
| Doesn't bode well...
| ianbutler wrote:
| Honest question, how would you expect systems to get
| external knowledge etc without tools like the OP is
| suggesting?
|
| Action oriented through self exploration? What is your
| thought for how these systems integrate with the existing
| world?
|
| Why does the OP's suggested mode of integration make you
| think of those older systems?
| brookst wrote:
| Hey look, it's Gordon Moore visiting us from 2005! :)
| crystal_revenge wrote:
| I don't think we've even _started_ to get the most value out of
| current gen LLMs. For starters very few people are even looking
| at sampling which is a major part of the model performance.
|
| The theory behind these models so aggressively lags the
| engineering that I suspect there are many major improvements to
| be found just by understanding a bit more about _what these
| models are really doing_ and making re-designs based on that.
|
| I highly encourage anyone seriously interested in LLMs to start
| spending more time in the open model space where you can really
| take a look inside and play around with the internals. Even if
| you don't have the resources for model training, I feel
| personally understanding sampling and other potential tweaks to
| the model (lots of neat work on uncertainty estimations,
| manipulating the initial embedding the prompts are assigned,
| intelligent backtracking, etc).
|
| And from a practical side I've started to realize that many
| people have been holding on of building things waiting for
| "that next big update", but there a so many small, annoying
| tasks that can be easily automated.
| dr_dshiv wrote:
| > I've started to realize that many people have been holding
| on of building things waiting for "that next big update"
|
| I've noticed this too -- I've been calling it _intellectual
| deflation._ By analogy, why spend now when it may be cheaper
| in a month? Why do the work now, when it will be easier in a
| month?
| vbezhenar wrote:
| Why optimise software today, when tomorrow Intel will
| release CPU with 2x performance?
| sdenton4 wrote:
| Curiously, Moore's law was predictable enough over
| decades that you could actually plan for the speed of
| next year's hardware quite reliably.
|
| For LLMs, we don't even know how to reliably measure
| performance, much less plan for expected improvements.
| mikeyouse wrote:
| Moores law became less of a prediction and more of a
| product road map as time went on. It helped coordinate
| investment and expectations across the entire industry so
| everyone involved had the same understanding of timelines
| and benchmarks. I fully believe more investment would've
| 'bent the curve' of the trend line but everyone was
| making money and there wasn't a clear benefit to pushing
| the edge further.
| epicureanideal wrote:
| Or maybe it pushed everyone to innovate faster than they
| otherwise would've? I'm very interested to hear your
| reasoning for the other case though, and I am not
| strongly committed to the opposite view, or either view
| for that matter.
| throwing_away wrote:
| Call Nvidia, that sounds like a job for AI.
| ben_w wrote:
| Back when Intel regularly gave updates with 2x
| performance increases, people did make decisions based on
| the performance doubling schedule.
| jkaptur wrote:
| https://en.wikipedia.org/wiki/Osborne_effect
| ppeetteerr wrote:
| The reason people are holding out is that the current
| generation of models are still pretty poor in many areas. You
| can have it craft an email, or to review your email, but I
| wouldn't trust an LLM with anything mission-critical. The
| accuracy of the generated output is too low be trusted in
| most practical applications.
| saalweachter wrote:
| Any email you trust an LLM to write is one you probably
| don't need to send.
| deegles wrote:
| My big question is what is being done about hallucination?
| Without a solution it's a giant footgun.
| creativenolo wrote:
| Great & motivational comment. Any pointers on where to start
| playing with the internals and sampling?
|
| Doesn't need to be comprehensive, I just don't know where to
| jump off from.
| creativenolo wrote:
| > holding on of building things waiting for "that next big
| update", but there a so many small, annoying tasks that can
| be easily automated.
|
| Also we only hear / see the examples that are meant to scale.
| Startups typically offer up something transformative, ready
| to soak up a segment of a market. And that's hard with the
| current state of LLMs. When you try their offerings, it's
| underwhelming. But there is richer, more nuanced hard to
| reach fruits that are extremely interesting - but it's not
| clear where they'd scale in and of themselves.
| kozikow wrote:
| > "The theory behind these models so aggressively lags the
| engineering"
|
| The problem is that 99% of theories are hard to scale.
|
| I am not an expert, as I work adjacent to this field, but I
| see the inverse - dumbing down theory to increase
| parallelism/scalability.
| msabalau wrote:
| There are all sorts of valuable things to explore and build
| with what we have already.
|
| But understanding how likely it is that we will (or will not)
| see a new models quickly and dramatically improve on what we
| have "because scaling" seems valuable context for everyone in
| ecosystem to make decisions.
| ben_w wrote:
| > Question for the group here: do we honestly feel like we've
| exhausted the options for delivering value on top of the
| current generation of LLMs?
|
| IMO we've not even exhausted the options for spreadsheets, let
| alone LLMs.
|
| And the reason I'm thinking of spreadsheets is that they, like
| LLMs, are very hard to win big on even despite the value they
| bring. Not "no moat" (that gets parroted stochastically in
| threads like these), but the moat is elsewhere.
| alach11 wrote:
| My team and I also develop with these models every day, and I
| completely agree. If models stall at current levels, it will
| take 10 (or more) years for us to capture most of the value
| they offer. There's so much work out there to automate and so
| many workflows to enhance with these "not quite AGI-level"
| models. And if peak model performance remains the same but cost
| continues to drop, that opens up vastly more applications as
| well.
| alangibson wrote:
| I think you're playing a different game than the Sam Altmans of
| the world. The level of investment and profit they are looking
| for can only be justified by creating AGI.
|
| The > 100 P/E ratios we are already seeing can't be justified
| by something as quotidian as the exceptionally good
| productivity tools you're talking about.
| gizajob wrote:
| Yeah I keep thinking this - how is Nvidia worth $3.5Trillion
| for making code autocomplete for coders
| drawnwren wrote:
| Nvidia was not the best example. They get to moon in the
| case that any AI exponential hits. Most others have less of
| a wide probability distribution.
| BeefWellington wrote:
| Yeah they're the shovel sellers of this particular
| goldrush.
|
| Most other businesses trying to actually use LLMs are the
| riskier ones, including OpenAI, IMO (though OpenAI is
| perhaps the least risky due to brand recognition).
| lokimedes wrote:
| Or they become the Webvan/pets.com of the bubble.
| zeusk wrote:
| Nvidia is more likely to become CSCO or INTC but as far
| as I can tell, that's still a few years off - unless
| ofcourse there is weakness in broader economy that
| accelerates the pressure on investors.
| HarHarVeryFunny wrote:
| I'm not sure about that. NVIDIA seems to stay in a
| dominant position as long as the race to AI remains
| intact, but the path to it seems unsure. They are selling
| a general purpose AI-accelerator that supports the
| unknown path.
|
| Once massively useful AI has been achieved, or it's been
| determined that LLMs are it, then it becomes a race to
| the bottom as GOOG/MSFT/AMZN/META/etc design/deploy more
| specialized accelerators to deliver this final form
| solution as cheaply as possible.
| JumpCrisscross wrote:
| > _level of investment and profit they are looking for can
| only be justified by creating AGI_
|
| What are you basing this on?
|
| IT outsourcing is a $500+ billion industry. If OpenAI _et al_
| can run even a 10% margin, that business alone justifies
| their valuation.
| HarHarVeryFunny wrote:
| It seems you are missing a lot of "ifs" in that
| hypothetical!
|
| Nobody knows how things like coding assistants or other AI
| applications will pan out. Maybe it'll be Oracle selling
| Meta-licenced solutions that gets the lion's share of the
| market. Maybe custom coding goes away for many business
| applications as off-the-shelf solutions get smarter.
|
| A future where all that AI (or some hypothetical AGI)
| changes is work being done by humans to the same work being
| done by machines seems way too linear.
| JumpCrisscross wrote:
| > _you are missing a lot of "ifs" in that hypothetical_
|
| The big one being I'm not assuming AGI. Low-level coding
| tasks, the kind frequently outsourced, are within the
| realm of being competitive with offshoring with known
| methods. My point is we don't need to assume AGI for
| these valuations to make sense.
| HarHarVeryFunny wrote:
| Current AI coding assistants are best at writing
| functions or adding minor features to an existing code
| base. They are not agentic systems that can develop an
| entire solution from scratch given a specification, which
| in my experience is more typcical of the work that is
| being outsourced. AI is a tool, whose full-cycle
| productivity benefit seems questionable. It is not a
| replacement for a human.
| JumpCrisscross wrote:
| > _they are not agentic systems that can develop an
| entire solution from scratch given a specification, which
| in my experience is more typcical of the work that is
| being outsourced_
|
| If there is one domain where we're seeing tangible
| progress from AI, it's in working towards this goal.
| Difficult projects aren't in scope. But most tech,
| _especially_ most tech branded IT, is not difficult.
| Everyone doesn 't need an inventory or customer-complaint
| system designed from scratch. Current AI is good at
| cutting through that cruft.
| senko wrote:
| There are a number of agentic systems that can develop
| more complex solutions. Just a few off the top of my
| head: Pythagora, Devin, OpenHands, Fume, Tusk, Replit,
| Codebuff, Vly. I'm sure I've missed a bunch.
|
| Are they good enough to replace a human yet?
| Questionable[0], but they _are_ improving.
|
| [0] You wouldn't believe how low the outsourcing
| contractors' quality can go. Easily surpassed by current
| AI systems :) That's a very low bar tho.
| hluska wrote:
| Nowhere near, but the market seems to have priced in that
| scaling would continue to have a near linear effect on
| capability. That's not happening and that's the issue the
| article is concerned with.
| HarHarVeryFunny wrote:
| Sure, there's going to be a lot of automation that can be built
| using current GPT-4 level LLMs, even if they don't get much
| better from here.
|
| However, this is better thought of as "business logic
| scripting/automation", not the magic employee-replacing AGI
| that would be the revolution some people are expecting. Maybe
| you can now build a slightly less shitty automated telephone
| response system to piss your customers off with.
| brookst wrote:
| > Question for the group here: do we honestly feel like we've
| exhausted the options for delivering value on top of the
| current generation of LLMs?
|
| Certainly not.
|
| But technology is all about stacks. Each layer strives to
| improve, right up through UX and business value. The uses for
| 1um chips had not been exhausted in 1989 when the 486 shipped
| in 800nm. 250nm still had tons of unexplored uses when the
| Pentium 4 shipped on 90nm.
|
| Talking about scaling at the the model level is like talking
| about transistor density for silicon: it's interesting, and
| relevant, and we should care... but it is not the sole
| determinent of what use cases can be build and what user value
| there is.
| senko wrote:
| No.
|
| The scaling laws may be dead. Does this mean the end of LLM
| advances? Absolutely not.
|
| There are many different ways to improve LLM capabilities.
| Everyone was mostly focused on the scaling laws because that
| worked extremely well (actually surprising most of the
| researchers).
|
| But if you're keeping an eye on the scientific papers coming
| out about AI, you've seen the astounding amount of research
| going on with some very good results, that'll probably take at
| least several months to trickle down to production systems.
| Thousands of extremely bright people in AI labs all across the
| world are working on finding the next trick that boosts AI.
|
| One random example is test-time compute: just give the AI more
| time to think. This is basically what O1 does. A recent
| research paper suggests using it is roughly equivalent to an
| order of magnitude more parameters, performance wise. (source
| for the curious: https://lnkd.in/duDST65P)
|
| Another example that sounds bonkers but apparently works is
| quantization: reducing the precision of each parameter to 1.58
| bits (ie only using values -1, 0, 1). This uses 10x less space
| for the same parameter count (compared to standard 16-bit
| format), and since AI operatons are actually memory limited,
| directly corresponds to 10x decrease in costs:
| https://lnkd.in/ddvuzaYp
|
| (Quite apart from improvements like these, we shouldn't forget
| that not all AIs are LLMs. There's been tremendous advance in
| AI systems for image, audio and video generation,
| interpretation and munipulation and they also don't show signs
| of stopping, and there's possibility that a new or hybrid
| architecture for the textual AI might be developed).
|
| AI winter is a long way off.
| limaoscarjuliet wrote:
| Scaling laws are not dead. The number of people predicting
| death of Moore's law doubles every two years.
|
| - Jim Keller
|
| https://www.youtube.com/live/oIG9ztQw2Gc?si=oaK2zjSBxq2N-zj1.
| ..
| nyrikki wrote:
| There are way too many personal definitions of what
| "Moore's Law" even is to have a discussion without deciding
| on a shared definition before hand.
|
| But Goodhart's law; "When a measure becomes a target, it
| ceases to be a good measure"
|
| Directly applies here, Moore's Law was used to set long
| term plans at semiconductor companies, and Moore didn't
| have empirical evidence it was even going to continue.
|
| If you say, arbitrarily decide CPU, or worse, single core
| performance as your measurement, it hasn't held for well
| over a decade.
|
| If you hold minimum feature size without regard to cost, it
| is still holding.
|
| What you want to prove usually dictates what interpretation
| you make.
|
| That said, the scaling law is still unknown, but you can
| game it as much as you want in similar ways.
|
| GPT4 was already hinting at an asymptote on MMLU, but the
| question is if it is valid for real work etc...
|
| Time will tell, but I am seeing far less optimism from my
| sources, but that is just anecdotal.
| afro88 wrote:
| > potential applications > if you ... > for example ...
|
| Yes there seems to be lots of potential. Yes we can brainstorm
| things that should work. Yes there is a lot of examples of
| incredible things in isolation. But it's a little bit like
| those youtube videos showing amazing basketball shots in 1 try,
| when in reality lots of failed attempts happened beforehand.
| Except our users experience the failed attempts (LLM replies
| that are wrong, even when backed by RAG) and it's incredibly
| hard to hide those from them.
|
| Show me the things you / your team has actually built that has
| decent retention and metrics concretely proving efficiency
| improvements.
|
| LLMs are so hit and miss from query to query that if your users
| don't have a sixth sense for a miss vs a hit, there may not be
| any efficiency improvement. It's a really hard problem with LLM
| based tools.
|
| There is so much hype right now and people showing cherry
| picked examples.
| jihadjihad wrote:
| > Except our users experience the failed attempts (LLM
| replies that are wrong, even when backed by RAG) and it's
| incredibly hard to hide those from them.
|
| This has been my team's experience (and frustration) as well,
| and has led us to look at using LLMs for classifying /
| structuring, but not entrusting an LLM with making a decision
| based on things like a database schema or business logic.
|
| I think the technology and tooling will get there, but the
| enormous amount of effort spent trying to get the system to
| "do the right thing" and the nondeterministic nature have
| really put us into a camp of "let's only allow the LLM to do
| things we know it is rock-solid at."
| sdesol wrote:
| > "let's only allow the LLM to do things we know it is
| rock-solid at."
|
| Even this is insanely hard in my opinion. The one thing
| that you would assume LLM to excel at is spelling and
| grammar checking for the English language, but even the top
| model (GPT-4o) can be insanely stupid/unpredictable at
| times. Take the following example from my tool:
|
| https://app.gitsense.com/?doc=6c9bada92&model=GPT-4o&sample
| s...
|
| 5 models are asked if the sentence is correct and GPT-4o
| got it wrong all 5 times. It keeps complaining that GitHub
| is spelled like Github, when it isn't. Note, only 2 weeks
| ago, Claude 3.5 Sonnet did the same thing.
|
| I do believe LLM is a game changer, but I'm not convinced
| it is designed to be public-facing. I see LLM as a power
| tool for domain experts, and you have to assume whatever it
| spits out may be wrong, and your process should allow for
| it.
|
| Edit:
|
| I should add that I'm convinced that not one single model
| will rule them all. I believe there will be 4 or 5 models
| that everybody will use and each will be used to challenge
| one another for accuracy and confidence.
| SimianSci wrote:
| > "I see LLM as a power tool for domain experts, and you
| have to assume whatever it spits out may be wrong, and
| your process should allow for it."
|
| this gets to the heart of it for me. I think LLMs are an
| incredible tool, providing advanced augmentation on our
| already developed search capabilities. What advanced user
| doesnt want to have a colleague they can talk about their
| specific domain capacity with?
|
| The problem comes from the hyperscaling ambitions of the
| players who were the first in this space. They quickly
| hyped up the technology beyond want it should have been.
| larodi wrote:
| Those Apple engineers stated in a very clear tone:
|
| - every time a different result is produced.
|
| - no reasoning capabilities were categorically
| determined.
|
| So this is it. If you want LLM - brace for different
| results and if this is okay for your application (say
| it's about speech or non-critical commands) then off you
| are.
|
| Otherwise simply forget this approach, and particularly
| when you need reproducible discreet results.
|
| I don't think it gets any better than that and nothing so
| far implicated it will (with this particular approach to
| AGI or whatever the wet dream is)
| marcellus23 wrote:
| > Those Apple engineers
|
| Which Apple engineers? Yours is the only reference to the
| company in this comment section or in the article.
| verteu wrote:
| (for reference: https://arxiv.org/pdf/2410.05229 )
| VeejayRampay wrote:
| really agree with this and I think it's been the general
| experience: people wanting LLMs to be so great (or making
| money off them) kind of cherry picking examples that fit
| their narrative, which LLMs are good at because they produce
| amazing results some of the time like the deluxe broken clock
| that they are (they're right many many times a day)
|
| at the end of the day though, it's not exactly reliable or
| particularly transformative when you get past the party
| tricks
| archiepeach wrote:
| To be fair in the human-based teams I've worked with in
| startups I couldn't show you products with decent retention.
| whiplash451 wrote:
| The main difference between GPT5 and a PhD-level new hire is
| that the new hire will autonomously go out, deliver and take on
| harder task with much fewer guidance than GPT5 will ever
| require. So much of human intelligence is about interacting
| with peers.
| ben_w wrote:
| Human interaction with peers is also guidance.
|
| I don't know how many team meetings PhD students have, but I
| do know about software development jobs with 15 minute daily
| standups, and that length meeting at 120 words per minute for
| 5 days a week, 48 weeks per year of a 3 year PhD is 1.296.000
| words.
| eastbound wrote:
| I have 3 remote employees whose job is consistently as bad
| as LLM.
|
| That means employees who use LLM are, on average,
| recognizably bad. Those who are good enough, are also good
| enough to write the code manually.
|
| To the point I wonder whether this HN thread is generated
| by OpenAI, trying to create buzz around AI.
| ben_w wrote:
| 1. The person I'm replying to is hypothesising about a
| future, not yet existent, version, GPT5. Current quality
| limits don't tell you jack about a hypothetical future,
| especially one that may not ever happen because money.
|
| 2. I'm not commenting on the quality, because they were
| writing about something that doesn't exist and therefore
| that's clearly just a given for the discussion. The only
| thing I was adding is that humans _also_ need guidance,
| and quite a lot of it -- even just a two-week sprint 's
| worth of 15 minute daily stand-up meetings is 18,000
| words, which is well beyond the point where I'd have
| given up prompting an LLM and done the thing myself.
| EGreg wrote:
| I want to stuff a transcript of a 3 hour podcast into some LLM
| API and have it summarize it by: segmenting by topic changes,
| keeping the timestamps, and then summarizing each segment.
|
| I wasn't able to get it do it with Anthropic or OpenAI chat
| completion APIs. Can someone explain why? I don't think the
| 200K token window actually works, is it looking sequentially or
| is it really looking at the whole thing at once or something?
| anonzzzies wrote:
| The current models are very powerful and we definitely didn't
| get most out of them yet. We are getting more and more out of
| them every week when we release new versions of our toolkits.
| So if this is it; please make it faster and take less energy.
| We'll be fine until the next AI spring.
| simonw wrote:
| Right. I've been saying for a while that if all LLM development
| stopped entirely and we were stuck with the models we have
| right now (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, Llama
| 3.1/2, Qwen 2.5 etc) we could still get multiple years worth of
| advances just out of those existing models. There is SO MUCH we
| haven't figured out about how to use them yet.
| 23B1 wrote:
| The user interface for LLMs is stuck in C:\
|
| That's where I'd focus.
| kenjackson wrote:
| Voice for LLMs is surprisingly good. I'd love to see LLMs
| used in more systems like cars and in-home automation.
| Whatever cars use today and Alexa in the home simply are much
| worse than what we get with ChatGPT voice today.
| ericmcer wrote:
| I have tried a few AI coding tools and always found them
| impressive but I don't really need something to autocomplete
| obvious code cases.
|
| Is there an AI tool that can ingest a codebase and locate code
| based on abstract questions? Like: "I need to invalidate
| customers who haven't logged in for a month" and it can locate
| things like relevant DB tables, controllers, services, etc.
| yk wrote:
| To a certain extent I think we get a better understanding what
| llms can do, and my estimation for the next ten years is more
| like best UI ever rather than llms will replace humanity. Now
| best UI ever is something that can certainly deliver a lot of
| value, 80% of all buttons in a car should be replaced by
| actually good voice control, and I think that is were we are
| going to see a lot of very interesting applications: Hey
| washing machine, this is two t-shirts and a jeans. (The washing
| machine can then figure out it's program by itself, I don't
| want to memorize the table in the manual.)
| lokimedes wrote:
| To each their own, but I don't look forward to having my kids
| yelling, a podcast in my ears and having to explain to my
| tumbler that wool must be spun at 1000 RPM. Humans have
| varying preferences when it comes to communication and
| sensing, making our machine interactions favor the
| extroverted talkative exhibitionists is really only one
| modality.
| machiaweliczny wrote:
| Long context is a scam. Claude is best but it's still gets lost
| with longer context
| bbor wrote:
| I have no data, but I whole-heartedly agree. Well, perhaps
| not "scam", but definitely oversold. One of my best undergrad
| professors taught me the adage "don't expect a model to do
| what a human expert cannot", and I think it's still a good
| rule of thumb. Giving someone an entire book to read before
| answering your question _might_ help, but it would help way,
| way more to give them a few paragraphs that you know are
| actually relevant.
| cruffle_duffle wrote:
| In my experience, the reality of long context windows doesn't
| live up to the hype. When you're iterating on something,
| whether it's code, text, or any document, you end up with
| multiple versions layered in the context. Every time you
| revise, those earlier versions stick around, even though only
| the latest one is the "most correct".
|
| What gets pushed out isn't the last version of the document
| itself (since it's FIFO), but the important parts of the
| conversation--things like the rationale, requirements, or any
| context the model needs to understand why it's making
| changes. So, instead of being helpful, that extra capacity
| just gets filled with old, repetitive chunks that have to be
| processed every time, muddying up the output. This isn't just
| an issue with code; it happens with any kind of document
| editing where you're going back and forth, trying to refine
| the result.
|
| Sometimes I feel the way to "resolve" this is to instead go
| back and edit some earlier portion of the chat to update it
| with the "new requirements" that I didn't even know I had
| until I walked down some rabbit hole. What I end up with is
| almost like a threaded conversation with the LLM. Like, I
| sometimes wish these LLM chatbots explicitly treated the
| conversion as if it were threaded. They do support basically
| my use case by letting you toggle between different edits to
| your prompts, but it is pretty limited and you cannot go back
| and edit things if you do some operations (eg: attach a
| file).
|
| Speaking of context, it's also hard to know what things like
| ChatGPT add to it's context in the first place. Many of times
| I'll attach a file or something and discover it didn't "read"
| the file into it's context. Or I'll watch it fire up a python
| program it writes that does nothing but echo the file into
| it's context.
|
| I think there is still a lot of untapped potential in
| strategically manipulating what gets placed into the context
| window at all. For example only present the LLM with the
| latest and greatest of a document and not all the previous
| revisions in the thread.
| bbor wrote:
| Great question. Im very confident in my answer, even though
| it's in the minority here: we're not even close to exhausting
| the potential.
|
| Imagine that our current capabilities are like the Model-T.
| There remains many improvements to be made upon this passenger
| transportation product, with RAG being a great common theme
| among them. People will use chatbots with much more permissive
| interfaces instead of clicking through menus.
|
| But all of that's just the start, the short term, the
| maturation of this consumer product; the really scary/exciting
| part comes when the technology reaches saturation, and opens up
| new possibilities for itself. In the Model-T metaphor, this is
| analogous to how highways have (arguably) transformed America
| beyond anyone's wildest dreams, changing the course of various
| historical events (eg WWII industrialization, 60s & 70s white
| flight, early 2000s housing crisis) so much it's hard to
| imagine what the country would look like without them. Now,
| automobiles are not simply passenger transportation, but the
| bedrock of our commerce, our military, and probably more --
| through ubiquity alone they unlocked new forms of themselves.
|
| For those doubting my utopian/apocalyptic rhetoric, I implore
| you to ask yourself one simple question: why are so many
| experts so worried about AGI? They've been leaving in droves
| from OpenAI, and that's ultimately what the governance
| kerfluffle there was. Hinton, a Turing award winner, gave up
| $$$ to doom-say full time. Why?
|
| My hint is that if your answer involves less then a 1000
| specialized LLMs per unified system, then you're not thinking
| big enough.
| fire_lake wrote:
| > Hinton, a Turing award winner, gave up $$$ to doom-say full
| time
|
| This is a hint of something but a weak argument. Smart people
| are wrong all the time.
| robrenaud wrote:
| > For example, combining a human-moderated knowledge graph with
| an LLM with RAG allows you to build "expert bots" that
| understand your business context / your codebase / your
| specific processes and act almost human-like similar to a
| coworker in your team.
|
| I'd love to hear about this. I applied to YC WC 25 with
| research/insight/an initial researchy prototype built on top of
| GPT4+finetuning about something along this idea. Less powerful
| than you describe, but it also works without the human
| moderated KG.
| bloppe wrote:
| > you can have LLMs create reasonable code changes, with
| automatic review / iteration etc.
|
| Nobody who takes code health and sustainability seriously wants
| to hear this. You absolutely do not want to be in a position
| where something breaks, but your last 50 commits were all
| written and reviewed by an LLM. Now you have to go back and
| review them all with human eyes just to get a handle on how
| things broke, while customers suffer. At this scale, it's an
| effort multiplier, not an effort reducer.
|
| It's still good for generating little bits of boilerplate,
| though.
| moogly wrote:
| > you can have LLMs create reasonable code changes
|
| Could you define "code changes" because I feel that is a very
| vague accomplishment.
| nonameiguess wrote:
| Your hypothesis here is not exclusive of the hypothesis in this
| article.
|
| Name your platform. Linux. C++. The Internet. The x86 processor
| architecture. We haven't exhausted the options for delivering
| value on top of those, but that doesn't mean the developers and
| sellers of those platforms don't try to improve them anyway and
| might struggle to extract value from application developers who
| use them.
| polskibus wrote:
| In other news, Altman said AGI is coming next year
| https://www.tomsguide.com/ai/chatgpt/sam-altman-claims-agi-i...
| Jyaif wrote:
| According to the article, he said it _could_ be achieved in
| 2025, which seems pretty obvious to me as well even though I
| don 't have any visibility into what is going on inside those
| companies.
| user90131313 wrote:
| AI market top very soon
| fallat wrote:
| What a stupid piece. We are making leaps every 6 months still.
| Tell me this when there are no developments for 3 years.
| hatefulmoron wrote:
| I'm curious, what was the leap after GPT-4? What about the
| leaps after that, given a leap every 6 months?
| xyst wrote:
| Many late investors in the genAI space about to be bag holders
| 12_throw_away wrote:
| Well shoot. It's not like it was patently obvious that this would
| happen _before_ the industry started guzzling electricity and
| setting money on fire, right? [1]
|
| [1] https://dl.acm.org/doi/10.1145/3442188.3445922
| kaibee wrote:
| Not sure where the OP to the comment I meant to reply to is, but
| I'll just add this here.
|
| > I suspect the path to general intelligence is not that, but
| we'll see.
|
| I think there's three things that a 'true' general intelligence
| has which is missing from basic-type-LLMs as we have now.
|
| 1. knowing what you know. <basic-LLMs are here>
|
| 2. knowing what you don't know but can figure out via
| tools/exploration. <this is tool use/function calling>
|
| 3. knowing what can't be known. <this is knowing that halting
| problem exists and being able to recognize it in novel
| situations>
|
| (1) From an LLM's perspective, once trained on corpus of text, it
| knows 'everything'. It knows about the concept of not knowing
| something (from having see text about it), (in so far as an LLM
| knows anything), but it doesn't actually have a growable map of
| knowledge that it knows has uncharted edges.
|
| This is where (2) comes in, and this is what tool use/function
| calling tries to solve atm, but the way function calling works
| atm, doesn't give the LLM knowledge the right way. I know that I
| don't know what 3,943,034 / 234,893 is. But I know I have a
| 'function call' of knowing the algorithm for doing long divison
| on paper. And I think there's another subtle point here: my
| knowledge in (1) includes the training data generated from
| running the intermediate steps of the long-division algorithm.
| This is the knowledge that later generalizes to being able to use
| a calculator (and this is also why we don't just give kids
| calculators in elementary school). But this is also why a kid
| that knows how to do long division on paper, doesn't seperately
| need to learn when/how to use a calculator, besides the very
| basics. Using a calculator to do that math feels like 1 step, but
| actually it does still have all of initial mechanical steps of
| setting up the problem on paper. You have to type in each digit
| individually, etc.
|
| (3) I'm less sure of this point now that I've written out point
| (1) and (2), but that's kinda exactly the thing I'm trying to get
| at. Its being able to recognize when you need more practice of
| (1) or more 'energy/capital' for doing (2).
|
| Consider a burger resturant. If you properly populated the
| context of a ChatGPT-scale model the data for a burger resturant
| from 1950, and gave it the kinda 'function calling' we're
| plugging into LLMs now, it could manage it. It could keep track
| of inventory, it could keep tabs on the employee-subprocesses,
| knowing when to hire, fire, get new suppliers, all via function
| calling. But it would never try to become McDonalds, because it
| would have no model of the the internals of those function-calls,
| and it would have no ability to investigate or modify the
| behaviour of those function calls.
| nomendos wrote:
| "Eureka"!?
|
| At the very early phase of the boom I was among a very few who
| knew and predicted this (usually most free and deep
| thinking/knowledgeable). Then my prediction got reinforced by the
| results. One of the best examples was with one of my experiments
| that all today's AI's failed to solve tree serialization and de-
| serialization in each of the DFS(pre-order/in-order/post-order)
| or BFS(level-order) which is 8 algorithms (2x4) and the result
| was only 3 correct! Reason is "limited training inputs" since
| internet and open source does not have other solutions :-) .
|
| So, I spent "some" time and implemented all 8, which took me few
| days. By the way this proves/demonstrates that ~15-30min
| pointless leetcode-like interviews are requiring to
| regurgitate/memorize/not-think. So, as a logical hard consequence
| there will.has-to be a "crash/cleanup" in the area of leetcode-
| like interviews as they will just be suddenly proclaimed as
| "pointless/stupid"). However, I decided not to publish the rest
| of the 5 solutions :-)
|
| This (and other experiments) confirms hard limits of the LLM
| approach (even when used with chain-of-thought). Increasing the
| compute on the problem will produce increasingly smaller and
| smaller results (inverse exponential/logarithmic/diminishing-
| returns) = new AGI approach/design is needed and to my knowledge
| majority of the inve$tment (~99%) is in LLM, so "buckle up" at-
| some-point/soon?
|
| Impacts and realities; LLM shall "run it's course" (produce some
| products/results/$$$, get reviewed/$corrected) and whoever
| survives after that pruning shall earn money on those products
| while investing in the new research to find new AGI
| design/approach (which could take quite a long time,... or not).
| NVDA is at the center of thi$ and time-wise this
| peak/turn/crash/correction is hard to predict (although I see it
| on the horizon and min/max time can be estimated). Be aware and
| alert. I'll stop here and hold my other number of
| thoughts/opinions/ideas for much deeper discussion. (BTW I am
| still "full in on NVDA" until,....)
| jmward01 wrote:
| Every negative headline I see about AI hitting a wall or being
| over-hyped makes me think of the early 2000's with that new thing
| the 'internet' (yes, I know the internet is a lot older than
| that). There is little doubt in my mind that ten years from now
| nearly every aspect of life will be deeply connected to AI just
| like the internet took over everything in the late 90's and early
| 2000's and is now deeply connected to everything now. I'd even
| hazard to say that AI could be more impactful.
| brookst wrote:
| And, as I've noted a couple of times in this thread, how many
| times have we heard that Moore's law is dead and compute has
| hit a wall?
| moffkalast wrote:
| Well according to Nvidia you can just ignore Moore's law and
| start requiring people to install multi kilowatt outlets just
| for their cards. Who needs efficiency amirite?
| akomtu wrote:
| AI can be thought of as the 2nd stage of the creature that we
| call the Internet. The 1st stage, that we are so familiar with,
| is about gathering knowledge into a giant and somewhat
| organized library. This library has books on every subject
| imaginable, but its scale is so vast that no living human today
| can grasp it. This is why the originally connected network has
| started falling apart. Once this I becomes AI, all the books in
| the library will be melted together into one coherent picture.
| Once again, anyone anywhere on Earth will be able to access all
| the knowledge and our Babylon will stay for a little longer.
| JohnMakin wrote:
| It's strange to me that's your takeaway. The reason that the
| internet was overhyped in the 2000's is because it _was_ and
| also heavily overvalued. It took a massive correction and
| seriously disruptive bubble burst to break the delusion and
| move on to something more sustainable.
| jmward01 wrote:
| I disagree that it was over hyped. It has transformed our
| society so much that I would argue it was vastly under-hyped.
| Sure, there were a lot of silly companies that sprang up and
| went away because they weren't sound, but so much of the
| modern economy is based on the internet that it is hard to
| say any business isn't somehow internet related today. You
| would be hard pressed to find any business anywhere that
| doesn't at least have a social media account. If 2000 was
| over-hyping things I just don't see it.
| JohnMakin wrote:
| pets.com was valued at $400 million based almost completely
| on its domain name. That's the classic example. People were
| throwing buckets of money at any .com that resolved to a
| site and almost all of it failed. I'm not sure how that
| doesn't meet the definition of over-hyped. It feels very
| similar to now. Not even to mention - the web largely
| doesn't consist of .com sites anymore, it's mostly a few
| centralized sites and apps.
| mvdtnz wrote:
| Even if you're right (you're not) whatever "AI" looks like in
| 20+ years will have virtually nothing in common with these
| stupid statistical word generators.
| LarsDu88 wrote:
| Curves that look exponential in virtually all cases turn out to
| be logarithmic.
|
| Certain OpenAI insiders must have known this for a while, hence
| Ilya Sutskever's new company in Israel
| rubiquity wrote:
| > Amodei has said companies will spend $100 million to train a
| bleeding-edge model this year
|
| Is it just me or does $100 million sound like it's on the very,
| very low end of how much training a new model costs? Maybe you
| can arrive within $200 million of that mark with amortization of
| hardware? It just doesn't make sense to me that a new model would
| "only" be $100 million when AmaGooBookSoft are spending tens of
| billions on hardware and the AI startups are raising billions
| every year or two.
| yalogin wrote:
| I do wonder how quickly llms will become a commodity AI
| instrument just like any other AI out there. If so what happens
| to openAI
| russellbeattie wrote:
| Go back a few decades and you'd see articles like this about CPU
| manufacturers struggling to improve processor speeds and
| questioning if Moore's Law was dead. Obviously those concerns
| were way overblown.
|
| That doesn't mean this article is irrelevant. It's good to know
| if LLM improvements are going to slow down a bit because the low
| hanging fruit has seemingly been picked.
|
| But in terms of the overall effect of AI and questioning the
| validity of the technology as a whole, it's just your basic FUD
| article that you'd expect from mainstream news.
| danjl wrote:
| Actually, Moore's Law has been dead for quite a few years now.
| Since we hit the power wall.
| wildermuthn wrote:
| Simply put, AGI requires more data: qualia.
| yobid20 wrote:
| This was predicted. Ai isnt going to get any better.
| jppope wrote:
| Just an observation. If the models are hitting the top of the
| S-curve, that might be why Sam Altman raised all the money for
| OpenAI... it might not be available if Venture Capitalists
| realize that the gains are close to being done
| m3kw9 wrote:
| Hold your horses, OpenAI just came out with o1preview 2 months
| ago, showing what test time computer can do
| devit wrote:
| It seems obvious to me that Common Crawl plus Github public
| repositories have more than an enough data to train an AI that is
| as good as any programmer (at tasks not requiring knowledge of
| non-public codebases or non-public domain knowledge).
|
| So the problem is more in the algorithm.
| darknoon wrote:
| I think just reading the code wouldn't make you a good
| programmer, you'd need to "read" the anti-code, ie what doesn't
| work, by trial and error. Models overconfidence that their code
| will work often leads them to fail in practice.
| krisroadruck wrote:
| AlphaGo got better by playing against itself. I wonder if the
| pathway forward here is to essentially do the same with
| coding. Feed it some arbitrary SRS documents - have it
| attempt to develop them including full code coverage testing.
| Have it also take on roles of QA, stakeholders, red-team
| security researchers, and users who are all aggressively
| trying to find edge cases and point out everything wrong with
| the application. Have it keep iterating and learn from the
| findings. Keep feeding it new novel SRSs until the number off
| attempts/iterations necessary to get a quality product out
| the other side drops to some acceptable number.
| superjose wrote:
| I'm more on the camp that these techs don't need to be perfect,
| but they need to be practical enough.
|
| And I think the latter is good enough for us to do exciting
| things.
| imiric wrote:
| How practical can they be when current flagship models generate
| incorrect responses more than 50% of the time[1]?
|
| This might be acceptable for amusing us with fiction and art,
| and for filling the internet with even more spam and
| propaganda, but would you trust them to write reliable code,
| drive your car or control any critical machinery?
|
| The truly exciting things are still out of reach, yet we just
| might be at the Peak of Inflated Expectations to see it now.
|
| [1]: https://openai.com/index/introducing-simpleqa/
| Timber-6539 wrote:
| Direct quote from the article: "The companies are facing several
| challenges. It's become increasingly difficult to find new,
| untapped sources of high-quality, human-made training data that
| can be used to build more advanced AI systems."
|
| The irony here is astounding.
| czhu12 wrote:
| If it becomes obvious that LLM's have a more narrow set of use
| cases, rather than the all encompassing story we hear today, then
| I would bet that the LLM platforms (OpenAI, Anthropic, Google,
| etc) will start developing products to compete directly with
| applications that supposed to be building on top of them like
| Cursor, in an attempt to increase their revenue.
|
| I wonder what this would mean for companies raising today on the
| premise of building on top of these platforms. Maybe the best
| ones get their ideas copied, reimplemented, and sold for cheaper?
|
| We already kind of see this today with OpenAI's canvas and Claude
| artifacts. Perhaps they'll even start moving into Palantir's
| space and start having direct customer implementation teams.
|
| It is becoming increasing obvious that LLM's are quickly becoming
| commoditized. Everyone is starting to approach the same limits in
| intelligence, and are finding it hard to carve out margin from
| competitors.
|
| Most recently exhibited by the backlash at claude raising prices
| because their product is better. In any normal market, this would
| be totally expected, but people seemed shocked that anyone would
| charge more than the raw cost it would take to run the LLM
| itself.
|
| https://x.com/ArtificialAnlys/status/1853598554570555614
| quantum_state wrote:
| Hope this would be a constant reminder that brute force can only
| get one that far, though it may still be useful when it is. With
| lots of intuition gained, it's time to ponder things a bit more
| deeply.
| dmafreezone wrote:
| Maybe, if you want to relearn the bitter lesson.
|
| http://www.incompleteideas.net/IncIdeas/BitterLesson.html
| cryptica wrote:
| It's interesting the way things turned out so far with LLMs,
| especially from the perspective of a software engineer. We are
| trained to keep a certain skepticism when we see software which
| appears to be working because, ultimately, the only question we
| care about is "Does it meet user requirements?" and this is
| usually framed in terms of users achieving certain goals.
|
| So it's interesting that when AI came along, we threw caution to
| the wind and started treating it like a silver bullet... Without
| asking the question of whether it was applicable to this goal or
| that goal...
|
| I don't think anyone could have anticipated that we could have an
| AI which could produce perfect sentences, faster than a human,
| better than a human but which could not reason. It appears to
| reason very well, better than most people, yet it doesn't
| actually reason. You only notice this once you ask it to
| accomplish a task. After a while, you can feel how it lacks
| willpower. It puts into perspective the importance of willpower
| when it comes to getting things done.
|
| In any case, LLMs bring us closer to understanding some big
| philosophical questions surrounding intelligence and
| consciousness.
| k__ wrote:
| But AGI is always right around the corner?
|
| I don't get it...
| sssilver wrote:
| One thing that makes the established AIs less ideal for my
| (programming) use-case is that the technologies I use quickly
| evolve past whatever the published models "learn".
|
| On the other hand, a lot of these frameworks and languages have
| relatively decent and detailed documentation.
|
| Perhaps this is a naive question, but why can't I as a user just
| purchase "AI software" that comes with a large pre-trained model
| to which I can say, on my own machine, "go read this
| documentation and help me write this app in this next version of
| Leptos", and it would augment its existing model with this new
| "knowledge".
___________________________________________________________________
(page generated 2024-11-14 23:00 UTC)