[HN Gopher] I disagree with Geoff Hinton regarding "glorified au...
___________________________________________________________________
I disagree with Geoff Hinton regarding "glorified autocomplete"
Author : magoghm
Score : 122 points
Date : 2023-11-18 15:39 UTC (7 hours ago)
(HTM) web link (statmodeling.stat.columbia.edu)
(TXT) w3m dump (statmodeling.stat.columbia.edu)
| 11235813213455 wrote:
| I still want to read a ChatGPT text that doesn't look stupidly
| verbose
| chpatrick wrote:
| Why don't you ask it not to be?
| coffeebeqn wrote:
| The default style really annoys me with its verbosity and
| overuse of numbered lists. Describing a tone and length of
| responses and anything else you want - think step by step,
| first principals, show don't tell - all work pretty well.
|
| You can even ask it to only reply with a Boolean true or
| false - which certainly doesn't play to its strengths
| satuke wrote:
| Actually, that's a by-product of RLHF. A base model is
| usually not that verbose.
| JoshTko wrote:
| Use custom instructions. I use "Answers should be very concise
| by default. Never provide disclosures. Answers should get to
| the point immediately." and get super concise answers. If I
| want more detail I just follow up with can you expand on X.
| xcv123 wrote:
| Follow up with the prompt "rewrite that so that it doesn't look
| stupidly verbose" and it will do exactly as requested.
| WillPostForFood wrote:
| Give this GPT called Briefly a try. Responses come back as
| bullet point lists.I prefer more verbose, but interesting to
| see this in contrast.
|
| https://chat.openai.com/g/g-LNsEQH5rz-briefly
| mbrumlow wrote:
| > If you want to be really good, you have to understand what's
| being said. That's the only way.
|
| This is simply not true. Predicting the next letter or word, or
| id you abstract it away from things that mean something to you,
| like the next color of a block in a long chain of colored blocks.
| You would realize that all we are doing is using statistics to
| predict what the next item might be. There simply is no need or
| requirement for the system doing the prediction to understand.
| albertzeyer wrote:
| > You would realize that all we are doing is using statistics
| to predict what the next item might be.
|
| So what does "understanding" really means then?
|
| "Understanding" is not really well defined. Either we (humans)
| do it, but then LLMs might just do it as well, depending on the
| definition of "understanding", or we both don't do it. But if
| no-one is really "understanding" anything, then this definition
| of "understanding" is maybe not useful.
| dboreham wrote:
| Bingo
| Joeri wrote:
| Nobody actually understands how humans actually understand
| something, just like nobody actually understands how LLMs do
| what they do.
|
| Everybody opining about it is doing just that: offering an
| opinion. Geoff Hinton's opinion is worth more than someone
| else's, but it is still an opinion.
| im_down_w_otp wrote:
| I don't know about the human part, but we absolutely
| understand how LLMs do what they do. They're not magic.
| og_kalu wrote:
| No we don't. No it's not "magic". No we don't understand
| what the black box is doing.
| brookst wrote:
| For some values of "we"
| og_kalu wrote:
| For every value of we. "I understand the internals of
| GPT" is the fastest way to demonstrate you have no idea
| what you're talking about.
| chpatrick wrote:
| We understand the architecture but we don't understand
| the weights.
| usrbinbash wrote:
| We also understand, down to a very very very microscopic
| level, how neurons work. We also know a helluva lot about
| the architecture of the brain. Does that mean we can
| explain our own intelligence, how our minds actually
| work? Nope.
| mannykannot wrote:
| I feel that LLMs raise some very interesting challenges for
| anyone trying to figure out what it means to understand
| something and how we do it, but I am not yet ready to agree
| with Hinton.
|
| For example, we are aware that some, but by no means all, of
| what people say is about an external world that may or may
| not conform to what the words say. We can also doubt that we
| have understood things correctly, and take steps to either
| confirm or refute our opinions. We see ourselves as entities
| in an external reality containing other individuals who also
| do this, and that we, and they, have a limited ability to
| influence what happens in that world. Do LLMs do these
| things, or is what they produce a result of having a lot of
| information about the purely formal properties of human
| language use, independently of semantics?
| hackinthebochs wrote:
| >I feel that LLMs raise some very interesting challenges
| for anyone trying to figure out what it means to understand
| something and how we do it, but I am not yet ready to agree
| with Hinton.
|
| Agreed. What LLMs say about understanding deserves a lot
| more attention than it has received. I wrote down some of
| my thoughts on the matter:
|
| https://www.reddit.com/r/naturalism/comments/1236vzf
|
| >Do LLMs do these things, or is what they produce a result
| of having a lot of information about the purely formal
| properties of human language use, independently of
| semantics?
|
| These two points aren't necessarily in opposition, and
| understanding why is I think key to solving a lot of
| important problems around intelligence, sentience, etc. To
| compute is to operate on formal properties. But this
| doesn't exclude semantic properties from having causal
| relevance to the behavior of the system. What we need is a
| way to conceptualize how a system can have multiple related
| descriptions at different levels. A description at the
| level of semantics doesn't exclude a description in terms
| of formal properties or vice versa. I think of it in terms
| of constraints: the higher level descriptions constrain the
| lower level behavior. What the computational description
| does is ensures the higher level semantic constraint is
| maintained. It does this by the particular space of
| computational dynamics it follows. Essentially, the
| information that picks out this programs space of branching
| dynamics embeds the semantic description in question, and
| this description realizes the computational dynamic
| necessary to maintain the higher level semantic constraint.
| Rather than semantics being in opposition to formal
| properties, they are two sides of the same coin.
| satuke wrote:
| Isn't that exactly how humans learn to respond to stimuli?
| Don't we just try to predict the best next response to
| everything? Yes, It's statistics but the fun part is nobody is
| writing this statistical function by hand.
| civilized wrote:
| LLMs don't have a concept of "best". Only most likely in what
| they've been trained on.
|
| I think LLMs ultimately just take imitation to a creative and
| sophisticated extreme. And imitation simply doesn't comprise
| the whole of human intelligence at all, no matter how much it
| is scaled up.
|
| The sophistication of the imitation has some people confused
| and questioning whether everything can be reduced to
| imitation. It can't.
|
| The ability to imitate seeking a goal isn't identical to the
| ability to seek a goal.
|
| The ability to imitate solving a problem isn't identical to
| the ability to solve a problem.
|
| Imitation is very useful, and the reduction of everything to
| imitation is an intriguing possibility to consider, but it's
| ultimately just wrong.
| lordnacho wrote:
| Ok, so now we need an example that separates humans from
| LLMs?
|
| I struggle to think of one, maybe someone on HN has a good
| example.
|
| Eg if I'm in middle school and learning quadratic
| equations, am I imitating solving the problem by plugging
| in the coefficients? Or am I understanding it?
|
| Most of what I see coming out of chatGPT and copilot could
| be said to be either. If you're generous, it's
| understanding. If not, it's imitation.
| civilized wrote:
| It is very easy to separate humans from LLMs. Humans
| created math without being given all the answers
| beforehand. LLMs can't do that yet.
|
| When an LLM can create math to solve a problem, we will
| be much closer to AGI.
| abm53 wrote:
| You can ask ChatGPT to solve maths problems which are not
| in its training data, and it will answer an astonishing
| amount of them correctly.
|
| The fact that we have trained it on examples of human-
| produced maths texts (rather than through interacting
| with the world over several millennia) seems like more of
| an implementation detail and not piece of evidence about
| whether it has "understood" or not.
| IshKebab wrote:
| _Some_ humans created maths. And it took thousands of
| years of thinking and interaction with the real world.
|
| Seems like goalpost moving to me.
|
| I think the _real_ things that separate LLMs from humans
| at the moment are:
|
| * Humans can do online learning. They have long term
| memory. I guess you could equate evolution to the
| training phase of AI but it still seems like they don't
| have quite the same on-line learning capabilities as us.
| This is what probably prevents them from doing things
| like inventing maths.
|
| * They seem to be incapable of saying "I don't know". Ok
| to be fair lots of humans struggle with this! I'm sure
| this will be solved fairly soon though.
|
| * They don't have a survival instinct that drives
| proactive action. Sure you can _tell_ them what to do but
| that doesn 't seem quite the same.
| naveen99 wrote:
| Interestingly some humans will admit to not knowing but
| are allergic to admitting being wrong (and can get fairly
| vindictive if forced to admit being wrong).
|
| LLM's actually admit to being wrong easily, but aren't
| great at introspection and confabulate too often. also
| their Meta cognition is poor still.
| IshKebab wrote:
| I guess LLM's don't have the social pressure to avoid
| admitting errors. And those sort of interactions aren't
| common in text so they don't learn them strongly.
|
| Also ChatGPT is trained specifically to be helpful and
| subservient.
| civilized wrote:
| About this goalpost moving thing. It's become very
| popular to say this, but I have no idea what it's
| supposed to mean. It's like a metaphor with no underlying
| reality.
|
| Did a wise arbiter of truth set up goalposts that I
| moved? I guess I didn't get the memo.
|
| If the implied claim is "GPT would invent math too given
| enough time", go ahead and make that claim.
| IshKebab wrote:
| > Did a wise arbiter of truth set up goalposts that I
| moved?
|
| Collectively, yes. The criticism of AI has always been
| "well it isn't AI because it can't do [thing just beyond
| its abilities].
|
| Maybe individually your goalpost hasn't moved, and as
| soon as it invents some maths you'll say "yep, it's
| intelligent" (though I strongly doubt it). But
| collectively the naysayers in general will find another
| reason why it's not _really_ intelligent. Not like us.
|
| It's very tedious.
| corethree wrote:
| You need to think deeper.
|
| There are levels of sophistication in "imitation". It
| follows a gradient. At the low end of this gradient is a
| bad imitation.
|
| At the high end of this gradient is a perfect imitation.
| Completely indistinguishable from what it's imitating.
|
| If an imitation is perfect than is it really an imitation?
|
| If I progressively make my imitation more and more accurate
| am I progressively building an imitation or am I
| progressively building the real thing?
|
| See what's going on here? You fell for a play on words.
| It's a common trope. Sometimes language and vocabulary
| actually tricks the brain into thinking in a certain
| direction. This word "imitation" is clouding your thoughts.
|
| Think about it. A half built house can easily be called an
| imitation of a real house.
| canjobear wrote:
| > LLMs don't have a concept of "best". Only most likely in
| what they've been trained on.
|
| At temperature 0 they are effectively producing the token
| that maximizes a weighted sum of base LM probability and
| model reward.
| genman wrote:
| I don't think that also humans in general have this concept
| of "best".
|
| But humans are able to build certain routines within their
| own system to help them to rationalize.
| RandomLensman wrote:
| Yes, if you are in the no free will school of thought, then
| that would be what humans do.
| usrbinbash wrote:
| > Isn't that exactly how humans learn to respond to stimuli?
|
| Maybe it is, maybe it isn't. Maybe we are "just" an
| incredibly powerful prediction engine. Or maybe we work from
| a completely different modus operandi, and our ability to
| predict things is an emergent capability of it.
|
| The thing is, no one actually knows what makes us
| intelligent, or even how to define intelligence for that
| matter.
| randyrand wrote:
| the human brain is just a statistical machine.
| brookst wrote:
| Odds are you're right
| continuational wrote:
| I think the most useful definition of understanding something
| is that you can explain it and use it in context.
|
| ChatGPT routinely does both.
| jstummbillig wrote:
| And while AI gets better and better and we will remain as
| touchy as ever about abstract concepts that make us oh so
| human, how about we say it just can't be understanding,
| unless a human does, eh, it.
| BobaFloutist wrote:
| Someone sufficiently fast and skilled at googling can explain
| and use in context a lot of things that they don't really
| properly understand.
|
| So unless you're saying that the _composite_ of the googler
| and of google understand something that neither does
| individually, your definition has some holes.
| continuational wrote:
| This is a variation of the Chinese room argument.
|
| If you consider understanding an observable property, then
| the Chinese room in aggregate displays understanding of
| Chinese.
|
| Would you say that humans understand nothing, because atoms
| don't understand anything, and we're made up of atoms?
| BobaFloutist wrote:
| I would say that there is a stronger consensus that a
| human being can be reasonably described as a single
| entity than a human being using a reference resource.
|
| A more apt comparison to my mind would be if a human
| being can be described as personally exerting strong
| nuclear force, just because their subatomic particles do,
| which I would happily answer "no."
| skepticATX wrote:
| How about this: understanding is the ability to generalize
| knowledge and apply it to novel scenarios.
|
| This definition is something that humans, and animals for
| that matter, do every day - both in small and large ways. And
| this is something that current language models aren't very
| good at.
| continuational wrote:
| What is the test for this?
|
| I taught it Firefly, which is an undocumented programming
| language I'm working on, through conversion.
|
| I find it's a lot quicker than any human at picking up
| syntax and semantics, both in real time and in number of
| messages, and makes pretty good attempts at writing code in
| it, as much as you could expect from a human programmer.
|
| That is, until you run out of context - is this what you
| mean?
| skepticATX wrote:
| There are plenty of results supporting my assertion; but
| the tests must be carefully designed. Of course, LLMs are
| not databases that store exact answers - so it's not
| enough to ask it something that it hasn't seen, if it's
| seen something similar (as is likely the case with your
| programming language).
|
| One benchmark that I track closely is ConceptARC, which
| aims to test generalization and abstraction capabilities.
|
| Here is a very recent result that uses the benchmark:
| https://arxiv.org/abs/2311.09247. Humans correctly solved
| 91% of the problems, GPT-4 solved 33%, and GPT-4V did
| much worse than GPT-4.
| continuational wrote:
| I wouldn't be surprised if GPT-4 is not too good at
| visual patterns, given that it's trained on text.
|
| Look at the actual prompt in figure 2. I doubt humans
| would get a 91% score on that.
| stubybubs wrote:
| I gave it the three lightbulbs in a closet riddle.
|
| https://puzzles.nigelcoldwell.co.uk/seven.htm
|
| The key complication is "once you've opened the door, you
| may no longer touch a switch." It gets this. There are
| many examples of it written out on the web. When I give
| it a variation and say "you can open the door to look at
| the bulbs and use the switches all you want" and it is
| absolutely unable to understand this. To a human it's
| simple: look at the bulbs and flick the switches. It kept
| giving me answers about using a special lens to examine
| the bulbs, using something to detect heat. I explained it
| in many ways and tried several times. I was paying for
| GPT-4 at the time as well.
|
| I would not consider this thinking. It's unable to make
| this simple abstraction from its training data. I think 4
| looks better than 3 simply because it's got more data,
| but we're reaching diminishing returns on that, as has
| been stated.
| Kiro wrote:
| How do you define understanding when predicting the next color?
| karaterobot wrote:
| > You would realize that all we are doing is using statistics
| to predict what the next item might be.
|
| I agree that Hinton's original quote doesn't make sense to me
| either. I suspect he would leverage the phrase "really good" to
| explain the difference between ChatGPT and, say, a Markov
| chain. I think that's a little disingenuous, if that's how he
| means it, but I don't know if I'm right about that.
|
| But I also do not agree that humans use statistics to predict
| what the next item in a series might be. As evidence, there is
| the classic example of asking people to predict the next coin
| toss in the series: "heads, heads, heads, heads, heads, heads,
| heads...". They'll either guess heads because it's come up so
| many times already, or because they assume the coin isn't fair,
| or tails because it's "overdue" to come up, but none of those
| are based on statistics _per se_.
| troelsSteegin wrote:
| From the article, Gelman's money quote is this: " So I'm not
| knocking auto-complete; I'm just disagreeing with Hinton's
| statement that "by training something to be really good at
| predicting the next word, you're actually forcing it to
| understand." As a person who does a lot of useful associative
| reasoning and also a bit of logical understanding, I think
| they're different, both in how they feel and also in what they
| do."
| hackerlight wrote:
| Why is "using statistics" mutually exclusive with
| "understanding"? It would help to carefully define terms. Note
| that "understanding" doesn't mean it's necessarily conscious.
|
| These systems learn high-level representations/abstractions of
| concepts we humans also use as part of our cognition. The
| concept of an object, an intuitive physics, the role of
| specific objects. I don't criticize him for using the word
| "understanding" to describe this.
| porknubbins wrote:
| As a translator I have come to think of it like this- (without
| the need for defining what "understanding" means) the ability
| to translate correctly depends on the size of your context. So
| humans and LLMs are both on the same spectrum with different
| context sizes of their training set and all facts known in the
| universe (for a theoretical perfect human) respectively.
|
| Doing moderately technical translations, state of the art LLMs
| may get it about 99% right but you find about 1 word per page
| where they would have needed to "zoom out" to a wider context
| to choose the word correctly. Often its an important word that
| is missed.
| usrbinbash wrote:
| > You would realize that all we are doing is using statistics
| to predict what the next item might be.
|
| Two questions:
|
| 1. How can humans come up with novel ideas then? Statistically
| predicting the next item means I am limited to the training
| set. How do humans formulate new concepts in this scenario?
|
| 2. Why can humans learn from their own output then? If all we
| do is statistically predicting the next token, then, by
| necessity, feeding our own output back into the model (aka.
| letting humans learn from it) should degrade human capabilities
| by overfitting to the training set. But we observe exactly the
| opposite.
| micromacrofoot wrote:
| > How can humans come up with novel ideas then?
|
| hallucinations, emotions
|
| > Why can humans learn from their own output then?
|
| if we don't, we suffer
| jameshart wrote:
| > Statistically predicting the next item means I am limited
| to the training set
|
| Not at all. Extrapolation is perfectly possible in a purely
| predictive model. It's one of the things GPTs are _best_ at.
|
| In the stream of tokens output by an LLM it's completely
| possible for new concepts to emerge, and for it then to
| continue to use and build on them in the remainder of the
| stream. You see this simply executed in programming contexts
| with LLMs where it is able to declare a novel function and
| then use it.
| usrbinbash wrote:
| > with LLMs where it is able to declare a novel function
| and then use it.
|
| Novel as in "implements a new algorithm that has never seen
| in any form and is actually an improvement over existing
| methodology"?
|
| Here is a little thought experiment: If all the training
| data in the set says that manned, powered flight is
| impossible, is a statistical prediction engine trained on
| that data capable of developing an airplane?
|
| In other words, could a very powerful stochastic parrot do
| what the Wright Brothers did?
|
| My best guess is that no, it cannot.
| lovecg wrote:
| Now I'm wondering if birds didn't exist how much longer
| would it take us to catch on that flight is possible at
| all.
| canjobear wrote:
| The best "statistic" you could compute to predict the next
| token would be the meaning of the prompt. A predictor computing
| meaning would outperform one that is not.
| xcv123 wrote:
| By "understanding" he means it builds a model of higher order
| abstractions over the text in order to model the meaning and
| context of each token in relation to other tokens. This is more
| sophisticated than a Markov process.
| Jensson wrote:
| > This is more sophisticated than a Markov process.
|
| Nothing prevents a markov process from having a world model.
| A markov process lacks state and generates a sequence based
| on statistics and previous entries, so LLMs as they are now
| are markov processes.
|
| Many of the dumb behaviors we see from LLMs today comes from
| their lack of internal state between tokens, so it don't
| remember what reason it had for generating the previous token
| and that means it can easily generate inconsistent answers.
| So LLMs being a markov process is an important point to
| highlight since it makes its thinking very different from how
| humans think.
| xcv123 wrote:
| The transformer architecture is NOT a Markov process, by
| mathematical definition of a Markov process. This is not
| even debatable. It's a mathematical fact.
|
| > Many of the dumb behaviors we see from LLMs today comes
| from their lack of internal state between tokens, so it
| don't remember what reason it had for generating the
| previous token and that means it can easily generate
| inconsistent answers
|
| The attention mechanism in the transformer architecture
| models relations between tokens within the context window,
| and does the exact opposite of what you are describing
| here. This is one aspect of LLMs that violates the Markov
| property.
| Jensson wrote:
| > The transformer architecture is NOT a Markov process,
| by mathematical definition of a Markov process. This is
| not even debatable. It's a mathematical fact.
|
| What, yes it is.
|
| > The attention mechanism in the transformer architecture
| models relations between tokens within the context
| window, and does the exact opposite of what you are
| describing here. This is one aspect of LLMs that violates
| the Markov property.
|
| The context window is finite, so that is the previous
| step. You know the dumb markov chains that are based on
| bag of words? They also look several words back, they
| don't just go based on a single word. LLMs are just that
| but with a way larger lookback and some extra logic
| there, but none of that changes the fundamental parts to
| make it no be a markov process.
|
| With a large enough context size you could argue it is
| now fundamentally different in practice, but in theory it
| is the same. There is no "hidden state", its just the
| previous n words defines the next word.
| xcv123 wrote:
| If you define the state broadly enough to include the
| state of the entire machine itself, including all of its
| internal representations, weights, activations, etc, then
| you are playing a funny trick here.
|
| By the same reasoning, a human brain is also Markov
| process.
|
| What you are doing here is a vast oversimplification and
| it is practically useless for understanding how LLMs
| work.
| tayo42 wrote:
| This kind of was on my mind recently, I was wondering, when I
| speak am I just spitting out the next word that makes sense or is
| there more to it.
|
| I think there is for people, I can think before I speak, I can
| plan out my thought entirely before turning it into words. Each
| invocation of the chat bot/llm is a new set of probabilities. I
| can plan what my 2nd token output will be and stick to it. Llm
| models dont have the ability to think past the first.
|
| To me that is auto completing, just done better, unless models
| really do have a sense of planning ahead. But I don't think the
| really complex game playing ai do that either.
| pixl97 wrote:
| >I can think before I speak, I can plan out my thought entirely
| before turning it into words.
|
| In theory models can/could do the same thing. Think of the
| current text output of a model being those thoughts inside your
| head, you have an internal scratch space you work on ideas,
| then after you perform 'chain of thought' on them, you output
| to the world. What you're outputting to the world isn't your
| directly thoughts, but more of a calculation of what you wanted
| to achieve.
|
| Really for LLMs you just need to have the model put it's output
| to an internal buffer, read that buffer and make sure it makes
| sense, then output that to the end user.
|
| It's probably not fast or compute efficient to do that at this
| time, but at some point it likely will be.
| drwiggly wrote:
| >Really for LLMs you just need to have the model put it's
| output to an internal buffer, read that buffer and make sure
| it makes sense, then output that to the end user.
|
| Makes sense to what. The LLM doesn't have a goal, other then
| to spew text that looks like it should be there.
| stevenhuang wrote:
| The analogy lies in the fact that, much like evolution
| through natural selection, deliberate intelligence/ability
| of organisms to comprehend reality is not the objective,
| but something else entirely is.
|
| For evolution, it's fitness. For LLMs, it's the next token.
|
| Yet despite that, the ability to reason emerges as a means
| to an end.
| pixl97 wrote:
| To the terminal or instrumental goal of the statement it is
| working on.
|
| Question to LLM, "I have one hundred and eleven eggs in the
| store and another two hundred and twenty two are showing up
| in an hour, how many eggs will I have in total"
|
| Internal response "this looks like math problem that
| requires addition. the answer is 333. use a calculator
| validate 111 + 222. (send 111+222, receive 333). Tool
| returns 333 validing previous response"
|
| External response: "The answer is 333"
|
| This chain of logic is internally consistent, hence makes
| sense.
| tayo42 wrote:
| Yeah thinking about it more, it does seem possible.
|
| Like the attention masking done in transformers, I get why,
| but I also never understood why you wouldn't want attention
| to happen, at least about concepts and abstract ideas about
| what is coming up in the phrase before predicting a word
| TerrifiedMouse wrote:
| > when I speak am I just spitting out the next word that makes
| sense or is there more to it.
|
| There is more to it. Specifically you are doing so to advance
| towards a specific goal. LLMs don't have goals. They just pick
| from a list of likely tokens - based on their training data -
| at random to generate the next token.
| brookst wrote:
| So the externally provided metaprompt doesn't establish a
| goal that generation works toward?
| TerrifiedMouse wrote:
| I don't think LLMs work towards anything. It just picks
| from a list of likely "next tokens" at random.
| brookst wrote:
| "Likely" as defined by...?
| TerrifiedMouse wrote:
| Statistics as defined by training data.
|
| https://writings.stephenwolfram.com/2023/02/what-is-
| chatgpt-...
| nkingsy wrote:
| I think people get tricked by the forward pass mechanics into
| thinking a single generation is comparable to a human thought
| process.
|
| I think we have a llm like mechanism we can employ and lean on
| as heavily as we like, but we also have an executive function,
| like thousands of specialized instant Boolean checks, which can
| adjust and redirect the big talky model's context on the fly.
|
| My hunch is it's turtles all the way down. "Gut feelings" are
| hyper-optimized ASICS with tiny parameter counts, but all using
| the same intelligence mechanisms.
|
| Extrapolating from that hunch, we are already witnessing AGI,
| and in fact we've started at the top.
|
| I believe that current llms are actually far far superior to a
| human language center and current architectures are more than
| sufficient in terms of raw intelligence.
|
| The challenge that remains is to understand, train, integrate,
| and orchestrate all the different flavors of intelligence that
| humans wield so elegantly and naturally as to make them opaque
| to our own understanding.
| overactor wrote:
| I think this misses the point made by Hinton. The point is that,
| if a model is to reach a certain level of autocomplete, it needs
| to understand what it is talking about. Imagine the result of
| what required deep thought from you is included in the training
| data, along with billions of other such exchanges. At some point
| the only way to improve training score is to develop the ability
| to "understand" things. (That is, if the model isn't big enough
| to memorize the training data.)
|
| Universal autocomplete implies understanding and the only ending
| question is if our current architectures and model sizes are able
| to achieve any level of it.
| hopfenspergerj wrote:
| I agree. We should prompt the model with the statement of the
| Riemann hypothesis. If the autocomplete is good, the model will
| output a proof.
| overactor wrote:
| No human can solve the Riemann Hypothesis. Why do you expect
| an AI to do it before you consider it able to understand
| things?
| scarmig wrote:
| The key bit is constructing a hard, novel proof. The fact
| that AI doesn't (yet) do this isn't evidence that it
| doesn't reason, but if it did so it would be strong
| evidence that it does reason.
|
| (I also take the pessimistic point of view that most humans
| don't reason, so YMMV.)
| lordnacho wrote:
| OTOH, doing something that only humans have done thus far
| would be a huge step in demonstrating understanding.
|
| Does that mean when a computer outputs a new proof it
| understands?
| corethree wrote:
| You can do something similar to this without giving it a
| problem that might be impossible.
|
| Train the LLM on a bunch of things but avoid certain
| things...Things that humans already know about.
|
| The you query the model about that thing. See if the
| model can come to the same conclusions humans do. You can
| actually do this right now with chatGPT.
| lordnacho wrote:
| So what happens?
| corethree wrote:
| This is a response from chatGPT. I asked it to come up
| with something novel and creative. I went through several
| iterations and eventually it started doing math stuff and
| I had it focus on that:
|
| Understood, let's propose a distinct mathematical concept
| within the realm of pure mathematics: "Infinite-
| Dimensional Fractal Algebra." This new branch of algebra
| focuses on operations, equations, and functions within an
| infinitely dimensional fractal space.
|
| In Infinite-Dimensional Fractal Algebra, traditional
| algebraic operations like addition, multiplication, and
| exponentiation are redefined and extended to work within
| an infinite fractal structure. Each point in this fractal
| space represents a unique algebraic entity, and the
| relationships between these points follow complex, self-
| repeating patterns characteristic of fractals.
|
| The concept challenges and expands the traditional
| boundaries of algebra by integrating the self-similarity
| and scale invariance of fractals into algebraic
| operations. For example, solving an equation in this
| system would involve navigating through the infinitely
| repeating patterns of the fractal, where each level of
| the fractal represents a different dimension or layer of
| the problem.
|
| This approach could offer new insights into complex
| systems and phenomena that exhibit fractal-like behavior,
| and potentially lead to breakthroughs in areas like chaos
| theory, dynamic systems, and perhaps even in
| understanding the fabric of the universe itself.
|
| To illustrate the concept of Infinite-Dimensional Fractal
| Algebra, let's consider a simple algebraic operation:
| addition.
|
| In traditional algebra, adding two numbers is
| straightforward: \\(3 + 4 = 7\\). In Infinite-Dimensional
| Fractal Algebra, however, this operation takes place in a
| fractal space with infinite dimensions.
|
| 1. *Fractal Representation*: Instead of representing
| numbers as points on a line, each number is represented
| by a unique fractal structure. The fractal for '3' and
| the fractal for '4' each have infinitely complex, self-
| similar patterns that vary across infinite dimensions.
|
| 2. *Adding Fractals*: The operation of addition involves
| combining these fractal structures. This isn't a simple
| sum but a fusion of their infinite patterns. The
| resulting fractal for '7' contains elements of both the
| '3' and '4' fractals, but in a way that the patterns and
| complexities of each are preserved and merged.
|
| 3. *Infinite Dimensions*: Each level of depth in the
| fractals represents a different dimension of the problem.
| As you dive deeper into the fractals, you encounter more
| intricate patterns, representing more complex layers of
| the addition.
|
| 4. *Result Interpretation*: The outcome of the operation
| is a new fractal representing '7'. But this '7' is not
| just a number; it's a multi-dimensional entity with a
| pattern that encodes not just the value '7', but also the
| history of how it was derived from '3' and '4'.
|
| This approach would require a rethinking of how we
| visualize and understand mathematical operations. The
| fractal patterns in Infinite-Dimensional Fractal Algebra
| carry more information than just the numerical value;
| they encode the relationships and interactions between
| numbers in a deeply interconnected, infinite-dimensional
| space.
|
| Let's choose the Sierpinski Triangle, a well-known
| fractal, to represent the identity element for addition
| in Infinite-Dimensional Fractal Algebra. In traditional
| algebra, the identity element for addition is '0', as
| adding zero to any number doesn't change its value. Here,
| we'll conceptualize this using the Sierpinski Triangle.
|
| 1. *Sierpinski Triangle as the Identity*: In this
| framework, the Sierpinski Triangle represents the '0'.
| It's a fractal with the property of self-similarity: any
| part of the triangle resembles the whole.
|
| 2. *Adding to the Identity*: When another number
| (fractal) is added to this Sierpinski Triangle, the
| process involves integrating the fractal patterns of this
| number with the Sierpinski Triangle. However, since the
| Sierpinski Triangle represents '0', this integration
| process must preserve the original number's fractal
| structure intact.
|
| 3. *Concrete Example*: Suppose we're adding the number
| '2', represented by a different fractal structure, to '0'
| (the Sierpinski Triangle). The addition operation would
| overlay the '2' fractal onto the Sierpinski Triangle. The
| key here is that the Sierpinski Triangle, with its
| repetitive and self-similar gaps, would not alter the
| structure of the '2' fractal. Instead, it would fill into
| the gaps of the Sierpinski Triangle, maintaining its own
| pattern.
|
| 4. *Resulting Fractal*: The resulting fractal from this
| addition is the '2' fractal itself, as the Sierpinski
| Triangle (being the identity element) does not alter the
| other number in the addition. The intricate details of
| the '2' fractal remain unchanged, signifying that adding
| '0' did not change its value.
|
| This is a very abstract and conceptual way to represent
| mathematical operations, and the choice of the Sierpinski
| Triangle for the identity is primarily for its self-
| similar and recursive properties, which align well with
| the concepts in fractal algebra.
| corethree wrote:
| Basically it formed a fuzzy idea of a algebra using
| different fractals as entities.
|
| I'm sure this can be mapped out further into very
| concrete detail. It's a highly realistic idea. we have
| algebras for all kinds of things from complex numbers to
| lists.
|
| Choosing the triangle for identity is probably the wrong
| choice though. The identity fractal should be zero
| dimensional or nothing. I think that will in actuality
| end up fitting the rules of the identity fractal if we
| ever decided to map out this algebra.
|
| If you're not familiar with abstract algebra basically
| it's choosing some fractal that's equivalent to a zero
| value and coming up with ways to combine fractals with
| operations that hold the same properties of associativity
| and commutativity that multiplication/addition does for
| numbers.
| dboreham wrote:
| > it needs to understand what it is talking about
|
| "to understand" is a concept invented by the human brain to
| help it operate. It doesn't actually mean anything concrete in
| the physical world. See: the Chinese room argument
| (paradoxically it proves the converse of what Searle was
| arguing -- that there is no such thing as understanding,
| because you can emulate a system with identical outputs from
| the same set of inputs without caring or labeling what happens
| inside the room).
| scudsworth wrote:
| that's, uh, not the conclusion of the chinese room argument
| ImHereToVote wrote:
| The Chinese room doesn't have a model; it has an infinite
| dictionary. A sufficiently large LLM isn't a dictionary, it
| has to contain a model of some sort.
| attilakun wrote:
| I agree, Hinton's point is more about next-token-prediction
| being a good training objective for understanding, and not
| whether LLMs reproduce every aspect of a human's thought
| processes. They clearly don't (at least not at present scale),
| otherwise we'd have AGI now!
| zeroonetwothree wrote:
| We have quantum mechanics, a model of the universe that makes
| predictions with absurd levels of accuracy. Yet we don't really
| understand "why" it works---I don't think we can claim that we
| really "understand" the universe.
|
| This disproves the claim that making accurate predictions implies
| understanding. Or perhaps a better way to put it is that the
| model has some understanding but a model doesn't necessarily
| reflect underlying truth.
| raincom wrote:
| Best examples come from medicine. There are many drugs that
| work, but we don't understand how they work. For example:
| https://en.wikipedia.org/wiki/Category:Drugs_with_unknown_me...
| marginalia_nu wrote:
| What are you even asking when you're asking why it works.
| Ultimately, it works the way it does because that's the way
| things are. I'm not sure what sort of answers you expect when
| asking why physics acts in a particular fashion.
|
| Also here's Feynman ranting about why questions:
| https://www.youtube.com/watch?v=36GT2zI8lVA
| firebaze wrote:
| Asking "why" may lead to really new physics, though. I assume
| the reason for discovering the theory of relativity was
| asking "Why is the speed of light constant in all
| directions".
| marginalia_nu wrote:
| My point is, why isn't really a question that makes sense.
|
| There isn't a "why" the speed of light is constant and
| isotropic. The answers you typically see for that question
| is just different ways of saying the same thing; there is
| no causal relationship.
|
| In the end, it just is. There are things that must be true
| if this is the case, and things that can't be true if this
| is the case.
| overactor wrote:
| Making accurate predictions in a specific domain doesn't imply
| understanding, sure. Universal autocomplete can make accurate
| predictions in all domains. Apples to oranges.
| xcv123 wrote:
| By "understanding" he simply means it builds a model of
| higher order abstractions over the text in order to model the
| meaning and context of each token in relation to other
| tokens. This is more sophisticated than a Markov process.
| zeroonetwothree wrote:
| I found this post to be overly long and unclear in its focus.
| Also who is this person that only serves as an autocomplete in
| meetings? Maybe don't go to those meetings then? In my meetings
| I'm mainly having to decide what our approach will be in some
| ambiguous problem space.
| coldtea wrote:
| > _In my meetings I'm mainly having to decide what our approach
| will be in some ambiguous problem space._
|
| Maybe that's because you are just some manager or exec in some
| company, and not a professor that has to transfer his
| knowledge, desribe, and explain things for others in his
| meetings?
|
| The post has a very specific subject it discusses. It's also
| not required reading, if someone finds it long, they can skip
| it.
| patcon wrote:
| Yeah, I agree there are two types:
|
| 1) repeating things he's reflected in or seen in the past (more
| like autocomplete), or
|
| 2) mixing two semantic concepts that he's seen in the past, and
| using it as a bridge to walk over to get to the next word/concept
| (which is back to autocomplete mode)
|
| The second is like crafting in Minecraft, instead of just taking
| things out of inventory (but all of that is happening in high
| dimensional semantic space :) )
| cmdli wrote:
| The argument "a sufficiently capable autocomplete must contain a
| level of general intelligence" is correct but also not very
| useful. It is a lot like saying "a sufficiently fast horse can
| fly".
|
| It is technically correct that when you take things to the
| extreme you can accomplish great things, but we may not reach
| those levels. We may require completely different technology to
| reach those levels of autocomplete, and we have simply reached a
| new plateau at this point in time.
| morkalork wrote:
| Just need a jet engine strapped to your horse for that
| wyago wrote:
| Funnily enough, "a sufficiently fast horse can fly" sounds sort
| of like a decent way to convey the idea of planes to a pre-car
| world.
| og_kalu wrote:
| The argument is simpler than that. Prediction requires a model,
| completely accurate or not. There's a projection of the world
| in text. A model of the text data we feed it is a model of the
| world as humans see it. The trend of loss is more and more
| accurate models of the dataset. So it won't stop at any
| arbitrary competency level. Indeed, there are already a few
| abilities GPT possess that are deemed Super Human. It's not a
| distinction that matters to the machine. It's all just data to
| be modelled.
|
| We have reached those levels lol. That's why we're having this
| argument.
| cmdli wrote:
| I think the trouble is that "model" is a very general term.
| If you had a computer doing simulations of artillery shots
| back in the 50s, then it would have a "model" of the world in
| terms of variables tracking projectiles, but this model
| doesn't generalize to anything else. If a computer does image
| recognition from the 90s and 2000s to recognize faces, then
| the computer has a "model" of visual information in the
| world, but this model only lets it recognize faces.
|
| ChatGPT has a model of all the text information on the
| internet, but it remains to be seen what the hard limits of
| this model are. Does this model let it do logic or predict
| the future well, or will no amount of training give it those
| abilities? Simply being good in one task doesn't imply a
| general ability to do everything, or even most of everything.
| LLM's would simply be the last advancement in a field with a
| lot of similar advancements.
| og_kalu wrote:
| >ChatGPT has a model of all the text information on the
| internet, but it remains to be seen what the hard limits of
| this model are.
|
| Before training is complete and loss is maxed, there will
| be limits on what the "learned so far" model can do that
| say absolutely nothing about the limits of a perfect(or
| very close to it) model.
|
| It really looks like anything will converge with enough
| compute. I don't think architecture is particularly
| important except as "how much compute will this one take?"
| question.
|
| https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-
| dat...
|
| >Does this model let it do logic or predict the future
| well, or will no amount of training give it those
| abilities?
|
| There's nothing special about logic. Basically, any
| sequence is fair game. It literally does not matter to the
| machine.
|
| Boolformer: Symbolic Regression of Logic Functions with
| Transformers(https://arxiv.org/abs/2309.12207)
|
| That said, GPT-4 can already do logic. It's not perfect but
| if perfect logic were a requirement then humans cannot do
| logic either.
|
| >Simply being good in one task doesn't imply a general
| ability to do everything, or even most of everything.
|
| It's not one task. It's one modality (text) that a plethora
| of tasks could be learned in.
|
| Coding and playing chess did not suddenly become a single
| task just because we found the common ground that allows a
| machine to learn both.
|
| The text, image, video and audio data we could feed a
| transformer will cover anything we care about.
| ijidak wrote:
| I've noticed that when I speak I really don't control each
| word.
|
| I have an idea that I want to convey, but how each word comes
| to my mind as I form a sentence has always felt like it's
| controlled by an unconscious algorithm.
|
| So I don't understand why people find this prediction
| mechanism so alien.
|
| It isn't clear to me how much of communication is really in
| our control.
|
| With the current tools, it feels like we still provide the
| ideas we want the AI to convey, and it may be using a nearly
| identical mechanism to us to form the words.
|
| Consciousness would be the computer being able to come up
| with the ideas.
|
| So, it seems to me we've gotten close enough on the
| communication side of intelligence.
|
| But the machine is not conscious. When it is, it seems like
| it will generate its own ideas.
|
| Are people debating whether the machine is conscious?
|
| Otherwise, it feels very straightforward to grasp what we've
| made up to now.
| gmuslera wrote:
| Is not black or white. Probably most of what we do is System 1,
| most of the time we are mostly meme machines, and so is a good
| part of the activity on some sectors.
|
| But are able to make steps forward, intuition, hard step by step
| reasoning, finding connections between dots, etc. GPT can do some
| of that, and in some point of the road someone must decide if we
| reached somewhere else. Even if making the full road may or not
| be possible in a foreseeable future.
| arketyp wrote:
| Indeed. What has caught researchers off guard is the way system
| 2 properties seem to appear as emergent phenomena in LLMs. This
| is also what has prompted people like Hinton and Sutskever to
| make this condensed point about statistical modelling and
| understanding (hypothetically) being simply matter of a
| spectrum.
| Kiro wrote:
| Reminds me of this scene from Westworld (slightly NSFW):
| https://www.youtube.com/watch?v=ZnxJRYit44k
| tester457 wrote:
| Also spoilers
| canjobear wrote:
| The argument is: "deep thinking" isn't like autocomplete because
| it feels different.
| bananaflag wrote:
| Deep thinking is how autocomplete feels like from the inside.
| fritzo wrote:
| Andrew's distinction between associative vs logical thinking
| reminds me of two kinds of programming thinking. Half of my time
| while programming is spent churning out glue code, satisfying a
| type checker and test suite, and implementing the simple solution
| that turns out to work. The other half of my time is spent slowly
| thinking through why some simple solution is wrong, and step by
| step proceeding towards the correct but complex solution. The
| former phase is intuitive, the latter phase is scientific, where
| I hypothesize, test, and repeat.
|
| Reading through the code-as-transcript afterwards it's unclear
| which bits of code required shallow associative vs deep rational
| thinking, pure autocomplete vs latent chain of thought.
| merizian wrote:
| The fallacy being made in this argument is that computers need to
| perform tasks the same way as humans to achieve equal or better
| performance on them. While having better "system 2" abilities may
| improve performance, it's plausible that scaled-up next-token
| prediction along with a bit of scaffolding and finetuning could
| match human performance on the same diversity of tasks while
| doing them a completely different way.
|
| If I had to critique Hinton's claims, I would say his usage of
| the word "understand" can be vague and communicate assumptions
| because it's from an ontology used for reasoning about human
| reasoning, not this new alien form of reasoning which language
| models embody.
| BurningFrog wrote:
| > _The fallacy being made in this argument is that computers
| need to perform tasks the same way as humans to achieve equal
| or better performance_
|
| Especially since I don't think we know that much about how
| human intelligence actually works.
| edot wrote:
| I believe it was Feynman who said something to the effect of
| "airplanes do not fly like birds do, but they fly much faster
| and can carry much more". So yes, we do not need to exactly
| replicate how humans do things in order to do human-like things
| in a useful manner. Planes do not flap their wings, but the jet
| engine (which is completely unnatural) does a great job of
| making things fly when paired with fixed wings of a certain
| shape.
| metanonsense wrote:
| In addition to that, the "system 2" abilities might already be
| there with "epi" strategies like chain-of-thought prompting.
| Talking / writing to yourself might not be the most efficient
| way to think but at least I do it often enough when pondering a
| problem.
| francois14 wrote:
| And what about "stochastic parrot" ?
| ffwd wrote:
| I think there is another aspect to human thinking other than
| system 1/system 2 though, which is the abstract world model
| humans have. system 1 / 2 is more like the process, while the
| world model is the actual data being 'processed'.
|
| And I think basically, humans have a much simplified, 'low
| dimensional' world model that consists of a set of objects (let's
| call them patterns), and then a "list" of essential properties
| that those objects have, that leads to a constraint on how each
| object can behave in the world model as a whole.
|
| And this is sort of hierarchical or at least, we can zoom in and
| out in detail depending on the level of knowledge we have about a
| particular pattern.
|
| So problem 1 is: It's not clear to me that text or any sort of
| data would contain all the necessary constraints so that any
| particular prompt would result in a world model that exactly
| takes into account the constrains of each object and 2) Even if
| was, I'm not sure the process of step by step thinking
| (system1/2) about each object and computing world states could
| occur in current architectures. This is especially important for
| computing a set of objects, then abstracting the result, then
| doing another round of computing with that result, or something
| like this.
|
| I'm not hard set on this but this is my current thinking.
| robbrown451 wrote:
| I agree with Hinton, although a lot hinges on your definition of
| "understand."
|
| I think to best wrap your head around this stuff, you should look
| to the commonalities of LLM's, image, generators, and even things
| like Alpha Zero and how it learned to play Go.
|
| Alpha Zero is kind of the extreme in terms of not imitating
| anything that humans have done. It learns to play the game simply
| by playing itself -- and what they found is that there isn't
| really a limit to how good it can get. There may be some
| theoretical limit of a "perfect" Go player, or maybe not, but it
| will continue to converge towards perfection by continuing to
| train. And it can go far beyond what the best human Go player can
| ever do. Even though very smart humans have spent their lifetimes
| deeply studying the game, and Alpha Zero had to learn everything
| from scratch.
|
| One other thing to take into consideration, is that to play the
| game of Go you can't just think of the next move. You have to
| think far forward in the game -- even though technically all it's
| doing is picking the next move, it is doing so using a model that
| has obviously looked forward more than just one move. And that
| model is obviously very sophisticated, and if you are going to
| say that it doesn't understand the game of Go, I would argue that
| you have a very, oddly restricted definition of the word,
| understand, and one that isn't particularly useful.
|
| Likewise, with large language models, while on the surface, they
| may be just predicting the next word one after another, to do so
| effectively they have to be planning ahead. As Hinton says, there
| is no real limit to how sophisticated they can get. When
| training, it is never going to be 100% accurate in predicting
| text it hasn't trained on, but it can continue to get closer and
| closer to 100% the more it trains. And the closer it gets, the
| more sophisticated model it needs. In the sense that Alpha Zero
| needs to "understand" the game of Go to play effectively, the
| large language model needs to understand "the world" to get
| better at predicting.
| jon_richards wrote:
| > to play the game of Go you can't just think of the next move.
| You have to think far forward in the game -- even though
| technically all it's doing is picking the next move, it is
| doing so using a model that has obviously looked forward more
| than just one move.
|
| While I imagine alpha go does some brute force and some tree
| exploration, I think the main "intelligent" component of alpha
| go is the ability to recognize a "good" game state from a "bad"
| game state based on _that moment in time_ , not any future
| plans or possibilities. That pattern recognition is all it has
| once its planning algorithm has reached the leaves of the
| trees. Correct me if I'm wrong, but I doubt alpha go has a
| neural net evaluating an entire tree of moves all at once to
| discover meta strategies like "the opponent focusing on this
| area" or "the opponent feeling on the back foot."
|
| You can therefore imagine a pattern recognition algorithm _so
| good_ that it is able to pick a move by only looking 1 move
| into the future, based solely on local stone densities and
| structures. Just play wherever improves the board state the
| most. It does not even need to "understand" that a game is
| being played.
|
| > while on the surface, they may be just predicting the next
| word one after another, to do so effectively they have to be
| planning ahead.
|
| So I don't think this statement is necessarily true.
| "Understanding" is a major achievement, but I don't think it
| requires planning. A computer can understand that 2+2=4 or
| where to play in tic-tac-toe without any "planning".
|
| That said, there's probably not much special about the concept
| of planning either. If it's just simulating a tree of future
| possibilities and pruning it based on evaluation, then many
| algorithms have already achieved that.
| Jensson wrote:
| A big difference between a game like Go and writing text is
| that text is single player. I can write out the entire text,
| look at it and see where I made mistakes on the whole and edit
| those. I can't go back in a game of Go and change one of my
| moves that turned out to be a mistake.
|
| So trying to make an AI that solves the entire problem before
| writing the first letter will likely not result in a good
| solution while also making it compute way too much since it
| solves the entire problem for every token generated. That is
| the kind of AI we know how to train so for now that is what we
| have to live with, but it isn't the kind of AI that would be
| efficient or smart.
| bytefactory wrote:
| This doesn't seem like a major difference, since LLMs are
| also choosing from a probability distribution of tokens for
| the most likely one, which is why they respond a token at a
| time. They can't "write out' the entire text at a time, which
| is why fascinating methods like "think step by step" work at
| all.
| Jensson wrote:
| But it can't improve its answer after it has written it,
| that is a major limitation. When a human writes an article
| or response or solution, that is likely not the first thing
| the human thought of, instead they write something down and
| works on it until it is tight and neat and communicates
| just what the human wants to communicate.
|
| Such answers will be very hard for an LLM to find, instead
| you mostly get very verbose messages since that is how our
| current LLM thinks.
| bytefactory wrote:
| Completely agree. The System 1/System 2 distinction seems
| relevant here. As powerful as transformers are with just
| next-token generation and context, which can be hacked to
| form a sort of short-term memory, some time of real-time
| learning + long-term memory storage seems like an
| important research direction.
| klodolph wrote:
| > As Hinton says, there is no real limit to how sophisticated
| they can get.
|
| There's no limit to how sophisticated a model can get, but,
|
| 1. That's a property shared with many architectures, and not
| really that interesting,
|
| 2. There _are_ limits to the specific ways that we train
| models,
|
| 3. We care about the relative improvement that these models
| deliver, for a given investment of time and money.
|
| From a mathematical perspective, you can just kind of keep
| multiplying the size of your model, and you can prove that it
| can represent arbitrary complicated structures (like, internal
| mental models of the world). That doesn't mean that your
| training methods will produce those complicated structures.
|
| With Go, I can see how the model itself can be used to generate
| new, useful training data. How such a technique could be
| applied to LLMs is less clear, and its benefits are more
| dubious.
| lsy wrote:
| The difference is that "the world" is not exhaustible in the
| same way as Go is. While it's surely true that the number of
| possible overall Go game states is extremely large, the game
| itself is trivially representable as a set of legal moves and
| rules. The "world model" of the Go board is actually just
| already exhaustive and finite, and the computer's work in
| playing against itself is to generate more varied data within
| that model rather than to develop that model itself. We know
| that when Alpha Zero plays a game against itself it is valuable
| data because it is a legitimate game which most likely
| represents a new situation it hasn't seen before and thus
| expands its capacity.
|
| For an LLM, this is not even close to being the case. The sum
| of all human artifacts ever made (or yet to be made) doesn't
| exhaust the description of a rock in your front yard, let alone
| the world in all its varied possibility. And we certainly
| haven't figured out a "model" which would let a computer
| generate new and valid data that expands its understanding of
| the world beyond its inputs, so self-training is a non-starter
| for LLMs. What the LLM is "understanding", and what it is
| reinforced to "understand" is not the world but the format of
| texts, and while it may get very good at understanding the
| format of texts, that isn't equivalent to an understanding of
| the world.
| og_kalu wrote:
| >The sum of all human artifacts ever made (or yet to be made)
| doesn't exhaust the description of a rock in your front yard,
| let alone the world in all its varied possibility.
|
| No human or creature we know of has a "true" world model so
| this is irrelevant. You don't experience the "real world".
| You experience a tiny slice of it, a few senses that is
| further slimmed down and even fabricated at parts.
|
| To the bird who can intuitively sense and use electromagnetic
| waves for motion and guidance, your model of the world is
| fundamentally incomplete.
|
| There is a projection of the world in text. Moreover training
| on additional modalities is trivial for a transformer. That's
| all that matters.
| lsy wrote:
| That's the difference though. I _know_ my world model is
| fundamentally incomplete. Even more foundationally, I know
| that there _is_ a world, and when my world model and the
| world disagree, the world wins. To a neural network there
| is no distinction. The closest the entire dynamic comes is
| the very basic annotation of RLHF which itself is done by
| an external human who is providing the value judgment, but
| even that is absent once training is over.
|
| Despite not having the bird's sense for electromagnetic
| waves, I have an understanding that they are there, because
| humans saw behavior they couldn't describe and
| investigated, in a back-and-forth with a world that has
| some capacity to disprove hypotheses.
|
| Additional modalities are really just reducible to more
| kinds of text. That still doesn't exhaust the world, and
| unless a machine has some ability to integrate new data in
| real time alongside a meaningful commitment and
| accountability to the world as a world, it won't be able to
| cope with the real world in a way that would constitute
| genuine intelligence.
| og_kalu wrote:
| >I know my world model is fundamentally incomplete. Even
| more foundationally, I know that there is a world, and
| when my world model and the world disagree, the world
| wins.
|
| Yeah this isn't really true. There's not how humans work.
| For a variety of reasons, Plenty stick with their
| incorrect model despite the world indicating otherwise.
| In fact, this seems to be normal enough human behaviour.
| Everyone does it, for something or the other. You are no
| exception.
|
| And yes LLMs can in fact tell truth from fiction.
|
| GPT-4 logits calibration pre RLHF -
| https://imgur.com/a/3gYel9r
|
| Just Ask for Calibration: Strategies for Eliciting
| Calibrated Confidence Scores from Language Models Fine-
| Tuned with Human Feedback -
| https://arxiv.org/abs/2305.14975
|
| Teaching Models to Express Their Uncertainty in Words -
| https://arxiv.org/abs/2205.14334
|
| Language Models (Mostly) Know What They Know -
| https://arxiv.org/abs/2207.05221
|
| The Geometry of Truth: Emergent Linear Structure in Large
| Language Model Representations of True/False Datasets -
| https://arxiv.org/abs/2310.06824
|
| Your argument seems to boil down to "they can't perform
| experiments" but that isn't true either.
| astrange wrote:
| There is no such thing as a world model, and you don't
| have one of them. This is a leftover bad psychological
| concept from the 70s AI researchers who never got
| anywhere. People and other creatures do very little
| modeling things, they mostly just do stuff.
| tazjin wrote:
| The world on our plane of existence absolutely is
| exhaustible, just on a much, much larger scale. Doesn't mean
| that the process is fundamentally different, and for the
| human perspective there might be diminishing returns.
| huytersd wrote:
| Next word generation is one way to put it. The key point here
| is we have no idea what's happening in the black box that is
| the neural network. It could be forming very strong connections
| between concepts in there with multi tiered abstractions.
| SkiFire13 wrote:
| > One other thing to take into consideration, is that to play
| the game of Go you can't just think of the next move. You have
| to think far forward in the game -- even though technically all
| it's doing is picking the next move, it is doing so using a
| model that has obviously looked forward more than just one
| move.
|
| It doesn't necessarily have to look ahead. Since Go is a
| deterministic game there is always a best move (or moves that
| are better than others) and hence a function that goes from the
| state of the game to the best move. We just don't have a way to
| compute this function, but it exists. And that function doesn't
| need the concept of lookahead, that's just an intuitive way of
| how could find some of its values. Likewise ML algorithms don't
| necessarily need lookahead, they can just try to approximate
| that function with enough precision by exploiting patterns in
| it. And that's why we can still craft puzzles that some AIs
| can't solve but humans can, by exploiting edge cases in that
| function that the ML algorithm didn't notice but are solvable
| with understanding of the game.
|
| The thing is though, does this really matter if eventually we
| won't be able to notice the difference?
| bytefactory wrote:
| > It doesn't necessarily have to look ahead. Since Go is a
| deterministic game there is always a best move
|
| Is there really a difference between the two? If a certain
| move shapes the opponent's remaining possible moves into a
| smaller subset, hasn't AlphaGo "looked ahead"? In other
| words, when humans strategize and predict what happens in the
| real world, aren't they doing the same thing?
|
| I suppose you could argue that humans also include additional
| world models in their planning, but it's not clear to me that
| these models are missing and impossible for machine learning
| models to generate during training.
| Joel_Mckay wrote:
| Well structured nonsense is indistinguishable from the assumption
| of sentience for the undisciplined. This means a 95% LLM
| generated article is indistinguishable from an illogical
| contradictory chaotic rant.
|
| Best of luck, and remember to tip your bot on the way out =)
| breadwinner wrote:
| There is evidence that the human brain is also doing
| "autocomplete" (prediction). The human brain uses predictive
| mechanisms when processing language, and these mechanisms play an
| important role in forming thoughts.
|
| When we hear or read a word, our brain quickly generates a set of
| predictions about what word might come next, based on the context
| of the sentence and our past experiences with language. These
| predictions are constantly updated as we receive new information,
| and they help us to process language more efficiently and
| accurately.
|
| In addition, research has shown that the brain engages in similar
| predictive processes when we are forming thoughts or planning
| actions. For example, when we plan a complex movement, such as
| reaching for a cup, our brain generates a set of predictions
| about the movements required to complete the action. These
| predictions are constantly updated as we receive feedback from
| our muscles and our environment, allowing us to make adjustments
| and achieve our goal.
|
| See links below for additional details:
|
| https://www.earth.com/news/our-brains-are-constantly-working...
|
| https://www.psycholinguistics.com/gerry_altmann/research/pap...
|
| https://www.tandfonline.com/doi/pdf/10.1080/23273798.2020.18...
|
| https://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2009...
| Probiotic6081 wrote:
| And curiously, those predictions that are made during language
| comprehension are made by the language production system
| itself!
| lsy wrote:
| This is of course sometimes true, we take shortcuts to minimize
| cognitive effort. However, when the situation warrants it we
| think more carefully about which words to use to achieve goals,
| or to reach correspondence with the situation at hand. Or we
| move more precisely and carefully to do something we haven't
| done before. I've no doubt that an LLM can approximate whatever
| model of language a person has from their life experience, but
| I don't think this type of model is capable of active coping,
| making judgments, or of having accountability to the world it's
| meant to operate in.
| lacrimacida wrote:
| > When we hear or read a word, our brain quickly generates a
| set of predictions about what word might come next, based on
| the context of the sentence
|
| Yes a big part of it is prediction but the brain also does
| something else which LLMs by themselves completely eschew. The
| human brain imagines in pictures, used abstractions, studies
| things and produces new knowledge.
| Eggpants wrote:
| As long as it's just returning the tokens of the statistical mean
| of previous tokens, it is just a clever autocomplete.
|
| A somewhat useful internet search engine without all the ads/seo
| garbage. Of course, the first rule of the internet is don't
| believe everything on the internet.
|
| I believe AI won't overcome its statistic mask until it can self
| tune its coefficients in real time. That requires an error
| function not yet invented that can mimic animals pain feedback
| error function.
|
| Baby steps can be taken with attempting to run GPT generated code
| then adjusting coefficients based on the returned errors. Aka
| compiler and unit test failures are basic "pain" functions, which
| is pretty much how humans learn to code.
| esafak wrote:
| Autocompletion is a red herring of a dismissal. Once you have a
| general purpose model, you can do many things with it, including
| next token prediction. And in that respect we are no different.
| eviks wrote:
| > Suppose you want to be really good at predicting the next word.
| If you want to be really good, you have to understand what's
| being said. That's the only way. So by training something to be
| really good at predicting the next word, you're actually forcing
| it to understand.
|
| Ok, so in this framing you'll just get stuck below the "really
| good" level because the training isn't based on understanding
| Falkon1313 wrote:
| I think that insight is an important feature that GPT doesn't
| seem to have, at least not yet.
|
| For instance, I've seen people saying they can ask it for help
| with how to code something and it will help them. Although the
| results aren't perfect, they can be helpful.
|
| However, I recall years ago asking a more senior developer how to
| do something. They could've just told me how, but instead they
| asked why I was doing that. Then they told me not to bother with
| writing code at all, and to instead do this other, much simpler
| thing which would have the same effect at solving the client's
| request. ChatGPT wouldn't have had that insight.
|
| Years later, a junior dev came to me asking for help with his
| code because it wasn't working. We went over the code line by
| line and he explained what it was trying to do and it all looked
| good. But when he ran it, the web server crashed. I told him to
| try adding a comment: // Do not remove this
| comment.
|
| Sure enough, it worked perfectly and the server no longer
| crashed. Why? Because I realized that if his code was wrong he
| should get an error message, not crash the server. But sometime
| back I had read about a bug in a version of the interpreter that
| would crash when parsing a file that was an exact multiple of
| 4096 bytes. Would chatGPT have thought of that? Probably not. It
| would've just talked about the code.
|
| Which is not to say that it's useless. But it lacks greater
| context and the insight to go beyond the parameters of the
| question on its own initiative. Then again, so do humans most of
| the time.
| bytefactory wrote:
| > I think that insight is an important feature that GPT doesn't
| seem to have, at least not yet.
|
| I actually think this is a limitation of the RLHF that GPT has
| been put through. With open-ended questions, I've seen GPT4
| come up with reasonable alternatives instead of just answering
| the question I've asked. This is often seen as the infamous,
| "however, please consider..." bits that it tacks on, which
| occasionally do consider actual insights into the problem I'm
| trying to solve.
|
| In most cases it seems to try very hard to mold the answer into
| what I want to hear, which in many cases isn't necessarily the
| best answer. A more powerful version of GPT with a less-
| restrictive RLHF seems like it would be more open to suggesting
| novel solutions, although this is just my speculation.
| Animats wrote:
| The big insight we've obtained from LLMs is how much of human
| discourse is glorified autocomplete. Humans are usually not very
| profound.
| lacrimacida wrote:
| But some are. Could we have that as AI? Would LLMs ever have
| it?
| 1vuio0pswjnm7 wrote:
| Here is a question: What is the practical significance of viewing
| "AI" as autocomplete versus some other idea. Why try to influence
| how others view using a computer. Why anthromorphise. These are
| questions for which I have answers, but of course they are
| personal opinions. Historically, programmers often like to refer
| to programming as "magic". But magic is illusion,
| entertaintainment, tricks. Believing in "magic" is a personal
| choice.
|
| Why not describe things in terms of what they do instead of what
| they "are". The latter is highly subjective and open to abuse.
___________________________________________________________________
(page generated 2023-11-18 23:01 UTC)