[HN Gopher] Quiet-STaR: Language Models Can Teach Themselves to ...
___________________________________________________________________
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Author : hackerlight
Score : 236 points
Date : 2024-03-15 09:24 UTC (13 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| roschdal wrote:
| Next Language models teaching themselves to think, then kill
| humans, based on crawled russian website with secret AI
| instructions.
| raidicy wrote:
| Although this is obviously satirical hyperbole dataset
| poisoning is real and will be underappreciated until the 1st
| catastrophic example of it happening occurs.
| DFHippie wrote:
| Many years ago now I wrote my kids a very simple chatbot to
| play with. You'd type in a phrase. It would tokenize it,
| adding start and stop tokens, then update it's token
| transition probabilities, using the two preceding tokens to
| pick the next one. It would then generate a response from
| these probabilities.
|
| The data poisoning began immediately. Because "poop" was such
| a funny word, they quickly taught it that the most probable
| token after any bigram was "poop".
|
| No humans were killed, but two small kids were amused for an
| hour or so.
| raidicy wrote:
| My condolences for your models' poisoning. It sounds like a
| real crappy way to go :?
| dcrimp wrote:
| I had this thought the other day that the whole chain of thought
| reasoning pattern contributing to improved performance in LLM-
| based systems seems to sit parallel to Kahneman's two-system
| model of the mind that he covers in 'Thinking, Fast and Slow'.
|
| Haven't read it in a few years, but I recall the book suggests
| that we use one 'System 1' in our brains primarily for low-
| effort, low computation thinking - like 1+1=? or "the sky is
| ____".
|
| It then suggests that we use a 'System 2' for deliberate,
| conscious, high-cognitive tasks. Dense multiplication, reasoning
| problems, working with tools - generally just decision-making.
| Anything that requires focus or brain power. Our brain escalates
| tasks from S1 to S2 if they feel complex or dangerous.
|
| Maybe I'm being too cute, but it feels like critique that "LLMs
| aren't intelligent because they are stochastic parrots" is an
| observation that they are only equipped to use their 'System 1'.
|
| When we prompt an LLM to think step-by-step, we allow it a
| workspace to write down it's thoughts which it can then consider
| in it's next token prediction, a rudimentary System 2, like a
| deliberation sandbox.
|
| We do a similar thing when we engage our System 2 - we hold a
| diorama of the world in the front of our mind, where we simulate
| what the environment will do if we proceed with a given action -
| what our friend might respond to what we say, how the sheet steel
| might bend to a force, how the code might break, how the tyres
| might grip. And we use that simulation to explore a tree of
| possibilities and decide an action that rewards us the most.
|
| I'm no expert, but this paper seems to recognise a similar
| framework to the above. Perhaps a recurrent
| deliberation/simulation mechanism will make it's way into models
| in the future, especially the action models we are seeing in
| robotics.
| OJFord wrote:
| I'm currently reading it for the first time, completely
| coincidentally/not for this reason, and on a few occasions I've
| thought 'Gosh that's just like' or 'analogous to' or 'brilliant
| description of that problem' for LLMs/generative AI or some
| aspect of it. I wish I could recall some examples.
| machiaweliczny wrote:
| It's a bit over my head for now but seems like GFlowNets are
| tackling this problem a bit.
| dcrimp wrote:
| interesting, hadn't come across these. Will be doing some
| more reading up on them.
| dougmwne wrote:
| I had the same thought from Thinking, Fast and Slow.
|
| Another variation of this seems to be the "thought loop" that
| agents such as Devin and AutoGPT use.
| mistermann wrote:
| https://en.m.wikipedia.org/wiki/OODA_loop
| biosed wrote:
| Wasn't most of the claims in that book refuted, some even by
| the author. I really enjoyed it and found some great insights
| only to be later told by a friend in that sphere that the book
| was not correct and even the author had "retracted" some of the
| assertions.
| jerpint wrote:
| He won a Nobel prize for his works so not sure how much of it
| would be refuted
| gryn wrote:
| One quick google search and you can find multiple links for
| that, including some that were posted here. wasn't proven
| to be false but that the evidence used was not much of
| evidence either.
|
| here the first one in my results:
|
| https://retractionwatch.com/2017/02/20/placed-much-faith-
| und...
| mistermann wrote:
| As luck would have it, a System 1 vs System 2 scenario
| falls into our laps.
| mannykannot wrote:
| It might still be a useful concept in developing LLMs.
| toisanji wrote:
| that is the approach also taken in this paper for building LLM
| agents with metacognition: https://replicantlife.com/
| HarHarVeryFunny wrote:
| > it feels like critique that "LLMs aren't intelligent because
| they are stochastic parrots" is an observation that they are
| only equipped to use their 'System 1'.
|
| I wouldn't say LLMs aren't intelligent (at all) since they are
| based on prediction which I believe is the ability that we
| recognize as intelligence. Prediction is what our cortex has
| evolved to do.
|
| Still, intelligence isn't an all or nothing ability - it exists
| on a spectrum (and not just an IQ score spectrum). My
| definition of intelligence is "degree of ability to correctly
| predict future outcomes based on past experience", so it
| depends on the mechanisms the system (biological or artificial)
| has available to recognize and predict patterns.
|
| Intelligence also depends on experience, minimally to the
| extent that you can't recognize (and hence predict) what you
| don't have experience with, although our vocabulary for talking
| about this might be better if we distinguished predictive
| ability from experience rather than bundling them together as
| "intelligence".
|
| If we compare the predictive machinery of LLMs vs our brain,
| there is obviously quite a lot missing. Certainly "thinking
| before speaking" (vs LLM fixed # steps) is part of that, and
| this Q* approach and tree-of-thoughts will help towards that.
| Maybe some other missing pieces such as thalamo-cortical loop
| (iteration) can be retrofitted to LLM/transformer approach too,
| but I think the critical piece missing for human-level
| capability is online learning - the ability to act then see the
| results of your action and learn from that.
|
| We can build a "book smart" AGI (you can't learn what you
| haven't been exposed to, so maybe unfair to withhold the label
| "AGI" just because of that) based on current approach, but the
| only way to learn a skill is by practicing it and
| experimenting. You can't learn to be a developer, or anything
| else, just by reading a book or analyzing what other people
| have produced - you need to understand the real world results
| of your _own_ predictions /actions, and learn from that.
| RandomLensman wrote:
| Defining intelligence as prediction leaves out a lot of other
| things that humans would see as intelligence in other humans
| (e.g., creating a novel), also quite simple organisms make
| predictions (e.g., a predator jumping at prey makes a
| prediction about positions).
| HarHarVeryFunny wrote:
| Maybe a better way to say it rather than "intelligence is
| prediction" is that prediction is what supports the
| behaviors we see as intelligent. For example, prediction is
| the basis of what-if planning (multi-step prediction),
| prediction (as LLMs have proved) is the basis of leaning
| and using language, prediction is the basis of modelling
| other people and their actions, etc. So, ultimately the
| ability to write a novel, is a result of prediction.
|
| Yes, an insect (a praying mantis, perhaps) catching another
| is exhibiting some degree of prediction, and per my
| definition I'd say is exhibiting some (smallish) degree of
| intelligence in doing so, regardless of this presumably
| being a hard-coded behavior. Prediction becomes more and
| more useful the better you are at it, from avoiding
| predators, to predicting where the food is, etc, so this
| would appear to be the selection pressure that has evolved
| our cortex to be a very powerful prediction machine.
| RandomLensman wrote:
| The ability to write a novel is different from actually
| writing a novel. If prediction forms the basis of (at
| least some forms of) intelligence, intelligence itself is
| more than prediction.
| HarHarVeryFunny wrote:
| That's why I say our vocabulary for talking about these
| things leaves something to be desired - the way we use
| the word "intelligence" combines both raw/potential
| ability to do something (prediction), and the experience
| we have that allows that ability to be utilized. The only
| way you are going to learn to actually write a novel is
| by a lot of reading and writing and learning how to write
| something that provides the experience you hope it to
| have.
| RandomLensman wrote:
| Kind of agree. I think, though, trying to shoe-horn
| intelligence into some evolutionary concepts is tricky
| because it is easy stack hypotheses there.
| coldtea wrote:
| > _The ability to write a novel is different from
| actually writing a novel_
|
| In what way, except as in begging the question?
| RandomLensman wrote:
| Which LLM will on its own go and write a novel? Also,
| even for humans, just because you technically know how to
| write a novel, you might fail at it.
| coldtea wrote:
| > _Which LLM will on its own go and write a novel?_
|
| Which human will?
|
| We get prompts all the time, it's called sensory input.
|
| Instead of "write a noval" it's more like information
| about literature, life experience, that partner who broke
| our heart and triggered our writing this personal novel,
| and so on.
| RandomLensman wrote:
| Some people write novels, some don't. Why some people do
| so we sometimes know, sometimes we don't (maybe they
| flipped a coin to decide). Some start to write but fail
| to finish.
|
| You have to believe that humans have no free will in a
| certain way to have them be like an LLM, i.e, every
| action is externally driven and determined.
| coldtea wrote:
| > _You have to believe that humans have no free will in a
| certain way to have them be like an LLM, i.e, every
| action is externally driven and determined._
|
| Free will doesn't have much meaning. If I dont base my
| action at time t, on their development based on inputs on
| times before t, what would I base it on?
|
| It would be random?
|
| Or would there be a small thinking presense inside me
| that gets information about my current situation and
| decides "impartially", able to decide in whatever
| direction, because it wasn't itself entirely determined
| by my experiences thus far?
| RandomLensman wrote:
| Randomness is certainly an option. Ignoring information
| is an option.
| spookie wrote:
| I think you're confusing prediction with ratiocination.
|
| I'm sure you've deducted hypothesis' based solely on the
| assertion that "contradiction and being are
| incompatible". Note, there wasn't prediction involved on
| that process.
|
| I consider prediction as a subset of reason, but not the
| contrary. Therefore, I beg to differ on the whole
| assumption that "intelligence is prediction". It's more
| than that, prediction is but a subset of that.
|
| This is perhaps the biggest reason for the high
| computational costs of LLM's, because they aren't taking
| the shortcuts necessary to achieve true intelligence,
| whatever that is.
| HarHarVeryFunny wrote:
| > I think you're confusing prediction with ratiocination.
|
| No, exactly not! Prediction is probabalistic and liable
| to be wrong, with those probabilities needing
| updating/refining.
|
| Note that I'm primarily talking about prediction as the
| brain does it - not about LLMs, although LLMs have proved
| the power of prediction as a (the?) learning mechanism
| for language. Note though that the words predicted by
| LLMs are also just probabilities. These probabilities are
| sampled from (per a selected sampling "temperature" -
| degree of randomness) to pick which word to actually
| output.
|
| The way the brain learns, from a starting point of
| knowing nothing, is to observe and predict that the same
| will happen next time, which it often will, once you've
| learnt what observations are appropriate to include or
| exclude from that prediction. This is all highly
| probabalistic, which is appropriate given that the thing
| being predicted (what'll happen if I throw a rock at that
| tiger?) is often semi-random in nature.
|
| We can better rephrase "intelligence is ability to
| predict well", as "intelligence derives from ability to
| predict well". It does of course also depend on
| experience.
|
| One reason why LLMs are so expensive to train is because
| they learn in an extremely brute force fashion from the
| highly redundant and repetitive output of others. Humans
| don't do that - if we're trying to learn something, or
| curious about it, we'll do focused experiments such as
| "Let's see what happens if I do this, since I don't
| already know", or "If I'm understanding this right, then
| if I do X then Y should happen".
| jimbokun wrote:
| LLMs have shown that writing a novel can be accomplished as
| an application of prediction, at least to a certain level
| of quality.
| RandomLensman wrote:
| I have yet to see an LLM write a novel on its volition.
| coldtea wrote:
| > _Defining intelligence as prediction leaves out a lot of
| other things that humans would see as intelligence in other
| humans (e.g., creating a novel)_
|
| Would it?
|
| Why would "creating a novel" by a human not itself be text
| generation based on prediction on what are the next good
| choices (of themes, words, etc) based on a training data
| set of lived experience stream and reading other
| literature?
| RandomLensman wrote:
| What is the human predicting there? Why would it need to
| be a prediction task at all? How about a dada-ist poem?
| Made-up words and syntax? If it is prediction but the
| criterion for "what is a good next choice" can totally be
| made up on the fly - what does the word "prediction" even
| mean?
| coldtea wrote:
| > _What is the human predicting there?_
|
| Their next action - word put on page, and so on.
|
| > _Why would it need to be a prediction task at all?_
|
| What else would it be?
|
| Note that prediction in LLM terminology doesn't mean
| "what is going to happen in the future" like Nostradamus.
| It means "what is a good next word given the input I was
| given and the words I've answered so far".
|
| > _How about a dada-ist poem? Made-up words and syntax?_
|
| How about it? People have their training (sensory input,
| stuff they're read, school, discussions) and sit to
| predict (come up with, based on what they know) a made-up
| word and then another.
| RandomLensman wrote:
| That is a meaningless definition of prediction if "what
| is a good next word" has an ever changing definition in
| humans (as everything would fulfill that definition).
| coldtea wrote:
| That's the very definition of production in an LLM.
|
| What does "has an ever changing definition" mean?
|
| And why "everything would fulfill that definition"?
|
| At any time whats the "good next word" is based on the
| state created by our inputs thus far (including
| chemical/physiological state, like decaying memories, and
| so on). And not only not "everything fullfil it", but it
| can be only a single specific word.
|
| (Same as if we include the random seed among an LLM
| output: we get the same results given the same training
| and same prompt).
| RandomLensman wrote:
| "it can be only a single specific word" - that is
| incorrect as a human can change the process to generate
| the next word, up to and including, using a random
| process to create or select the next word (i.e., any word
| would be fine).
|
| You could say the process chosen is somehow predetermined
| (even if the choices then are all made by using
| randomness), but then really the word "prediction" has
| very little meaning as the criteria to what is a "good
| next word" have a nearly unlimited and ever changing
| range as the generating process changes.
| duskwuff wrote:
| > Why would "creating a novel" by a human not itself be
| text generation based on prediction on what are the next
| good choices (of themes, words, etc) based on a training
| data set of lived experience stream and reading other
| literature?
|
| Unless you're Stephen King on a cocaine bender, you don't
| typically write a novel in a single pass from start to
| finish. Most authors plan things out, at least to some
| degree, and go back to edit and rewrite parts of their
| work before calling it finished.
| hackerlight wrote:
| > online learning - the ability to act then see the results
| of your action and learn from that.
|
| I don't think that should be necessary, if you are talking
| about weight updates. Offline batch mode Q-learning achieves
| the same thing.
|
| By online learning, did you mean working memory? I'd agree
| with that. Whether it's RAG, ultra-long-context, and LSTM-
| like approach, or something else, is TBD.
| HarHarVeryFunny wrote:
| By online learning I mean incremental real-time learning
| (as opposed to pre-training), such that you can predict
| something (e.g. what some external entity is going to do
| next, or the results of some action you are about to take),
| then receive the sensory feedback of what actually
| happened, and use that feedback to improve your predictions
| for next time.
|
| I don't think there is any substitute for a predict-act-
| learn loop here - you don't want to predict what someone
| else has done (which is essentially what LLMs learn from a
| training set), you want to learn how your OWN predictions
| are wrong, and how to update them.
| exe34 wrote:
| > By online learning I mean incremental real-time
| learning, such that you can predict something (e.g. what
| some external entity is going to do next, or the results
| of some action you are about to take),
|
| I used to believe this, but the recent era of LLMs has
| changed my mind. It's clear that the two things are not
| related: you don't need to update weights in real-time if
| you can hold context another way (attention) while
| predicting the next token.
|
| The fact that we appear to remember things with one-shot,
| online training might be an illusion. It appears that we
| don't immediately update the weights (long term memory),
| but we store memories in short term memory first (e.g.
| https://www.scientificamerican.com/article/experts-short-
| ter...).
| HarHarVeryFunny wrote:
| The fundamental difference is that humans do learn,
| permanently (eventually at least), from prediction
| feedback, however this works. I'm not convinced that STM
| is necessarily involved in this particular learning
| process (maybe just for episodic memories?), but it makes
| no difference - we do learn from the feedback.
|
| An LLM can perform one-shot in-context learning, which in
| conversational mode will include (up to context limit)
| feedback from it's actions (output), but this is never
| learned permanently.
|
| The problem with LLMs not permanently learning from the
| feedback to their own actions is that it means they will
| never learn new skills - they are doomed to only learn
| what they were pre-trained with, which isn't going to
| include the skills of any specific job unless that
| specific on-the-job experience of when to do something,
| or avoid doing it, were made a part of it. The training
| data for this does not exist - it's not the millions of
| lines of code on GitHub or the bug fixes/solutions
| suggested on Stack Overflow - what would be needed would
| be the inner thoughts (predictions) of developers as they
| tackled a variety of tasks and were presented with
| various outcomes (feedback) continuously throughout the
| software development cycle (or equivalent for any other
| job/skill one might want them to acquire).
|
| It's hard to see how OpenAI or anyone else could provide
| this on-the-job training to an LLM even if they let it
| loose in a programming playground where it could generate
| the training dataset. How fast would the context fill
| with compiler/link errors, debugger output, program
| output etc ... once context was full you'd have to pre-
| train on that (very slow - months, expensive) before it
| could build on that experience. Days of human experience
| would take years to acquire. Maybe they could train it to
| write crud apps or some other low-hanging fruit, but it's
| hard to see this ever becoming the general purpose "AI
| programmer" some people think is around the corner. The
| programming challenges of any specialized domain or task
| would require training for that domain - it just doesn't
| scale. You really need each individual deployed instance
| of an LLM/AI to be able to learn itself - continuously
| and incrementally - to get the on-the-job training for
| any given use.
| exe34 wrote:
| > but this is never learned permanently.
|
| Are you sure? I think "Open"AI uses the chat transcripts
| to help the next training run?
|
| > they are doomed to only learn what they were pre-
| trained with
|
| Fine-tuning.
|
| > The training data for this does not exist
|
| What does "this" refer to? Have you read the Voyager
| paper? (https://arxiv.org/abs/2305.16291) Any lesson
| learnt in the library could be used for fine-tuning or
| the next training run for a base model.
|
| > what would be needed would be the inner thoughts
| (predictions) of developers as they tackled a variety of
| tasks and were presented with various outcomes (feedback)
| continuously throughout the software development cycle
|
| Co-pilot gets to watch people figure stuff out - there's
| no reason that couldn't be used for the next version. Not
| only does it not need to read minds, but people go out of
| their way to write comments or chat messages to tell it
| what they think is going on and how to improve its code.
|
| > Days of human experience would take years to acquire
|
| And once learnt, that skill will never age, never get
| bored, never take annual leave, never go to the kids'
| football games, never die. It can be replicated as many
| millions of time as necessary.
|
| > they could train it to write crud apps
|
| To be fair, a lot of computer code is crud apps. But
| instead of learning it in one language, now it can do it
| in every language that existed on stackoverflow the day
| before its training run.
| iteygib wrote:
| To me, it is one of those things like defining what 'art' is,
| as in creating a model in our heads around a concept. We take
| our definitions and then use those to construct models like
| AI that simulate our model well enough.
|
| In other words, I personally do not believe any system we
| develop will be truly 'intelligent', since intelligence is a
| concept we created to help explain ourselves. We can't even
| truly define it, but yet we try to test technologies we
| develop to see if they possess it. It is a bit non sensical
| to me.
| HarHarVeryFunny wrote:
| Sure, we created the word intelligence to help describe
| ourselves, and our differing levels of ability, as well as
| applying it to animals such as apes or dogs that we see
| seem to possess some similar abilities.
|
| However, if we want to understand where this rather
| nebulous ability/quality of "intelligence" comes from, the
| obvious place to look is our cortex, which it turns out
| actually has rather simple architecture! If uncrumpled our
| cortex would be a thin sheet about the size of a tea towel,
| and consists of six layers of neurons of different types,
| with a specific pattern of connectivity, and including
| massive amounts of feedback. We can understand this
| architecture to be a prediction machine, which makes sense
| from an evolutionary point of view. Prediction is what lets
| you act according to what will happen in the future as
| opposed to being stuck in the present reacting to what is
| happening right now.
|
| Now, if we analyze what capabilities arise from an ability
| to predict, such as multi-step what-if planning (multi-step
| prediction), ability to learn and use language (as proven
| by LLMs - a predict-next-word architecture), etc, etc, it
| does appear (to me at least!) that this predictive function
| of the cortex is behind all the abilities that we consider
| as "intelligence".
|
| For sure there is very little agreement on a definition of
| intelligence, but I have offered here a very concrete
| definition "degree of ability to predict future outcomes
| based on past experience" that I think gets to the core of
| it.
|
| Part of the problem people have in agreeing on a definition
| of intelligence is that this word arose from self-
| observation as you suggest, and is more a matter of "i know
| it when i see it" rather than having any better defined
| meaning. For technical discussion of AI/AGI and brain
| architecture we really need a rigorously defined
| vocabulary, and might be better off avoiding such a poorly
| defined concept in the first place, but it seems we are
| stuck with it since the word is so entrenched and people
| increasingly want to compare machines to ourselves and
| judge whether they too have this quality.
|
| Of course we can test for intelligence, in ourselves as
| well as machines, by using things like IQ tests to see the
| degree to which we/they can do the things we regard as
| intelligent (we'd really need a much deeper set of tests
| than a standard IQ test to do a good job of assessing
| this), but the utility of understanding what is actually
| behind intelligence (prediction!) is that this allows us to
| purposefully design machines that have this property, and
| to increasing degrees of capability (via more powerful
| predictive architectures).
| airstrike wrote:
| I'll preface this by saying I know this may sound entirely made
| up, unscientific, anecdotal, naive, or adolescent even, but
| luckily nobody has to believe me...
|
| A few weeks back I was in that limbo state where you're neither
| fully awake nor fully asleep and for some reason I got into a
| cycle where I could _notice_ my fast-thinking brain spitting
| out words /concepts in what felt like the speed of light before
| my slow-thinking brain would take those and turn them into
| actual sentences
|
| It was like I was seeing my chain of thought as a list of
| _ideas_ that was filled impossibly fast before it got
| summarized into a proper "thought" as a carefully selected
| list of _words_
|
| I have since believed, as others have suggested in much more
| cogent arguments before me, that what we perceive as our
| thoughts are, indeed, a curated output of the brainstormy
| process that immediately precedes it
| mirror_neuron wrote:
| It's hard (impossible?) to know if we're talking about the
| same thing or not, but I experience something like this all
| the time, without being on the edge of sleep. We might both
| be wrong, but it's relatable!
| dicroce wrote:
| This is fascinating. I had another experience that I think
| sheds light on some of this. One day I was in my office and
| the lights were off. I turned around and looked at the dark
| shape on top of my coworkers desk. For a few seconds I stared
| blankly and then suddenly I had a thought: PC, it's his PC.
| Then I started to think about that period of time just before
| I realized what I was looking at... The only word I can
| describe what it felt like is: unconscious. Is it possible
| that consciousness is just a stream of recognition?
| Swizec wrote:
| > I got into a cycle where I could notice my fast-thinking
| brain spitting out words/concepts in what felt like the speed
| of light before my slow-thinking brain would take those and
| turn them into actual sentences
|
| The way I've seen this described by psychologists is that
| System 1 is driving the car while System 2 panicks in the
| back seat screaming out explanations for every action and
| shouting directions to the driver so it can feel in control.
| The driver may listen to those directions, but there's no
| direct link between System 2 in the backseat and System 1
| holding the wheel.
|
| Various experiments have shown that in many situations our
| actions come first and our conscious
| understanding/explanation of those actions comes second.
| Easiest observed in people with split brain operations. The
| wordy brain always thinks it's in control even when we know
| for a fact it couldn't possibly have been because the link
| has been surgically severed.
|
| Being super tired, on the edge of sleep, or on drugs can
| disrupt these links enough to let you observe this directly.
| It's pretty wild when it happens.
|
| Another easy way, for me, is to get up on stage and give a
| talk. Your mouth runs away presenting things and you're in
| the back of your head going "Oh shit no that's going in the
| wrong direction and won't make the right point, adjust
| course!"
| devinprater wrote:
| Oh, yes, that's what I do! I act first, and then consider
| the action.
| nuancebydefault wrote:
| Sometimes when I am in a Teams call, I observe myself
| talking. I know for myself that I can get carried away
| whilst talking and that time passes faster then. My
| conscious self sometimes needs to interrupt my talky self
| with a 'nough explained signal, or even with a 'nough
| joking signal.
|
| I read several studies that show that brains don't have a
| central point of command, so our true self can not exist
| (as one single origin). We are the sum of all our
| consciousnesses, similar to how a car is the sum of its
| parts.
| nico wrote:
| There is a technique for achieving this state of
| consciousness, it's called noting
|
| This is an awareness that advanced meditators seek, practice
| and develop to perceive "reality as it is"
|
| If you are curious, you might find related discussions, and a
| great welcoming community at r/streamentry on Reddit
|
| Also the book Mastering the Core Teachings of the Buddha
| talks about it quite a bit, including instructions on how to
| do it
| jondwillis wrote:
| Is this different from Dzoghchen buddhism?
| nico wrote:
| Noting is just a meditation technique
|
| You might also call it an exercise for insight practice
|
| There are multiple traditions that use noting or similar
| techniques for insight practice (maybe with different
| names)
|
| Can't vouch for this thread, as I just found it, but
| here's a related discussion (Dzogchen vs Vipassana) https
| ://www.reddit.com/r/Buddhism/comments/9t3095/dzogchen_v..
| .
| jprete wrote:
| Noting is very useful as long as you remember not to do it
| all the time.
| 0xdeadbeefbabe wrote:
| If you don't remember then what? Stack overflow? Heap
| overflow?
| giva wrote:
| Well, this sound weird to me in the sense that I don't feel
| that I think in _words_. I only convert my thoughts into
| words when i need to speak or write them down; So when I need
| to communicate them to others, when I need to remember them
| for later, or when I am stuck and I need to clear things up.
|
| I was actually convinced it was the same for most people, and
| that for this reason "Rubber duck debugging"[1] is a thing.
|
| 1) https://en.wikipedia.org/wiki/Rubber_duck_debugging
| JoBrad wrote:
| Same. If I try to visualize my thoughts it's like a cloud
| that coalesces into various forms, to show different
| scenarios. It definitely isn't word-based until I decide to
| actually translate it into that mode.
| mewpmewp2 wrote:
| Interesting. I think all of my thoughts are this record
| I'm listening to as if it's an audiobook almost.
| Sometimes, it's like multiple parallel streams of
| different thoughts at different strengths that I can
| observe, like a thought line that is going on, on a more
| subconscious level, and it's something that if I notice,
| I might want to pay attention to.
|
| Like multiple LLMs are generating tokens in my head in
| parallel, but like in my field of view, some I can only
| listen/see barely because I'm not focusing on them.
| kjqgqkejbfefn wrote:
| Am I the only one visualizing some of my most creative
| thoughts in a mental palace that is formed by many distinct
| (euclidian) spaces, whose axis connect to each other
| through a graph ? Closest thing that can describe this I
| found are simplicial sets:
|
| picture: https://encrypted-
| tbn0.gstatic.com/images?q=tbn:ANd9GcRx5Xam...
|
| It seems it's used by cognitive models, although I'm not
| formally trained enough to tell exactly how:
|
| https://arxiv.org/pdf/1703.08314.pdf
| mewpmewp2 wrote:
| I wish I had something like this in my head to tie things
| in together. Right now I feel like my understanding of
| things is so disorganised and "lucky" in a sense. I feel
| lucky that I have grasp of anything.
| giva wrote:
| I don't know what a simpilician set is and wikipedia
| didn't really helped me. However I could roughly describe
| my "mind" as many mental maps where concepts are laid out
| and connected in different ways. Learning means putting
| new things on these maps a thinking is navigating through
| them.
| karmakaze wrote:
| Reminds me of the saying about a poet vs mathematician,
| the first gives different names to the same thing and the
| latter the same name to different things. _Maybe that 's
| why I can't stand highly descriptive prose (aka
| describing the water while I'm drowning over here)._
|
| Now what if you're a poetic mathematician _(or
| mathematical poet)_ , what's that mind map look like?
| LargoLasskhyfv wrote:
| Well... what about that palace of mind thing, and the
| ability to rewind into almost all older memories at will,
| and on demand being able to look up things from there,
| like reading, without having it memorized at all? Also
| full stream of consciousness, like smells, tastes, light
| wind on your skin, 'silken air' at just the right
| temperature and humidity.
|
| All of that arranged in something like 'eigengrau',
| represented by glitterlike points connected by graphs,
| mostly in 'phospene' colors, but not exclusively so.
|
| Sometimes very non-euclidean, moving/warping.
|
| _KNOWING_ what 's behind every glitter point, like small
| cinema, large home theatre, from several points of view
| at the same time.
|
| No words involved. Just visuals.
|
| Thinking, like juggling/weighing blobs, like that glowing
| stuff which moves slowly up and down in a lava-lamp.
|
| Somehow 'knowing' what each blob, its size/form/viscosity
| /weight/speed/color/brightness/'feel'/smell represents.
|
| Slowly emerging new 'visuals' from this. Which are then
| translated into 'language', if ever.
| jiggawatts wrote:
| https://mymodernmet.com/inner-monologue/
| marmaduke wrote:
| > curated output of the brainstormy process that immediately
| precedes it
|
| Daniel Dennett gives a nice albeit more detailed version of
| your idea in his book Consciousness Explained, could be worth
| a read
| samstave wrote:
| Mandelthought psyt.
| melagonster wrote:
| From positive perspective,it is surely that our thinking/mind
| is not just language and always faster than sentence
| formation.
| JoBrad wrote:
| I had a similar experience when I was put under during
| surgery a few years ago. Later I learned that they used
| ketamine in their concoction.
| allemagne wrote:
| I occasionally reach a similar state near sleep where I will
| be half-dreaming that I'm reading from a page of a book where
| the words materialize/"come into focus" right before my eyes
| into what is usually vaguely grammatically correct nonsense.
| pictureofabear wrote:
| This seems like it might upend Descartes' "cogito, ergo sum"
| ("I think therefore I am") in that the process for forming
| thoughts in a language is not indicative that we exist,
| rather it merely indicates that we have evolved a brain that
| can produce and interpret language.
|
| Seems like we're dismantling a lot of what Descartes came up
| with these days.
| TriNetra wrote:
| For that I came up (or got inspired from somewhere) with
| this: I'm aware therefore I exist. Pure awareness, devoid
| of all objects (thoughts/visualization) is me.
| theaussiestew wrote:
| I have this too. My cognitive processes are not related to my
| thinking brain, which I define as the part of my mental
| process which produces the sounds of words in my mind.
| Instead, I've observed that first, my subconscious processes
| concepts at a much more fine grained level, much like the
| latent space of a machine learning model. Only substantially
| after, let's say 10ms after, do thoughts arise, which are
| just pointers to the already processed subconscious process.
| A very rough analogy would be the inference of an LLM in
| words, vs all the processing of embeddings that happens
| internally.
| tasty_freeze wrote:
| People often say that LLMs aren't really thinking because they
| are just producing a stream of words (tokens really)
| reflexively based on some windows of previous text either read
| or from its own response. That is true.
|
| But I have the experience when talking of not knowing what I'm
| going to say until I hear what I've said. Sometimes I do have
| deliberative thought and planning, trialing phrases in my head
| before uttering them, but apparently I'm mostly an LLM that is
| just generating a stream of tokens.
| Workaccount2 wrote:
| This is something that is easily observable by anyone at
| virtually any moment, yet at the same time is something that
| escapes 99% of the population.
|
| When you are talking to someone in normal conversation, you
| are both taking in the words you are saying at the same time.
| iteygib wrote:
| How does evolutionary instinct factor into the system model?
| Flight or fight responses, reflexes, etc. 'Thinking' does have
| consequences in terms of evolutionary survival in some
| circumstances, as in spending too much time
| deliberating\simulating.
| kderbe wrote:
| Andrej Karpathy makes this same point, using the same book
| reference, in his "[1hr Talk] Intro to Large Language Models"
| video from Nov. 2023.
|
| Here is a link to the relevant part of his presentation:
| https://youtu.be/zjkBMFhNj_g?t=2120
| emmender2 wrote:
| thinking step-by-step requires 100% accuracy in each step. If
| you are 95% accurate in each step, after the 10th step, the
| accuracy of the reasoning chain drops to 59%. this is the
| fundamental problem with llm for reasoning.
|
| reasoning requires deterministic symbolic manipulation for
| accuracy. only then it can be composed into long chains.
| hesdeadjim wrote:
| I dream of a world where the majority of humans could come
| close to 59% after attempting a ten step logical process.
| throwuwu wrote:
| You've never made a mistake in your reasoning?
|
| Tongue in cheek but this has been considered and has resulted
| in experiments like tree of thought and various check your
| work and testing approaches. Thinking step by step is really
| just another way of saying make a plan or use an algorithm
| and when humans do either they need to periodically re-
| evaluate what they've done so far and ensure it's correct.
|
| The trick is training the model to do this as a matter of
| course and to learn which tool to apply at the right time
| which is what the paper is about wrt interspersed thoughts.
| trenchgun wrote:
| >reasoning requires deterministic symbolic manipulation for
| accuracy
|
| No, that is automation. Automated reasoning is a thing,
| indeed. And I can kind of see a world where there is a system
| which uses LLM for creative thinking, augmented with
| automated reasoning systems (think datalog, egg, SMT-solver,
| probabilistic model checking etc).
| glial wrote:
| I think of COT as a memory scratchpad. It gives the LLM some
| limited write-only working memory that it can use for simple
| computations (or associations, in its case). Now suppose an LLM
| had re-writeable memory... I think every prompt-hack, of which
| COT is one example, is an opportunity for an architecture
| improvement.
| HarHarVeryFunny wrote:
| I think of COT more as a type of planning or thinking before
| you speak. If you just open your mouth and start talking,
| which is what a plain LLM does, then you may talk yourself
| into a corner with no good way to get out of it, or find
| yourself saying something that really makes no sense. COT
| effectively allows the LLM to see the potential continuations
| of what it is considering saying, and pick one that makes
| sense!
|
| I think lack of COT or any ability to plan ahead is part of
| why LLMs are prone to hallucinate - if you've already run
| your mouth and said "the capital of australia is", then it's
| a bit late to realize you don't know what it is. The plain
| LLM solution is to do what they always do and predict next
| word using whatever it had in the training set, such as names
| of some australian cities and maybe a notion that a capital
| should be a large important city. IOW it'll
| hallucinate/bullshit a continuation word such as "Melbourne".
| With COT it would potentially have the ability to realize
| that "the capital of australia is" is not a good way to start
| a sentence when you don't know the answer, and instead say "i
| don't know". Of course the other cause of hallucinations is
| that the LLM might not even know what it doesn't know, so
| might think that "Melbourne" is a great answer.
| eightysixfour wrote:
| This is a common comparison in the LLM world. I actually think
| it is closer to the Left/Right Brain differences described in
| Master and His Emissary, but that's for a blog post later.
| bun_at_work wrote:
| I have a similar view to you and not much to add to your
| comment, other than to reference a couple books that you might
| like if you enjoyed 'Thinking, Fast and Slow'.
|
| 'The Righteous Mind' by Jonathan Haidt. Here, Haidt describes a
| very similar 2-system model he describes as the Elephant-rider
| model.
|
| 'A Thousand Brains: A New Theory of Intelligence' by Jeff
| Hawkins. Here Jeff describes his Thousand Brains theory, which
| has commonality with the 2-system model described by Kahneman.
|
| I think these theories of intelligence help pave the way for
| future improvements on LLMs for sure, so just want to share.
| thwarted wrote:
| This sounds similar to the A Brain/B Brain concept that was
| described by, I believe, Marvin Minsky. I don't know how this
| might be related to Kahneman's work.
| kouru225 wrote:
| Feel like this is better represented as the default mode
| network: https://en.m.wikipedia.org/wiki/Default_mode_network
|
| There are questions we know the answers to and we just
| reflexively spit them out, but then there are questions that
| are new to us and we have to figure them out separately.
|
| Recent research has shown that new memories are recorded in the
| brain differently depending on how unique the memory is:
| https://www.quantamagazine.org/the-usefulness-of-a-memory-gu...
| adlpz wrote:
| Any relation to OpenAI's rumored Q* (i.e. q-star) model? Authors
| of this paper don't seem affiliated.
|
| Just a name coincidence?
| HarHarVeryFunny wrote:
| I was thinking the same. The STaR paper this is an extension of
| came out in 2022, so at least possible this is what q-star is
| based on too, but maybe with Q standing for something else.
| smusamashah wrote:
| I think it's just a play on the same hyped up term.
| anon291 wrote:
| So it seems 'obvious' to me that a network about 50 layers deep
| (for example) can only reason about symbolic questions for 50
| 'steps' (in quotes because it's not a step as we think about it).
| It only seems there's more complexity because it's 50 steps in
| one or more learned subspaces that the model has been trained in
| (which might mean the model can accomplish more than one 'human
| step' in its 'step'). Humans (well intelligent humans at least)
| seem able to obviously reason beyond those steps, but we all know
| it requires real thinking and deliberation and perhaps a notepad
| to be able to do that.
|
| It's quite something to, for example, expect ChatGPT to be able
| to correctly do 4 digit multiplications without any thought or
| recourse to 'paper' when very few human beings can do that.
| blackbear_ wrote:
| This paper does indeed follow your intuition to investigate the
| limits of transformers on compositional tasks (i.e., those that
| require multi-step reasoning, including your multiplication
| example): https://arxiv.org/abs/2305.18654
|
| > Our empirical findings suggest that transformer LLMs solve
| compositional tasks by reducing multi-step compositional
| reasoning into linearized subgraph matching, without
| necessarily developing systematic problem-solving skills. To
| round off our empirical study, we provide theoretical arguments
| on abstract multi-step reasoning problems that highlight how
| autoregressive generations' performance can rapidly decay with
| increased task complexity.
| anon291 wrote:
| Ah good... This is definitely a research path I've been
| looking into. Great to see someone else has already gone
| there!
| visarga wrote:
| Maybe the Skill Mix paper is relevant here. They define a
| list of 100 skills, and then randomly sample tuples of n
| skills (usually less than 6) and generate a test example
| using those skills. Apparently only GPT-4 (at the time of the
| paper) was able to compose 5 skills, the other models just 3
| or 2. Beyond 5 skills even GPT-4 was doing much worse.
|
| The interesting finding of the paper is that GPT-4 couldn't
| have seen all the (topic, skill-tuple) combinations in the
| training set. If you have 10,000 examples on a topic, and use
| 5 out of 100 skills, you would need 100^5 training examples
| to cover all combinations. In conclusion GPT-4 generalizes to
| new skill combinations, thus it is not a stochastic parrot.
|
| https://arxiv.org/abs/2310.17567
| radarsat1 wrote:
| This is true but you have to also consider the autoregressive
| component. In your example, it's 50 steps _per iteration of the
| model_ , where the model is executed once for each token in the
| output.
|
| So practically speaking it's a bit more complicated to
| calculate how much the model can "think". Of course once a
| token is output it is committed to that (in the most basic
| scenario), but that doesn't mean it is not still "thinking" as
| it produces subsequent tokens.
|
| > perhaps a notepad
|
| Exactly, the context and previously output tokens can be
| considered such a notepad since they are input for the next
| steps of the model.
| Closi wrote:
| Agreed - also prompt engineering encourages LLM's to do this
| too (i.e. asking the LLM to explain the steps it will take to
| solve an answer, prior to answering - e.g. Zero-Shot CoT
| 'Let's think step by step')
| anon291 wrote:
| So part of my general issue with this kind of thinking is
| that, if we take this as the main means of creating
| complexity, then shorter prompts are worse for reasoning than
| longer ones, because longer ones automatically give the model
| more 'space', to think. Now, I realize that the research
| community knows this, but I like papers like this that
| explicitly seek ways to enable the model to 'breathe' a bit,.
| visarga wrote:
| You are missing an important detail here - number of tokens -
| yes, you have 50 "steps" in network depth, but you could have
| extra tokens. Assuming you don't run out of tape, there is no
| reason for LLMs to be limited to simple operations.
| FeepingCreature wrote:
| Here we go!! I've been waiting years for them to try this. Let's
| see how it does when scaled up to GPT-3/4 level.
|
| This might be the missing piece to AGI.
| parthianshotgun wrote:
| The missing piece is unknowable
| Cthulhu_ wrote:
| We'll likely reconstruct what the missing piece was in
| hindsight, but it's very probable there's no one missing
| piece. Just like human evolution.
| digging wrote:
| Until it's been found, you mean?
| sroussey wrote:
| Maybe even then too!
| arendtio wrote:
| I am not convinced there even is a missing piece. I mean,
| LLMs are being _used_ very differently compared to how
| traditional AI programs were written. Combining both worlds
| might be all that is needed.
|
| I would not be surprised that when we have general artificial
| intelligences, we will see, that advancing LLMs wasn't
| necessary.
| 082349872349872 wrote:
| Edsger Dijkstra had a precise english style; even though his
| mother tongue was Dutch, I find he made better use of English
| than many native speakers.
|
| In one of the EWD's, he reminisced that, as children, they were
| taught to never begin to speak a sentence unless they already
| knew how they were going to finish it.
|
| I'd bet these two observations have a causal connection.
| ricardobeat wrote:
| Is that even possible, or just hyperbole? I'd bet the latter. I
| wouldn't be surprised if some people are able to fully unravel
| entire paragraphs of conversation in their head in a couple of
| seconds, but that's not something you could teach to children
| in general.
| mannykannot wrote:
| I don't think it is feasible, at least for conversation, but
| as an aspirational goal for children, along the lines of "put
| your toys away when you've finished playing with them", it is
| not a bad one.
|
| It's not unusual for me to think I know how I am going to end
| a sentence, but then find that I can't get there.
| h34t wrote:
| in Dutch (and German) the verb often goes at the end of a
| sentence, so the advice is rather practical.
| ricardobeat wrote:
| Dat week ik heel goed :(
| ricardobeat wrote:
| *weet, thanks autocarrot
| ted_bunny wrote:
| German children would with you disagree.
| caddy wrote:
| I also wonder if it has anything to do with the process of
| learning a new language in general. I've thought more
| thoroughly about how English works since I've been learning
| French (not that I'm very eloquent in either)
| Cthulhu_ wrote:
| I've observed two things. One, writing is different to
| speaking, because it's async, you can think before you write,
| you can edit, etc.
|
| But second, speaking in a non-native language makes you think
| harder about what you're about to say. Less colloquialisms,
| more focus on making sure your meaning is understood, more
| sensitivity in case you might offend someone, perhaps?
|
| It's not new either; a lot of science and whatnot has been done
| in people's not-native language, like French, German, Latin,
| etc. Another factor there is the lingo of the field; I can't
| simply say "Kubernetes is een open-bron houder
| orkestratiesysteem voor het automatiseren van de inzet,
| schalen, en het beheer van zachte waren" without confusing half
| my native speaking audience.
| torginus wrote:
| I also learned English from textbooks, and one of the strangest
| things I encountered that native speakers routinely confuse
| "their, there, they're" which I never thought was a mistake I
| could make. It would be like confusing 'wet' and 'vet'. So
| there's definitely a difference between native and non-native
| speakers use the language.
| qup wrote:
| The people who confuse that mostly have not done very much
| reading. Audibly, those words are identical.
| leobg wrote:
| Even crazier:
|
| "Could of".
|
| Like "You could of said so".
| zoogeny wrote:
| When I was a young man I was taking a language course while I
| was temporarily living in a foreign country. There was an older
| man in the course (not elderly, more like mid-fifties) who was
| very bad at the new language we were both learning. Yet I
| noticed he had, what seemed to me, a magic power: he could
| always make people laugh. He would often whisper something to
| one of our classmates and they would always get a giant smile
| on their face or even laugh out loud.
|
| I was intensely curious and I spent some time wondering how he
| did it. One day, out of the blue, he invited me out to lunch
| after class. We just chatted for most of the lunch, exchanging
| backgrounds and stories. Then his face took on a serious
| expression and he slowly and carefully began to explain
| something to me as if he was passing on some wisdom.
|
| He said that he never spoke a single sentence without fully
| saying the sentence in his mind. He said he would often think
| of the words several times in his mind, revising the phrase
| until he was happy. He would imagine saying the words to the
| person in front of him and he would imagine their reaction. And
| he would continue to revise until he felt confident the person
| who heard the words he would say would react in the way he
| wanted them to react. If he could not imagine the person
| reacting how he wanted them to react, he would not say anything
| at all.
|
| It was clear to me that he was passing along this advice but
| also that he was calling me out a bit. He was letting me know
| that I spoke without thinking. I say what pops into my head. It
| was like he read my mind honestly, he knew exactly what I was
| curious about and he answered the question I had for him that I
| never asked.
|
| I wish I could say that I learned the lesson. When I have tried
| the technique it has rewarded the effort. But I haven't formed
| it into a habit and I still tend to let my mouth race ahead of
| my mind.
| wara23arish wrote:
| I love reading his EWDs, I had a professor who worked with him
| who mentioned he made his students work use pens while taking
| his tests. To make it less likely for the students to make
| mistakes??
| westurner wrote:
| Perhaps to make it easier determine how to correct
| instruction.
|
| - "Guidelines for keeping a laboratory notebook" (2019)
| https://news.ycombinator.com/item?id=19123430#19126809
| float4 wrote:
| > he made his students work use pens while taking his tests
|
| This is very common in the Netherlands, I think that's why it
| was a rule of his.
|
| In general, the Dutch education system seems to be against
| pencils (at least this was the case until recent; I'm Dutch
| and mid 20s). You're tought to write using a fountain pen,
| not a pencil. In high school, you're allowed to switch to
| ball point but absolutely not to pencil. In university, write
| with pretty much anything you want, but... not with a pencil.
| If you do take your test with a pencil, there's genuinely a
| chance your teacher will give you a 0, although most of the
| time they'll probably be forgiving.
|
| I majored in CS in the Netherlands and every test was done
| with good old pen and paper. Students still make mistakes all
| the time, which is why everyone uses a scrap sheet.
| QuantumG wrote:
| We're done for!
| pawnty wrote:
| This is the missing piece to train AI which has the ability to
| reason. There are so many tasks whose answers are known but
| reason steps are missing. With this method, we can use less
| annotated data the reach the ability.
|
| The interesting part(I imagine): the generated thought could be
| hard for human to understand while it is still way more helpful
| to get the correct answer! If that happens, we have created
| something more intelligent than ourselves.
| silent_cal wrote:
| Neural networks do not think
| adlpz wrote:
| Do neurons think? Do a bunch of neurons?
|
| Is this semantics?
| empath-nirvana wrote:
| basically this: https://en.wikipedia.org/wiki/Sorites_paradox
|
| One neuron doesn't think. Three neurons don't think. Billions
| of neurons think. Somewhere between one neuron and billions
| of neurons, thinking starts happening. Probably also true for
| neural networks. The main problem is that people throw around
| terms like: "Thought", "Intelligence", "Will", "Reasoning",
| "Knowledge", "Consciousness", etc like they are very well
| defined and well understood terms and they very much are not.
| silent_cal wrote:
| Billions of neurons don't think, people do.
| empath-nirvana wrote:
| ...with what?
| silent_cal wrote:
| With their minds
| adlpz wrote:
| My point precisely. Those are all vague terms. Saying that
| "neural nerworks do not think" is as meaningless as any
| equivalent (or opposite) statement on any other system
| including any number of neurons, a whole brain or a
| _person_.
|
| It's all semantics.
| silent_cal wrote:
| There are no real neurons in a neurons in a neural network.
| PoignardAzur wrote:
| You're not giving information anybody on this forum doesn't
| already know.
|
| Obviously they don't "speak" either. Both "think" and "speak"
| are used as shorthands here for what the language models
| actually do.
| silent_cal wrote:
| What are you upset with me for? The authors are using the
| misleading language, not me. Take it up with them.
| optimalsolver wrote:
| Could you give a definition of "think" that NNs fail to live up
| to?
| silent_cal wrote:
| Abstracting immaterial concepts from physical reality and
| deliberately using them in analytical or deductive processes
| to discover truths.
| optimalsolver wrote:
| So basically finding ways to compress your observational
| history?
| silent_cal wrote:
| No, it's not "basically" that at all.
| optimalsolver wrote:
| That's pretty much what it is, as you stated it. Finding
| abstractions that let you encode your observational
| history more efficiently than you previously could, or
| "discovering truths", if you want to be all mystical
| about it.
| ogogmad wrote:
| Might be relevant:
| https://www.nature.com/articles/s41586-023-06924-6
| _Mathematical discoveries from program search with large
| language models_
| 4RealFreedom wrote:
| I don't understand the downvotes - you are correct.
| silent_cal wrote:
| I think people just get mad when they're reminded of this
| obvious fact. They want computers to prove that our minds are
| an illusion, the product of a "meat computer".
| stoniejohnson wrote:
| Read some Daniel Dennett!
| silent_cal wrote:
| Are you serious?
| stoniejohnson wrote:
| You're very grumpy I think you need some food and a nap
| :-)
| viraptor wrote:
| It's fair to talk about thinking in a handwavey "you know
| what I mean" way. This is not a philosophy paper. It's a fine
| point if that's what you want to discuss, but doesn't change
| anything about the issue at hand and is needlessly pedantic.
| It's the "what you're referring to is actually GNU/Linux" of
| AI discussions.
| YetAnotherNick wrote:
| Another RL paper with terrible baseline. They used 0 shot non
| instruction tuned Mistral for GSM8k which has very specific way
| of output. They got 11% accuracy after improving it, while few
| shot prompting achieves 37%[1]. GPT 4 could get ~97% with
| prompting.
|
| [1]:
| https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...
| hiddencost wrote:
| Fwiw if they're serious scientists, taking a known method and
| baseline and improving it is good science. Extensions to get
| state of the art are probably possible, but their goal is to
| measure just the impact of their change in a simple setting.
| Let the engineers do the munged system combinations and get
| SoTA.
| YetAnotherNick wrote:
| I am not talking about SoTA. I am talking about deliberate
| poor baseline. GSM8k consists of two things: solving the
| problem and getting the output format correct. Getting the
| output format corrects gives 30% accuracy for the same model
| where they got 11%. SoTA is 97%.
| lionkor wrote:
| This is purely anecdotal, and I try to keep it to myself but its
| very difficult when at least half of the HN homepage is AI
| related: LLMs like ChatGPT do so utterly terribly at any non-
| trivial job I throw at it that I seriously consider people who
| use it daily to either be straight up incompetent, or maybe their
| domain is so trivial that the LLM actually does well.
|
| From asking LLMs to solve a highly difficult async C++
| parallelism problem, to german language specifics, it just fucks
| up at a fundamental level. I understand that LLMs cannot solve
| these issues and why, but then I do not understand the heavy
| focus on AI by so many tech people.
|
| Is day to day programming job so trivial that LLMs do a good job,
| while at the same time being too difficult for you to do it
| yourself? I really, really want to know exactly what the use case
| is.
|
| Do people just throw simple problems at it to validate their own
| preconceived notion of how cool and useful LLMs are? Whats the
| deal?
| rplnt wrote:
| Every other query I've given to ChatGPT came up with an utterly
| wrong answer. Followup always yielded "sorry, I made an obvious
| mistake, here's another wrong answer". Confident and stupid is
| a very bad combination.
| jollyllama wrote:
| There are plenty of jobs where people have to complete various
| tasks that are outside of their domain or otherwise tedious on
| a daily basis. For example, plenty of devs have to set up or
| change the configuration of remote hosts. Some LLMs are pretty
| good at generating devops scripts to speed up this work.
| orzig wrote:
| Exactly. Example: maybe 1% of the code I generate is bash. I
| used to try to memorize patterns, but of the top 20 I'd use
| each less than once per year. Now, instead of that 1% taking
| 5% of my time, it takes 2%. It's all "simple stuff", and I
| can verify it instantly.
|
| I have ~10 similar use cases. So it hasn't revolutionized my
| life, but it's been well worth $20/mo ChatGPT Plus and $3/mo
| API calls.
| williamcotton wrote:
| Boilerplate, test code, and general tedium. Most software just
| needs to handle IO.
|
| The next time you want to use SQL to compute a rolling sum try
| asking ChatGPT 4 instead of searching through documentation or
| search engine results for windowing functions.
|
| Competency at programming along with very good technical
| communication skills (with a touch of learning how to not hold
| the tool backwards) and you should find the appeal.
| slices wrote:
| yes. just used Cody to get me on the right path with an
| obscure postgresql JSON query, it easily saved me an hour of
| fiddling around.
| luma wrote:
| This is an observation I've seen a lot around here. Underneath
| it is the assumption that "if I can't figure out how to get
| meaningful use out of a tool, the tool must be useless".
|
| OpenAI didn't sign up 100M users without somebody somewhere
| finding it to be useful. Like any other tool, it's utility is
| limited mostly by the person wielding it.
| bluGill wrote:
| The tools seem useful, but I'm not sure they are. too often
| they will confidently make up an answer that is wrong. When I
| use them they do great on trivial problems but can't help on
| hard ones.
| luma wrote:
| Reframe your thinking. You're approaching it like other
| computer systems, where a given input yields a determined
| output. Instead, treat it like a junior dev whom you can
| unload an unlimited amount of work to, but the result still
| requires review.
|
| We're all used to working this way in human systems, people
| that sound confident might also be wrong, and you learn
| where you might trust them more or less as you work with
| them over time. Until you are confident that they are
| always "right" in a given problem domain, you need to apply
| some level of review.
|
| Finally, keep in mind that there are "smarter" and "dumber"
| LLMs. If you didn't pay for what you were doing, you were
| talking to a "dumber" model. The quality does go up if you
| have $20 in your pocket.
| bluGill wrote:
| The junior engineers I know tend to ask questions not be
| confidently wrong. That isn't to say they are always
| right but they make a very different class of errors.
| luma wrote:
| Again, this is a tool you can use. You can complain that
| it doesn't work in the way you expect, or you can learn
| how it operates and how best to use it. If you can't
| figure out how to apply it to your work, that's fine, but
| loads of other people are doing exactly that with or
| without you.
| EForEndeavour wrote:
| > When I use them they do great on trivial problems but
| can't help on hard ones.
|
| That sounds _super_ useful! The tools free you up from
| wasting time on trivial problems so you have more time to
| focus on the hard ones. What 's not to love?
| bluGill wrote:
| I try to work on complex problems. Sometimes they hide
| something easy
| Tadpole9181 wrote:
| They're good autocomplete, they can help search for solutions
| sometimes better than Google (SEO spam), you can use it as a
| rubber duck, and you can make it auto fill trivial stuff that
| would take you a few minutes to write out manually, like test
| scaffolding. I would _never_ use it to actually complete a non-
| trivial task and I _always_ confirm it 's answers. And yeah,
| sometimes it sucks - it's a tool with a learning curve about
| knowing it's limitations.
|
| The reason there's so much money and time is that even semi-
| competant AI is relatively new and the methods are still
| extreme crude, and yet it's this advanced. This seems like the
| path to an AGI, and if someone were to even approach that
| point, it would radically change the world forever and could
| lead to either really good things or really bad things.
|
| Now, GPT-4 isn't considered the best at specialized tasks. It's
| a master of many, but there are _much_ smaller models that can
| do things like incredibly complex symbolic /geometric math
| proofs, write code, perform translations, etc better. A lot of
| ideas are on making expert systems using many of those
| specialists combined with a generalist, like the segmentation
| of a brain.
|
| Anyway:
|
| > I seriously consider people who use it daily to either be
| straight up incompetent, or maybe their domain is so trivial
| that the LLM actually does well.
|
| These kinds of radical lines of thinking about a significant
| proportion of enthused professionals (in any industry) who
| aren't showing the same experience as you, is a red flag for
| introspection. It's so easy to fall into the "enlightened me"
| trap.
|
| I appreciate you asking for more information!
| nathas wrote:
| I had a similar take until about a week ago. A friend showed me
| his workflow with Copilot and whatever Jetbrains AI assistant
| is.
|
| Use it as a tool: what if instead of opening up a new tab,
| searching for the API docs for the library you're trying to
| find a function in, find the function, re-read the parameter
| arguments for the 400th time, and then use it, you could just
| highlight a snippet and say "Paginate the results from S3 using
| boto3" and the code would just populate?
|
| You have to have the clarity of thought to know what you're
| doing, but the time it takes to write every line for basic
| stuff you've done 1000x before can be greatly compressed if
| it's inlined with your IDE.
|
| I think this is the move for most LLM tools: integrate it with
| existing tooling. An LLM for Excel for corporate bookkeepers,
| CPAs, etc will be great. A Word/PDF summarizer that's tuned for
| attorneys will also be fantastic. Highlight a paragraph, ask
| for relevant case law, etc.
|
| I thought ~2 years ago the results were... not great. Now I'm
| pretty happy with it.
|
| SecureFrame (helps with compliance regimes like SOC2) recently
| added the ability to generate Terraform templates to
| automatically generate infrastructure that will fix specific
| platform risks for AWS, Azure, GCP, etc.
|
| It definitely needs someone at the helm since it does
| hallucinate, but I have found it to cut down my time on mundane
| tasks or otherwise niche/annoying problems. When was the last
| time you visited 4+ StackOverflow posts to find your answer?
| Copilot, so far, has always hit a pretty close answer very
| quickly.
| samstave wrote:
| Sorry if this is sophmoric, but when you said "you have to
| have clarity of thought" - what jumped to mind was the phrase
| "you have to speak to the code"... I thought it encapsulated
| your clarity of thought quite saliently for me.
| throwup238 wrote:
| You must be one with the code. You must _be the code_.
| dkjaudyeqooe wrote:
| I don't know exactly how you use it, but this isn't my
| experience at all. If you ask a LLM anything too specific,
| that isn't obvious and a common issue/discussion ( something
| that I almost never need to do), it just makes up nonsense to
| fill the space.
|
| Equally, if you ask it general questions it misses
| information and is almost always incomplete, leaving out
| slightly more obscure elements. Again, I need comprehensive
| answers, I can come up with incomplete ones myself.
|
| What's really obvious to me when I use it is that it's a LLM
| trained on pre-existing text, that really comes through in
| the character of its answers and its errors.
|
| I've very glad others find them useful and productive, but
| for me they're disappointing given how I want to use them.
| orzig wrote:
| That's fair, it might not be for you. In 'old school ML',
| for a binary classifier, there's the concept of Precision
| (% of Predicted Positive that's ACTUALLY Positive) and
| Recall (% of ACTUALLY Positive that's Predicted to be
| Positive).
|
| It sounds like you want perfect Precision (no errors on
| specific Qs) and perfect Recall (comprehensive on general
| Qs). You're right that no model of any type has ever
| achieved that on any large real-world data, so if that's
| truly the threshold for useful in your use cases, they
| won't make sense.
| dkjaudyeqooe wrote:
| I just want something useful. I'm not talking perfection,
| I'm talking about answers which are not fit for purpose.
| 80% of the time the answers are just not useful.
|
| How are you supposed to use LLMs if the answers they give
| are not salvageable with less work than answering the
| question yourself using search?
|
| Again, for some people it might be fine, for technical
| work, LLMs don't seem to cut it.
| orzig wrote:
| I also had to build intuition for when it will be appropriate
| versus not. It's hard to describe but one very positive
| signal is certainly "will any hallucination be caught in
| <30s"? Even in ChatGPT Plus you can have it write its own
| unit tests and run them in the original prompt (even in the
| profile's Custom Instructions so you don't have to type it
| all the time).
|
| So a mistake was using it for something where runtime
| performance on dozens of quirky data files was critical; that
| nearly set my CPU on fire. But str>str data cleanup, chain of
| simple API calls, or some a one-off data visualization? _chef
| kiss_
| jmull wrote:
| > to write every line for basic stuff you've done 1000x
| before
|
| There are ways to avoid writing basic stuff you've done 1000x
| before that are better than LLMs though...
|
| Put it in a well-thought-out function or package or other
| form of shared/reusable code. You can validate it, spend the
| time to make sure it covers your edge cases, optimize it,
| test it, etc. so that when you go to reuse it you can have
| confidence it will reliably do what you need it to do. LLM-
| generated code doesn't have that.
|
| (When you think about how LLMs are trained and work, you
| realize they are actually just another form of code reuse,
| but one where there are various transformations to the
| original code that may or may not be correct.)
|
| Where LLMs shine for coding is in code-completion. You get
| the LLM output in little chunks that you can immediately
| review correctly and completely, in the moment: "yeah that's
| what I want" or "no, that's no good" or "ok, I can work with
| that". Not surprising, since predicting completion is what
| LLMs actually do.
| kthartic wrote:
| Some questions we've thrown at GPT-4 recently (real use cases):
|
| > how does torchmetrics IOU work? Does it match gt with
| detection boxes? or does it do pairwise IOU and average?
|
| > What predictions has Ray Kurzweil made that he got correct
| and incorrect? Please produce a table
|
| > can you give me a stack implementation with min function in
| O(1) time
|
| > (A question about how we should solve a UX problem specific
| to our app)
|
| > What is the best way to return raw image data via a REST
| endpoint?
|
| > How is Return on Capital Employed (ROCE) calculated?
|
| > Following the email exchange below, write a cross intro email
| to introduce (X person) to (Y person)
|
| > How do I run this code on TPU in Collab?
| samstave wrote:
| RE: Ray Kurzweil
|
| Did you see him on JRE last week:
|
| https://www.youtube.com/watch?v=w4vrOUau2iY
|
| (or was that why you asked)
| jrmg wrote:
| Did it correctly answer all of these?
| keiferski wrote:
| You should treat LLMs the same way you treat any other smart
| entity, human or otherwise: realize that they can be both
| immensely _useful_ and fundamentally _wrong_ at the same time.
| Intelligence is not equivalent to correctness.
| zmgsabst wrote:
| Three examples:
|
| 1. having ChatGPT generate boilerplate, because I'm lazy;
|
| 2. having ChatGPT attempt something I don't know as a starting
| point, eg JavaScript; or,
|
| 3. having ChatGPT give a reference rather than Google myself,
| eg of a config option.
|
| ChatGPT makes 1 less tedious, 3 less a game of "what magic
| phrase finds the right SO post?", and means I do 2 at all, eg
| trying out JS features on my blog.
|
| I think it does alright at composition if you break down the
| task sufficiently, but it struggles with higher order structure
| -- particularly if you're using multiple responses.
|
| That said, I suspect we need a theory shift to get AI to
| comprehend higher order structure in composition.
| empath-nirvana wrote:
| It's pretty amazing at generating rust structs from yaml
| examples, and also at writing generic versions of rust
| functions.
|
| Neither of those tasks are especially _difficult_, but they
| are _annoying_.
| slifin wrote:
| Not everything in tech is difficult
|
| I find LLMs great for creating SQL queries and regexes
| readyman wrote:
| Profit. The question at hand is whether LLMs can produce
| profit, which is an extremely different question than the
| questions you're asking.
| leothecool wrote:
| I train my LLM to barf up my domain specific boilerplate code.
| I don't ask it to solve business problems.
| BenFranklin100 wrote:
| I signed up for Open.AI's monthly subscription. Its performance
| on non-trivial tasks is abysmal. It's a regurgitation machine.
| One might mischievously argue the average tech worker isn't
| much better than an LLM, thus the interest? On a related note,
| we are deluged daily with firms offering AI services. I see a
| bubble.
| Havoc wrote:
| For me it's more like brainstorming.
|
| Even if half of it is garbage it's a net win. At least in
| domains where I can distinguish the two.
|
| There are also cases where the cost of failure is very low. Eg
| I could spend half an hour reading an api spec or I could make
| an AI give me a curl command and test it out in 30 seconds. If
| it works great if not oh well time to read spec
| dmos62 wrote:
| Why do you presume that people commonly use it for non-trivial
| things? It excels at trivial things. That's what most people
| use it for, probably. Like google search. Is there something
| that leads you to think otherwise?
| xanderlewis wrote:
| Perhaps the incessant talk of GPT-x being AGI, whatever that
| means.
| samstave wrote:
| I think its safe to remind oneself that this thing is literally
| a zygot. So patience, and in ~5 years, it will be a different
| story.
|
| @xanderlewis
|
| Doesnt that mean it simply is now consuming the internet in
| real-time?
| xanderlewis wrote:
| Why? It's already eaten all of the publicly available data on
| the web.
| reportgunner wrote:
| First sentence is 100% my sentiment, cheers !
| visarga wrote:
| Your view on LLM usage is too narrow. Yes, they are pretty shit
| for me too in solving coding problems, but they are still
| useful for bespoke information extraction, classification and
| creative applications. The interest is justified, we're just
| having a hard time understanding the limitations.
| atoav wrote:
| Technology is complex and hard to make sense of. That is why
| most non-experts have a strong wish for a kind of mythical
| technology, which you can just pour onto your problem and it
| magically knows what you wanted (and which things you did not
| want).
|
| For a certain class of problems LLMs achieved new, never before
| seen, almost magical results. Now imagine you were someone who
| hates dealing with the constant complexity of solving problems
| with technology and something comes along that seems to carry
| promise of lifting that off your shoulders. Then you know why
| people react like they do. Recall the block-chain-craze? There
| were people who declared that this somehow magically solved any
| IT-security problem there ever was - instead of seeing it as a
| good solution for a very specific set of circumstances, nearly
| nobody faced in practise.
|
| In reality of course also LLMs have limitiations, e.g. above
| mentioned ambiguity that is inherent to any magical technology:
| To be _true_ magic the technology would have to be able to read
| the thoughts of those who apply it and somehow infer from that
| the true thing they want or need. Now LLMs are in the end still
| just very good guesses based on statistical data, that means
| the guess could just be what you want, but it lacks an actual
| understanding of what it is doing.
|
| Those applying the technology for things it is actually good at
| (e.g. classification problems etc) will put it to good use, but
| there will be a lot who will apply it and have things fall
| apart Canada Airlines style.
| epanchin wrote:
| I daily drive KDB/Q. This is readily extendable for example in
| C, which was my previous daily, and Python which I use
| sporadically.
|
| I don't use LLMs for C or KDB, I do use them for Python.
|
| ChatGPT is good in Python. I guess as Python programmers rely
| on stack exchange so there is lots to learn from, and Python
| anyway is largely an exercise in finding the correct library.
|
| If the only thing ChatGPT did was listen to my problem and
| suggest which imports to use/manuals to read, that would be
| good enough to use regularly. If I wasn't after a library/pre
| existing code I wouldn't be using Python!
| BoxOfRain wrote:
| I've definitely noticed ChatGPT generally writes better
| Python than it writes Scala, presumably for the same reason
| of there being a fair bit more Python code in the wild.
| mrguyorama wrote:
| The actual reason probably has to do with the fact that LLM
| developers and academics are more familiar with Python than
| other programming languages, and therefore have policed
| it's correctness better.
| sebzim4500 wrote:
| Stop using it for things that are in you area of expertise but
| are too difficult for you. Use if for things where you think
| "this is probably easy but I have no idea how to do it". For
| example, I needed to do some pretty trivial task in powershell
| but I have never used it so I got chatGPT to do it for me and
| it worked first time. Obviously I checked the commands looked
| plausible before I ran them, but it still probably took 2 mins
| to do something that would have otherwise taken 30.
| porkbeer wrote:
| That just means you are ignorant of how wrong it guides you.
| You need to first build trust before taking it new places.
| You do that with topics and concepts you are familiar with.
| OmarShehata wrote:
| This has always been true of anything anyone has ever
| googled or looked up on stackoverflow
|
| I copy paste code from stackoverflow all the time. I used
| to agonize over making sure I fully understand every line
| it's copying. Now I have the discretion of making that
| decision: sometimes it does really matter, sometimes all
| you need to know is that it produces the right result for
| your limited use & test case of it. (it's no different than
| relying on a 3rd party library in that way)
|
| I think we need to apply the same discretion to LLM output.
| The answer "it depends". Sometimes using its output blindly
| leads to disaster. Sometimes using it without fully
| understanding all the details is a great way to make
| progress.
| mrguyorama wrote:
| This is no different from my coworker who regularly
| copy/pastes from stackoverflow to do things he doesn't have
| any idea how to do himself, and just as awful,
| unproductive, and problem inducing.
| OmarShehata wrote:
| I want to second this:
|
| > Use if for things where you think "this is probably easy
| but I have no idea how to do it"
|
| I had exactly the same reaction as OP (LLM's suck what's with
| the all the hype). These people are using it differently. For
| me it's often something like, asking it to put together a
| specific sequence of matrix transformations in ThreeJS or
| some other library.
|
| This is not a difficult task but it's often one I waste a lot
| of time getting right. It's sort of about finding the right
| level of abstraction you need to ask it.
| runeofdoom wrote:
| And how often will those "plausible looking commands" create
| obvious or subtle problems that cost far more than 30
| minutes?
| sebzim4500 wrote:
| Probably about as often as if I cobbled something together
| from random blog posts except faster.
|
| It's not like the script is running a nuclear power
| station.
| Al-Khwarizmi wrote:
| Does your job involve solving complex, challenging problems
| _all_ the time?
|
| I am a CS professor, I don't think most people would class that
| as a trivial job, but I find myself needing to do plenty of
| trivial tasks every day: mixed bureaucracy (periodic reports,
| grant requests, various evaluations, etc.), trivial programming
| (a Seaborn chart to show some Excel results), text polishing
| (need to cut a text to 500 words without altering meaning),
| writing student assignments, writing emails in (non-Native)
| English for sensitive requests with the right tone, etc... all
| of those are things I have found LLMs to do fairly well and
| save me a lot of time.
|
| I wouldn't use them to do the core job of designing novel
| algorithms, doing experiments, writing the bulk of a paper or
| teaching students. But most of my working hours are not really
| that "core" stuff. And I would assume it's the same for most
| professionals.
|
| If you have an environment where you are _constantly_
| challenged by difficult tasks... wow. I don 't know if I should
| envy you (because I love difficult problems and hate mindless
| chores) or it would be too stressful.
|
| PS: I don't think "being too difficult for you to do it
| yourself" is the right litmus test for LLM usefulness. I _can_
| draw charts with Seaborn, of course. But the LLM does it much
| faster, and I don 't think doing it myself would make me grow,
| hone useful skills or anything. I'd rather devote my time to
| something else. So (in my view) it's clearly better to have the
| LLM do it.
| Benjaminsen wrote:
| I'm preparing. Learning how to work with an AI is the only way
| to stay competitive. The AIs will become smarter much faster
| than I will.
| mordymoop wrote:
| When you say ChatGPT, are you referring to GPT4? I find a huge
| and avoidable miscommunication happens when two people both
| think they are using "ChatGPT" but talking about two different
| models which vary in size by a factor of 10.
|
| Assuming you are talking about GPT4, for the sake of argument,
| the answer is speed. Of course I can write a small parser
| script that deals with some data I received from a client. It
| will take me an hour and be a tedious task far distant from my
| actual expertise. An LLM can do it in 45 seconds, including the
| time it took me to describe the task.
| KLejmooo wrote:
| I don't use it constantly but regularly.
|
| LLMs english skills are much better than mine.
|
| And when i do a little bit of go coding once a week (i'm a java
| developer by trade), i don't have the time to learn go well
| enough to just type stuff down without looking things up.
| Instead of googling, i tell it "I need a struct with the
| following attributes..." and it doesn't just ell me how i do
| structs in go, it also creates them for me.
|
| Also: There are a TON of issues were i would write a short
| script to do something (formatting text into a table, searching
| for specific lines etc.) were a normal person doesn't even have
| those tools at hand.
|
| For companies overall: Its not just what an LLM can do, LLM can
| do things for you but its also a very very good interface to
| your application. The demos i saw in my company are really good
| and totally make sense and do reduce the entry barrier for
| people.
|
| I know a friend whos job is to create reports with sql. She
| doesn't do anything else just reports across the whole
| datawarehouse. Why? Because every normal non dev person can't
| just write SQL or automate things.
|
| The gap between tech people and management is huge.
| archibaldJ wrote:
| This looks really interesting; any possibility the researchers
| will release some code soon ?
| iAkashPaul wrote:
| Base Mistral 7B is hardly suitable for the evaluations, even one
| team at Intel tried to pull a fast one with NeuralChat in the
| exact same way https://huggingface.co/Intel/neural-
| chat-7b-v3#quantitative-...
| kjqgqkejbfefn wrote:
| This is basically what I tried this morning at the prompt level
| (awful results), but the sketchy idea I had in mind went further
| by introducing control-flow "meta-tokens" to help the LLM
| renavigate its context. In this perspective the context would be
| rethought as a self-editing structured mind-map, with the linear
| aspect of the context at a time T standing for the execution
| trace of the exploration of this mind-map so far. Some of those
| meta-tokens would be able to have side effects on the context, to
| highlight, give structure, summarize, forget and so on, some of
| its parts. This could allow for native structured output without
| using a syntactic format such as json, programmatic constructs in
| the style of LMQL, implementing memory, etc. The goal: not just
| to give logical/reasoning abilities to a LLM, but to give it the
| means to come up with its own cognitive architecture.
| Implementing structured output (using a <label
| name="stuff">...</label> token) to also implement
| memory/scratchpads, would also bring inspectability of those
| cognitive structures for free. Of course I have no idea how to
| implement this (I'm a ML tourist).
| lawlessone wrote:
| If it is doing this , is it still a language model? or also a
| thought model?
| aaroninsf wrote:
| Observation: "expertise" (hence "reflex") is the learning of the
| nonlinear solution space that can be inferred from initial
| conditions.
|
| Conjecture: models which engage in self-training on the solutions
| they derive will get to something that looks a bit like
| bootstrapping when you squint.
|
| Lemma: there's a nice opportunity for cloud-hosted model SaaS to
| offer discounts for actionable feedback on the quality of their
| output, so as to drive this retraining.
|
| Idle comment: I'd use the language of REM sleep and the idea of
| "memory consolidation" for this.
|
| Most of the premises of model training can be extended to the
| level of reasoned solutions, rather than tokens.
| thesz wrote:
| They do not cite [1], a paper on (learned) variable computation
| in RNN, applied to language modeling, that predates their work by
| almost 8 years.
|
| [1] https://openreview.net/pdf?id=S1LVSrcge
|
| Microsoft also had something alike at that time, but for image
| recognition: a CNN at input and then varable computation at
| classification.
___________________________________________________________________
(page generated 2024-03-15 23:00 UTC)