[HN Gopher] Extending the context length to 1M tokens
___________________________________________________________________
Extending the context length to 1M tokens
Author : cmcconomy
Score : 71 points
Date : 2024-11-18 16:27 UTC (6 hours ago)
(HTM) web link (qwenlm.github.io)
(TXT) w3m dump (qwenlm.github.io)
| swazzy wrote:
| Note unexpected three body problem spoilers in this page
| zargon wrote:
| Those summaries are pretty lousy and also have hallucinations
| in them.
| johndough wrote:
| I agree. Below are a few errors. I have also asked ChatGPT to
| check the summaries and it found all the errors (and even
| made up a few more which weren't actual errors, but just not
| expressed in perfect clarity.)
|
| Spoilers ahead!
|
| First novel: The Trisolarans did not contact earth first. It
| was the other way round.
|
| Second novel: Calling the conflict between humans and
| Trisolarans a "complex strategic game" is a bit of a stretch.
| Also, the "water drops" do not disrupt ecosystems. I am not
| sure whether "face-bearers" is an accurate translation. I've
| only read the English version.
|
| Third novel: Luo Yi does not hold the key to the survival of
| the Trisolarans and there were no "micro-black holes" racing
| towards earth. Trisolarans were also not shown colonizing
| other worlds.
|
| I am also not sure whether Luo Ji faced his "personal
| struggle and psychological turmoil" in this novel or in an
| earlier novel. He certainly was most certain of his role at
| the end. Even the Trisolarians judged him at over 92 %
| deterrent rate.
| johndough wrote:
| And this example does not even illustrate the long context
| understanding well, since smaller Qwen2.5 models can already
| recall parts of the Three Body Problem trilogy without pasting
| the three books into the context window.
| gs17 wrote:
| And multiple summaries of each book (in multiple languages)
| are almost definitely in the training set. I'm more confused
| how it made such inaccurate, poorly structured summaries
| given that and the original text.
|
| Although, I just tried with normal Qwen 2.5 72B and Coder 32B
| and they only did a little better.
| agildehaus wrote:
| Seems a very difficult problem to produce a response just on
| the text given and not past training. An LLM that can do that
| would seem to be quite more advanced than what we have today.
|
| Though I would say humans would have difficulty too -- say,
| having read The Three Body problem before, then reading a
| slightly modified version (without being aware of the
| modifications), and having to recall specific details.
| botanical76 wrote:
| This problem is poorly defined; what would it mean to
| produce a response JUST based on the text given? Should it
| also forgo all logic skills and intuition gained in
| training because it is not in the text given? Where in the
| N dimensional semantic space do we draw a line (or rather,
| a surface) between general, universal understanding and
| specific knowledge about the subject at hand?
|
| That said, once you have defined what is required, I
| believe you will have solved the problem.
| anon291 wrote:
| Can we all agree that these models far surpass human intelligence
| now? I mean they process hours worth of audio in less time than
| it would take a human to even listen. I think the singularity
| passed and we didn't even notice (which would be expected)
| Spartan-S63 wrote:
| No, I can't agree that these models surpass human intelligence.
| Sure, they're good at probabilistic recall, but they aren't
| reasoning and they aren't synthesizing anything novel.
| anon291 wrote:
| > they aren't synthesizing anything novel.
|
| ChatGPT has synthesized my past three vacations and regularly
| plans my family's meals based on whatever is in my fridge. I
| completely disagree.
| rootusrootus wrote:
| Seems more likely that your vacations and fridge contents
| aren't as novel as you hope.
| anon291 wrote:
| This is a low-effort comment. I cook a lot for my family
| and community and things get boring after a while. After
| using ChatGPT, my wife has really enjoyed the new dishes,
| and I've gotten excellent feedback at potlucks. Yes, the
| base idea of the dish (roast, rice dish, noodles, etc)
| are old, but the things it'll put inside and give you the
| right instructions for cooking are new. And that's what
| creativity is, right? Although, I have also asked it to
| give ideas for avant-garde cuisine and it has good ideas,
| but I have no skills to make those dishes
| rootusrootus wrote:
| > This is a low-effort comment
|
| Not any worse than this sentence. Counter it with a
| higher value comment.
|
| You are a single person and LLMs have been trained on the
| output of billions. Any given choice you make can be
| predicted with extraordinary probability by looking at
| your inputs and environment and guessing that you will do
| what most other people do in that situation.
|
| This is pretty basic stuff, yes? Especially on HN? Great
| ideas are a dime a dozen, and every successful startup
| was built on an idea that certainly wasn't novel, but was
| executed well.
| anon291 wrote:
| My higher value comment was a list of things for which
| ChatGPT, a widely available product, will produce novel
| ideas. Responding that those ideas are not novel enough
| based on absolutely no data is a low-effort comment. What
| evidence of creativity would you accept?
| lostmsu wrote:
| > they aren't synthesizing anything novel.
|
| They are. Like millions of monkeys, but drastically better.
| elashri wrote:
| Processing speed is not the metric for measuring intelligence.
| The same way we have an above average intelligent people taking
| longer time to think about stuff and coming with better ideas.
| One can argue that this useful in some aspects but humans have
| different types of intelligence spectrum that an LLM will lack.
| Also are you comparing against average person or people on top
| of their fields or people working in science?
|
| Also human can reason, LLMs currently can't do this in useful
| way and is very limited by their context in all the trials to
| make it do that. Not to mention their ability to make new
| things if they do not exist (and not complete made up stuff
| that are non-sense) is very limited.
| anon291 wrote:
| You've hit on the idea that intelligence is not quantifiable
| by one metric. I completely agree. But you're holding a much
| different goal for AI than for average people. Modern LLMs
| are able to produce insights much faster and more accurately
| than most people (you think you could pass the retrieval
| tasks in the way that the LLMs do (reading the whole
| text)?... I really encourage people to try). By that metric
| (insights/speed), I think they far surpass even the most
| brilliant. You can claim that that's not intelligence until
| the cows come home, but any person able to do that would be
| considered a savant.
| elashri wrote:
| I would argue the opposite actually. The same way we don't
| call someone who are able to do arithmetic calculations
| very fast a genius if they can't think in more useful
| mathematical way and construct novel ideas. The samething
| is happening here, these tools are useful in retrieving and
| processing current information at high speeds but
| intelligence is not about the ability to process some data
| at high speed and then recall them. This is what we
| actually call servant. It is the ability to build on top
| this knowledge retrieval and use reason to create new ideas
| is a closer definition of intelligence and would be better
| goal.
| anon291 wrote:
| Let's step back.
|
| 1. The vast majority of people never come up with a truly
| new idea. those that do are considered exceptional and
| their names go down in history books.
|
| 2. Most 'new ideas' are rehashes of old ones.
|
| 3. If you set the temperature up on an LLM, it will
| absolutely come up with new ideas. Expecting an LLM to
| make a scientific discover a la einstein is ... a bit
| much, don't you think [1]? When it comes to 'everyday'
| creativity, such as short poems, songs, recipes, vacation
| itineraries, etc. ChatGPT is more capable than the vast
| majority of people. Literally, ask ChatGPT to write you a
| song about _____, and it will come up with something
| creative. Ask it for a recipe with ridiculous ingredients
| and see what it does. It'll make things you've never seen
| before, generate an image for you and even come up with a
| neologism if you ask it too. It's insanely creative.
|
| [1] Although I have walked chatgpt through various
| theoretical physics scenarios and it will create new math
| for you.
| vlovich123 wrote:
| > The vast majority of people never come up with a truly
| new idea. those that do are considered exceptional and
| their names go down in history books.
|
| Depends on your definition of "truly" new since any idea
| could be argued to be a mix of all past ideas. But I see
| truly new ideas all the time without going down in the
| history books because most new ideas are incrementally
| building on what came before or are extremely niche and
| only a very few turn out to be a massive turning point
| which has a broad impact which is also only usually
| evident in retrospect (e.g. blue LEDs was basically trial
| and error and almost an approach that was given up on,
| transistors were believed to be impactful but not a huge
| revolution for computing like they turned out to be, etc
| etc).
| anon291 wrote:
| > Depends on your definition of "truly" new since any
| idea could be argued to be a mix of all past ideas.
|
| My personal feeling when I engage in these conversations
| is that we humans have a cognitive bias to ascribe a
| human remixing of an old idea to intelligence, but an AI-
| model remixing of an old idea as lookup.
|
| Indeed, basically every revolutionary idea is a mix of
| past ideas if you look closely enough. AI is a great
| example. To the 'lay person' AI is novel! It's new. It
| can talk to you! It's amazing. But for people who've been
| in this field for a while, it's an incremental
| improvement over linear algebra, topology, functional
| spaces, etc.
| ehhehehh wrote:
| It is not about novelty so much as it is about reasoning
| from first principles and learning new things.
|
| I don't need to finetune on five hundred pictures of
| rabbits to know one. I need one look and then I'll know
| for life and can use this in unimaginable and endless
| variety.
|
| This is a simplistic example which you can naturally pick
| apart but when you do I'll provide another such example.
| My point is, learning at human (or even animal) speeds is
| definitely not solved and I'd say we are not even
| attempting that kind of learning yet. There is "in
| context learning" and "finetuning" and both are not going
| to result in human level intelligence judging from
| anything I've had access to.
|
| I think you are anthropomorphizing the clever text
| randomization process. There is a bunch of information
| being garbled and returned in a semi-legible fashion and
| you imbue the process behind it with intelligence that I
| don't think it has. All these models stumble over simple
| reasoning unless specifically trained for those specific
| types of problems. Planning is one particularly famous
| example.
|
| Time will tell, but I'm not betting on LLMs. I think
| other forms of AI are needed. Ones that understand
| substance, modality, time and space and have working
| memory, not just the illusion of it.
| anon291 wrote:
| > I don't need to finetune on five hundred pictures of
| rabbits to know one. I need one look and then I'll know
| for life and can use this in unimaginable and endless
| variety.
|
| So if you do use in-context learning and give chatGPT a
| few images of your novel class, then it will correctly
| classify usually. Finetuning is so you an save on token
| cost.
|
| Moreover, you don't typically need that many pictures to
| fine tune. The studies show that the models successfully
| extrapolate once they've been 'pre-trained'. This is
| similar to how my toddler insists that a kangaroo is a
| dog. She's not been exposed to enough data to know
| otherwise. Dog is a much more fluid category for her than
| in real life. If you talk with her for a while about it,
| she will eventually figure out kangaroo is kangaroo and
| dog is dog. But if you ask her again next week, she'll go
| back to saying they're dogs. Eventually she'll learn.
|
| > All these models stumble over simple reasoning unless
| specifically trained for those specific types of
| problems. Planning is one particularly famous example.
|
| We have extremely expensive programs called schools and
| universities designed to teach little humans how to plan
| and execute. If you look at cultures without
| American/Western biases (and there's not very many left,
| so we really have to look to history), we see that the
| idea of planning the way we do it is not universal.
| vlovich123 wrote:
| LLMs are probably better than you at tasks you're not good
| at. There's a huge gulf between a domain expert and an LLM
| though. If there weren't, all the humans in companies would
| be fired right now and replaced. Similarly, OpenAI and
| Anthropic are paying engineers a metric fuckton of money to
| work there. If LLMs were that big of a game changer right,
| they wouldn't be paying that much. Or if you make the
| argument that only the best humans are getting hired,
| they're still hiring interns & junior engineers. If that
| were the case those would be being replaced by LLMs and
| they're not.
|
| You're basically ignoring all the experts saying "LLMs suck
| at all these things that even beginning domain experts
| don't suck at" to generate your claim & then ignoring all
| evidence to the contrary.
|
| And you're ignoring the ways in which LLMs fall on their
| face to be creative that aren't language-based. Creative
| problem solving in ways they haven't been trained on is out
| of their domain while fully squarely in the domain of human
| intelligence.
|
| > You can claim that that's not intelligence until the cows
| come home, but any person able to do that would be
| considered a savant
|
| Computers can do arithmetic really quickly but that's not
| intelligence but a person computing that quickly is
| considered a savant. You've built up an erroneous dichotomy
| in your head.
| anon291 wrote:
| But that's exactly it, right. There are some people
| excellent for being the expert in one field and some
| people are excellent because they're extremely competent
| at many fields. LLMs are the latter.
|
| Sure, for any domain expert, you can easily get an LLM to
| trip on something. But just the shear amount of things it
| is above average at puts it easily into the top echelon
| of humans.
|
| > You're basically ignoring all the experts saying "LLMs
| suck at all these things that even beginning domain
| experts don't suck at" to generate your claim & then
| ignoring all evidence to the contrary.
|
| Domain expertise is not the only form of intelligence.
| The most interesting things often lie at the
| intersections of domains. As I said in another comment.
| There are a variety of ways to judge intillegence, and no
| one quantifiable metric. It's like asking if Einstein is
| better than Mozart. I don't know... their fields are so
| different. However, I think it's pretty safe to say that
| the modern slate of LLMs fall into the top 10% of human
| intelligence, simply for their breath of knowledge and
| ability to synthesize ideas at the cross-section of any
| wide number of fields.
| vlovich123 wrote:
| > some people are excellent because they're extremely
| competent at many fields. LLMs are the latter
|
| But they're not. The people who are extremely competent
| at many fields will _still_ outperform LLMs in those
| fields. The LLM can basically only outperform a complete
| beginner in the area & makes up for that weakness by
| scaling up the amount it can output which a human can't
| match. That doesn't take away from the fact that the
| output is complete garbage when given anything it doesn't
| know the answer to. As I noted elsewhere, ask it to
| provide an implementation of the S3 ListObjects operation
| (like the actual backend) and see what BS it tries to
| output to the point where you have to spend a good amount
| of time to convince it just to not output an example of
| using the S3 ListObjects API.
|
| > I think it's pretty safe to say that the modern slate
| of LLMs fall into the top 10% of human intelligence,
| simply for their breath of knowledge and ability to
| synthesize ideas at the cross-section of any wide number
| of fields.
|
| Again, evidence assumed that's not been submitted. Please
| provide an indication of any truly novel ideas being
| synthesized by LLMs that are a cross-section of fields.
| anon291 wrote:
| > Please provide an indication of any truly novel ideas
| being synthesized by LLMs that are a cross-section of
| fields.
|
| The problem here is that you expect something akin to
| relativity, the Poincare conjecture, et al. The _vast
| majority_ of humans are not able to do this.
|
| If you restrict yourself to the sorts of creativity that
| average people are good at, the models do extremely well.
|
| I'm not sure how to convince you of this. Ideally, I'd
| get a few people of above average intelligence together,
| and give them an hour (?) to work on some problem /
| creative endeavor (we'd have to restrict their tool use
| to the equivalent of whatever we allow GPT to have), and
| then we can compare the results.
|
| EDIT: Here's what ChatGPT thinks we should do: https://ch
| atgpt.com/share/673b90ca-8dd4-8010-a1a0-61af699a44...
| vlovich123 wrote:
| But why is comparing against untrained humans the
| benchmark? ChatGPT has literally been trained on so much
| more data than a human would ever see & use so much more
| energy. Let's compare like against like. Benchmarks like
| FrontierMath are important and one extreme - passing it
| would indicate that either the questions are part of the
| training set or genuine creativity and skill has been
| developed for the AI system. The important thing is that
| people keep growing - they can go from student to expert.
| AI systems do not have that growth capability which
| indicates a very important thing is missing from their
| intelligence capability.
|
| I want to be clear - I'm talking about the intelligence
| of AI systems available today and today only. There's
| lots of reason to be enthusiastic about the future but
| similarly very cautious about understanding what is
| available today & what is available today isn't human-
| like.
| anon291 wrote:
| > ChatGPT has literally been trained on so much more data
| than a human would ever see
|
| This is a common fallacy. The average human ingests a few
| dozen GB of data a day [1] [2].
|
| ChatGPT 4 was trained on 13 trillion tokens. Say a token
| is 4 bytes (it's more like 3, but we're being
| conservative). That's 52 trillion bytes or 52 terabytes.
|
| Say the average human only consumes the lower estimate of
| 30 GB a day. That means it would take a human 1625 days
| to consume the number of tokens ChatGPT was trained on,
| or 4.5 years. Assuming humans and the LLM start from the
| same spot [3], the proper question is... is ChatGPT
| smarter than a 4.5 year old. If we use the higher
| estimate, then we have to ask if ChatGPT is smarter than
| a 2 year old. Does ChatGPT hallucinate more or less than
| the average toddler?
|
| The cognitive bias I've seen everywhere is the idea that
| humans are trained on a small amount of data. Nothing is
| further from the truth. Humans require training on an
| insanely large amount of data. A 40 year old human has
| been trained on orders of magnitudes more data than I
| think we even have available as data sets. If you prevent
| a human from being trained on this amount of data through
| sensory deprivation they go crazy (and hallucinate very
| vividly too!).
|
| No argument about energy, but this is a technology
| problem.
|
| [1] https://www.tech21century.com/the-human-brain-is-
| loaded-dail...
|
| [2] https://kids.frontiersin.org/articles/10.3389/frym.20
| 17.0002...
|
| [3] this is a bad assumption since LLMs are randomly
| initialized whereas humans seem to be born with some
| biases that significantly aid in the acquisition of
| language and social skills
| giantrobot wrote:
| My old TI-86 can calculate stuff faster than me. You wouldn't
| ever ask if it was smarter than me. An audio filter can process
| audio faster than I can listen to it but you'd never suggest it
| was intelligent.
|
| AI models are algorithms running on processors running at
| billions of calculations a second often scaled to hundreds of
| such processors. They're not intelligent. They're fast.
| anon291 wrote:
| Except the LLM can solve a general problem (or tell you why
| it cannot), while your calculator can only do that which it's
| been programmed.
| th0ma5 wrote:
| Do you have any evidence besides anecdote?
| anon291 wrote:
| what kind of evidence substantiates creativity?
|
| Things I've used chat gpt for:
|
| 1. writing songs (couldn't find the generated lyrics
| online, so assume it's new)
|
| 2. Branding ideas (again couldn't find the logos online,
| so assuming they're new)
|
| 3. Recipes (with weird ingredients that I've not found
| put together online)
|
| 4. Vacations with lots of constraints (again, all the
| information is obviously available online, but it put it
| together for me and gave recommendations for my family
| particularly).
|
| 5. Theoretical physics explorations where I'm too lazy to
| write out the math (and why should I... chatgpt will do
| it for me...)
|
| I think perhaps one reason people here do not have the
| same results is I typically use the API directly and
| modify the system prompt, which drastically changes the
| utility of chatgpt. The default prompt is too focused on
| retrieval and 'truth'. If you want creativity you have to
| ask it to be an artist.
| th0ma5 wrote:
| No I think they don't have the results you do because
| they are trying to do those things _well_ ...
| anon291 wrote:
| The personal insult insinuated here is not appreciated
| and probably against community guidelines.
|
| For what I needed, those things worked very well
| th0ma5 wrote:
| Anecdotes have equal weight. All of these models
| frustrate me to no end but I only do things that have
| never been done before. And it isn't an insult because
| you have no evidence of quality.
| anon291 wrote:
| > Anecdotes have equal weight. All of these models
| frustrate me to no end but I only do things that have
| never been done before. And it isn't an insult because
| you have no evidence of quality.
|
| You have not specified what evidence would satisfy you.
|
| And yes, it was an insult to insinuate I would accept sub
| par results whereas others would not.
|
| EDIT: Chat GPT seems to have a solid understanding of why
| your comment comes across as insulting: https://chatgpt.c
| om/share/673b95c9-7a98-8010-9f8a-9abf5374bb...
|
| Maybe this should be taken as one point of evidence of
| greater ability?
| th0ma5 wrote:
| I think you lead the result by not providing enough
| context like saying how there is no objective way to
| measure the quality of an LLM generation after the fact
| nor before.
|
| Edit I asked ChatGPT with a more proper context: "It's
| not inherently insulting to say that an LLM (Large
| Language Model) cannot guarantee the best quality because
| it's a factual statement grounded in the nature of how
| these models work. LLMs rely on patterns in their
| training data and probabilistic reasoning rather than
| subjective or objective judgments about "best quality."
| anon291 wrote:
| I can't criticize how you prompted it because you did not
| link the transcript :)
|
| Zooming out, you seem to be in the wrong conversation. I
| said:
|
| > the LLM can solve a general problem (or tell you why it
| cannot), while your calculator can only do that which
| it's been programmed.
|
| You said:
|
| > Do you have any evidence besides anecdote?
|
| I think that -- for both of us now having used chat gpt
| to generate a response -- we have good evidence that the
| model can solve a general program (or tell you why it
| cannot), while a calculator can only do the arithmetic
| for which it's been programmed. If you want to counter,
| then a video of your calculator answering the question we
| just posed would be nice.
| vlovich123 wrote:
| Go ask your favorite LLM to write you some code to
| implement the backend of the S3 API and see how well it
| does. Heck, just ask it to implement list iteration against
| some KV object store API and be amazed at the complete
| garbage that gets emitted.
| anon291 wrote:
| So I told it what I wanted, and it generated an initial
| solution and then modified it to do some file
| distribution. Without the ability to actually execute the
| code, this is an excellent first pass.
|
| https://chatgpt.com/share/673b8c33-2ec8-8010-9f70-b0ed12a
| 524...
|
| Chat GPT can't directly execute code on my machine due to
| architectural limitations, but I imagine if I went and
| followed its instructions and told it what went wrong, it
| would correct it.
|
| and that's just it, right? If i were to program this, I
| would be iterating. ChatGPT cannot do that because of how
| its architected (I don't think it would be hard to do
| this if you used the API and allowed some kind of tool
| use). However, if I told someone to go write me an S3
| backend without ever executing it, and they came back
| with this... that would be great.
|
| EDIT: with chunking: https://chatgpt.com/share/673b8c33-2
| ec8-8010-9f70-b0ed12a524...
|
| IIRC, from another thread on this site, this is
| essentially how S3 is implemented (centralized metadata
| database that hashes out to nodes which implement a local
| storage mechanism -- MySQL I think).
| vlovich123 wrote:
| And that's why it's dangerous to evaluate something when
| you don't understand what's going on. The implementation
| generated not only saves things directly to disk [1] [2]
| but it doesn't even implement file uploading correctly
| nor does it implementing listing of objects (which I
| guarantee you would be incorrect). Additionally, it makes
| a key mistake which is that uploading isn't a form but is
| the body of the request so it's already unable to have a
| real S3 client connect. But of course at first glance it
| has the appearance of maybe being something passable.
|
| Source: I had to implement R2 from scratch and nothing
| generated here would have helped me as even a starting
| point. And this isn't even getting to complex things like
| supporting arbitrarily large uploads and encrypting
| things while also supporting seeked downloads or
| multipart uploads.
|
| [1] No one would ever do this for all sorts of problems
| including that you'd have all sorts of security problems
| with attackers sending you /../ to escape bucket and
| account isolation.
|
| [2] No one would ever do this because you've got nothing
| more than a toy S3 server. A real S3 implementation needs
| to distribute the data to multiple locations so that
| availability is maintained in the face of isolated
| hardware and software failures.
| anon291 wrote:
| > I had to implement R2 from scratch and nothing
| generated here would have helped me as even a starting
| point.
|
| Of course it wouldn't. You're a computer programmer.
| There's no point for you to use ChatGPT to do what you
| already know how to do.
|
| > The implementation generated not only saves things
| directly to disk
|
| There is nothing 'incorrect' about that, given my initial
| problem statement.
|
| > Additionally, it makes a key mistake which is that
| uploading isn't a form but is the body of the request so
| it's already unable to have a real S3 client connect.
|
| Again.. look at the prompt. I asked it to generate an
| object storage system, not an S3-compatible one.
|
| It seems you're the one hallucinating.
|
| EDIT: ChatGPT says: In short, the feedback likely stems
| from the implicit expectation of S3 API standards, and
| the discrepancy between that and the multipart form
| approach used in the code.
|
| and
|
| In summary, the expectation of S3 compatibility was a
| bias, and he should have recognized that the
| implementation was based on our explicitly discussed
| requirements, not the implicit ones he might have
| expected.
| vlovich123 wrote:
| > There's no point for you to use ChatGPT to do what you
| already know how to do.
|
| If it were more intelligent of course there would be. It
| would catch mistakes I wouldn't have thought about, it
| would output the work more quickly, etc. It's literally
| worse than if I'd assigned a junior engineer to do some
| of the legwork.
|
| > ChatGPT says: In short, the feedback likely stems from
| the implicit expectation of S3 API standards, and the
| discrepancy between that and the multipart form approach
| used in the code. > In summary, the expectation of S3
| compatibility was a bias, and he should have recognized
| that the implementation was based on our explicitly
| discussed requirements, not the implicit ones he might
| have expected
|
| Now who's rationalizing. I was pretty clear in saying
| implement S3.
| anon291 wrote:
| > Now who's rationalizing. I was pretty clear in saying
| implement S3.
|
| In general, I don't deny the fact that humans fall into
| common pitfalls, such as not reading the question. As I
| pointed out this is a common human failing, a
| 'hallucination' if you will. Nevertheless, my failing to
| deliver that to chatgpt should not count against chatgpt,
| but rather me, a humble human who recognizes my failings.
| And again, this furthers my point that people hallucinate
| regularly, we just have a social way to get around it --
| what we're doing right now... discussion!
| vlovich123 wrote:
| My reply was purely around ChatGPT's response which I
| characterized as a rationalization. It clearly was
| following the S3 template since it copied many parts of
| the API but then failed to call out if it was deviating
| and why it made decisions to deviate.
| achierius wrote:
| In the same sense (though to greater extent) that calculators
| are, sure. Calculators can also far exceed human capacity to,
| well, calculate. LLMs are similar: spikes of capacity in
| various areas (bulk summarization, translation, general recall,
| ...) that humans could never hope to match, but not capable of
| beating humans at a more general range of tasks.
| anon291 wrote:
| > humans could never hope to match, but not capable of
| beating humans at a more general range of tasks.
|
| If we restrict ourselves only to language (LLMs are at a
| disadvantage because there is no common physical body we can
| train them on at the present moment... that will change), I
| think LLMs beat humans for most tasks.
| Workaccount2 wrote:
| They process the audio but they stumble enough with recall that
| you cannot really trust it.
|
| I had a problem where I used GPT-4o to help me with inventory
| management, something a 5th grade kid could handle, and it kept
| screwing up values for a list of ~50 components. I ended up
| spending more time trying to get it to properly parse the input
| audio (I read off the counts as I moved through inventory bins)
| then if I had just done it manually.
|
| On the other hand, I have had good success with having it write
| simple programs and apps. So YMMV quite a lot more than with a
| regular person.
| anon291 wrote:
| > They process the audio but they stumble enough with recall
| that you cannot really trust it.
|
| I will wave my arms wildly at the last eight years if the
| claim is that humans do not struggle with recall.
| th0ma5 wrote:
| So are they human like and therefore not anything special
| or are they super human magic? I never get the equivocation
| when people complain how there is no way to objectively
| tell what out is right or wrong people either say they are
| getting better, or they work for me, or that people are
| just as bad. No they aren't! Not in the same way these
| things are bad.
| anon291 wrote:
| Most people will confidently recount whatever narrative
| matches their current actions. This is called
| rationalization, and most people engage in it daily.
| vlovich123 wrote:
| I will wave my arms wildly if the claim is that LLM
| struggle with recall is similar to human-like struggle with
| recall. And since that's how we decide on truth, I win?
| anon291 wrote:
| what we call hallucination in LLMs is called
| 'rationalization' for humans. The psychology shows that
| most peoples do things out of habit and only after
| they've done it will explain why the did it. This is most
| obviously seen in split brain patients where the visual
| fields are then separated. If you throw a ball towards
| the left side of the person, the right brain will catch
| the ball. if you then ask the person why they caught the
| ball the left brain will make up a completely ridiculous
| narrative as to why the hand moved (because it didn't
| know there is a ball. This is a contrived example, but it
| shows that human recollection of intent is often very
| very wrong. There are studies that show this even in
| people with whole brains.
| vlovich123 wrote:
| You're unfortunately completely missing the point. I
| didn't say that human recall is perfect or that they
| don't rationalize. And of course you can have extreme
| denial of what's happening in front of you even in
| healthy individuals. In fact, you see this in this thread
| where either you or the huge number of people trying to
| dissillusion you from the maximal position you've staked
| out on LLMs is wrong and one of us is incorrectly
| rationalizing our position.
|
| The point is that the ways in which it fails is
| completely different from LLMs and it's different between
| people whereas the failure modes for LLMs are all fairly
| identical regardless of the model. Go ask an LLM to draw
| you a wine glass filled to the brim and it'll keep
| insisting it does even though it keeps drawing one half-
| filled and agree that the one it drew doesn't have the
| characteristics it says such a drawing would need and
| _still_ output the exact same drawing. Most people would
| not fail at the task in that way.
| anon291 wrote:
| > In fact, you see this in this thread where either you
| or the huge number of people trying to dissillusion you
| from the maximal position you've staked out on LLMs is
| wrong and one of us is incorrectly rationalizing our
| position.
|
| I by no means have a 'maximal' position. I have said that
| they exceed the intelligence and ability of the vast
| majority of the human populace when it comes to their
| singular sense and action (ingesting language and
| outputting language). I fully stand by that, because it's
| true. I've not claimed that they exceed everyone's
| intelligence in every area. However, their ability to
| synthesize wildly different fields is well beyond most
| human's ability. Yes, I do believe we've crossed the
| tipping point. As it is, these things are not noticeable
| except in retrospect.
|
| > The point is that the ways in which it fails is
| completely different from LLMs and it's different between
| people whereas the failure modes for LLMs are all fairly
| identical
|
| I disagree with the idea that human failure modes are
| different between people. I think this is the result of
| not thinking at a high enough level. Human failure modes
| are often very similar. Drama authors make a living off
| exploring human failure modes, and there's a reason why
| they say there are no new stories.
|
| I agree that Human and LLM failure modes are different,
| but that's to be expected.
|
| > regardless of the model
|
| As far as I'm aware, all LLMs in common use today use a
| variant of the transformer. Transformers have much
| different pitfalls compared to RNNs (RNNs are
| parlticularly bad at recall for example).
|
| > Go ask an LLM to draw you a wine glass filled to the
| brim and it'll keep insisting it does even though it
| keeps drawing one half-filled and agree that the one it
| drew doesn't have the characteristics it says such a
| drawing would need and still output the exact same
| drawing. Most people would not fail at the task in that
| way.
|
| Most people can't draw very well anyway, so this is just
| proving my point.
| vlovich123 wrote:
| > Most people can't draw very well anyway, so this is
| just proving my point.
|
| And you're proving my point. The ways in which the people
| would fail to draw the wine glass are different from the
| LLM. The vast majority of people would fail to reproduce
| a photorealistic simile. But the vast majority of people
| would meet the requirement of drawing it filled to the
| brim. The LLMs absolutely succeed at the quality of the
| drawing but absolutely fail at meeting human
| specifications and expectations. Generously, you can say
| it's a different kind of intelligence. But saying it's
| more intelligent than humans requires you to use a
| drastically different axis akin to the one you'd use
| saying that computers are smarter than humans because
| they can add two numbers more quickly.
| anon291 wrote:
| At no point did I say humans and LLMs have the same
| failure modes.
|
| > But the vast majority of people would meet the
| requirement of drawing it filled to the brim.
|
| But both are failures, right? It's just a cognitive bias
| that we don't expect artistic ability of most people.
|
| > But saying it's more intelligent than humans requires
| you to use a drastically different axis
|
| I'm not going to rehash this here, but as I said
| elsewhere in this thread, intelligences are different.
| There's no one metric, but for many common human tasks,
| the ability of the LLMs surpasses humans.
|
| > saying that computers are smarter than humans because
| they can add two numbers more quickly.
|
| This is where I disagree. Unlike a traditional program,
| both humans and LLMs can take unstructured input and
| instruction. Yes, they can both fail and they fail
| differently (or succeed in different ways), but there is
| a wide gulf between the sort of structured computation a
| traditional program does and an llm.
| wahnfrieden wrote:
| You must use it to make transcripts and then write code to
| process the values in the transcripts
| XenophileJKO wrote:
| Likely the issue is how you are asking the model to process
| things. The primary limitation is the amount of information
| (or really attention) they can keep in flight at any given
| moment.
|
| This generally means for a task like you are doing, you need
| to have sign posts in the data like minute markers or
| something that it can process serially.
|
| This means there are operations that are VERY HARD for the
| model like ranking/sorting. This requires the model to attend
| to everything to find the next biggest item, etc. It is very
| hard for the models currrently.
| anon291 wrote:
| > This means there are operations that are VERY HARD for
| the model like ranking/sorting. This requires the model to
| attend to everything to find the next biggest item, etc. It
| is very hard for the models currrently.
|
| Ranking / sorting is O(n log n) no matter what. Given that
| a transformer runs in constant time before we 'force' it to
| output an answer, there must be an M such that beyond that
| length it cannot reliably sort a list. This MUST be the
| case and can only be solved by running the model some
| indeterminate number of times, but I don't believe we
| currently have any architecture to do that.
|
| Note that humans have the same limitation. If you give
| humans a time limit, there is a maximum number of things
| they will be able to sort reliably in that time.
| christianqchung wrote:
| Transformers absolutely do not run in constant time by
| any reasonable definition, no matter what your point is.
| anon291 wrote:
| They absolutely do given a sequence size. All models have
| max context lengths. Thus bounded by a constant
| simion314 wrote:
| So what? I can write a script that can do iun a minute some job
| you won't do in a 1000 years.
|
| Singularity means something very specific, if your AI can build
| a smarter AI then itself by itself, and that AI can also build
| a new smarter AI then you have singularity.
|
| You do not have singularity if an LLM can solve more math
| problems then the average Joe, or if ti can answer more trivia
| questions then a random person, even if you have an AI better
| then all humans combined at Tic Tac Toe you still do not have a
| singularity, IT MUST build a smarter AI then itself and then
| iterate on that.
| anon291 wrote:
| > Singularity means something very specific, if your AI can
| build a smarter AI then itself by itself, and that AI can
| also build a new smarter AI then you have singularity.
|
| When I was at Cerebras, I fed in a description of the custom
| ISA into our own model and asked it to generate kernels (my
| job), and it was surprisingly good
| simion314 wrote:
| >When I was at Cerebras, I fed in a description of the
| custom ISA into our own model and asked it to generate
| kernels (my job), and it was surprisingly good
|
| And? Was it actually better then say the top 3 people in
| this field would create if they would work on it ? Because
| this models are better at css then me, so what? I am bad at
| css, but all the top models could not solve a math limit
| from my son homework so we had to use good old forums to
| have people give us some hints. But for sure models can
| solve more math limits then the average person who probably
| can't solve a single one.
| anon291 wrote:
| No not better than the top 3.
|
| > But for sure models can solve more math limits then the
| average person who probably can't solve a single one.
|
| Some people are domain experts. The pretrained GPTs are
| certainly not (nor are they trained to be).
|
| Some people are polymaths but not domain experts. This is
| still impressive, and where the GPTs fall.
|
| The final conclusion I have is this: These models
| demonstrate above average understanding in a plethora of
| widely disparate fields. I can discuss mathematics,
| computation, programming languages, etc with them and
| they come across as knowledgeable and insightful to me,
| and this is my field. Then, I can discuss with them
| things I know nothing about, such as foreign languages,
| literature, plant diseases, recipes, vacation
| destinations, etc, and they're still good at that. If I
| met a person with as much knowledge and ability to engage
| as the model, I would think that person to be of very
| high intelligence.
|
| It doesn't bother me that it's not the best at anything.
| It's good enough at most things. Yes, its results are not
| always perfect. Its code doesn't work on the first try,
| and it sometimes gets confused. But many polymaths do too
| at a certain level. We don't tell them they're stupid
| because of it.
|
| My old physics professor was very smart in physics but
| also a great pianist. But he probably cannot play as well
| as Chopin. Does that make him an idiot? Of course not.
| He's still above average in piano too! And that makes him
| more of a genius than if he were just a great scientist.
| simion314 wrote:
| agree, there are usages for LLMs
|
| My point was about Singularity, what i t means and why
| LLMs are not there.
|
| So you missed my point? Was I not clear enough what I was
| talking about?
| xtreme wrote:
| This actually tells you why AI doesn't have to be better
| than _all_ human experts, just the ones you can afford to
| get together.
| futureshock wrote:
| I actually do think you have a solid point. These models fall
| short of AGI, but that might be more of a OODA loop agentic
| tweak than anything else.
|
| At their core, the state of the art LLMs can basically do any
| small to medium mental task better than I can or get so close
| to my level than I've found myself no longer thinking through
| things the long way. For example, if I want to run some napkin
| math on something, like I recently did some solar battery
| charge time estimates, an LLM can get to a plausible answer in
| seconds that would have taken me an hour.
|
| So yeah, in many practical ways, LLMs are smarter than most
| people in most situations. They have not yet far surpassed all
| humans in all situations, and there are still some classes of
| reasoning problems that they seem to struggle with, but to a
| first order approximation, we do seem to be mostly there.
| anon291 wrote:
| > For example, if I want to run some napkin math on
| something, like I recently did some solar battery charge time
| estimates, an LLM can get to a plausible answer in seconds
| that would have taken me an hour.
|
| Exactly. I've used it to figure geometric problems for
| everyday things (carpentry), market sizing estimates for
| business ideas, etc. Very fast turnaround. All the doomers in
| this thread are just ignoring the amazing utility these
| models provide.
| jcims wrote:
| >I actually do think you have a solid point. These models
| fall short of AGI, but that might be more of a OODA loop
| agentic tweak than anything else.
|
| I think this is it. LLM responses feel like the unconsidered
| ideas that pop into my head from nowhere. Like if someone
| asks me how many states are in the United States, a number
| pops out from somewhere. I don't just wire that to my mouth,
| I also think about whether or not that's current info, have I
| gotten this wrong in the past, how confident am I in it, what
| is the cost of me providing bad information, etc etc etc.
|
| If you effectively added all of those layers to an LLM
| (something that I think the o1-preview and other approaches
| are starting to do) it's going to be interesting to see what
| the net capability is.
|
| The other thing that makes me feel like we're 'getting there'
| is using some of the fast models at groq.com. The information
| is generated at, in many cases, an order of magnitude faster
| than I can consume it. The idea that models might be able to
| start to engage through an much more sophisticated embedding
| than english to pass concepts and sequences back and forth
| natively is intriguing.
| anon291 wrote:
| > I think this is it. LLM responses feel like the
| unconsidered ideas that pop into my head from nowhere.
|
| You have to look at the LLM as the inner voice in your
| head. We've kind of forced them into saying whatever they
| think due to how we sample the output (next token
| prediction), but in new architectures with pause tokens, we
| let them 'think' and they show better judgement and
| ability. These systems are rapidly going to improve and it
| will be very interesting to see.
|
| But this is another reason why I think they've surpassed
| human intelligence. You have to look at each token as a
| 'time step' in the inner thought process of some entity. A
| real 'alive' entity has more 'ticks' than what their
| actions would suggest. For example, human brains can
| process up to 10FPS (100ms response time), but most humans
| aren't saying 10 words a second. However, we've made LLMs
| whose internal processes (i.e., their intuition) is already
| superior. If we just gave them that final agentic ability
| to not say anything and ponder (which researchers are
| doing), their capabilities will increase exponentially
|
| > The other thing that makes me feel like we're 'getting
| there' is using some of the fast models at groq.com.
|
| Unlike perhaps many of the commentators here, I've been in
| this field for a bit under a decade now, and was one of the
| early compiler engineers at Groq. Glad you're finding it
| useful. It's amazing stuff.
| nisarg2 wrote:
| At the end of the response they forget everything. They need to
| be fed the entire text for them to know anything about it the
| next time. That is not surpassing even feline intelligence.
| andai wrote:
| A genius can have anterograde amnesia and still be a genius.
| anon291 wrote:
| If we did to cats what he did to GPT models that would be
| animal abuse.
|
| That is to say, if we want to extend this analogy, the model
| is 'killed' after each round. This is hardly a criticism of
| the underlying technology.
|
| Going back to feeding the entire input. That is not really
| true. There are a dozen ways to not do that these day.
| throwaway106382 wrote:
| Can we all agree that chainsaws far surpass human intelligence
| now? I mean, you can chop down thousands of trees in less time
| than a single person could even do one. I think the singularity
| has passed.
| anon291 wrote:
| Cutting down a tree is not intelligence, but I think it's
| been well accepted for more than a century that machines
| surpass human physical capability yes. There were many during
| the industrial revolution that denied that this was going to
| be the case, just like how we're seeing here.
| andai wrote:
| Hijacking thread to ask: how would we know? Another
| uncomfortable issue is the question of sentience. Models
| claimed they were sentient years ago, but this was dismissed as
| "mimicking patterns in the training data" (fair enough) and the
| training was modified to forbid them from doing that.
|
| But if it does happen some day, how will we know? What are the
| chances that the first sentient AI will be accused of just
| mimicking patterns?
|
| Indeed with the current training methodology it's highly likely
| that the first sentient AI will be unable to even let us know
| it's sentient.
| anon291 wrote:
| We couldn't know. Humans mimick patterns. The claims that
| LLMs aren't smart because they don't generate anything new
| fall completely flat for me. If you look back far enough most
| humans generate nothing new. For example, even novel ideas
| like Einstein's theory of relativity are re-iterations of
| existing ideas. If you want to be pedantic, one can trace
| back the majority of ideas, claim that each incremental step
| was 'not novel, but just recollection' and then make the
| egregious claim that humanity has invented nothing.
|
| > But if it does happen some day, how will we know? What are
| the chances that the first sentient AI will be accused of
| just mimicking patterns?
|
| Leaving questions of sentience aside (since we don't even
| really know _what_ that is) and focusing on intelligence, the
| truth is that we will probably not know until many decades
| latel.
| cdblades wrote:
| But you just made a strong claim about something you are
| here saying we can't know?
| anon291 wrote:
| I believe we have passed a technological singularity.
| There is no consensus as you can see here. I believe in a
| few decades there will be consensus.
|
| Intelligence and technological singularities are
| observable things.
|
| Sentience is not.
| hehehheh wrote:
| Computers can do stuff humans struggled with since the abacus.
| A 386 PC can do mathematical calaculatuons a human couldn't do
| in a lifetime.
| aliljet wrote:
| This is fantastic news. I've been using
| Qwen2.5-Coder-32B-Instruct with Ollama locally and it's honestly
| such a breathe of fresh air. I wonder if any of you have had a
| moment to try this newer context length locally?
|
| BTW, I fail to effectively run this on my 2080 ti, I've just
| loaded up the machine with classic RAM. It's not going to win any
| races, but as they say, it's not the speed that matter, it's the
| quality of the effort.
| notjulianjaynes wrote:
| Hi, are you able to use Qwen's 128k context length with Ollama?
| Using AnythingLLM + Ollamma and a GGUF version I kept getting
| an error message with prompts longer than 32,000 tokens.
| (summarizing long transcripts)
| syntaxing wrote:
| The famous Daniel Chen (same person that made Unsloth and
| fixed Gemini/LLaMa bugs) mentioned something about this on
| reddit and offered a fix. https://www.reddit.com/r/LocalLLaMA
| /comments/1gpw8ls/bug_fix...
| zargon wrote:
| After reading a lot of that thread, my understanding is
| that yarn scaling is disabled intentionally by default in
| the GGUFs, because it would degrade outputs for contexts
| that do fit in 32k. So the only change is enabling yarn
| scaling at 4x, which is just a configuration setting. GGUF
| has these configuration settings embedded in the file
| format for ease of use. But you should be able to override
| them without downloading an entire duplicate set of weights
| (12 to 35 GB!). (It looks like in llama.cpp the override-kv
| option can be used for this, but I haven't tried it yet.)
| syntaxing wrote:
| Oh super interesting, I didn't know you can override this
| with a flag on llama.cpp.
| notjulianjaynes wrote:
| Yeah unfortunately that's the exact model I'm using (Q5
| version. What I've been doing is first loading the
| transcript into the vector database, and then giving it a
| prompt thats like "summarize the transcript below: <full
| text of transcript>". This works surprisingly well except
| for one transcript I had which was of a 3 hour meeting that
| was per an online calculator about 38,000 tokens. Cutting
| the text up into 3 parts and pretending each was a seperate
| meeting* lead to a bunch of hallucinations for some reason.
|
| *In theory this shouldn't matter much for my purpose of
| summarizing city council meetings that follow a predictable
| format.
| ipsum2 wrote:
| The long context model has not been open sourced.
| lukev wrote:
| I ran a couple needle-in-a-haystack type queries with just a
| 32k context length, and was very much not impressed. It often
| failed to find facts buried in the middle of the prompt, that
| were stated almost identically to the question being asked.
|
| It's cool that these models are getting such long contexts, but
| performance definitely degrades the longer the context gets and
| I haven't seen this characterized or quantified very well
| anywhere.
| lostmsu wrote:
| Is this model downloadable?
| gkaye wrote:
| They are not clear about this (which is annoying), but it seems
| it will not be downloadable. No weights have been released so
| far, and nothing in this post mentions plans to do so going
| forward.
___________________________________________________________________
(page generated 2024-11-18 23:00 UTC)