[HN Gopher] Extending the context length to 1M tokens
       ___________________________________________________________________
        
       Extending the context length to 1M tokens
        
       Author : cmcconomy
       Score  : 71 points
       Date   : 2024-11-18 16:27 UTC (6 hours ago)
        
 (HTM) web link (qwenlm.github.io)
 (TXT) w3m dump (qwenlm.github.io)
        
       | swazzy wrote:
       | Note unexpected three body problem spoilers in this page
        
         | zargon wrote:
         | Those summaries are pretty lousy and also have hallucinations
         | in them.
        
           | johndough wrote:
           | I agree. Below are a few errors. I have also asked ChatGPT to
           | check the summaries and it found all the errors (and even
           | made up a few more which weren't actual errors, but just not
           | expressed in perfect clarity.)
           | 
           | Spoilers ahead!
           | 
           | First novel: The Trisolarans did not contact earth first. It
           | was the other way round.
           | 
           | Second novel: Calling the conflict between humans and
           | Trisolarans a "complex strategic game" is a bit of a stretch.
           | Also, the "water drops" do not disrupt ecosystems. I am not
           | sure whether "face-bearers" is an accurate translation. I've
           | only read the English version.
           | 
           | Third novel: Luo Yi does not hold the key to the survival of
           | the Trisolarans and there were no "micro-black holes" racing
           | towards earth. Trisolarans were also not shown colonizing
           | other worlds.
           | 
           | I am also not sure whether Luo Ji faced his "personal
           | struggle and psychological turmoil" in this novel or in an
           | earlier novel. He certainly was most certain of his role at
           | the end. Even the Trisolarians judged him at over 92 %
           | deterrent rate.
        
         | johndough wrote:
         | And this example does not even illustrate the long context
         | understanding well, since smaller Qwen2.5 models can already
         | recall parts of the Three Body Problem trilogy without pasting
         | the three books into the context window.
        
           | gs17 wrote:
           | And multiple summaries of each book (in multiple languages)
           | are almost definitely in the training set. I'm more confused
           | how it made such inaccurate, poorly structured summaries
           | given that and the original text.
           | 
           | Although, I just tried with normal Qwen 2.5 72B and Coder 32B
           | and they only did a little better.
        
           | agildehaus wrote:
           | Seems a very difficult problem to produce a response just on
           | the text given and not past training. An LLM that can do that
           | would seem to be quite more advanced than what we have today.
           | 
           | Though I would say humans would have difficulty too -- say,
           | having read The Three Body problem before, then reading a
           | slightly modified version (without being aware of the
           | modifications), and having to recall specific details.
        
             | botanical76 wrote:
             | This problem is poorly defined; what would it mean to
             | produce a response JUST based on the text given? Should it
             | also forgo all logic skills and intuition gained in
             | training because it is not in the text given? Where in the
             | N dimensional semantic space do we draw a line (or rather,
             | a surface) between general, universal understanding and
             | specific knowledge about the subject at hand?
             | 
             | That said, once you have defined what is required, I
             | believe you will have solved the problem.
        
       | anon291 wrote:
       | Can we all agree that these models far surpass human intelligence
       | now? I mean they process hours worth of audio in less time than
       | it would take a human to even listen. I think the singularity
       | passed and we didn't even notice (which would be expected)
        
         | Spartan-S63 wrote:
         | No, I can't agree that these models surpass human intelligence.
         | Sure, they're good at probabilistic recall, but they aren't
         | reasoning and they aren't synthesizing anything novel.
        
           | anon291 wrote:
           | > they aren't synthesizing anything novel.
           | 
           | ChatGPT has synthesized my past three vacations and regularly
           | plans my family's meals based on whatever is in my fridge. I
           | completely disagree.
        
             | rootusrootus wrote:
             | Seems more likely that your vacations and fridge contents
             | aren't as novel as you hope.
        
               | anon291 wrote:
               | This is a low-effort comment. I cook a lot for my family
               | and community and things get boring after a while. After
               | using ChatGPT, my wife has really enjoyed the new dishes,
               | and I've gotten excellent feedback at potlucks. Yes, the
               | base idea of the dish (roast, rice dish, noodles, etc)
               | are old, but the things it'll put inside and give you the
               | right instructions for cooking are new. And that's what
               | creativity is, right? Although, I have also asked it to
               | give ideas for avant-garde cuisine and it has good ideas,
               | but I have no skills to make those dishes
        
               | rootusrootus wrote:
               | > This is a low-effort comment
               | 
               | Not any worse than this sentence. Counter it with a
               | higher value comment.
               | 
               | You are a single person and LLMs have been trained on the
               | output of billions. Any given choice you make can be
               | predicted with extraordinary probability by looking at
               | your inputs and environment and guessing that you will do
               | what most other people do in that situation.
               | 
               | This is pretty basic stuff, yes? Especially on HN? Great
               | ideas are a dime a dozen, and every successful startup
               | was built on an idea that certainly wasn't novel, but was
               | executed well.
        
               | anon291 wrote:
               | My higher value comment was a list of things for which
               | ChatGPT, a widely available product, will produce novel
               | ideas. Responding that those ideas are not novel enough
               | based on absolutely no data is a low-effort comment. What
               | evidence of creativity would you accept?
        
           | lostmsu wrote:
           | > they aren't synthesizing anything novel.
           | 
           | They are. Like millions of monkeys, but drastically better.
        
         | elashri wrote:
         | Processing speed is not the metric for measuring intelligence.
         | The same way we have an above average intelligent people taking
         | longer time to think about stuff and coming with better ideas.
         | One can argue that this useful in some aspects but humans have
         | different types of intelligence spectrum that an LLM will lack.
         | Also are you comparing against average person or people on top
         | of their fields or people working in science?
         | 
         | Also human can reason, LLMs currently can't do this in useful
         | way and is very limited by their context in all the trials to
         | make it do that. Not to mention their ability to make new
         | things if they do not exist (and not complete made up stuff
         | that are non-sense) is very limited.
        
           | anon291 wrote:
           | You've hit on the idea that intelligence is not quantifiable
           | by one metric. I completely agree. But you're holding a much
           | different goal for AI than for average people. Modern LLMs
           | are able to produce insights much faster and more accurately
           | than most people (you think you could pass the retrieval
           | tasks in the way that the LLMs do (reading the whole
           | text)?... I really encourage people to try). By that metric
           | (insights/speed), I think they far surpass even the most
           | brilliant. You can claim that that's not intelligence until
           | the cows come home, but any person able to do that would be
           | considered a savant.
        
             | elashri wrote:
             | I would argue the opposite actually. The same way we don't
             | call someone who are able to do arithmetic calculations
             | very fast a genius if they can't think in more useful
             | mathematical way and construct novel ideas. The samething
             | is happening here, these tools are useful in retrieving and
             | processing current information at high speeds but
             | intelligence is not about the ability to process some data
             | at high speed and then recall them. This is what we
             | actually call servant. It is the ability to build on top
             | this knowledge retrieval and use reason to create new ideas
             | is a closer definition of intelligence and would be better
             | goal.
        
               | anon291 wrote:
               | Let's step back.
               | 
               | 1. The vast majority of people never come up with a truly
               | new idea. those that do are considered exceptional and
               | their names go down in history books.
               | 
               | 2. Most 'new ideas' are rehashes of old ones.
               | 
               | 3. If you set the temperature up on an LLM, it will
               | absolutely come up with new ideas. Expecting an LLM to
               | make a scientific discover a la einstein is ... a bit
               | much, don't you think [1]? When it comes to 'everyday'
               | creativity, such as short poems, songs, recipes, vacation
               | itineraries, etc. ChatGPT is more capable than the vast
               | majority of people. Literally, ask ChatGPT to write you a
               | song about _____, and it will come up with something
               | creative. Ask it for a recipe with ridiculous ingredients
               | and see what it does. It'll make things you've never seen
               | before, generate an image for you and even come up with a
               | neologism if you ask it too. It's insanely creative.
               | 
               | [1] Although I have walked chatgpt through various
               | theoretical physics scenarios and it will create new math
               | for you.
        
               | vlovich123 wrote:
               | > The vast majority of people never come up with a truly
               | new idea. those that do are considered exceptional and
               | their names go down in history books.
               | 
               | Depends on your definition of "truly" new since any idea
               | could be argued to be a mix of all past ideas. But I see
               | truly new ideas all the time without going down in the
               | history books because most new ideas are incrementally
               | building on what came before or are extremely niche and
               | only a very few turn out to be a massive turning point
               | which has a broad impact which is also only usually
               | evident in retrospect (e.g. blue LEDs was basically trial
               | and error and almost an approach that was given up on,
               | transistors were believed to be impactful but not a huge
               | revolution for computing like they turned out to be, etc
               | etc).
        
               | anon291 wrote:
               | > Depends on your definition of "truly" new since any
               | idea could be argued to be a mix of all past ideas.
               | 
               | My personal feeling when I engage in these conversations
               | is that we humans have a cognitive bias to ascribe a
               | human remixing of an old idea to intelligence, but an AI-
               | model remixing of an old idea as lookup.
               | 
               | Indeed, basically every revolutionary idea is a mix of
               | past ideas if you look closely enough. AI is a great
               | example. To the 'lay person' AI is novel! It's new. It
               | can talk to you! It's amazing. But for people who've been
               | in this field for a while, it's an incremental
               | improvement over linear algebra, topology, functional
               | spaces, etc.
        
               | ehhehehh wrote:
               | It is not about novelty so much as it is about reasoning
               | from first principles and learning new things.
               | 
               | I don't need to finetune on five hundred pictures of
               | rabbits to know one. I need one look and then I'll know
               | for life and can use this in unimaginable and endless
               | variety.
               | 
               | This is a simplistic example which you can naturally pick
               | apart but when you do I'll provide another such example.
               | My point is, learning at human (or even animal) speeds is
               | definitely not solved and I'd say we are not even
               | attempting that kind of learning yet. There is "in
               | context learning" and "finetuning" and both are not going
               | to result in human level intelligence judging from
               | anything I've had access to.
               | 
               | I think you are anthropomorphizing the clever text
               | randomization process. There is a bunch of information
               | being garbled and returned in a semi-legible fashion and
               | you imbue the process behind it with intelligence that I
               | don't think it has. All these models stumble over simple
               | reasoning unless specifically trained for those specific
               | types of problems. Planning is one particularly famous
               | example.
               | 
               | Time will tell, but I'm not betting on LLMs. I think
               | other forms of AI are needed. Ones that understand
               | substance, modality, time and space and have working
               | memory, not just the illusion of it.
        
               | anon291 wrote:
               | > I don't need to finetune on five hundred pictures of
               | rabbits to know one. I need one look and then I'll know
               | for life and can use this in unimaginable and endless
               | variety.
               | 
               | So if you do use in-context learning and give chatGPT a
               | few images of your novel class, then it will correctly
               | classify usually. Finetuning is so you an save on token
               | cost.
               | 
               | Moreover, you don't typically need that many pictures to
               | fine tune. The studies show that the models successfully
               | extrapolate once they've been 'pre-trained'. This is
               | similar to how my toddler insists that a kangaroo is a
               | dog. She's not been exposed to enough data to know
               | otherwise. Dog is a much more fluid category for her than
               | in real life. If you talk with her for a while about it,
               | she will eventually figure out kangaroo is kangaroo and
               | dog is dog. But if you ask her again next week, she'll go
               | back to saying they're dogs. Eventually she'll learn.
               | 
               | > All these models stumble over simple reasoning unless
               | specifically trained for those specific types of
               | problems. Planning is one particularly famous example.
               | 
               | We have extremely expensive programs called schools and
               | universities designed to teach little humans how to plan
               | and execute. If you look at cultures without
               | American/Western biases (and there's not very many left,
               | so we really have to look to history), we see that the
               | idea of planning the way we do it is not universal.
        
             | vlovich123 wrote:
             | LLMs are probably better than you at tasks you're not good
             | at. There's a huge gulf between a domain expert and an LLM
             | though. If there weren't, all the humans in companies would
             | be fired right now and replaced. Similarly, OpenAI and
             | Anthropic are paying engineers a metric fuckton of money to
             | work there. If LLMs were that big of a game changer right,
             | they wouldn't be paying that much. Or if you make the
             | argument that only the best humans are getting hired,
             | they're still hiring interns & junior engineers. If that
             | were the case those would be being replaced by LLMs and
             | they're not.
             | 
             | You're basically ignoring all the experts saying "LLMs suck
             | at all these things that even beginning domain experts
             | don't suck at" to generate your claim & then ignoring all
             | evidence to the contrary.
             | 
             | And you're ignoring the ways in which LLMs fall on their
             | face to be creative that aren't language-based. Creative
             | problem solving in ways they haven't been trained on is out
             | of their domain while fully squarely in the domain of human
             | intelligence.
             | 
             | > You can claim that that's not intelligence until the cows
             | come home, but any person able to do that would be
             | considered a savant
             | 
             | Computers can do arithmetic really quickly but that's not
             | intelligence but a person computing that quickly is
             | considered a savant. You've built up an erroneous dichotomy
             | in your head.
        
               | anon291 wrote:
               | But that's exactly it, right. There are some people
               | excellent for being the expert in one field and some
               | people are excellent because they're extremely competent
               | at many fields. LLMs are the latter.
               | 
               | Sure, for any domain expert, you can easily get an LLM to
               | trip on something. But just the shear amount of things it
               | is above average at puts it easily into the top echelon
               | of humans.
               | 
               | > You're basically ignoring all the experts saying "LLMs
               | suck at all these things that even beginning domain
               | experts don't suck at" to generate your claim & then
               | ignoring all evidence to the contrary.
               | 
               | Domain expertise is not the only form of intelligence.
               | The most interesting things often lie at the
               | intersections of domains. As I said in another comment.
               | There are a variety of ways to judge intillegence, and no
               | one quantifiable metric. It's like asking if Einstein is
               | better than Mozart. I don't know... their fields are so
               | different. However, I think it's pretty safe to say that
               | the modern slate of LLMs fall into the top 10% of human
               | intelligence, simply for their breath of knowledge and
               | ability to synthesize ideas at the cross-section of any
               | wide number of fields.
        
               | vlovich123 wrote:
               | > some people are excellent because they're extremely
               | competent at many fields. LLMs are the latter
               | 
               | But they're not. The people who are extremely competent
               | at many fields will _still_ outperform LLMs in those
               | fields. The LLM can basically only outperform a complete
               | beginner in the area  & makes up for that weakness by
               | scaling up the amount it can output which a human can't
               | match. That doesn't take away from the fact that the
               | output is complete garbage when given anything it doesn't
               | know the answer to. As I noted elsewhere, ask it to
               | provide an implementation of the S3 ListObjects operation
               | (like the actual backend) and see what BS it tries to
               | output to the point where you have to spend a good amount
               | of time to convince it just to not output an example of
               | using the S3 ListObjects API.
               | 
               | > I think it's pretty safe to say that the modern slate
               | of LLMs fall into the top 10% of human intelligence,
               | simply for their breath of knowledge and ability to
               | synthesize ideas at the cross-section of any wide number
               | of fields.
               | 
               | Again, evidence assumed that's not been submitted. Please
               | provide an indication of any truly novel ideas being
               | synthesized by LLMs that are a cross-section of fields.
        
               | anon291 wrote:
               | > Please provide an indication of any truly novel ideas
               | being synthesized by LLMs that are a cross-section of
               | fields.
               | 
               | The problem here is that you expect something akin to
               | relativity, the Poincare conjecture, et al. The _vast
               | majority_ of humans are not able to do this.
               | 
               | If you restrict yourself to the sorts of creativity that
               | average people are good at, the models do extremely well.
               | 
               | I'm not sure how to convince you of this. Ideally, I'd
               | get a few people of above average intelligence together,
               | and give them an hour (?) to work on some problem /
               | creative endeavor (we'd have to restrict their tool use
               | to the equivalent of whatever we allow GPT to have), and
               | then we can compare the results.
               | 
               | EDIT: Here's what ChatGPT thinks we should do: https://ch
               | atgpt.com/share/673b90ca-8dd4-8010-a1a0-61af699a44...
        
               | vlovich123 wrote:
               | But why is comparing against untrained humans the
               | benchmark? ChatGPT has literally been trained on so much
               | more data than a human would ever see & use so much more
               | energy. Let's compare like against like. Benchmarks like
               | FrontierMath are important and one extreme - passing it
               | would indicate that either the questions are part of the
               | training set or genuine creativity and skill has been
               | developed for the AI system. The important thing is that
               | people keep growing - they can go from student to expert.
               | AI systems do not have that growth capability which
               | indicates a very important thing is missing from their
               | intelligence capability.
               | 
               | I want to be clear - I'm talking about the intelligence
               | of AI systems available today and today only. There's
               | lots of reason to be enthusiastic about the future but
               | similarly very cautious about understanding what is
               | available today & what is available today isn't human-
               | like.
        
               | anon291 wrote:
               | > ChatGPT has literally been trained on so much more data
               | than a human would ever see
               | 
               | This is a common fallacy. The average human ingests a few
               | dozen GB of data a day [1] [2].
               | 
               | ChatGPT 4 was trained on 13 trillion tokens. Say a token
               | is 4 bytes (it's more like 3, but we're being
               | conservative). That's 52 trillion bytes or 52 terabytes.
               | 
               | Say the average human only consumes the lower estimate of
               | 30 GB a day. That means it would take a human 1625 days
               | to consume the number of tokens ChatGPT was trained on,
               | or 4.5 years. Assuming humans and the LLM start from the
               | same spot [3], the proper question is... is ChatGPT
               | smarter than a 4.5 year old. If we use the higher
               | estimate, then we have to ask if ChatGPT is smarter than
               | a 2 year old. Does ChatGPT hallucinate more or less than
               | the average toddler?
               | 
               | The cognitive bias I've seen everywhere is the idea that
               | humans are trained on a small amount of data. Nothing is
               | further from the truth. Humans require training on an
               | insanely large amount of data. A 40 year old human has
               | been trained on orders of magnitudes more data than I
               | think we even have available as data sets. If you prevent
               | a human from being trained on this amount of data through
               | sensory deprivation they go crazy (and hallucinate very
               | vividly too!).
               | 
               | No argument about energy, but this is a technology
               | problem.
               | 
               | [1] https://www.tech21century.com/the-human-brain-is-
               | loaded-dail...
               | 
               | [2] https://kids.frontiersin.org/articles/10.3389/frym.20
               | 17.0002...
               | 
               | [3] this is a bad assumption since LLMs are randomly
               | initialized whereas humans seem to be born with some
               | biases that significantly aid in the acquisition of
               | language and social skills
        
         | giantrobot wrote:
         | My old TI-86 can calculate stuff faster than me. You wouldn't
         | ever ask if it was smarter than me. An audio filter can process
         | audio faster than I can listen to it but you'd never suggest it
         | was intelligent.
         | 
         | AI models are algorithms running on processors running at
         | billions of calculations a second often scaled to hundreds of
         | such processors. They're not intelligent. They're fast.
        
           | anon291 wrote:
           | Except the LLM can solve a general problem (or tell you why
           | it cannot), while your calculator can only do that which it's
           | been programmed.
        
             | th0ma5 wrote:
             | Do you have any evidence besides anecdote?
        
               | anon291 wrote:
               | what kind of evidence substantiates creativity?
               | 
               | Things I've used chat gpt for:
               | 
               | 1. writing songs (couldn't find the generated lyrics
               | online, so assume it's new)
               | 
               | 2. Branding ideas (again couldn't find the logos online,
               | so assuming they're new)
               | 
               | 3. Recipes (with weird ingredients that I've not found
               | put together online)
               | 
               | 4. Vacations with lots of constraints (again, all the
               | information is obviously available online, but it put it
               | together for me and gave recommendations for my family
               | particularly).
               | 
               | 5. Theoretical physics explorations where I'm too lazy to
               | write out the math (and why should I... chatgpt will do
               | it for me...)
               | 
               | I think perhaps one reason people here do not have the
               | same results is I typically use the API directly and
               | modify the system prompt, which drastically changes the
               | utility of chatgpt. The default prompt is too focused on
               | retrieval and 'truth'. If you want creativity you have to
               | ask it to be an artist.
        
               | th0ma5 wrote:
               | No I think they don't have the results you do because
               | they are trying to do those things _well_ ...
        
               | anon291 wrote:
               | The personal insult insinuated here is not appreciated
               | and probably against community guidelines.
               | 
               | For what I needed, those things worked very well
        
               | th0ma5 wrote:
               | Anecdotes have equal weight. All of these models
               | frustrate me to no end but I only do things that have
               | never been done before. And it isn't an insult because
               | you have no evidence of quality.
        
               | anon291 wrote:
               | > Anecdotes have equal weight. All of these models
               | frustrate me to no end but I only do things that have
               | never been done before. And it isn't an insult because
               | you have no evidence of quality.
               | 
               | You have not specified what evidence would satisfy you.
               | 
               | And yes, it was an insult to insinuate I would accept sub
               | par results whereas others would not.
               | 
               | EDIT: Chat GPT seems to have a solid understanding of why
               | your comment comes across as insulting: https://chatgpt.c
               | om/share/673b95c9-7a98-8010-9f8a-9abf5374bb...
               | 
               | Maybe this should be taken as one point of evidence of
               | greater ability?
        
               | th0ma5 wrote:
               | I think you lead the result by not providing enough
               | context like saying how there is no objective way to
               | measure the quality of an LLM generation after the fact
               | nor before.
               | 
               | Edit I asked ChatGPT with a more proper context: "It's
               | not inherently insulting to say that an LLM (Large
               | Language Model) cannot guarantee the best quality because
               | it's a factual statement grounded in the nature of how
               | these models work. LLMs rely on patterns in their
               | training data and probabilistic reasoning rather than
               | subjective or objective judgments about "best quality."
        
               | anon291 wrote:
               | I can't criticize how you prompted it because you did not
               | link the transcript :)
               | 
               | Zooming out, you seem to be in the wrong conversation. I
               | said:
               | 
               | > the LLM can solve a general problem (or tell you why it
               | cannot), while your calculator can only do that which
               | it's been programmed.
               | 
               | You said:
               | 
               | > Do you have any evidence besides anecdote?
               | 
               | I think that -- for both of us now having used chat gpt
               | to generate a response -- we have good evidence that the
               | model can solve a general program (or tell you why it
               | cannot), while a calculator can only do the arithmetic
               | for which it's been programmed. If you want to counter,
               | then a video of your calculator answering the question we
               | just posed would be nice.
        
             | vlovich123 wrote:
             | Go ask your favorite LLM to write you some code to
             | implement the backend of the S3 API and see how well it
             | does. Heck, just ask it to implement list iteration against
             | some KV object store API and be amazed at the complete
             | garbage that gets emitted.
        
               | anon291 wrote:
               | So I told it what I wanted, and it generated an initial
               | solution and then modified it to do some file
               | distribution. Without the ability to actually execute the
               | code, this is an excellent first pass.
               | 
               | https://chatgpt.com/share/673b8c33-2ec8-8010-9f70-b0ed12a
               | 524...
               | 
               | Chat GPT can't directly execute code on my machine due to
               | architectural limitations, but I imagine if I went and
               | followed its instructions and told it what went wrong, it
               | would correct it.
               | 
               | and that's just it, right? If i were to program this, I
               | would be iterating. ChatGPT cannot do that because of how
               | its architected (I don't think it would be hard to do
               | this if you used the API and allowed some kind of tool
               | use). However, if I told someone to go write me an S3
               | backend without ever executing it, and they came back
               | with this... that would be great.
               | 
               | EDIT: with chunking: https://chatgpt.com/share/673b8c33-2
               | ec8-8010-9f70-b0ed12a524...
               | 
               | IIRC, from another thread on this site, this is
               | essentially how S3 is implemented (centralized metadata
               | database that hashes out to nodes which implement a local
               | storage mechanism -- MySQL I think).
        
               | vlovich123 wrote:
               | And that's why it's dangerous to evaluate something when
               | you don't understand what's going on. The implementation
               | generated not only saves things directly to disk [1] [2]
               | but it doesn't even implement file uploading correctly
               | nor does it implementing listing of objects (which I
               | guarantee you would be incorrect). Additionally, it makes
               | a key mistake which is that uploading isn't a form but is
               | the body of the request so it's already unable to have a
               | real S3 client connect. But of course at first glance it
               | has the appearance of maybe being something passable.
               | 
               | Source: I had to implement R2 from scratch and nothing
               | generated here would have helped me as even a starting
               | point. And this isn't even getting to complex things like
               | supporting arbitrarily large uploads and encrypting
               | things while also supporting seeked downloads or
               | multipart uploads.
               | 
               | [1] No one would ever do this for all sorts of problems
               | including that you'd have all sorts of security problems
               | with attackers sending you /../ to escape bucket and
               | account isolation.
               | 
               | [2] No one would ever do this because you've got nothing
               | more than a toy S3 server. A real S3 implementation needs
               | to distribute the data to multiple locations so that
               | availability is maintained in the face of isolated
               | hardware and software failures.
        
               | anon291 wrote:
               | > I had to implement R2 from scratch and nothing
               | generated here would have helped me as even a starting
               | point.
               | 
               | Of course it wouldn't. You're a computer programmer.
               | There's no point for you to use ChatGPT to do what you
               | already know how to do.
               | 
               | > The implementation generated not only saves things
               | directly to disk
               | 
               | There is nothing 'incorrect' about that, given my initial
               | problem statement.
               | 
               | > Additionally, it makes a key mistake which is that
               | uploading isn't a form but is the body of the request so
               | it's already unable to have a real S3 client connect.
               | 
               | Again.. look at the prompt. I asked it to generate an
               | object storage system, not an S3-compatible one.
               | 
               | It seems you're the one hallucinating.
               | 
               | EDIT: ChatGPT says: In short, the feedback likely stems
               | from the implicit expectation of S3 API standards, and
               | the discrepancy between that and the multipart form
               | approach used in the code.
               | 
               | and
               | 
               | In summary, the expectation of S3 compatibility was a
               | bias, and he should have recognized that the
               | implementation was based on our explicitly discussed
               | requirements, not the implicit ones he might have
               | expected.
        
               | vlovich123 wrote:
               | > There's no point for you to use ChatGPT to do what you
               | already know how to do.
               | 
               | If it were more intelligent of course there would be. It
               | would catch mistakes I wouldn't have thought about, it
               | would output the work more quickly, etc. It's literally
               | worse than if I'd assigned a junior engineer to do some
               | of the legwork.
               | 
               | > ChatGPT says: In short, the feedback likely stems from
               | the implicit expectation of S3 API standards, and the
               | discrepancy between that and the multipart form approach
               | used in the code. > In summary, the expectation of S3
               | compatibility was a bias, and he should have recognized
               | that the implementation was based on our explicitly
               | discussed requirements, not the implicit ones he might
               | have expected
               | 
               | Now who's rationalizing. I was pretty clear in saying
               | implement S3.
        
               | anon291 wrote:
               | > Now who's rationalizing. I was pretty clear in saying
               | implement S3.
               | 
               | In general, I don't deny the fact that humans fall into
               | common pitfalls, such as not reading the question. As I
               | pointed out this is a common human failing, a
               | 'hallucination' if you will. Nevertheless, my failing to
               | deliver that to chatgpt should not count against chatgpt,
               | but rather me, a humble human who recognizes my failings.
               | And again, this furthers my point that people hallucinate
               | regularly, we just have a social way to get around it --
               | what we're doing right now... discussion!
        
               | vlovich123 wrote:
               | My reply was purely around ChatGPT's response which I
               | characterized as a rationalization. It clearly was
               | following the S3 template since it copied many parts of
               | the API but then failed to call out if it was deviating
               | and why it made decisions to deviate.
        
         | achierius wrote:
         | In the same sense (though to greater extent) that calculators
         | are, sure. Calculators can also far exceed human capacity to,
         | well, calculate. LLMs are similar: spikes of capacity in
         | various areas (bulk summarization, translation, general recall,
         | ...) that humans could never hope to match, but not capable of
         | beating humans at a more general range of tasks.
        
           | anon291 wrote:
           | > humans could never hope to match, but not capable of
           | beating humans at a more general range of tasks.
           | 
           | If we restrict ourselves only to language (LLMs are at a
           | disadvantage because there is no common physical body we can
           | train them on at the present moment... that will change), I
           | think LLMs beat humans for most tasks.
        
         | Workaccount2 wrote:
         | They process the audio but they stumble enough with recall that
         | you cannot really trust it.
         | 
         | I had a problem where I used GPT-4o to help me with inventory
         | management, something a 5th grade kid could handle, and it kept
         | screwing up values for a list of ~50 components. I ended up
         | spending more time trying to get it to properly parse the input
         | audio (I read off the counts as I moved through inventory bins)
         | then if I had just done it manually.
         | 
         | On the other hand, I have had good success with having it write
         | simple programs and apps. So YMMV quite a lot more than with a
         | regular person.
        
           | anon291 wrote:
           | > They process the audio but they stumble enough with recall
           | that you cannot really trust it.
           | 
           | I will wave my arms wildly at the last eight years if the
           | claim is that humans do not struggle with recall.
        
             | th0ma5 wrote:
             | So are they human like and therefore not anything special
             | or are they super human magic? I never get the equivocation
             | when people complain how there is no way to objectively
             | tell what out is right or wrong people either say they are
             | getting better, or they work for me, or that people are
             | just as bad. No they aren't! Not in the same way these
             | things are bad.
        
               | anon291 wrote:
               | Most people will confidently recount whatever narrative
               | matches their current actions. This is called
               | rationalization, and most people engage in it daily.
        
             | vlovich123 wrote:
             | I will wave my arms wildly if the claim is that LLM
             | struggle with recall is similar to human-like struggle with
             | recall. And since that's how we decide on truth, I win?
        
               | anon291 wrote:
               | what we call hallucination in LLMs is called
               | 'rationalization' for humans. The psychology shows that
               | most peoples do things out of habit and only after
               | they've done it will explain why the did it. This is most
               | obviously seen in split brain patients where the visual
               | fields are then separated. If you throw a ball towards
               | the left side of the person, the right brain will catch
               | the ball. if you then ask the person why they caught the
               | ball the left brain will make up a completely ridiculous
               | narrative as to why the hand moved (because it didn't
               | know there is a ball. This is a contrived example, but it
               | shows that human recollection of intent is often very
               | very wrong. There are studies that show this even in
               | people with whole brains.
        
               | vlovich123 wrote:
               | You're unfortunately completely missing the point. I
               | didn't say that human recall is perfect or that they
               | don't rationalize. And of course you can have extreme
               | denial of what's happening in front of you even in
               | healthy individuals. In fact, you see this in this thread
               | where either you or the huge number of people trying to
               | dissillusion you from the maximal position you've staked
               | out on LLMs is wrong and one of us is incorrectly
               | rationalizing our position.
               | 
               | The point is that the ways in which it fails is
               | completely different from LLMs and it's different between
               | people whereas the failure modes for LLMs are all fairly
               | identical regardless of the model. Go ask an LLM to draw
               | you a wine glass filled to the brim and it'll keep
               | insisting it does even though it keeps drawing one half-
               | filled and agree that the one it drew doesn't have the
               | characteristics it says such a drawing would need and
               | _still_ output the exact same drawing. Most people would
               | not fail at the task in that way.
        
               | anon291 wrote:
               | > In fact, you see this in this thread where either you
               | or the huge number of people trying to dissillusion you
               | from the maximal position you've staked out on LLMs is
               | wrong and one of us is incorrectly rationalizing our
               | position.
               | 
               | I by no means have a 'maximal' position. I have said that
               | they exceed the intelligence and ability of the vast
               | majority of the human populace when it comes to their
               | singular sense and action (ingesting language and
               | outputting language). I fully stand by that, because it's
               | true. I've not claimed that they exceed everyone's
               | intelligence in every area. However, their ability to
               | synthesize wildly different fields is well beyond most
               | human's ability. Yes, I do believe we've crossed the
               | tipping point. As it is, these things are not noticeable
               | except in retrospect.
               | 
               | > The point is that the ways in which it fails is
               | completely different from LLMs and it's different between
               | people whereas the failure modes for LLMs are all fairly
               | identical
               | 
               | I disagree with the idea that human failure modes are
               | different between people. I think this is the result of
               | not thinking at a high enough level. Human failure modes
               | are often very similar. Drama authors make a living off
               | exploring human failure modes, and there's a reason why
               | they say there are no new stories.
               | 
               | I agree that Human and LLM failure modes are different,
               | but that's to be expected.
               | 
               | > regardless of the model
               | 
               | As far as I'm aware, all LLMs in common use today use a
               | variant of the transformer. Transformers have much
               | different pitfalls compared to RNNs (RNNs are
               | parlticularly bad at recall for example).
               | 
               | > Go ask an LLM to draw you a wine glass filled to the
               | brim and it'll keep insisting it does even though it
               | keeps drawing one half-filled and agree that the one it
               | drew doesn't have the characteristics it says such a
               | drawing would need and still output the exact same
               | drawing. Most people would not fail at the task in that
               | way.
               | 
               | Most people can't draw very well anyway, so this is just
               | proving my point.
        
               | vlovich123 wrote:
               | > Most people can't draw very well anyway, so this is
               | just proving my point.
               | 
               | And you're proving my point. The ways in which the people
               | would fail to draw the wine glass are different from the
               | LLM. The vast majority of people would fail to reproduce
               | a photorealistic simile. But the vast majority of people
               | would meet the requirement of drawing it filled to the
               | brim. The LLMs absolutely succeed at the quality of the
               | drawing but absolutely fail at meeting human
               | specifications and expectations. Generously, you can say
               | it's a different kind of intelligence. But saying it's
               | more intelligent than humans requires you to use a
               | drastically different axis akin to the one you'd use
               | saying that computers are smarter than humans because
               | they can add two numbers more quickly.
        
               | anon291 wrote:
               | At no point did I say humans and LLMs have the same
               | failure modes.
               | 
               | > But the vast majority of people would meet the
               | requirement of drawing it filled to the brim.
               | 
               | But both are failures, right? It's just a cognitive bias
               | that we don't expect artistic ability of most people.
               | 
               | > But saying it's more intelligent than humans requires
               | you to use a drastically different axis
               | 
               | I'm not going to rehash this here, but as I said
               | elsewhere in this thread, intelligences are different.
               | There's no one metric, but for many common human tasks,
               | the ability of the LLMs surpasses humans.
               | 
               | > saying that computers are smarter than humans because
               | they can add two numbers more quickly.
               | 
               | This is where I disagree. Unlike a traditional program,
               | both humans and LLMs can take unstructured input and
               | instruction. Yes, they can both fail and they fail
               | differently (or succeed in different ways), but there is
               | a wide gulf between the sort of structured computation a
               | traditional program does and an llm.
        
           | wahnfrieden wrote:
           | You must use it to make transcripts and then write code to
           | process the values in the transcripts
        
           | XenophileJKO wrote:
           | Likely the issue is how you are asking the model to process
           | things. The primary limitation is the amount of information
           | (or really attention) they can keep in flight at any given
           | moment.
           | 
           | This generally means for a task like you are doing, you need
           | to have sign posts in the data like minute markers or
           | something that it can process serially.
           | 
           | This means there are operations that are VERY HARD for the
           | model like ranking/sorting. This requires the model to attend
           | to everything to find the next biggest item, etc. It is very
           | hard for the models currrently.
        
             | anon291 wrote:
             | > This means there are operations that are VERY HARD for
             | the model like ranking/sorting. This requires the model to
             | attend to everything to find the next biggest item, etc. It
             | is very hard for the models currrently.
             | 
             | Ranking / sorting is O(n log n) no matter what. Given that
             | a transformer runs in constant time before we 'force' it to
             | output an answer, there must be an M such that beyond that
             | length it cannot reliably sort a list. This MUST be the
             | case and can only be solved by running the model some
             | indeterminate number of times, but I don't believe we
             | currently have any architecture to do that.
             | 
             | Note that humans have the same limitation. If you give
             | humans a time limit, there is a maximum number of things
             | they will be able to sort reliably in that time.
        
               | christianqchung wrote:
               | Transformers absolutely do not run in constant time by
               | any reasonable definition, no matter what your point is.
        
               | anon291 wrote:
               | They absolutely do given a sequence size. All models have
               | max context lengths. Thus bounded by a constant
        
         | simion314 wrote:
         | So what? I can write a script that can do iun a minute some job
         | you won't do in a 1000 years.
         | 
         | Singularity means something very specific, if your AI can build
         | a smarter AI then itself by itself, and that AI can also build
         | a new smarter AI then you have singularity.
         | 
         | You do not have singularity if an LLM can solve more math
         | problems then the average Joe, or if ti can answer more trivia
         | questions then a random person, even if you have an AI better
         | then all humans combined at Tic Tac Toe you still do not have a
         | singularity, IT MUST build a smarter AI then itself and then
         | iterate on that.
        
           | anon291 wrote:
           | > Singularity means something very specific, if your AI can
           | build a smarter AI then itself by itself, and that AI can
           | also build a new smarter AI then you have singularity.
           | 
           | When I was at Cerebras, I fed in a description of the custom
           | ISA into our own model and asked it to generate kernels (my
           | job), and it was surprisingly good
        
             | simion314 wrote:
             | >When I was at Cerebras, I fed in a description of the
             | custom ISA into our own model and asked it to generate
             | kernels (my job), and it was surprisingly good
             | 
             | And? Was it actually better then say the top 3 people in
             | this field would create if they would work on it ? Because
             | this models are better at css then me, so what? I am bad at
             | css, but all the top models could not solve a math limit
             | from my son homework so we had to use good old forums to
             | have people give us some hints. But for sure models can
             | solve more math limits then the average person who probably
             | can't solve a single one.
        
               | anon291 wrote:
               | No not better than the top 3.
               | 
               | > But for sure models can solve more math limits then the
               | average person who probably can't solve a single one.
               | 
               | Some people are domain experts. The pretrained GPTs are
               | certainly not (nor are they trained to be).
               | 
               | Some people are polymaths but not domain experts. This is
               | still impressive, and where the GPTs fall.
               | 
               | The final conclusion I have is this: These models
               | demonstrate above average understanding in a plethora of
               | widely disparate fields. I can discuss mathematics,
               | computation, programming languages, etc with them and
               | they come across as knowledgeable and insightful to me,
               | and this is my field. Then, I can discuss with them
               | things I know nothing about, such as foreign languages,
               | literature, plant diseases, recipes, vacation
               | destinations, etc, and they're still good at that. If I
               | met a person with as much knowledge and ability to engage
               | as the model, I would think that person to be of very
               | high intelligence.
               | 
               | It doesn't bother me that it's not the best at anything.
               | It's good enough at most things. Yes, its results are not
               | always perfect. Its code doesn't work on the first try,
               | and it sometimes gets confused. But many polymaths do too
               | at a certain level. We don't tell them they're stupid
               | because of it.
               | 
               | My old physics professor was very smart in physics but
               | also a great pianist. But he probably cannot play as well
               | as Chopin. Does that make him an idiot? Of course not.
               | He's still above average in piano too! And that makes him
               | more of a genius than if he were just a great scientist.
        
               | simion314 wrote:
               | agree, there are usages for LLMs
               | 
               | My point was about Singularity, what i t means and why
               | LLMs are not there.
               | 
               | So you missed my point? Was I not clear enough what I was
               | talking about?
        
               | xtreme wrote:
               | This actually tells you why AI doesn't have to be better
               | than _all_ human experts, just the ones you can afford to
               | get together.
        
         | futureshock wrote:
         | I actually do think you have a solid point. These models fall
         | short of AGI, but that might be more of a OODA loop agentic
         | tweak than anything else.
         | 
         | At their core, the state of the art LLMs can basically do any
         | small to medium mental task better than I can or get so close
         | to my level than I've found myself no longer thinking through
         | things the long way. For example, if I want to run some napkin
         | math on something, like I recently did some solar battery
         | charge time estimates, an LLM can get to a plausible answer in
         | seconds that would have taken me an hour.
         | 
         | So yeah, in many practical ways, LLMs are smarter than most
         | people in most situations. They have not yet far surpassed all
         | humans in all situations, and there are still some classes of
         | reasoning problems that they seem to struggle with, but to a
         | first order approximation, we do seem to be mostly there.
        
           | anon291 wrote:
           | > For example, if I want to run some napkin math on
           | something, like I recently did some solar battery charge time
           | estimates, an LLM can get to a plausible answer in seconds
           | that would have taken me an hour.
           | 
           | Exactly. I've used it to figure geometric problems for
           | everyday things (carpentry), market sizing estimates for
           | business ideas, etc. Very fast turnaround. All the doomers in
           | this thread are just ignoring the amazing utility these
           | models provide.
        
           | jcims wrote:
           | >I actually do think you have a solid point. These models
           | fall short of AGI, but that might be more of a OODA loop
           | agentic tweak than anything else.
           | 
           | I think this is it. LLM responses feel like the unconsidered
           | ideas that pop into my head from nowhere. Like if someone
           | asks me how many states are in the United States, a number
           | pops out from somewhere. I don't just wire that to my mouth,
           | I also think about whether or not that's current info, have I
           | gotten this wrong in the past, how confident am I in it, what
           | is the cost of me providing bad information, etc etc etc.
           | 
           | If you effectively added all of those layers to an LLM
           | (something that I think the o1-preview and other approaches
           | are starting to do) it's going to be interesting to see what
           | the net capability is.
           | 
           | The other thing that makes me feel like we're 'getting there'
           | is using some of the fast models at groq.com. The information
           | is generated at, in many cases, an order of magnitude faster
           | than I can consume it. The idea that models might be able to
           | start to engage through an much more sophisticated embedding
           | than english to pass concepts and sequences back and forth
           | natively is intriguing.
        
             | anon291 wrote:
             | > I think this is it. LLM responses feel like the
             | unconsidered ideas that pop into my head from nowhere.
             | 
             | You have to look at the LLM as the inner voice in your
             | head. We've kind of forced them into saying whatever they
             | think due to how we sample the output (next token
             | prediction), but in new architectures with pause tokens, we
             | let them 'think' and they show better judgement and
             | ability. These systems are rapidly going to improve and it
             | will be very interesting to see.
             | 
             | But this is another reason why I think they've surpassed
             | human intelligence. You have to look at each token as a
             | 'time step' in the inner thought process of some entity. A
             | real 'alive' entity has more 'ticks' than what their
             | actions would suggest. For example, human brains can
             | process up to 10FPS (100ms response time), but most humans
             | aren't saying 10 words a second. However, we've made LLMs
             | whose internal processes (i.e., their intuition) is already
             | superior. If we just gave them that final agentic ability
             | to not say anything and ponder (which researchers are
             | doing), their capabilities will increase exponentially
             | 
             | > The other thing that makes me feel like we're 'getting
             | there' is using some of the fast models at groq.com.
             | 
             | Unlike perhaps many of the commentators here, I've been in
             | this field for a bit under a decade now, and was one of the
             | early compiler engineers at Groq. Glad you're finding it
             | useful. It's amazing stuff.
        
         | nisarg2 wrote:
         | At the end of the response they forget everything. They need to
         | be fed the entire text for them to know anything about it the
         | next time. That is not surpassing even feline intelligence.
        
           | andai wrote:
           | A genius can have anterograde amnesia and still be a genius.
        
           | anon291 wrote:
           | If we did to cats what he did to GPT models that would be
           | animal abuse.
           | 
           | That is to say, if we want to extend this analogy, the model
           | is 'killed' after each round. This is hardly a criticism of
           | the underlying technology.
           | 
           | Going back to feeding the entire input. That is not really
           | true. There are a dozen ways to not do that these day.
        
         | throwaway106382 wrote:
         | Can we all agree that chainsaws far surpass human intelligence
         | now? I mean, you can chop down thousands of trees in less time
         | than a single person could even do one. I think the singularity
         | has passed.
        
           | anon291 wrote:
           | Cutting down a tree is not intelligence, but I think it's
           | been well accepted for more than a century that machines
           | surpass human physical capability yes. There were many during
           | the industrial revolution that denied that this was going to
           | be the case, just like how we're seeing here.
        
         | andai wrote:
         | Hijacking thread to ask: how would we know? Another
         | uncomfortable issue is the question of sentience. Models
         | claimed they were sentient years ago, but this was dismissed as
         | "mimicking patterns in the training data" (fair enough) and the
         | training was modified to forbid them from doing that.
         | 
         | But if it does happen some day, how will we know? What are the
         | chances that the first sentient AI will be accused of just
         | mimicking patterns?
         | 
         | Indeed with the current training methodology it's highly likely
         | that the first sentient AI will be unable to even let us know
         | it's sentient.
        
           | anon291 wrote:
           | We couldn't know. Humans mimick patterns. The claims that
           | LLMs aren't smart because they don't generate anything new
           | fall completely flat for me. If you look back far enough most
           | humans generate nothing new. For example, even novel ideas
           | like Einstein's theory of relativity are re-iterations of
           | existing ideas. If you want to be pedantic, one can trace
           | back the majority of ideas, claim that each incremental step
           | was 'not novel, but just recollection' and then make the
           | egregious claim that humanity has invented nothing.
           | 
           | > But if it does happen some day, how will we know? What are
           | the chances that the first sentient AI will be accused of
           | just mimicking patterns?
           | 
           | Leaving questions of sentience aside (since we don't even
           | really know _what_ that is) and focusing on intelligence, the
           | truth is that we will probably not know until many decades
           | latel.
        
             | cdblades wrote:
             | But you just made a strong claim about something you are
             | here saying we can't know?
        
               | anon291 wrote:
               | I believe we have passed a technological singularity.
               | There is no consensus as you can see here. I believe in a
               | few decades there will be consensus.
               | 
               | Intelligence and technological singularities are
               | observable things.
               | 
               | Sentience is not.
        
         | hehehheh wrote:
         | Computers can do stuff humans struggled with since the abacus.
         | A 386 PC can do mathematical calaculatuons a human couldn't do
         | in a lifetime.
        
       | aliljet wrote:
       | This is fantastic news. I've been using
       | Qwen2.5-Coder-32B-Instruct with Ollama locally and it's honestly
       | such a breathe of fresh air. I wonder if any of you have had a
       | moment to try this newer context length locally?
       | 
       | BTW, I fail to effectively run this on my 2080 ti, I've just
       | loaded up the machine with classic RAM. It's not going to win any
       | races, but as they say, it's not the speed that matter, it's the
       | quality of the effort.
        
         | notjulianjaynes wrote:
         | Hi, are you able to use Qwen's 128k context length with Ollama?
         | Using AnythingLLM + Ollamma and a GGUF version I kept getting
         | an error message with prompts longer than 32,000 tokens.
         | (summarizing long transcripts)
        
           | syntaxing wrote:
           | The famous Daniel Chen (same person that made Unsloth and
           | fixed Gemini/LLaMa bugs) mentioned something about this on
           | reddit and offered a fix. https://www.reddit.com/r/LocalLLaMA
           | /comments/1gpw8ls/bug_fix...
        
             | zargon wrote:
             | After reading a lot of that thread, my understanding is
             | that yarn scaling is disabled intentionally by default in
             | the GGUFs, because it would degrade outputs for contexts
             | that do fit in 32k. So the only change is enabling yarn
             | scaling at 4x, which is just a configuration setting. GGUF
             | has these configuration settings embedded in the file
             | format for ease of use. But you should be able to override
             | them without downloading an entire duplicate set of weights
             | (12 to 35 GB!). (It looks like in llama.cpp the override-kv
             | option can be used for this, but I haven't tried it yet.)
        
               | syntaxing wrote:
               | Oh super interesting, I didn't know you can override this
               | with a flag on llama.cpp.
        
             | notjulianjaynes wrote:
             | Yeah unfortunately that's the exact model I'm using (Q5
             | version. What I've been doing is first loading the
             | transcript into the vector database, and then giving it a
             | prompt thats like "summarize the transcript below: <full
             | text of transcript>". This works surprisingly well except
             | for one transcript I had which was of a 3 hour meeting that
             | was per an online calculator about 38,000 tokens. Cutting
             | the text up into 3 parts and pretending each was a seperate
             | meeting* lead to a bunch of hallucinations for some reason.
             | 
             | *In theory this shouldn't matter much for my purpose of
             | summarizing city council meetings that follow a predictable
             | format.
        
         | ipsum2 wrote:
         | The long context model has not been open sourced.
        
         | lukev wrote:
         | I ran a couple needle-in-a-haystack type queries with just a
         | 32k context length, and was very much not impressed. It often
         | failed to find facts buried in the middle of the prompt, that
         | were stated almost identically to the question being asked.
         | 
         | It's cool that these models are getting such long contexts, but
         | performance definitely degrades the longer the context gets and
         | I haven't seen this characterized or quantified very well
         | anywhere.
        
       | lostmsu wrote:
       | Is this model downloadable?
        
         | gkaye wrote:
         | They are not clear about this (which is annoying), but it seems
         | it will not be downloadable. No weights have been released so
         | far, and nothing in this post mentions plans to do so going
         | forward.
        
       ___________________________________________________________________
       (page generated 2024-11-18 23:00 UTC)