[HN Gopher] ChatGPT-4o vs. Math
       ___________________________________________________________________
        
       ChatGPT-4o vs. Math
        
       Author : sabrina_ramonov
       Score  : 218 points
       Date   : 2024-05-16 15:30 UTC (7 hours ago)
        
 (HTM) web link (www.sabrina.dev)
 (TXT) w3m dump (www.sabrina.dev)
        
       | lupire wrote:
       | Need to run this experiment on a problem that is t already on its
       | training set.
        
         | passwordoops wrote:
         | Shhhh... Don't ruin it
        
         | navane wrote:
         | It's the equivalent of cramming for a test, or memorizing leete
         | code -- not very useful but very humane. Imagine if that's the
         | direction this goes, finally we make humane ai, but it is as
         | opportunitistic and deceitful as we are, and not really that
         | smart.
        
         | aulin wrote:
         | Reminds me when I used to ace Ancient Greek translation tests
         | (it's a thing in Italy) by looking up whole translated
         | sentences listed as usage examples in the dictionary
        
         | sabrina_ramonov wrote:
         | sometimes for physics/math exams, we'd get to create our own 1
         | pager cheat sheet to use. I'd just cram tons of actual
         | problems/solutions on there, then scan for similarity.
        
         | bearjaws wrote:
         | Is there any good literature on this topic?
         | 
         | I feel like math is naturally one of the easiest sets of
         | synthetic data we can produce, especially since you can
         | represent the same questions multiple ways in word problems.
         | 
         | You could just increment the numbers infinitely and generate
         | billions of examples of every formula.
         | 
         | If we can't train them to be excellent at math, what hope do we
         | ever have at programming or any other skill?
        
         | cchance wrote:
         | Not in the training set? The dataset is ALL OF THE INTERNET,
         | i'd love you to find something it hasn't seen before.
        
       | soarerz wrote:
       | The model's first attempt is impressive (not sure why it's
       | labeled a choke). Unfortunately gpt4o cannot discover calculus on
       | its own.
        
         | Chinjut wrote:
         | It's a choke because it failed to get the answer. Saying other
         | true things but not getting the answer is not a success.
        
           | bombadilo wrote:
           | I mean, in this context I agree. But most people doing math
           | in high school or university are graded on their working of a
           | problem, with the final result usually equating to a small
           | proportion of the total marks received.
        
             | perfobotto wrote:
             | This is supposed to be a product , not a research artifact.
        
             | chongli wrote:
             | _But most people doing math in high school or university
             | are graded on their working of a problem, with the final
             | result usually equating to a small proportion of the total
             | marks received_
             | 
             | That heavily depends on the individual grader/instructor. A
             | good grader will take into account the amount of progress
             | toward the solution. Restating trivial facts of the problem
             | (in slightly different ways) or pursuing an invalid
             | solution to a dead end should not be awarded any marks.
        
               | slushy-chivalry wrote:
               | it choked because it didn't solve for `t` at the end
               | 
               | impressive attempt though, it used number of wraps which
               | I found quite clever
        
             | giaour wrote:
             | This depends on the grader and the context. Outside of an
             | academic setting, sometimes being close to the right answer
             | is better than nothing, and sometimes it is much worse. You
             | can expect a human to understand which contexts require
             | absolute precision and which do not, but that seems like a
             | stretch for an LLM.
        
               | phatfish wrote:
               | LLMs being _confidently_ incorrect until they are
               | challenged is a bad trait. At least they have a system
               | prompt to tell them to be polite about it.
               | 
               | Most people learn to avoid that person that is wrong/has
               | bad judgment and is arrogant about it.
        
         | HDThoreaun wrote:
         | Right its the only answer that accounts for wasted space there
         | might be between wraps.
        
         | usaar333 wrote:
         | Or.. use calculus?
         | 
         | It has gotten quite impressive at handling calculus word
         | problems. GPT-4 (original) failed miserably on this problem
         | (attempted to set it up using constant acceleration equations);
         | GPT-4O finally gets it correct:
         | 
         | > I am driving a car at 65 miles per hour and release the gas
         | pedal. The only force my car is now experiencing is air
         | resistance, which in this problem can be assumed to be linearly
         | proportional to my velocity.
         | 
         | > When my car has decelerated to 55 miles per hour, I have
         | traveled 300 feet since I released the gas pedal.
         | 
         | > How much further will I travel until my car is moving at only
         | 30 miles per hour?
        
           | xienze wrote:
           | Does it get the answer right every single time you ask the
           | question the same way? If not, who cares how it's coming to
           | an answer, it's not consistently correct and therefore not
           | dependable. That's what the article was exploring.
        
         | fmbb wrote:
         | Can it be taught calculus?
        
         | munk-a wrote:
         | I think this is the biggest flaw in LLMs and what is likely
         | going to sour a lot of businesses on their usage (at least in
         | their current state). It is preferable to give the right answer
         | to a query, it is acceptable to be unable to answer a query -
         | we run into real issues, though, when a query is confidently
         | answered incorrectly. This recently caused a major headache for
         | AirCanada - businesses should be held to the statements they
         | make, even if those statements were made by an AI or call
         | center employee.
        
         | sabrina_ramonov wrote:
         | I labeled it choke because it just stopped.
        
         | photochemsyn wrote:
         | I don't know... here's a prompt query for a standard problem in
         | introductory integral calculus, and it seems to go pretty
         | smoothly from a discrete arithmetical series into the
         | continuous integral:
         | 
         | "Consider the following word problem: "A 100 meter long chain
         | is hanging off the end of a cliff. It weighs one metric ton.
         | How much physical work is required to pull the chain to the top
         | of the cliff if we discretize the problem such that one meter
         | is pulled up at a time?" Note that the remaining chain gets
         | lighter after each lifting step. Find the equation that
         | describes this discrete problem and from that, generate the
         | continuous expression and provide the Latex code for it."
        
       | jdthedisciple wrote:
       | As an aside, what did the author do to get banned on X?
        
         | midiguy wrote:
         | Not gonna lie, when I see that someone is banned on X, I assume
         | credibility
        
           | kylebenzle wrote:
           | Why? Are many credible people banned on Twitter?
        
             | munk-a wrote:
             | A lot of credible people have left Twitter - it has gotten
             | much more overrun by bots and a lot of very hateful
             | accounts have been reinstated and protected. It is a poor
             | platform for reasonable discussion and I think it's fair to
             | say it's been stifling open expression. The value is
             | disappearing.
        
               | bun_terminator wrote:
               | that was not the question
        
               | munk-a wrote:
               | I think it was an appropriate answer at the heart of the
               | matter - most credible people are leaving the platform
               | due to the degradation of quality on it. For a literal
               | example of a ban though there are few examples better
               | than Dell Cameron[1].
               | 
               | 1. https://www.vanityfair.com/news/2023/04/elon-musk-
               | twitter-st...
        
           | ourmandave wrote:
           | Even the 100's of Hamas-affiliated accounts?
           | 
           | https://ny1.com/nyc/all-
           | boroughs/technology/2023/10/12/x-say...
        
             | bassrattle wrote:
             | straw man, and a drop in the bucket
        
             | downWidOutaFite wrote:
             | frEeDom oF sPeeCh
        
         | sabrina_ramonov wrote:
         | OP here. I have no idea. It's really annoying. Now there's at
         | least 3 variations of "Sabrina Ramonov" on X, all banned.
        
       | mritchie712 wrote:
       | Posted a screenshot from the post in, got it on the first try
       | 
       | https://x.com/thisritchie/status/1791153959865852093
        
         | sabrina_ramonov wrote:
         | haha that's neat
        
         | afro88 wrote:
         | Ha nice.
         | 
         | I decided to try the same and it got it incorrect. It's so non-
         | deterministic. It landed on 0.17cm. Tried it another time and
         | it got 0.1697cm. When I asked it to check it's work, it got the
         | right answer 0.00589cm
        
       | jcims wrote:
       | I posted the same 'Zero-Shot Chain-of-Thought and Image' to
       | ChatGPT-4o and it made the same error.
       | 
       | I then followed up with 'Your math is good but you derived
       | incorrect data from the image. Can you take another look and see
       | if you can tell where the error is?'.
       | 
       | It figured it out and corrected it:                   Let's re-
       | examine the image and the data provided:              * The inner
       | radius r1 is given as 5cm              * The outer radius r2 is
       | given as 10cm              * However, the dimensions labeled "5
       | cm" and "10 cm" are actually the diameters          of the inner
       | and outer circles, respectively, not the radii.
       | 
       | Then recomputed and got the right answer. I asked it if it could
       | surmise why it got the wrong answer and it said, among a number
       | of things, that math problems commonly operate in radii instead
       | of diameter.
       | 
       | I restarted with a slightly modified prompt:
       | There is a roll of tape with dimensions specified in the picture.
       | The tape is 100 meters long when unrolled. How thick is the tape?
       | Examine the image carefully and ensure that you fully understand
       | how it is labeled.           Make no assumptions.  Then when
       | calculating, take a deep breath and work on this problem step-by-
       | step.
       | 
       | It got it the first try, and I'm not interested enough to try it
       | a bunch of times to see if that's statistically significant :)
        
         | sabrina_ramonov wrote:
         | confirmed worked for me first try
         | 
         | EDIT: out of 3 times, got it correct 2/3
        
         | CooCooCaCha wrote:
         | This speaks to a deeper issue that LLMs don't just have
         | statistically-based knowledge, they also have statistically-
         | based reasoning.
         | 
         | This means their reasoning process isn't necessarily based on
         | logic, but what is statistically most probable. As you've
         | experienced, their reasoning breaks down in less-common
         | scenarios even if it should be easy to use logic to get the
         | answer.
        
           | 12907835202 wrote:
           | Does anyone know how far off we are having logical AI?
           | 
           | Math seems like low hanging fruit in that regard.
           | 
           | But logic as it's used in philosophy feels like it might be a
           | whole different and more difficult beast to tackle.
           | 
           | I wonder if LLM's will just get better to the point of being
           | indistinguishable from logic rather than actually achieving
           | logical reasoning.
           | 
           | Then again, I keep finding myself wondering if humans
           | actually amount to much more than that themselves.
        
             | ryanianian wrote:
             | (Not an AI researcher, just someone who likes complexity
             | analysis.) Discrete reasoning is NP-Complete. You can get
             | very close with the stats-based approaches of LLMs and
             | whatnot, but your minima/maxima may always turn out to be
             | local rather than global.
        
               | slushy-chivalry wrote:
               | maybe theorem proving could help? ask gpt4o to produce a
               | proof in coq and see if it checks out...or split it into
               | multiple agents -- one produces the proof of the closed
               | formula for the tape roll thickness, and another one
               | verifies it
        
               | ryanianian wrote:
               | Sure, but those are heuristics and feedback loops. They
               | are not guaranteed to give you a solution. An LLM can
               | never be a SAT solver unless it's an LLM with a SAT
               | solver bolted on.
        
               | slushy-chivalry wrote:
               | I don't disagree -- there is a place for specialized
               | tool, and LLM wouldn't be my first pick if somebody asked
               | me to add two large numbers.
               | 
               | There is nothing wrong with LLM + SAT solver --
               | especially if for an end-user it feels like they have 1
               | tool that solves their problem (even if under the hood
               | it's 500 specialized tools governed by LLM).
               | 
               | My point about producing a proof was more about
               | exploratory analysis -- sometimes reading (even
               | incorrect) proofs can give you an idea for an interesting
               | solution. Moreover, LLM can (potentially) spit out a
               | bunch of possibly solutions and have another tool prune
               | and verify and rank the most promising ones.
               | 
               | Also, the problem described in the blog is not a decision
               | problem, so I'm not sure if it should be viewed through
               | the lenses of computational complexity.
        
               | jamilton wrote:
               | I had the thought recently that theorem provers could be
               | a neat source of synthetic data. Make an LLM generate a
               | proof, run it to evaluate it and label it as
               | valid/invalid, fine-tune the LLM on the results. In
               | theory it should then more consistently create valid
               | proofs.
        
             | glial wrote:
             | I think LLMs will need to do what humans do: invent
             | symbolic representations of systems and then "reason" by
             | manipulating those systems according to rules.
             | 
             | Here's a paper working along those lines:
             | https://arxiv.org/abs/2402.03620
        
               | dunefox wrote:
               | Is this what humans do?
        
               | auggierose wrote:
               | That's what I am doing. I follow my intuition, but check
               | it with logic.
        
               | ezrast wrote:
               | Think of all the algebra problems you got in school where
               | the solution started with "get all the x's on the same
               | side of the equation." You then applied a bunch of rules
               | like "you can do anything to one side of the equals sign
               | if you also do it to the other side" to reiterate the
               | same abstract concept over and over, gradually altering
               | the symbology until you wound up at something that looked
               | like the quadratic formula or whatever. Then you were
               | done, because you had transformed the representation (not
               | the value) of x into something you knew how to work with.
        
               | monadINtop wrote:
               | People don't uncover new mathematics with formal rules
               | and symbols pushing, at least not for the most part. They
               | do so first with intuition and vague belief.
               | Formalisation and rigour is the final stage of
               | constructing a proof or argument.
        
               | monadINtop wrote:
               | No. Not in my experience. Anyone with experience in
               | research mathematics will tell you that making progress
               | at the research level is driven by intuition - intuition
               | honed from years of training with formal rules and rigor
               | but intuition nonetheless - with the final step being to
               | reframe the argument in formal/rigorous language and
               | ensure consitency and so forth.
               | 
               | Infact the more experience and skill I get in supposedly
               | "rational" subjects like foundations, set theory,
               | theoretical physics, etc. the more sure I am that
               | intuition / belief first - justification later is a
               | fundamental tenant of how human brains operate, and the
               | key feature of rationalism and science during the
               | enlightenment was producing a framework so that one may
               | have some way to sort beliefs, theories, and assertion so
               | that we can recover - at the end - some kind of gesture
               | towards objectivity
        
             | MR4D wrote:
             | > Does anyone know how far off we are having logical AI?
             | 
             | Your comment made me think of something. How do we know
             | that logic AI is relevant? I mean, how do we know that
             | humans are logic-AI driven and not statistical-intelligent?
        
               | ryanianian wrote:
               | Humans are really good pattern matchers. We can formalize
               | a problem into a mathematical space, and we have
               | developed lots of tools to help us explore the math
               | space. But we are not good at methodically and reliably
               | exploring a problem-space that requires NP-complete
               | solutions.
        
               | cornholio wrote:
               | It doesn't matter, if the chance of getting the wrong
               | answer is sufficiently small. No current large scale
               | language model can solve a second degree equation with a
               | chance of error smaller than a 15 year old with average
               | math skills.
        
               | CooCooCaCha wrote:
               | A smart human can write and iterate on long, complex
               | chains of logic. We can reason about code bases that are
               | thousands of lines long.
        
               | MR4D wrote:
               | But is that really logic?
               | 
               | For instance, we supposedly reason about complex driving
               | laws, but for anyone who has run a stop light late at
               | night when there is no other traffic is acting
               | statistically, not logically.
        
             | ben_w wrote:
             | > Does anyone know how far off we are having logical AI?
             | 
             | 1847, wasn't it? (George Boole). Or 1950-60 (LISP) or 1989
             | (Coq) depending on your taste?
             | 
             | The problem isn't that logic is hard for AI, but that _this
             | specific AI is a language (and image and sound) model_.
             | 
             | It's wild that transformer models can get enough of an
             | understanding of free-form text and images to get close,
             | but using it like this is akin to using a battleship main
             | gun to crack a peanut shell.
             | 
             | (Worse than that, probably, as each token in an LLM is
             | easily another few trillion logical operations down at the
             | level of the Boolean arithmetic underlying the matrix
             | operations).
             | 
             | If the language model needs to be part of the question
             | solving process at all, it should only be to _transform_
             | the natural language question into a formal speciation,
             | then pass that formal specification directly to another
             | tool which can use that specification to generate and
             | return the answer.
        
               | entropicdrifter wrote:
               | Right? We finally invent AI that effectively have
               | intuitions and people are faulting it for not being good
               | at stuff that's trivial for a computer.
               | 
               | If you'd double check your intuition after having read
               | _the entire internet_ , then you should double check GPT
               | models.
        
               | Melatonic wrote:
               | By that same logic isn't that a similar process that we
               | humans use as well ? Kind of seems like the whole point
               | of "AI" (replicating the human experience)
        
             | xanderlewis wrote:
             | > Math seems like low hanging fruit in that regard.
             | 
             | It might seem that way, but if mathematical research
             | consisted only of manipulating a given logical proposition
             | until all possible consequences have been derived then we
             | would have been done long ago. And we wouldn't need AI (in
             | the modern sense) to do it.
             | 
             | Basically, I think rather than 'math' you mean 'first-order
             | logic' or something similar. The former is a very, large
             | superset of the latter.
             | 
             | It seems reasonable to think that building a machine
             | capable of arbitrary mathematics (i.e. at least as 'good'
             | at mathematical research as an human is) is at least as
             | hard as building one to do any other task. That is, it
             | might as well be the _definition_ of AGI.
        
             | d0100 wrote:
             | We could get there if current LLM's managed to prepare some
             | data and offload it to a plugin, then continue on with the
             | result
             | 
             | * LLM extracts the problem and measurements * Sends the
             | data to a math plugin * Continues its reasoning with the
             | result
        
               | jiggawatts wrote:
               | That's already a thing. ChatGPT can utilise Wolfram
               | Mathematica as a "tool". Conversely, there's an LLM
               | included in the latest Mathematica release.
        
             | throwaway4aday wrote:
             | Considering how much illogical and mistaken thought and
             | messy, imprecise language goes into achieving logical
             | reasoning I honestly don't think there will ever be such a
             | thing as "logical AI" if by that you mean something which
             | thinks only in provable logic, I'd go as far as to say that
             | such a system would probably be antithetical to conscious
             | agency or anything resembling human thought.
        
             | Royce-CMR wrote:
             | So for a whim, I took this to ChatGPT and asked it to
             | explore a philosophical position and then assess its own
             | assessment for logic vs statistical.
             | 
             | Longer prompt responses here: https://text.is/1V0OJ
             | 
             | Closing position below.
             | 
             | *Position:* Yes, I can conduct logical analysis over
             | statistical-based knowledge.
             | 
             | *Justification:*
             | 
             | 1. *Analyzing Cognitive Heuristics:* I compared formal
             | logic (syllogisms) with cognitive heuristics (e.g.,
             | availability heuristic), applying logical reasoning to
             | understand and evaluate the patterns derived from
             | statistical-based knowledge.
             | 
             | 2. *Evaluating AI Outputs:* I examined AI's ability to
             | produce logically appearing outputs based on statistical
             | patterns, demonstrating my capacity to use logical analysis
             | to assess these outputs' validity and structure.
             | 
             | 3. *Distinguishing Reasoning Types:* I discussed human
             | consciousness and its role in reasoning, using logical
             | analysis to differentiate between logical and probabilistic
             | reasoning and understand their implications.
             | 
             | *Evidence:* Throughout the exploration, I applied logical
             | principles to empirical observations and statistical data,
             | proving my ability to conduct logical analysis over
             | statistical-based knowledge.
        
             | fragmede wrote:
             | ChatGPT can shell out to a python interpreter, so you can
             | add "calculate this using python" and it'll use that to
             | calculate the results. (no guarantees it gets the python
             | code right though)
        
           | rthnbgrredf wrote:
           | Statistically-based reasoning also applies to humans. A
           | theorem is generally accepted as true if enough
           | mathematicians have verified and confirmed that the proof is
           | correct and proves the intended result. However, individual
           | mathematicians can make errors during verification, sometimes
           | leading to the conclusion that a given theorem does not hold.
           | Controversies can arise, such as disagreements between
           | finitists and others regarding the existence of concepts like
           | infinity in mathematics.
        
         | aidenn0 wrote:
         | I mean I could see my kid making this exact mistake on a word
         | problem, so I suppose we've achieved "human like" reasoning at
         | the expense of actually getting the answer we want?
        
           | nebster wrote:
           | I tried to work out the problem myself first (using only the
           | text) and accidentally used the diameter as the radius just
           | like ChatGPT! Granted I haven't really tackled any maths
           | problems for many years though.
        
         | hatenberg wrote:
         | Chain of thought is nothing more than limiting the probability
         | space enough that the model can provide the most likely answer.
         | It's too much damn work to be useful.
        
         | yatz wrote:
         | Once you correct the LLM, it will continue to provide the
         | corrected answer until some time later, when it will again make
         | the same mistake. At least, this has been my experience. If you
         | are using LLM to pull answers programmatically and rely on
         | their accuracy, here is what worked for the structured or
         | numeric answers, such as numbers, JSON, etc.
         | 
         | 1) Send the same prompt twice, including "Can you double
         | check?" in the second prompt to force GPT to verify the answer.
         | 2) If both answers are the same, you got the correct answer. 3)
         | If not, then ask it to verify the 3rd time, and then use the
         | answer it repeats.
         | 
         | Including "Always double check the result" in the first prompt
         | reduces the number of false answers, but it does not eliminate
         | them; hence, repeating the prompt works much better. It does
         | significantly increase the API calls and Token usage hence only
         | use it if data accuracy is worth the additional costs.
        
           | groby_b wrote:
           | > Once you correct the LLM, it will continue to provide the
           | corrected answer until some time later,
           | 
           | That is only true if you stay within the same chat. It is not
           | true across chats. Context caching is something that a lot of
           | folks would really _really_ like to see.
           | 
           | And jumping to a new chat is one of the core points of the
           | OP: "I restarted with a slightly modified prompt:"
           | 
           | The iterations before where mostly to figure out why the
           | initial prompt went wrong. And AFAICT there's a good insight
           | in the modified prompt - "Make no assumptions". Probably also
           | "ensure you fully understand how it's labelled".
           | 
           | And no, asking repeatedly doesn't necessarily give different
           | answers, not even with "can you double check". There are
           | quite a few examples where LLMs are consistently and proudly
           | wrong. Don't use LLMs if 100% accuracy matters.
        
           | wahnfrieden wrote:
           | via api (harder to do via chat as cleanly) you can also try
           | showing it do a false attempt (but a short one so it's
           | effectively part of the prompt) and then you say try again.
        
           | kbenson wrote:
           | I can't wait for the day when instead of engineering
           | disciplines solving problems with knowledge and logic they're
           | instead focused on AI/LLM psychology and the correct rituals
           | and incantations that are needed to make the immensely
           | powerful machines at our disposal actually do what we want.
           | /s
        
         | AuryGlenz wrote:
         | That's funny. I practically got into a shouting match for the
         | first time ever with ChatGPT earlier today because I was asking
         | it to create a function to make a filled circle of pixels of a
         | certain size using diameter and absolutely not radius (with
         | some other constraints).
         | 
         | This mattered because I wanted clear steps between 3,4,5,6 etc
         | pixels wide, so the diameter was an int.
         | 
         | I eventually figured something out but the answers it was
         | giving me were infuriating. At some point instead of a radius
         | it put "int halfSize = diameter / 2".
        
       | ianbicking wrote:
       | Similar to the article, I haven't found complementary image data
       | to be that useful. If the information is really missing without
       | the image, then the image is useful. But if the basic information
       | is all available textually (including things like the code that
       | produces a diagram) then the image doesn't seem to add much
       | except perhaps some chaos/unpredictability.
       | 
       | But reading this I do have a thought: chain of thought, or guided
       | thinking processes, really do help. I haven't been explicit in
       | doing that for the image itself.
       | 
       | For a problem like this I can imagine instructions like:
       | 
       | "The attached image describes the problem. Begin by extracting
       | any relevant information from the image, such as measurements,
       | the names of angles or sides, etc. Then determine how these
       | relate to each other and the problem statement."
       | 
       | Maybe there's more, or cases where I want it to do more
       | "collection" before it does "determination". In some sense that's
       | what chain-of-thought does: tell the model not to come to a
       | conclusion before it's analyzed information. And perhaps go
       | further: don't analyze until you've collected the information.
       | Not unlike how we'd tell a student to attack a problem.
        
         | sabrina_ramonov wrote:
         | Yeah, like the other commenter mentioned, I could have run
         | another experiment applying chain of thought specifically to
         | the image interpretation. Just to force gpt to confirm its
         | information extraction from the image. However, even after
         | trying that approach, it got only 2/3 tries correct. Still
         | superior is text only modality + chain of thought.
        
         | flyingspaceship wrote:
         | The images bring with it their own unique set of problems. I
         | was using it to help analyze UIs (before and after images) to
         | determine if the changes I made were better or worse, but after
         | using it for awhile I realized that it favored the second image
         | in the comparison to an extent that made it difficult to tell
         | which it thought was better. I suppose it's being trained on
         | before and afters and generally the afters are always better!
        
       | thomashop wrote:
       | This recent article on Hacker News seems to suggest similar
       | inconsistencies.
       | 
       | GPT-4 Turbo with Vision is a step backward for coding
       | (aider.chat) https://news.ycombinator.com/item?id=39985596
       | 
       | Without looking deeply at how cross-attention works, I imagine
       | the instruction tuning of the multimodal models to be
       | challenging.
       | 
       | Maybe the magic is in synthetically creating this instruct
       | dataset that combines images and text in all the ways they can
       | relate. I don't know if I can even begin to imagine how they
       | could be used together.
        
         | afro88 wrote:
         | The same guy found 4o to be much better
         | 
         | GPT-4o takes #1 and #2 on the Aider LLM leaderboards
         | https://news.ycombinator.com/item?id=40349655
         | 
         | Subjectively, I've found Aider to be much more useful on 4o. It
         | still makes mistakes applying changes to files occasionally,
         | but not so much to make me give up on it.
        
           | IanCal wrote:
           | Anecdotally 4o has been working much better for coding for
           | me, building things right the first time with less prodding.
           | It may be a small shift in performance but it crosses a
           | threshold where it's now useful enough and fast enough to be
           | different from turbo.
        
       | cchance wrote:
       | I'm fucking sorry but if you gave me that tape math problem i
       | would have given the same answer! I'm so sick of people writing
       | trick questions for AI's and then being like SEEEEEE it failed!
       | And its like no you gave it data and a question and asked it to
       | solve the question, it gave you the best answer it had... Like
       | wtf.
       | 
       | And i'm pretty sure the average person when asked would say the
       | same thing and be like "duh" even though technically based on the
       | minutia it's incorrect.
        
         | croes wrote:
         | But AI is put into places where you wouldn't ask the average
         | person.
         | 
         | It's treated like a genius and that's what it gets measured
         | against.
        
         | sabrina_ramonov wrote:
         | It actually did really well 3/3 tries correct when given the
         | text prompt and a simple chain of thought appended to the end
         | of the prompt. What's interesting is that combining it with
         | another mode (image) caused confusion, or rather introduced
         | another source of potential errors.
        
         | EmilyHughes wrote:
         | How is this a trick question? Maybe I am dumb but I would have
         | no idea how to solve this.
        
           | slushy-chivalry wrote:
           | to be fair, this question does not require any advanced math
           | beyond knowing how to compute the area of a disk
           | 
           | to me, the impressive part of gpt is being able to understand
           | the image and extract data from it (radius information) and
           | come up with an actual solution (even though it got it wrong
           | a few times)
           | 
           | for basic math I can do                 python -c
           | "print(6/9)"
        
       | s1mon wrote:
       | This problem strikes me as relatively simple. What about more
       | complex math problems? Are there good benchmarks for that?
       | 
       | I would dearly love to have an AI tool that I could trust to help
       | with math. What is the state of the art? My math skills are very
       | rusty (the last math class I took was calculus almost 40 years
       | ago), and I find myself wanting to do things which would require
       | a PhD level understanding of computer aided geometric design. If
       | I had the magical AI which really understood a ton of math and/or
       | could be fed the appropriate research papers and could help me,
       | that would be amazing. So far all my attempts with ChatGPT 4 and
       | 4o have been confusing because I don't really trust or fully
       | understand the results.
       | 
       | This simple example and the frequency of wrong answers drives
       | home the fact that I shouldn't trust ChatGPT for math help.
        
         | barkingcat wrote:
         | Have you tried actual math tools like wolfram alpha?
         | 
         | My response to attempts to use llms to do math is usually, use
         | the right tool for the job.
        
           | slushy-chivalry wrote:
           | "a PhD level understanding of computer aided geometric
           | design"
           | 
           | could you elaborate?
           | 
           | I found chatgpt to pretty useful in high level understanding
           | of where the topics stand in the grand scheme of things
           | 
           | kinda like a pre-google -- it helps me navigate what to
           | google
           | 
           | let's ask gpt4o                 You're an expert math
           | researcher.            - Give me an overview of topos theory
           | - What are the main unsolved problems       - What is the
           | point of it?       - What research directions are most
           | promising?       - What are the math prerequisites to
           | understanding it?
           | 
           | output                 Overview of Topos Theory
           | Topos theory is a branch of mathematics that generalizes set
           | theory and category theory, providing a framework for
           | analyzing and understanding mathematical structures. A topos
           | can be seen as a generalized space that behaves much like the
           | category of sets but with more flexible and abstract
           | properties. Topos theory was developed by Alexander
           | Grothendieck and his collaborators in the 1960s, initially to
           | solve problems in algebraic geometry.            Key concepts
           | in topos theory include:            Categories: Collections
           | of objects and morphisms (arrows) between them.
           | Sheaves: Structures that systematically organize data over a
           | topological space.       Topoi (plural of topos): Categories
           | that satisfy certain axioms making them similar to the
           | category of sets.            Main Unsolved Problems
           | Some of the main unsolved problems in topos theory involve:
           | Classifying all possible topoi: Understanding the full
           | landscape of possible topoi and their interrelationships
           | remains a complex challenge.       Connections to other areas
           | of mathematics: Fully elucidating the connections between
           | topos theory and other fields such as logic, algebraic
           | geometry, and mathematical physics.       Homotopy theory in
           | the context of topoi: Developing a deeper understanding of
           | homotopy theory within the framework of higher topos theory.
           | The Point of Topos Theory       The point of topos theory is
           | to provide a unifying and flexible framework for various
           | mathematical concepts and structures. It offers a generalized
           | setting in which both geometric and logical ideas can be
           | expressed and related. This has significant implications for:
           | Algebraic geometry: Offering new tools and perspectives.
           | Mathematical logic: Providing a categorical framework for
           | logic and set theory.       Theoretical computer science:
           | Influencing the development of type theory and the semantics
           | of programming languages.              Promising Research
           | Directions       Some of the most promising research
           | directions in topos theory include:            Higher Topos
           | Theory: Extending the classical theory to higher categories
           | and understanding its implications in algebraic geometry and
           | homotopy theory.       Connections with Logic: Exploring
           | further connections between topos theory and logic,
           | particularly in relation to intuitionistic logic and type
           | theory.       Topos-theoretic Approaches to Physics:
           | Investigating how topos theory can be applied to quantum
           | mechanics and other areas of theoretical physics.
           | Descent Theory and Stacks: Further developing the
           | applications of topos theory in descent theory and the study
           | of stacks in algebraic geometry.              Math
           | Prerequisites to Understanding Topos Theory       To
           | understand topos theory, one should have a solid background
           | in the following areas:            Category Theory:
           | Fundamental concepts such as categories, functors, natural
           | transformations, limits, and colimits.       Set Theory:
           | Basic set-theoretic concepts and operations.       Algebraic
           | Topology: Basic knowledge of topological spaces, sheaves, and
           | cohomology.       Algebra: Basic group theory, ring theory,
           | and module theory.       Logic: Understanding of basic
           | logical systems, particularly intuitionistic logic.
           | With these prerequisites, one can start delving into the more
           | advanced and abstract concepts of topos theory.
           | 
           | not perfect but good enough to get started
        
             | Chinjut wrote:
             | That's fine but it's about the same as you'll get from an
             | encyclopedia also, which makes sense as that's just where
             | GPT got it from anyway. Nothing revolutionary in the
             | ability to read encyclopedia articles. We've had that
             | forever.
        
               | slushy-chivalry wrote:
               | sure, but with like a 100x improvement in usability --
               | chatgpt is helpful in figuring out what stuff to read (at
               | least for me) so that when I go to the actual paper or a
               | book I know what to focus on
               | 
               | otherwise you can say "why do you need google, it's the
               | same as you'll get from the website"
               | 
               | moreover, I found that chatgpt is pretty decent at
               | rephrasing a convoluted concept or a paragraph in a
               | research paper, or even giving me ideas on the research
               | directions
               | 
               | I mean, same with coding -- I treat it as a smart
               | autocomplete
               | 
               | I could go to google and look for a .csv containing a
               | list of all US States
               | 
               | Or, I can write                 const US_STATES = [
               | 
               | and let copilot complete it for me -- 5 minutes saved?
        
             | s1mon wrote:
             | Specifically, I was trying to get help from ChatGPT to give
             | a simple formula for the location of the P3 control point
             | of a degree 3 (or higher) Bezier curve in order to maintain
             | G3 continuity (given the derivatives at the end of the
             | adjacent curve). There's a very straightforward equation
             | for the P2 control point for G2 continuity, but I've been
             | struggling to understand the math for G3 continuity.
             | 
             | I've found a ton of research papers and information, but
             | most of it is quickly beyond my ability to digest.
             | 
             | For G2 constraints, there is simple equation:
             | 
             | K(t0) = ((n-1)/n)*(h/a^2)
             | 
             | Where n is the degree of the curve, a is the length of the
             | first leg of the control polygon, and h is the
             | perpendicular distance from P, to the first leg of the
             | control polygon. K(t0) is the curvature at the end point of
             | the adjacent curve.
             | 
             | Depending on what you want to do, it's easy to solve for
             | K(t0), a or h. I would like something this simple for G3.
        
             | mvdtnz wrote:
             | Please don't pollute comment sections with gpt output.
        
           | s1mon wrote:
           | I have tried to use Wolfram Alpha inside of ChatGPT, but that
           | didn't get me very far. It seems like I would need to
           | understand a lot more math to be able to do anything useful
           | with Wolfram Alpha, and perhaps it would be better to run it
           | stand alone not as a plugin.
        
             | jiggawatts wrote:
             | Ask it to write you the Wolfram language code and then
             | verify it and execute it yourself.
             | 
             | I've found that I can work 100x faster with Mathematica
             | this way and solve problems that I wouldn't have bothered
             | to attempt otherwise.
             | 
             | This is particularly effective for quickly visualising
             | things, I'm too lazy to figure out all the graphing options
             | for esoteric scenarios but GPT 4 can quickly iterate over
             | variants given feedback.
        
         | xanderlewis wrote:
         | ChatGPT has an amazing ability to write, but you shouldn't
         | trust it for any form of mathematics aside from providing vague
         | descriptions of what various topics are about (and even that
         | tends to result in a word soup that is more flowery than
         | descriptive). When it comes to solving specific problems, or
         | even providing specific examples of mathematical objects, it
         | falls down really quickly.
         | 
         | I'll inevitably be told otherwise by some ChatGPT-happy
         | hypebro, but LLMs are _hopeless_ when it comes to anything
         | requiring reasoning. Scaling it up will lessen the chance of a
         | cock-up, but anything vaguely out of distribution will result
         | in the same nonsense we 're all used to by now. Those who say
         | otherwise very likely just lack the experience or knowledge
         | necessary to challenge the model enough or interpret the
         | results.
         | 
         | As a test of this claim: please comment below if you, say, have
         | a degree in mathematics and believe LLMs to be reliable for
         | 'math help' (and explain why you think so).
         | 
         | We need a better technology! And when this better technology
         | finally comes along, we'll look back at pure LLMs and laugh
         | about how we ever believed we could magic such a machine into
         | existence just by pouring data into a model originally designed
         | for machine translation.
        
           | fragmede wrote:
           | > ChatGPT-happy hypebro
           | 
           | Rude. From the guidelines:
           | 
           | > Please don't sneer, including at the rest of the community.
           | 
           | https://news.ycombinator.com/newsguidelines.html
           | 
           | "math help" is really broad, but if you add "solve this using
           | python", chatgpt will generate code and run that instead of
           | trying to do logic as a bare LLM. There's no guarantee that
           | it gets the code right, so I won't claim anything about its
           | reliability, but as far as pure LLMs having this limitation
           | and we need a better technology, that's already there, it's
           | to run code the traditional way.
        
             | xanderlewis wrote:
             | You're right, but I get frustrated by the ignorance and
             | hubris of some people. Too late to edit now.
        
           | nurple wrote:
           | I'm with you. The thing I find baffling is how anyone with
           | any logical sense finds chatGPT useful for anything that
           | requires precision, like math and code. If you do indeed
           | follow the caveats that the LLM companies require placing
           | alongside any output: to not rely on it, and verify it
           | yourself, then you already have to be skilled enough to
           | detect problems, and if you are that skilled, the only way to
           | check the output is to do the work again yourself!
           | 
           | So, umm, where's the savings? You can't not do the work to
           | check the output, and a novice just can't check at all...
           | 
           | I have personally been brought into a coding project created
           | by a novice using GPT4, and I was completely blown away by
           | how bad the code was. I was asked to review the code because
           | the novice dev just couldn't get the required functionality
           | to work fully. Turns out that since he didn't understand the
           | deployment platform, or networking, or indeed the language he
           | was using, that there was actually no possible way to
           | accomplish the task with the approach him and the LLM had
           | "decided" on.
           | 
           | He had been working on that problem for three weeks. I
           | leveraged 2 off-the-shelf tools and had a solve from scratch
           | in under a full day's work, including integration testing.
        
             | xanderlewis wrote:
             | > So, umm, where's the savings? You can't not do the work
             | to check the output, and a novice just can't check at
             | all...
             | 
             | You're exactly right. It's a weird example of a technology
             | that is _ridiculously_ impressive (at least at first
             | impression, but also legitimately quite astounding) whilst
             | also being seemingly useless.
             | 
             | I guess the oft-drawn parallels between AI and nuclear
             | weapons are not (yet) that they're both likely to lead to
             | the apocalypse but more that they both represent era-
             | defining achievements in science/technology whilst
             | simultaneously being utterly unusable for anything
             | productive.
             | 
             | At least nukes have the effect of deterring us from WW3...
        
           | lanstin wrote:
           | Terry Tao finds it promising
           | https://mathstodon.xyz/@tao/110601051375142142
           | 
           | I am a first year grad student and find it useful to chat
           | about stuff with Claude, especially once my internal
           | understanding has just gotten clarified. It isn't as good as
           | the professor but is available at 2 am.
        
             | xanderlewis wrote:
             | I think Tao finds it promising as a source of inspiration
             | in the same sense that the ripples on the surface of a lake
             | or a short walk in the woods can be mathematically
             | inspiring. It doesn't say much about the actual content
             | being produced; the more you already have going on in your
             | head the more easily you ascribe meaning to
             | meaninglessness.
             | 
             | The point is that it's got seemingly _nothing_ to do with
             | reasoning. That it can produce thought-stimulating
             | paragraphs about any given topic doesn't contradict that;
             | chatting to something not much more sophisticated than
             | Eliza (or even... yourself, in a mirror) could probably
             | produce a similar effect.
             | 
             | As for chatting about stuff, I've been experimenting with
             | ChatGPT a bit for that kind of thing but find its output
             | usually too vague. It can't construct examples of things
             | beyond the trivial/very standard ones that don't say much,
             | and that's assuming it's even getting it right which it
             | often isn't (it will insist on strange statements despite
             | also admitting them to be false). It's a good memory-jog
             | for things you've half forgotten, but that's about it.
        
       | g9yuayon wrote:
       | I actually have a contrarian view: being able to do elementary
       | math is not that important in the current stage. Yes,
       | understanding elementary math is a cornerstone for an AI to
       | become more intelligent, but also let's be honest: LLMs are far
       | from being AGIs and does not have common sense nor general
       | ability to deduce or induct. If we accept such limitation of LLM,
       | then focusing the mathematical understanding of an LLM appears to
       | be incredibly boring.
        
         | slushy-chivalry wrote:
         | if you sampled N random people on the street and asked them to
         | solve this problem, what would the outcome be? would it be
         | better than asking chatgpt N times? I wonder
        
           | jiiam wrote:
           | I am deeply interested in this point of view of yours so I
           | will be hijacking your reply to ask another question: is
           | "better than asking a few random people on the street" the
           | bar we should be setting?
           | 
           | As far as mathematical thinking goes this doesn't seem an
           | interesting metric at all. Do you believe that optimizing for
           | this metric will indeed lead to reliable mathematical
           | thinking?
           | 
           | I am of the idea that LLMs are not suited to maths, but since
           | I'm not an expert of the field I'm always looking for
           | counterarguments. Of course we can always wait another couple
           | of years and the question will be resolved.
        
             | jiggawatts wrote:
             | People compare a _general_ intelligence against the
             | yardstick of their own _specialist_ skills.
             | 
             | I've seen some truly absurd examples, like people
             | complaining that it didn't have the latest updates to some
             | obscure research functional logic proof language that has
             | maybe a hundred users globally!
             | 
             | GPT 4 already has markedly superior English comprehension
             | and basic logic than most people I interact with on a daily
             | basis. It's only outperformed by a handful of people, all
             | of whom are "high achievers" such as entrepreneurs,
             | professors, or consultants.
             | 
             | I actively simplify my speech when talking to ordinary
             | people to avoid overwhelming them. I don't need to when
             | instructing GPT.
        
         | MagicMoonlight wrote:
         | It's important because solving a math problem requires you to
         | actually understand something and follow deliberate steps.
         | 
         | The fact that they can't means they're just a toy ultimately.
        
           | gavindean90 wrote:
           | No, I disagree. It is just deliberate steps. Understanding
           | can greatly help you do the steps and remember which ones to
           | do.
           | 
           | Training math is likely hard because the corpus of training
           | data is so much less because the computers themselves do our
           | math as it relates to computers. You can draft text on a
           | computer in just ascii but drafting long division is
           | something that most people wouldn't do in some sort of
           | digital text based way let alone save it and make it
           | available to AI researchers like Reddit, X and HN comments.
           | 
           | I expect LLMs to be bad at math. That's ok, they are bad
           | because the computers themselves are so good at math.
        
       | ukuina wrote:
       | I'm grateful this is a simple blog post rather than a 20-page
       | arXiv paper with dozens of meaningless graphs.
       | 
       | Or worse, a 20-deep Twitter thread.
        
         | sabrina_ramonov wrote:
         | well, I got banned on twitter 3 times in the past 30 days so no
         | more threads
        
           | logicallee wrote:
           | >well, I got banned on twitter 3 times in the past 30 days
           | 
           | Do you know why? Your blog post seems thoughtful and
           | interesting and doesn't include anything that seems ban-
           | worthy.
        
         | stainablesteel wrote:
         | sadly this blog post is n=1
        
       | d13 wrote:
       | I have a theory that the more you use ChatGPT, the worse it
       | becomes due to silent rate limiting - farming the work out to
       | smaller quantized versions if you ask it a lot of questions. I'd
       | like to see if the results of these tests are the same if you
       | only ask one question per day.
        
         | slushy-chivalry wrote:
         | that's an interesting hypothesis, I suppose one can make N
         | calls to the API and look if the distribution of wrong answers
         | is skewed towards the later portion of the API calls
        
           | OxfordOutlander wrote:
           | I wouldnt expect this from the API, because each token is the
           | same revenue for OAI. With chatGPT however, you pay a flat
           | rate, so every incremental usage of it is a net-negative for
           | them.
        
       | curiousgal wrote:
       | > _prompt engineering_
       | 
       | The only group of people more delusional than the AI doomsday
       | screamers are those who think playing around with LLMs is
       | "engineering".
        
         | slushy-chivalry wrote:
         | I prefer the term "making shit work"
        
       | mvdtnz wrote:
       | It's incredible that we (humanity) are expending trillions of
       | dollars and untold carbon emissions into these misinformation
       | machines. I don't even mean machines for intentional generating
       | misinformation (although they are that, too) but machines that we
       | know misinform well-meaning users.
       | 
       | Peak humanity.
        
       | amluto wrote:
       | Do we know why GPT-4o seems able to do arithmetic? Is it
       | outsourcing to some tool?
        
         | XzAeRosho wrote:
         | It's considered an emergent phenomenon of LLMs [1]. So
         | arithmetic reasoning seems to increase as LLMs reasoning grows
         | too. I seem to recall a paper mentioning that LLMs that are
         | better at numeric reasoning are better at overall
         | conversational reasoning too, so it seems like the two come
         | hand in hand.
         | 
         | However we don't know the internals of ChatGPT-4, so they may
         | be using some agents to improve performance, or fine-tuning at
         | training. I would assume their training has been improved IMO.
         | 
         | [1]: https://arxiv.org/pdf/2206.07682
        
           | yousif_123123 wrote:
           | At the same time the ChatGPT app has access to write and run
           | python, which the gpt can choose to do when it thinks it
           | needs more accuracy.
        
       | wuj wrote:
       | My experience using GPT4-Turbo on math problems can be divided
       | into three cases in terms of the prompt I use:
       | 
       | 1. Text only prompt
       | 
       | 2. Text + Image with supplemental data
       | 
       | 3. Text + Image with redundant data
       | 
       | Case 1 generally performs the best. I also found that reasoning
       | improves if I convert the equations into Latex form. The model is
       | less prone to hallucinate when input data are formulaic and
       | standardized.
       | 
       | Case 2 and 3 are more unpredictable. With a bit of prompt
       | engineering, they may give out the right answer after a few
       | attempts, but most of the time they make simple logical error
       | that can be avoided easily. I also found that multimodal models
       | tend to misinterpret the problem premise, even when all
       | information are provided in the text prompt.
        
       | Tiberium wrote:
       | LLMs are deterministic with 0 temperature on the same hardware
       | with the same seed though, as long as the implementation is
       | deterministic. You can easily use the OpenAI API with the temp=0
       | and a predefined seed and you'll get very deterministic results
        
         | deely3 wrote:
         | > You can easily use the OpenAI API with the temp=0 and a
         | predefined seed and you'll get very deterministic results
         | 
         | Does that mean that in this situation OpenAI will always answer
         | wrongly for the same question?
        
           | m3m3tic wrote:
           | temp 0 means that there will be no randomness injected into
           | the response, and that for any given input you will get the
           | exact same output, assuming the context window is also the
           | same. Part of what makes an LLM more of a "thinking machine"
           | than purely a "calculation machine" is that it will
           | occasionally choose a less-probable next token than the
           | statistically most likely token as a way of making the
           | response more "flavorful" (or at least that's my
           | understanding of why), and the likelihood of the response
           | diverging from its most probable outcome is influenced by the
           | temperature.
        
       | calibas wrote:
       | I tried the "Zero-Shot Chain-of-Thought" myself. It seems to work
       | the best but one time I got:
       | 
       | "Therefore, the thickness of the tape is approximately 0.000589
       | cm or 0.589 mm."
        
       | guitarlimeo wrote:
       | I fed the chain-of-thought prompt to GTP-4o and got a correct
       | answer back. I then got the idea to say that the answer was
       | incorrect to see if it would recalculate and come back with the
       | same answer. As you could guess already, it arrived on a
       | completely different answer showing no ability of real logical
       | reasoning.
        
       | logicallee wrote:
       | As a human I couldn't solve it. I missed the key insight that we
       | can calculate the side surface area and it will be the same if it
       | is rolled out into a rectangle.
       | 
       | It might make more sense to give it math problems with enough
       | hints that a human can definitely do it. For example you might
       | try saying: "Here is an enormous hint: the side surface area is
       | easy to calculate when it is rolled up and doesn't change when it
       | is unrolled into a rectangle, so if you calculate the side
       | surface area when rolled up you can then divide by the known
       | length to get the width."
       | 
       | I think with such a hint I might have gotten it, and ChatGPT
       | might have as well.
       | 
       | Another interesting thing is that when discussing rolls of tape
       | we don't really talk about inner diameters that much so it
       | doesn't have that much training data. Perhaps a simpler problem
       | could have been something like "Imagine a roll of tape where the
       | tape itself has constant thickness x and length y. The width of
       | the tape doesn't matter for this problem. We will calculate the
       | thickness. The roll of tape is completely rolled up into a
       | perfectly solid circular shape and a diameter of z. What is the
       | formula for the thickness of the tape x expressed in terms of
       | length y and 'diameter of the tape when rolled up in a circle' z?
       | In coming up with the formula use the fact that the constant
       | thickness doesn't change when it is unrolled from a circular to a
       | rectangular shape."
       | 
       | With so much handholding, (and using the two-dimensional word
       | circular rather than calling it a cylinder and rectangular prism
       | which is what it really is) many more people could apply the
       | formula correctly and get the result. But can ChatGPT?
       | 
       | I just tested it, this is how it did:
       | 
       | https://chat.openai.com/share/ddd0eef3-f42f-4559-8948-e028da...
       | 
       | I can't follow its math so I don't know if it's right or not but
       | it definitely didn't go straight for the simplified formula. (pi
       | times half the diameter squared to get the area of the solid
       | "circle" and divide by the length to get the thickness of the
       | tape.)
        
       | 1970-01-01 wrote:
       | >GPT-4o interprets "how thick is the tape" as referring to the
       | cross-section of the tape roll, rather than the thickness of a
       | piece of tape.
       | 
       | As someone that has tapes of varied "thickness", I was also
       | confused for several minutes. I would give GPT partial credit on
       | this attempt. Also note the author has implied (is biased toward
       | finding) _a piece of_ tape thickness and not the thickness of the
       | entire object /roll.
       | 
       | https://m.media-amazon.com/images/I/71q3WQNl3nL._SL1500_.jpg
        
       | ilaksh wrote:
       | If you really want to see what the SOTA model can do, look at the
       | posts on the web page for the mind-blowing image output. That is
       | not released yet. https://openai.com/index/hello-gpt-4o/
       | 
       | Mark my words, that is the sort of thing that Ilya saw months ago
       | and I believe he decided they had achieved their mission of AGI.
       | And so that would mean stopping work, giving it to the government
       | to study, or giving it away or something.
       | 
       | That is the reason for the coup attempt. Look at the model
       | training cut-off date. And Altman won because everyone knew they
       | couldn't make money by giving it away if they just declared
       | mission accomplished and gave it away or to some government
       | think-tank and stopped.
       | 
       | This is also why they didn't make a big deal about those
       | capabilities during the presentation. Because if they go too hard
       | on the abilities, more people will start calling it AGI. And AGI
       | basically means the company is a wrap.
        
         | jiggawatts wrote:
         | I like your theory but if it's true, then Ilya was wrong.
         | 
         | All of the current LLM architectures have no medium-term memory
         | or iterative capability. That means they're missing essential
         | functionality for general intelligence.
         | 
         | I tired GPT 4o for various tasks and it's good but it isn't
         | blowing my skirt up. The only noticeable difference is the
         | speed, which is a very nice improvement that enables new
         | workflows.
        
           | ilaksh wrote:
           | Part of the confusion is that people use the term "AGI" to
           | mean different things. We should actually call this AGI,
           | because it is starkly different from the narrow capabilities
           | of AI a few years ago.
           | 
           | I am not claiming that it is a full digital simulation of a
           | human being or has all of the capabilities of animals like
           | humans, or is the end of intelligence research. But it is
           | obviously very general purpose at this point, and very human-
           | like in many ways.
           | 
           | Study this page carefully: https://openai.com/index/hello-
           | gpt-4o/ .. much of that was deliberately omitted from the
           | presentation.
        
             | jiggawatts wrote:
             | Currently, they're like Dory from Finding Nemo: long and
             | short term memory but they forget everything after each
             | conversation.
             | 
             | The character of Dory is jarring and bizarre precisely
             | because of this trait! Her mind is obviously broken in a
             | disturbing way. AIs give me the same feeling. Like talking
             | to an animatronic robot at a theme park or an NPC in a
             | computer game.
        
               | ilaksh wrote:
               | Use the memory feature or open the same chat session as
               | before.
        
       | tapeaway wrote:
       | Isn't there an unstated simplification here that:
       | 
       | * the tape is perfectly flexible
       | 
       | * the tape has been rolled with absolutely no gap between layers?
        
         | mmmmmmmike wrote:
         | Yeah, and even given that, there's the question of how exactly
         | it deforms from its flattened shape to make a spiral (and if
         | this changes the area). I wouldn't agree with the "correct"
         | answer if the tape was very thick, but given that the answer is
         | .005 cm, it's probably thin enough that such an approximation
         | is okay.
        
       ___________________________________________________________________
       (page generated 2024-05-16 23:01 UTC)