[HN Gopher] Can AI do maths yet? Thoughts from a mathematician
       ___________________________________________________________________
        
       Can AI do maths yet? Thoughts from a mathematician
        
       Author : mathgenius
       Score  : 372 points
       Date   : 2024-12-23 10:50 UTC (1 days ago)
        
 (HTM) web link (xenaproject.wordpress.com)
 (TXT) w3m dump (xenaproject.wordpress.com)
        
       | noFaceDiscoG668 wrote:
       | "once" the training data can do it, LLMs will be able to do it.
       | and AI will be able to do math once it comes to check out the
       | lights of our day and night. until then it'll probably wonder
       | continuously and contiguously: "wtf! permanence! why?! how?! by
       | my guts, it actually fucking works! why?! how?!"
        
         | tossandthrow wrote:
         | I do think it is time to start questioning whether the utility
         | of ai solely can be reduced to the quality of the training
         | data.
         | 
         | This might be a dogma that needs to die.
        
           | noFaceDiscoG668 wrote:
           | I tried. I don't have the time to formulate and scrutinise
           | adequate arguments, though.
           | 
           | Do you? Anything anywhere you could point me to?
           | 
           | The algorithms live entirely off the training data. They
           | consistently fail to "abduct" (inference) beyond any
           | language-in/of-the-training-specific information.
        
             | jstanley wrote:
             | The best way to predict the next word is to accurately
             | model the underlying system that is being described.
        
             | tossandthrow wrote:
             | It is a gradual thing. Presumably the models are inferring
             | things on runtime that was not a part of their training
             | data.
             | 
             | Anyhow, philosophically speaking you are also only exposed
             | to what your senses pick up, but presumably you are able to
             | infer things?
             | 
             | As written: this is a dogma that stems from a limited
             | understanding of what algorithmic processes are and the
             | insistence that emergence can not happen from algorithmic
             | systems.
        
           | croes wrote:
           | If not bad training data shouldn't be problem
        
             | kergonath wrote:
             | There can be more than one problem. The history of
             | computing (or even just the history of AI) is full of
             | things that worked better and better right until they hit a
             | wall. We get diminishing returns adding more and more
             | training data. It's really not hard to imagine a series of
             | breakthroughs bringing us way ahead of LLMs.
        
         | Flenkno wrote:
         | AWS announced 2 or 3 weeks a way of formulating rules into a
         | formal language.
         | 
         | AI doesn't need to learn everything, our LLM Models already
         | contain EVERYTHING. Including ways of how to find a solution
         | step by step.
         | 
         | Which means, you can tell an LLM to translate whatever you
         | want, into a logical language and use an external logic
         | verifier. The only thing a LLM or AI needs to 'understand' at
         | this point is to make sure that the statistical translation
         | from left to right is high enough.
         | 
         | Your brain doesn't just do logic out of the box, You conclude
         | things and formulate them.
         | 
         | And plenty of companies work on this. Its the same with
         | programming, if you are able to write code and execute it, you
         | execute it until the compiler errors are gone. Now your LLM can
         | write valid code out of the box. Let the LLM write unit tests,
         | now it can verify itself.
         | 
         | Claude for example offers you, out of the box, to write a
         | validation script. You can give claude back the output of the
         | script claude suggested to you.
         | 
         | Don't underestimate LLMs
        
           | TeamDman wrote:
           | Is this the AWS thing you referenced?
           | https://aws.amazon.com/what-is/automated-reasoning/
        
       | casenmgreen wrote:
       | I may be wrong, but I think it a silly question. AI is basically
       | auto-complete. It can do math to the extent you can find a
       | solution via auto-complete based on an existing corpus of text.
        
         | Bootvis wrote:
         | You're underestimating the emergent behaviour of these LLM's.
         | See for example what Terrence Tao thinks about o1:
         | 
         | https://mathstodon.xyz/@tao/113132502735585408
        
           | WhyOhWhyQ wrote:
           | I'm always just so pleased that the most famous mathematician
           | alive today is also an extremely kind human being. That has
           | often not been the case.
        
         | roflc0ptic wrote:
         | Pretty sure this is out of date now
        
         | mdp2021 wrote:
         | > _AI is basically_
         | 
         | Very many things conventionally labelled in the 50's.
         | 
         | You are speaking of LLMs.
        
           | casenmgreen wrote:
           | Yes - I mean only to say "AI" as the term is commonly used
           | today.
        
             | mdp2021 wrote:
             | > _as the term is commonly used today_
             | 
             | Which is, _wrongly_ : so, don't spread the bad notion and
             | habit.
             | 
             | Bad notion and habit which has a counter-helpful impact on
             | debate.
        
         | esafak wrote:
         | Humans can autocomplete sentences too because we understand
         | what's going on. Prediction is a necessary criterion for
         | intelligence, not an irrelevant one.
        
       | aithrowawaycomm wrote:
       | I am fairly optimistic about LLMs as a human math -> theorem-
       | prover translator, and as a fan of Idris I am glad that the AI
       | community is investing in Lean. As the author shows, the answer
       | to "Can AI be useful for automated mathematical work?" is clearly
       | "yes."
       | 
       | But I am confident the answer to the question in the headline is
       | "no, not for several decades." It's not just the underwhelming
       | benchmark results discussed in the post, or the general concern
       | about hard undergraduate math using different skillsets than
       | ordinary research math. IMO the deeper problem still seems to be
       | a basic gap where LLMs can seemingly do formal math at the level
       | of a smart graduate student but fail at quantitative/geometric
       | reasoning problems designed for fish. I suspect this holds for
       | O3, based on one of the ARC problems it wasn't able to solve:
       | https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...
       | (via https://www.interconnects.ai/p/openais-o3-the-2024-finale-
       | of...) ANNs are simply not able to form abstractions, they can
       | only imitate them via enormous amounts of data and compute. I
       | would say there has been _zero_ progress on  "common sense" math
       | in computers since the invention of Lisp: we are still faking it
       | with expert systems, even if LLM expert systems are easier to
       | build at scale with raw data.
       | 
       | It is the same old problem where an ANN can attain superhuman
       | performance on level 1 of Breakout, but it has to be retrained
       | for level 2. I am not convinced it makes sense to say AI can do
       | math if AI doesn't understand what "four" means with the same
       | depth as a rat, even if it can solve sophisticated modular
       | arithmetic problems. In human terms, does it make sense to say a
       | straightedge-and-compass AI understands Euclidean geometry if
       | it's not capable of understanding the physical intuition behind
       | Euclid's axioms? It makes more sense to say it's a brainless tool
       | that helps with the tedium and drudgery of actually proving
       | things in mathematics.
        
         | asddubs wrote:
         | it can take my math and point out a step I missed and then show
         | me the correct procedure but still get the wrong result because
         | it can't reliably multiply 2-digit numbers
        
           | fifilura wrote:
           | Better than an average human then.
        
             | actionfromafar wrote:
             | Different than an average human.
        
           | watt wrote:
           | it's a "language" model (LLM), not a "math" model. when it is
           | generating your answer, predicting and outputing a word after
           | word it is _not_ multiplying your numbers internally.
        
             | asddubs wrote:
             | Yes, I know. It's just kind of interesting how it can make
             | inferences about complicated things but not get
             | multiplications correct that would almost definitely have
             | been in its training set many times (two digit by two
             | digit)
        
         | QuadmasterXLII wrote:
         | To give a sense if scale: It's not that o3 failed to solve that
         | red blue rectangle problem once: o3 spent thousands of gpu
         | hours putting out text about that problem, creating by my math
         | about a million pages of text, and did not find the answer
         | anywhere in those pages. For other problems it did find the
         | answer around the million page mark, as at the ~$3000 per
         | problem spend setting the score was still slowly creeping up.
        
           | josh-sematic wrote:
           | If the trajectory of the past two years is any guide, things
           | that can be done at great compute expense now will rapidly
           | become possible for a fraction of the cost.
        
             | asadotzler wrote:
             | The trajectory is not a guide, unless you count the recent
             | plateauing.
        
         | aithrowawaycomm wrote:
         | Just a comment: the example o1 got wrong was actually
         | underspecified: https://anokas.substack.com/p/o3-and-arc-agi-
         | the-unsolved-ta...
         | 
         | Which is actually a problem I have with ARC (and IQ tests more
         | generally): it is computationally cheaper to go from ARC
         | transformation rule -> ARC problem than it is the other way
         | around. But this means it's pretty easy to generate ARC
         | problems with non-unique solutions.
        
       | est wrote:
       | At this stage I assume everything having a sequencial pattern can
       | and will be automated by LLM AIs.
        
         | Someone wrote:
         | I think that's provably incorrect for the current approach to
         | LLMs. They all have a horizon over which they correlate tokens
         | in the input stream.
         | 
         | So, for any LLM, if you intersperse more than that number of
         | 'X' tokens between each useful token, they won't be able to do
         | anything resembling intelligence.
         | 
         | The current LLMs are a bit like n-gram databases that do not
         | use letters, but larger units.
        
           | red75prime wrote:
           | The follow-up question is "Does it require a paradigm shift
           | to solve it?". And the answer could be "No". Episodic memory,
           | hierarchical learnable tokenization, online learning or
           | whatever works well on GPUs.
        
           | beng-nl wrote:
           | It's that a bit of an unfair sabotage?
           | 
           | Naturally, humans couldn't do it, even though they could edit
           | the input to remove the X's, but shouldn't we evaluate the
           | ability (even intelligent ability) of LLM's on what they can
           | generally do rather than amplify their weakness?
        
             | Someone wrote:
             | Why is that unfair in reply to the claim _"At this stage I
             | assume everything having a sequencial pattern can and will
             | be automated by LLM AIs."_?
             | 
             | I am not claiming LLMs aren't or cannot be intelligent, not
             | even that they cannot do magical things; I just rebuked a
             | statement about the lack of limits of LLMs.
             | 
             | > Naturally, humans couldn't do it, even though they could
             | edit the input to remove the X's
             | 
             | So, what are you claiming: that they cannot or that they
             | can? I think most people can and many would. Confronted
             | with a file containing millions of X's, many humans will
             | wonder whether there's something else than X's in the file,
             | do a 'replace all', discover the question hidden in that
             | sea of X's, and answer it.
             | 
             | There even are simple files where most humans would easily
             | spot things without having to think of removing those X's.
             | Consider a file                  How         X X X X X X
             | many        X X X X X X        days        X X X X X X
             | are         X X X X X X        there       X X X X X X
             | in          X X X X X X        a           X X X X X X
             | week?       X X X X X X
             | 
             | with a million X's on the end of each line. Spotting the
             | question in that is easy for humans, but impossible for the
             | current bunch of LLMs
        
               | int_19h wrote:
               | If you have a million Xs on the end of each line, when a
               | human is looking at that file, he's not looking at the
               | entirety of it, but only at the part that is actually
               | visible on-screen, so the equivalent task for an LLM
               | would be to feed it the same subset as input. In which
               | case they can all answer this question just fine.
        
               | orbital-decay wrote:
               | This is only easy because the software does line wrapping
               | for you, mechanistically transforming the hard pattern of
               | millions of symbols into another that happens to be easy
               | for your visual system to match. Do the same for any
               | visually capable model and it will get that easily too.
               | Conversely, make that a single line (like the one
               | transformers sees) and you will struggle much more than
               | the transformer because you'll have to scan millions of
               | symbols sequentially looking for patterns.
               | 
               | Humans have weak attention compared to it, this is a poor
               | example.
        
         | palata wrote:
         | At this stage I _hope_ everything that needs to be reliable won
         | 't be automated by LLM AIs.
        
       | ned99 wrote:
       | I think this is a silly question, you could track AI's doing very
       | simple maths back in 1960 - 1970's
        
         | mdp2021 wrote:
         | It's just the worrisome linguistic confusion between AI and
         | LLMs.
        
       | jampekka wrote:
       | I just spent a few days trying to figure out some linear algebra
       | with the help of ChatGPT. It's very useful for finding conceptual
       | information from literature (which for a not-professional-
       | mathematician at least can be really hard to find and decipher).
       | But in the actual math it constantly makes very silly errors.
       | E.g. indexing a vector beyond its dimension, trying to do matrix
       | decomposition for scalars and insisting on multiplying matrices
       | with mismatching dimensions.
       | 
       | O1 is a lot better at spotting its errors than 4o but it too
       | still makes a lot of really stupid mistakes. It seems to be quite
       | far from producing results itself consistently without at least a
       | somewhat clueful human doing hand-holding.
        
         | glimshe wrote:
         | Isn't Wolfram Alpha a better "ChatGPT of Math"?
        
           | Filligree wrote:
           | Wolfram Alpha is better at actually doing math, but far worse
           | at explaining what it's doing, and why.
        
             | dartos wrote:
             | What's worse about it?
             | 
             | It never tells you the wrong thing, at the very least.
        
               | fn-mote wrote:
               | Its understanding of problems was very bad last time I
               | used it. Meaning it was difficult to communicate what you
               | wanted it to do. Usually I try to write in the
               | Mathematica language, but even that is not foolproof.
               | 
               | Hopefully they have incorporated more modern LLM since
               | then, but it hasn't been that long.
        
               | jampekka wrote:
               | Wolfram Alpha's "smartness" is often Clippy level
               | enraging. E.g. it makes assumptions of symbols based on
               | their names (e.g. a is assumed to be a constant,
               | derivatives are taken w.r.t. x). Even with Mathematica
               | syntax it tends to make such assumptions and refuses to
               | lift them even when explicitly directed. Quite often one
               | has to change the variable symbols used to try to make
               | Alpha to do what's meant.
        
               | jvanderbot wrote:
               | When you give it a large math problem and the answer is
               | "seven point one three five ... ", and it shows a plot of
               | the result v some randomly selected domain, well there
               | could be more I'd like to know.
               | 
               | You can unlock a full derivation of the solution, for
               | cases where you say "Solve" or "Simplify", but what I
               | (and I suspect GP) might want, is to know why a few of
               | the key steps might work.
               | 
               | It's a fantastic tool that helped get me through my
               | (engineering) grad work, but ultimately the breakthrough
               | inequalities that helped me write some of my best stuff
               | were out of a book I bought in desperation that basically
               | cataloged linear algebra known inequalities and
               | simplifications.
               | 
               | When I try that kind of thing with the best LLM I can use
               | (as of a few months ago, albeit), the results can get
               | incorrect pretty quickly.
        
               | kens wrote:
               | What book was it that you found helpful?
        
               | seattleeng wrote:
               | Im reviewing linear algebra now and would also love to
               | know that book!
        
               | PeeMcGee wrote:
               | > [...], but what I (and I suspect GP) might want, is to
               | know why a few of the key steps might work.
               | 
               | It's been some time since I've used the step-by-step
               | explainer, and it was for calculus or intro physics
               | problems at best, but IIRC the pro subscription will at
               | least mention the method used to solve each step and link
               | to reference materials (e.g., a clickable tag labeled
               | "integration by parts"). Doesn't exactly explain _why_
               | but does provide useful keywords in a sequence that can
               | be used to derive the why.
        
             | amelius wrote:
             | I wish there was a way to tell Chatgpt where it has made a
             | mistake, with a single mouse click.
        
               | akoboldfrying wrote:
               | What's surprising to me is that this would surely be in
               | OpenAI's interests, too -- free RLHF!
               | 
               | Of course there would be the risk of adversaries giving
               | bogus feedback, but my gut says it's relatively
               | straightforward to filter out most of this muck.
        
             | a3w wrote:
             | Is the explanation a pro feature? At the very end it says
             | "step by step? Pay here"
        
           | jampekka wrote:
           | Wolfram Alpha is mostly for "trivia" type problems. Or giving
           | solutions to equations.
           | 
           | I was figuring out some mode decomposition methods such as
           | ESPRIT and Prony and how to potentially extend/customize
           | them. Wolfram Alpha doesn't seem to have a clue about such.
        
           | lupire wrote:
           | No. Wolfram Alpha can't solve anything that isn't a function
           | evaluation or equation. And it can't do modular arithmetic to
           | save its unlife.
           | 
           | WolframOne/Mathematica is better, but that requires the user
           | (or ChatGPT!)to write complicated code, not natural language
           | queries.
        
           | GuB-42 wrote:
           | Wolfram Alpha can solve equations well, but it is terrible at
           | understanding natural language.
           | 
           | For example I asked Wolfram Alpha "How heavy a rocket has to
           | be to launch 5 tons to LEO with a specific impulse of 400s",
           | which is a straightforward application of the Tsiolkovsky
           | rocket equation. Wolfram Alpha gave me some nonsense about
           | particle physics (result: 95 MeV/c^2), GPT-4o did it right
           | (result: 53.45 tons).
           | 
           | Wolfram alpha knows about the Tsiolkovsky rocket equation, it
           | knows about LEO (low earth orbit), but I found no way to get
           | a delta-v out of it, again, more nonsense. It tells me about
           | Delta airlines, mentions satellites that it knows are not in
           | LEO. The "natural language" part is a joke. It is more like
           | an advanced calculator, and for that, it is great.
        
             | bongodongobob wrote:
             | You're using it wrong, you can use natural language in your
             | equation, but afaik it's not supposed to be able to do what
             | you're asking of it.
        
               | CamperBob2 wrote:
               | You know, "You're using it wrong" is usually meant to
               | carry an ironic or sarcastic tone, right?
               | 
               | It dates back to Steve Jobs blaming an iPhone 4 user for
               | "holding it wrong" rather than acknowledging a flawed
               | antenna design that was causing dropped calls. The
               | closest Apple ever came to admitting that it was their
               | problem was when they subsequently ran an employment ad
               | to hire a new antenna engineering lead. Maybe it's time
               | for Wolfram to hire a new language-model lead.
        
               | bongodongobob wrote:
               | It's not an LLM. You're simply asking too much of it. It
               | doesn't work the way you want it to, sorry.
        
               | CamperBob2 wrote:
               | Tell Wolfram. They're the ones who've been advertising it
               | for years, well before LLMs were a thing, using English-
               | language prompts like these examples:
               | https://www.pcmag.com/news/23-cool-non-math-things-you-
               | can-d...
               | 
               | The problem has always been that you only get good
               | answers if you happen to stumble on a specific question
               | that it can handle. Combining Alpha with an LLM could
               | actually be pretty awesome, but I'm sure it's easier said
               | than done.
        
               | Sharlin wrote:
               | Before LLMs exploded nobody really _expected_ WA to
               | perform well at natural language comprehension. The
               | expectations were at the level of  "an ELIZA that knows
               | math".
        
               | edflsafoiewq wrote:
               | Correct, so it isn't a "ChatGPT of Math", which was the
               | point.
        
               | kortilla wrote:
               | No, "holding it wrong" is the sarcastic version. "You're
               | using it wrong" is a super common way to tell people they
               | are literally using something wrong.
        
               | CamperBob2 wrote:
               | But they're not using it wrong. They are using it as
               | advertised by Wolfram themselves (read: himself).
               | 
               | The GP's rocket equation question is _exactly_ the sort
               | of use case for which Alpha has been touted for years.
        
         | spacemanspiff01 wrote:
         | I wonder if these are tokenization issues? I really am curious
         | about metas byte tokenization scheme...
        
           | jampekka wrote:
           | Probably mostly not. The errors tend to be
           | logical/conceptual. E.g. mixing up scalars and matrices is
           | unlikely to be from tokenization. Especially if using spaces
           | between the variables and operators, as AFAIK GPTs don't form
           | tokens over spaces (although tokens may start or end with
           | them).
        
         | lordnacho wrote:
         | The only thing I've consistently had issues with while using AI
         | is graphs. If I ask it to put some simple function, it produces
         | a really weird image that has nothing to do with the graph I
         | want. It will be a weird swirl of lines and words, and it never
         | corrects itself no matter what I say to it.
         | 
         | Has anyone had any luck with this? It seems like the only thing
         | that it just can't do.
        
           | KeplerBoy wrote:
           | You're doing it wrong. It can't produce proper graphs with
           | it's diffusion style image generation.
           | 
           | Ask it to produce graphs with python and matplotlib. That
           | will work.
        
             | lanstin wrote:
             | And works very well - it made me a nice general "draw
             | successively accurate Fourier series approximations given
             | this lambda for coefficients and this lambda for the
             | constant term". PNG output, no real programming errors (I
             | wouldn't remember if it had some stupid error, I'm a python
             | programmer). Even TikZ in LaTeX isn't hopeless (although I
             | did ending up reading the tikz manual)
        
           | thomashop wrote:
           | Ask it to plot the graph with python plotting utilities. Not
           | using its image generator. I think you need a ChatGPT
           | subscription though for it to be able to run python code.
        
             | lupire wrote:
             | You seem to get 2(?) free Python program runs per week(?)
             | as part of the 01 preview.
             | 
             | When you visit chatgpt on the free account it automatically
             | gives you the best model and then disables it after some
             | amount of work and says to come back later or upgrade.
        
               | amelius wrote:
               | Just install Python locally, and copy paste the code.
        
             | xienze wrote:
             | Shouldn't ChatGPT be smart enough to know to do this
             | automatically, based on context?
        
               | CamperBob2 wrote:
               | It was, for a while. I think this is an area where there
               | may have been some regression. It can still write code to
               | solve problems that are a poor fit for the language
               | model, but you may need to ask it to do that explicitly.
        
           | HDThoreaun wrote:
           | The agentic reasoning models should be able to fix this if
           | they have the ability to run code instead of giving each task
           | to itself. "I need to make a graph" "LLMs have difficulty
           | graphing novel functions" "Call python instead" is a line of
           | reasoning I would expect after seeing what O1 has come up
           | with on other problems.
           | 
           | Giving AI the ability to execute code is the safety peoples
           | nightmare though, wonder if we'll hear anything from them as
           | this is surely coming
        
         | amelius wrote:
         | Don't most mathematical papers contain at least one such error?
        
           | aiono wrote:
           | Where is this data from?
        
             | amelius wrote:
             | It's a question, and to be fair to AI it should actually
             | refer to papers before review.
        
               | monktastic1 wrote:
               | Yes, it's a question, but you haven't answered what you
               | read that makes you suspect so.
        
         | mseri wrote:
         | It reliably fails also basic real analysis proofs, but I think
         | this is not too surprising since those require a mix of logic
         | and computation that is likely hard to just infer from
         | statistical likelihood of tokens
        
         | cheald wrote:
         | LLMs have been very useful for me in explorations of linear
         | algebra, because I can have an idea and say "what's this
         | operation called?" or "how do I go from this thing to that
         | thing?", and it'll give me the mechanism and an explanation,
         | and then I can go read actual human-written literature or
         | documentation on the subject.
         | 
         | It often gets the actual math wrong, but it is good enough at
         | connecting the dots between my layman's intuition and the
         | "right answer" that I can get myself over humps that I'd
         | previously have been hopelessly stuck on.
         | 
         | It does make those mistakes you're talking about very
         | frequently, but once I'm told that the thing I'm trying to do
         | is achievable with the Gram-Schmidt process, I can go self-
         | educate on that further.
         | 
         | The big thing I've had to watch out for is that it'll usually
         | agree that my approach is a good or valid one, even when it
         | turns out not to be. I've learned to ask my questions in the
         | shape of "how do I", rather than "what if I..." or "is it a
         | good idea to...", because most of the time it'll twist itself
         | into shapes to affirm the direction I'm taking rather than
         | challenging and refining it.
        
       | lproven wrote:
       | Betteridge's Law applies.
        
       | LittleTimothy wrote:
       | It's fascinating that this has run into the exact same problem as
       | the Quantum research. Ie, in the quantum research to demonstrate
       | any valuable forward progress you must compute something that is
       | impossible to do with a traditional computer. If you can't do it
       | with a traditional computer, it suddenly becomes difficult to
       | verify correctness (ie, you can't just check it was matching the
       | traditional computer's answer.
       | 
       | In the same way ChatGPT scores 25% on this and the question is
       | "How close were those 25% to questions in the training set". Or
       | to put it another way we want to answer the question "Is ChatGPT
       | getting better at applying it's reasoning to out-of-set problems
       | or is it pulling more data into it's training set". Or "Is the
       | test leaking into the training".
       | 
       | Maybe the whole question is academic and it doesn't matter, we
       | solve the entire problem by pulling all human knowledge into the
       | training set and that's a massive benefit. But maybe it implies a
       | limit to how far it can push human knowledge forward.
        
         | lazide wrote:
         | If constrained by existing human knowledge to come up with an
         | answer, won't it fundamentally be unable to push human
         | knowledge forward?
        
           | actionfromafar wrote:
           | Then much of human research and development is also
           | fundamentally impossible.
        
             | AnerealDew wrote:
             | Only if you think current "AI" is on the same level as
             | human creativity and intelligence, which it clearly is not.
        
               | actionfromafar wrote:
               | I think current "AI" (i.e. LLMs) is unable to push human
               | knowledge forward, but not because it's constrained by
               | existing human knowledge. It's more like peeking into a
               | very large magic-8 ball, new answers everytime you shake
               | it. Some useful.
        
               | SJC_Hacker wrote:
               | It may be able to push human knowledge forward to an
               | extent.
               | 
               | In the past, there was quite a bit of low hanging fruit
               | such that you could have polymaths able to contribute to
               | a wide variety of fields, such as Newton.
               | 
               | But in the past 100 years or so, the problem is there is
               | so much known, it is impossible for any single person to
               | have deep knowledge of everything. e.g. its rare to find
               | a really good mathematician who also has a deep knowledge
               | (beyond intro courses) about say, chemistry.
               | 
               | Would a sufficiently powerful AI / ML model be able to
               | come up with this synthesis across fields?
        
               | lupire wrote:
               | That's not a strong reason. Yes, that means ChatGPT isn't
               | good at wholly independently pushing knowledge forward,
               | but a good brainstormer that is right even 10% of the
               | time is an incredible found of knowledge.
        
           | Havoc wrote:
           | I don't think many expect AI to push knowledge forward? A
           | thing that basically just regurgitates consensus historic
           | knowledge seems badly suited to that
        
             | calmoo wrote:
             | But apparently these new frontier models can 'reason' - so
             | with that logic, they should be able to generate new
             | knowledge?
        
             | tomjen3 wrote:
             | O1 was able to find the math problem in a recently
             | published paper, so yes.
        
           | LittleTimothy wrote:
           | Depends on your understanding of human knowledge I guess?
           | People talk about the frontier of human knowledge and if your
           | view of knowledge is like that of a unique human genius
           | pushing forward the frontier then yes - it'd be stuck. But if
           | you think of knowledge as more complex than that you could
           | have areas that are kind of within our frontier of knowledge
           | (that we could reasonably know, but don't actually know) -
           | taking concepts that we already know in one field and
           | applying them to some other field. Today the reason that
           | doesn't happen is because genius A in physics doesn't know
           | about the existence of genius B in mathematics (let alone
           | understand their research), but if it's all imbibed by "The
           | Model" then it's trivial to make that discovery.
        
             | lazide wrote:
             | I was referring specifically to the parent comments
             | statements around current AI systems.
        
           | wongarsu wrote:
           | Reasoning is essentially the creation of new knowledge from
           | existing knowledge. The better the model can reason the less
           | constrained it is to existing knowledge.
           | 
           | The challenge is how to figure out if a model is genuinely
           | reasoning
        
             | lupire wrote:
             | Reasoning is a very minor (but essential) part of knowledge
             | creation.
             | 
             | Knowledge creation comes from collecting data from the real
             | world, and cleaning it up somehow, and brainstorming
             | creative models to explain it.
             | 
             | NN/LLM's version of model building is frustrating because
             | it is quite good, but not highly "explainable". Human
             | models have higher explainability, while machine models
             | have high predictive value on test examples due to an
             | impenetrable mountain of algebra.
        
           | dinosaurdynasty wrote:
           | There are likely lots of connections that could be made that
           | no individual has made because no individual has _all of
           | existing human knowledge_ at their immediate disposal.
        
         | eagerpace wrote:
         | How much of this could be resolved if its training set were
         | reduced? Conceivably, most of the training serves only to
         | confuse the model when only aiming to solve a math equation.
        
         | newpavlov wrote:
         | >in the quantum research to demonstrate any valuable forward
         | progress you must compute something that is impossible to do
         | with a traditional computer
         | 
         | This is factually wrong. The most interesting problems
         | motivating the quantum computing research are hard to solve,
         | but easy to verify on classical computers. The factorization
         | problem is the most classical example.
         | 
         | The problem is that existing quantum computers are not powerful
         | enough to solve the interesting problems, so researchers have
         | to invent semi-artificial problems to demonstrate "quantum
         | advantage" to keep the funding flowing.
         | 
         | There is a plethora of opportunities for LLMs to show their
         | worth. For example, finding interesting links between different
         | areas of research or being a proof assistant in a
         | math/programming formal verification system. There is a lot of
         | ongoing work in this area, but at the moment signal-to-noise
         | ratio of such tools is too low for them to be practical.
        
           | aleph_minus_one wrote:
           | > This is factually wrong. The most interesting problems
           | motivating the quantum computing research are hard to solve,
           | but easy to verify on classical computers.
           | 
           | You parent did not talk about quantum _computers_. I guess he
           | rather had predictions of novel quantum-field theories or
           | theories of quantum gravity in the back of his mind.
        
             | newpavlov wrote:
             | Then his comment makes even less sense.
        
           | bondarchuk wrote:
           | No, it is factually right, at least if Scott Aaronson is to
           | be believed:
           | 
           | > _Having said that, the biggest caveat to the "10^25 years"
           | result is one to which I fear Google drew insufficient
           | attention. Namely, for the exact same reason why (as far as
           | anyone knows) this quantum computation would take ~10^25
           | years for a classical computer to simulate, it would also
           | take ~10^25 years for a classical computer to directly verify
           | the quantum computer's results!! (For example, by computing
           | the "Linear Cross-Entropy" score of the outputs.) For this
           | reason, all validation of Google's new supremacy experiment
           | is indirect, based on extrapolations from smaller circuits,
           | ones for which a classical computer can feasibly check the
           | results. To be clear, I personally see no reason to doubt
           | those extrapolations. But for anyone who wonders why I've
           | been obsessing for years about the need to design efficiently
           | verifiable near-term quantum supremacy experiments: well,
           | this is why! We're now deeply into the unverifiable regime
           | that I warned about._
           | 
           | https://scottaaronson.blog/?p=8525
        
             | newpavlov wrote:
             | It's a property of the "semi-artificial" problem chosen by
             | Google. If anything, it means that we should heavily
             | discount this claim of "quantum advantage", especially in
             | the light of inherent probabilistic nature of quantum
             | computations.
             | 
             | Note that the OP wrote "you MUST compute something that is
             | impossible to do with a traditional computer". I
             | demonstrated a simple counter-example to this statement:
             | you CAN demonstrate forward progress by factorizing big
             | numbers, but the problem is that no one can do it despite
             | billions of investments.
        
               | bondarchuk wrote:
               | Apparently they can't, right now, as you admit. Anyway
               | this is turning into a stupid semantic argument, have a
               | nice day.
        
               | joshuaissac wrote:
               | If they can't, then is it really quantum supremacy?
               | 
               | They claimed it last time in 2019 with Sycamore, which
               | could perform in 200 seconds a calculation that Google
               | claimed would take a classical supercomputer 10,000
               | years.
               | 
               | That was debunked when a team of scientists replicated
               | the same thing on an ordinary computer in 15 hours with a
               | large number of GPUs. Scott Aaronson said that on a
               | supercomputer, the same technique would have solved the
               | problem in seconds.[1]
               | 
               | So if they now come up with another problem which they
               | say cannot even be verified by a classical computer and
               | uses it to claim quantum advantage, then it is right to
               | be suspicious of that claim.
               | 
               | 1. https://www.science.org/content/article/ordinary-
               | computers-c...
        
               | lmm wrote:
               | > If they can't, then is it really quantum supremacy?
               | 
               | Yes, quantum supremacy on an artificial problem is
               | quantum supremacy (even if it's "this quantum computer
               | can simulate itself faster than a classical computer").
               | Quantum supremacy on problems that are easy to verify
               | would of course be nicer, but unfortunately not all
               | problems happen to have an easy verification.
        
             | noqc wrote:
             | the unverifiable regime is a _great_ way to extract
             | funding.
        
             | cowl wrote:
             | that applies specifically to this artificial problem google
             | created to be hard for classical computers and in fact in
             | the end it turned out it was not so much. IBM came up with
             | a method to do what google said it would take 10.000 years
             | on a classical computers in just 2 days. I would not be
             | surprised if a similar reduction happened also to their
             | second attempt if anyone was motivated enough to look at
             | it.
             | 
             | In general we have thousands of optimisations problems that
             | are hard to solve but immediate to verify.
        
           | derangedHorse wrote:
           | > This is factually wrong.
           | 
           | What's factually wrong about it? OP said "you must compute
           | something that is impossible to do with a traditional
           | computer" which is true, regardless of the output produced.
           | Verifying an output is very different from verifying the
           | proper execution of a program. The difference between testing
           | a program and seeing its code.
           | 
           | What is being computed is fundamentally different from
           | classical computers, therefore the verification methods of
           | proper adherence to instructions becomes increasingly
           | complex.
        
             | ajmurmann wrote:
             | They left out the key part which was incorrect and the
             | sentence right after "If you can't do it with a traditional
             | computer, it suddenly becomes difficult to verify
             | correctness"
             | 
             | The point stands that for actually interesting problems
             | verifying correctness of the results is trivial. I don't
             | know if "adherence to instructions" transudates at all to
             | quantum computing.
        
         | 0xfffafaCrash wrote:
         | I agree with the issue of "is the test dataset leaking into the
         | training dataset" being an issue with interpreting LLM
         | capabilities in novel contexts, but not sure I follow what you
         | mean on the quantum computing front.
         | 
         | My understanding is that many problems have solutions that are
         | easier to verify than to solve using classical computing. e.g.
         | prime factorization
        
           | LittleTimothy wrote:
           | Oh it's a totally different issue on the quantum side that
           | leads to the same issue with difficulty verifying. There, the
           | algorithms that Google for example is using today, aren't
           | like prime factorization, they're not easy to directly verify
           | with traditional computers, so as far as I'm aware they kind
           | of check the result for a suitably small run, and then do the
           | performance metrics on a large run that they _hope_ gave a
           | correct answer but aren 't able to directly verify.
        
       | intellix wrote:
       | I haven't checked in a while, but last I checked ChatGPT it
       | struggled on very basic things like: how many Fs are in this
       | word? Not sure if they've managed to fix that but since that I
       | had lost hope in getting it to do any sort of math
        
       | sylware wrote:
       | How to train an AI strapped to a formal solver.
        
       | puttycat wrote:
       | No: https://github.com/0xnurl/gpts-cant-count
        
         | sebzim4500 wrote:
         | I can't reliably multiply four digit numbers in my head either,
         | what's your point?
        
           | reshlo wrote:
           | Nobody said you have to do it in your head.
        
             | sebzim4500 wrote:
             | That's the equivalent to what we are asking the model to
             | do. If you give the model a calculator it will get 100%. If
             | you give it a pen and paper (e.g. let it show it's working)
             | then it will get near 100%.
        
               | reshlo wrote:
               | Citation needed.
        
               | sebzim4500 wrote:
               | Which bit do you need a citation for? I can run the
               | experiment in 10 mins.
        
               | reshlo wrote:
               | > That's the equivalent to what we are asking the model
               | to do.
               | 
               | Why?
               | 
               | What does it mean to give a model a calculator?
               | 
               | What do you mean "let it show its working"? If I ask an
               | LLM to do a calculation, I never said it can't express
               | the answer to me in long-form text or with intermediate
               | steps.
               | 
               | If I ask a human to do a calculation that they can't
               | reliably do in their head, they are intelligent enough to
               | know that they should use a pen and paper without needing
               | my preemptive permission.
        
       | rishicomplex wrote:
       | Who is the author?
        
         | williamstein wrote:
         | Kevin Buzzard
        
       | nebulous1 wrote:
       | There was a little more information in that reddit thread. Of the
       | three difficulty tiers, 25% are T1 (easiest) and 50% are T2. Of
       | the five public problems that the author looked at, two were T1
       | and two were T2. Glazer on reddit described T1 as
       | "IMO/undergraduate problems", but the article author says that
       | they don't consider them to be undergraduate problems. So the LLM
       | is _already_ doing what the author says they would be surprised
       | about.
       | 
       | Also Glazer seemed to regret calling T1 "IMO/undergraduate", and
       | not only because of the disparity between IMO and typical
       | undergraduate. He said that "We bump problems down a tier if we
       | feel the difficulty comes too heavily from applying a major
       | result, even in an advanced field, as a black box, since that
       | makes a problem vulnerable to naive attacks from models"
       | 
       | Also, all of the problems shows to Tao were T3
        
         | riku_iki wrote:
         | > So the LLM is already doing what the author says they would
         | be surprised about.
         | 
         | that's if you unconditionally believe in result without any
         | proofreading, confirmation, reproducability and even barely any
         | details (we are given only one slide).
        
         | joe_the_user wrote:
         | The reddit thread is ... interesting (direct link[1]). It seems
         | to be a debate among mathematicians some of whom do have access
         | to the secret set. But they're debating publicly and so
         | naturally avoiding any concrete examples that would give the
         | set away so wind-up with fuzzy-fiddly language for the
         | qualities of the problem tiers.
         | 
         | The "reality" of keeping this stuff secret 'cause someone would
         | train on it is itself bizarre and certainly shouldn't be above
         | questioning.
         | 
         | https://www.reddit.com/r/OpenAI/comments/1hiq4yv/comment/m30...
        
           | obastani wrote:
           | It's not about training directly on the test set, it's about
           | people discussing questions in the test set online (e.g., in
           | forums), and then this data is swept up into the training
           | set. That's what makes test set contamination so difficult to
           | avoid.
        
             | joe_the_user wrote:
             | Yes,
             | 
             | That is the "reality" - that because companies can train
             | their models on the whole Internet, companies will train
             | their (base) models on the entire Internet.
             | 
             | And in this situation, "having heard the problem" actually
             | serves as a barrier to understanding of these harder
             | problems since any variation of known problem will receive
             | a standard "half-assed guestimate".
             | 
             | And these companies "can't not" use these base models since
             | they're resigned to the "bitter lesson" (better the "bitter
             | lesson viewpoint" imo) that they need large scale
             | heuristics for the start of their process and only then can
             | they start symbolic/reasoning manipulations.
             | 
             | But hold-up! Why couldn't an organization freeze their
             | training set and their problems and release both to the
             | public? That would give us an idea where the research
             | stands. Ah, the answer comes out, 'cause they don't own the
             | training set and the result they want to train is a
             | commercial product that needs every drop of data to be the
             | best. As Yan LeCun has said, _this isn 't research, this is
             | product development_.
        
             | phkahler wrote:
             | >> It's not about training directly on the test set, it's
             | about people discussing questions in the test set online
             | 
             | Don't kid yourself. There are 10's of billions of dollars
             | going into AI. Some of the humans involved would happily
             | cheat on comparative tests to boost investment.
        
               | xmprt wrote:
               | The incentives are definitely there, but even CEOs and
               | VCs know that if they cheat the tests just to get more
               | investment, they're only cheating themselves. No one is
               | liquidating within the next 5 years so either they end up
               | getting caught and lose everything or they spent all this
               | energy trying to cheat while having a subpar model which
               | results in them losing to competitors who actually
               | invested in good technology.
               | 
               | Having a higher valuation could help with attracting
               | better talent or more funding to invest in GPUs and
               | actual model improvements but I don't think that
               | outweighs the risks unless you're a tiny startup with
               | nothing to show (but then you wouldn't have the money to
               | bribe anyone).
        
               | earnestinger wrote:
               | People like to cheat. See the VW case. Company is big and
               | established and still cheated.
               | 
               | It depends a lot on individuals making up the companies
               | command chain and their values.
        
               | davidcbc wrote:
               | Why is this any different from say, Theranos?
               | 
               | CEOs and VCs will happily lie because they are convinced
               | they are smarter than everyone else and will solve the
               | problem before they get caught.
        
           | zifpanachr23 wrote:
           | Not having access to the dataset really makes the whole thing
           | seem incredibly shady. Totally valid questions you are
           | raising
        
             | whimsicalism wrote:
             | it's a key aspect of the entire project. we have gone
             | through many cycles of evils where the dataset is public
        
       | seafoamteal wrote:
       | I don't have much to opine from an advanced maths perspective,
       | but I'd like to point out a couple examples of where ChatGPT made
       | basic errors in questions I asked it as an undergrad CS student.
       | 
       | 1. I asked it to show me the derivation of a formula for the
       | efficiency of Stop-and-Wait ARQ and it seemed to do it, but a day
       | later, I realised that in one of the steps, it just made a term
       | vanish to get to the next step. Obviously, I should have verified
       | more carefully, but when I asked it to spot the mistake in that
       | step, it did the same thing twice more with bs explanations of
       | how the term is absorbed.
       | 
       | 2. I asked it to provide me syllogisms that I could practice
       | proving. An overwhelming number of the syllogisms it gave me were
       | inconsistent and did not hold. This surprised me more because
       | syllogisms are about the most structured arguments you can find,
       | having been formalized centuries ago and discussed extensively
       | since then. In this case, asking it to walk step-by-step actually
       | fixed the issue.
       | 
       | Both of these were done on the free plan of ChatGPT, but I can
       | remember if it was 4o or 4.
        
         | voiper1 wrote:
         | The first question is always: which model? Which fortunately
         | you at least addressed: >free plan of ChatGPT, but I can
         | remember if it was 4o or 4.
         | 
         | Since chatgpt-4o, there has been o1-preview, and o1 (full) is
         | out. They just announced o3 got a 25% on frontiermath which is
         | what this article is a reaction to. So, any tests on 4o are at
         | least TWO (or three) AI releases with new capabilities.
        
       | Xcelerate wrote:
       | So here's what I'm perplexed about. There are statements in
       | Presburger arithmetic that take time doubly exponential (or
       | worse) in the size of the statement to reach via _any path_ of
       | the formal system whatsoever. These are arithmetic truths about
       | the natural numbers. Can these statements be reached faster in
       | ZFC? Possibly--it 's well-known that there exist shorter proofs
       | of true statements in more powerful consistent systems.
       | 
       | But the problem then is that one can suppose there are also true
       | short statements in ZFC which likewise require doubly exponential
       | time to reach via any path. Presburger Arithmetic is decidable
       | whereas ZFC is not, so these statements would require the
       | additional axioms of ZFC for shorter proofs, but I think it's
       | safe to assume such statements exist.
       | 
       | Now let's suppose an AI model can resolve the truth of these
       | short statements quickly. That means one of three things:
       | 
       | 1) The AI model can discover doubly exponential length proof
       | paths within the framework of ZFC.
       | 
       | 2) There are certain short statements in the formal language of
       | ZFC that the AI model cannot discover the truth of.
       | 
       | 3) The AI model operates outside of ZFC to find the truth of
       | statements in the framework of some other, potentially unknown
       | formal system (and for arithmetical statements, the system must
       | necessarily be sound).
       | 
       | How likely are each of these outcomes?
       | 
       | 1) is not possible within any coherent, human-scale timeframe.
       | 
       | 2) IMO is the most likely outcome, but then this means there are
       | some _really_ interesting things in mathematics that AI cannot
       | discover. Perhaps the same set of things that humans find
       | interesting. Once we have exhausted the theorems with short
       | proofs in ZFC, there will still be an infinite number of short
       | and interesting statements that we cannot resolve.
       | 
       | 3) This would be the most bizarre outcome of all. If AI operates
       | in a consistent way outside the framework of ZFC, then that would
       | be equivalent to solving the halting problem for certain
       | (infinite) sets of Turing machine configurations that ZFC cannot
       | solve. That in itself itself isn't too strange (e.g., it might
       | turn out that ZFC lacks an axiom necessary to prove something as
       | simple as the Collatz conjecture), but what would be strange is
       | that it could find these new formal systems _efficiently_. In
       | other words, it would have discovered an algorithmic way to
       | procure new axioms that lead to efficient proofs of true
       | arithmetic statements. One could also view that as an efficient
       | algorithm for computing BB(n), which obviously we think isn 't
       | possible. See Levin's papers on the feasibility of extending PA
       | in a way that leads to quickly discovering more of the halting
       | sequence.
        
         | aleph_minus_one wrote:
         | > There are statements in Presburger arithmetic that take time
         | doubly exponential (or worse) in the size of the statement to
         | reach via any path of the formal system whatsoever.
         | 
         | This is a correct statement about the _worst_ case runtime.
         | What is interesting for practical applications is whether such
         | statements are among those that you are practically interested
         | in.
        
           | Xcelerate wrote:
           | I would certainly think so. The statements mathematicians
           | seem to be interested in tend to be at a "higher level" than
           | simple but true statements like 2+3=5. And they necessarily
           | have a short description in the formal language of ZFC,
           | otherwise we couldn't write them down (e.g., Fermat's last
           | theorem).
           | 
           | If the truth of these higher level statements instantly
           | unlocks many other truths, then it makes sense to think of
           | them in the same way that knowing BB(5) allows one to
           | instantly classify any Turing machine configuration on the
           | computation graph of all n <= 5 state Turing machines (on
           | empty tape input) as halting/non-halting.
        
         | wbl wrote:
         | 2 is definitely true. 3 is much more interesting and likely
         | true but even saying it takes us into deep philosophical
         | waters.
         | 
         | If every true theorem had a proof in a computationally bounded
         | length the halting problem would be solvable. So the AI can't
         | find some of those proofs.
         | 
         | The reason I say 3 is deep is that ultimately our foundational
         | reasons to assume ZFC+the bits we need for logic come from
         | philosohical groundings and not everyone accepts the same ones.
         | Ultrafinitists and large cardinal theorists are both kinds of
         | people I've met.
        
           | Xcelerate wrote:
           | My understanding is that no model-dependent theorem of ZFC or
           | its extensions (e.g., ZFC+CH, ZFC+!CH) provides any insight
           | into the behavior of Turing machines. If our goal is to
           | invent an algorithm that finds better algorithms, then the
           | philosophical angle is irrelevant. For computational
           | purposes, we would only care about new axioms independent of
           | ZFC if they allow us to prove additional Turing machine
           | configurations as non-halting.
        
         | semolinapudding wrote:
         | ZFC is way worse than Presburger arithmetic -- since it is
         | undecidable, we know that the length of the minimal proof of a
         | statement cannot be bounded by a computable function of the
         | length of the statement.
         | 
         | This has little to do with the usefulness of LLMs for research-
         | level mathematics though. I do not think that anyone is hoping
         | to get a decision procedure out of it, but rather something
         | that would imitate human reasoning, which is heavily based on
         | analogies ("we want to solve this problem, which shares some
         | similarities with that other solved problem, can we apply the
         | same proof strategy? if not, can we generalise the strategy so
         | that it becomes applicable?").
        
         | lmm wrote:
         | > and for arithmetical statements, the system must necessarily
         | be sound
         | 
         | Why do you say this? The AI doesn't know or care about
         | soundness. Probably it has mathematical intuition that makes
         | unsound assumptions, like human mathematicians do.
         | 
         | > How likely are each of these outcomes?
         | 
         | I think they'll all be true to a certain extent, just as they
         | are for human mathematicians. There will probably be certain
         | classes of extremely long proofs that the AI has no trouble
         | discovering (because they have some kind of structure, just not
         | structure that can be expressed in ZFC), certain truths that
         | the AI makes an intuitive leap to despite not being able to
         | prove them in ZFC (just as human mathematicians do), and
         | certain short statements that the AI cannot prove one way or
         | another (like Goldbach or twin primes or what have you, again,
         | just as human mathematicians can't).
        
       | bambax wrote:
       | > _As an academic mathematician who spent their entire life
       | collaborating openly on research problems and sharing my ideas
       | with other people, it frustrates me_ [that] _I am not even to
       | give you a coherent description of some basic facts about this
       | dataset, for example, its size. However there is a good reason
       | for the secrecy. Language models train on large databases of
       | knowledge, so you moment you make a database of maths questions
       | public, the language models will train on it._
       | 
       | Well, yes and no. This is only true because we are talking about
       | closed models from closed companies like so-called "OpenAI".
       | 
       | But if all models were truly open, then we could simply verify
       | what they had been trained on, and make experiments with models
       | that we could be sure had never seen the dataset.
       | 
       | Decades ago Microsoft (in the words of Ballmer and Gates)
       | famously accused open source of being a "cancer" because of the
       | cascading nature of the GPL.
       | 
       | But it's the opposite. In software, and in knowledge in general,
       | the true disease is secrecy.
        
         | ludwik wrote:
         | > But if all models were truly open, then we could simply
         | verify what they had been trained on
         | 
         | How do you verify what a particular open model was trained on
         | if you haven't trained it yourself? Typically, for open models,
         | you only get the architecture and the trained weights. How can
         | you reliably verify what the model was trained on from this?
         | 
         | Even if they provide the training set (which is not typically
         | the case), you still have to take their word for it--that's not
         | really "verification."
        
           | asadotzler wrote:
           | The OP said "truly open" not "open model" or any of the other
           | BS out there. If you are truly open you share the training
           | corpora as well or at least a comprehensive description of
           | what it is and where to get it.
        
             | ludwik wrote:
             | It seems like you skipped the second paragraph of my
             | comment?
        
               | SpaceManNabs wrote:
               | Because it is mostly hogwash.
               | 
               | Lots of ai researchers have shown that you can both give
               | credit and discredit "open models" when you are given a
               | dataset and training steps.
               | 
               | Many lauded papers fell into reddit Ml or twitter ire
               | when people couldnt reproduce the model or results.
               | 
               | If you are given the training set, the weights, the steps
               | required, and enough compute, you can do it.
               | 
               | Having enough compute and people releasing the steps is
               | the main impediment.
               | 
               | For my research I always release all of my code, and the
               | order of execution steps, and of course the training set.
               | I also give confidence intervals based on my runs so
               | people can reproduce and see if we get similar intervals.
        
           | bambax wrote:
           | If they provide the training set it's reproducible and
           | therefore verifiable.
           | 
           | If not, it's not really "open", it's bs-open.
        
           | lmm wrote:
           | > Even if they provide the training set (which is not
           | typically the case), you still have to take their word for it
           | --that's not really "verification."
           | 
           | If they've done it right, you can re-run the training and get
           | the same weights. And maybe you could spot-check parts of it
           | without running the full training (e.g. if there are glitch
           | tokens in the weights, you'd look for where they came from in
           | the training data, and if they weren't there at all that
           | would be a red flag). Is it possible to release the wrong
           | training set (or the wrong instructions) and hope you don't
           | get caught? Sure, but demanding that it be published and
           | available to check raises the bar and makes it much more
           | risky to cheat.
        
       | 4ad wrote:
       | > FrontierMath is a secret dataset of "hundreds" of hard maths
       | questions, curated by Epoch AI, and announced last month.
       | 
       | The database stopped being secret when it was fed to proprietary
       | LLMs running in the cloud. If anyone is not thinking that OpenAI
       | has trained and tuned O3 on the "secret" problems people fed to
       | GPT-4o, I have a bridge to sell you.
        
         | fn-mote wrote:
         | This level of conspiracy thinking requires evidence to be
         | useful.
         | 
         | Edit: I do see from your profile that you are a real person
         | though, so I say this with more respect.
        
           | dns_snek wrote:
           | What evidence do we need that AI companies are exploiting
           | every bit of information they can use to get ahead in the
           | benchmarks to generate more hype? Ignoring terms/agreements,
           | violating copyright, and otherwise exploiting information for
           | personal gain is the foundation of that entire industry for
           | crying out loud.
        
             | threeseed wrote:
             | Some people are also forgetting who is the CEO of OpenAI.
             | 
             | Sam Altman has long talked about believing in the "move
             | fast and break things" way of doing business. Which is just
             | a nicer way of saying do whatever dodgy things you can get
             | away with.
        
               | cheald wrote:
               | OpenAI's also in the position of having to compete
               | against other LLM trainers - including the open-weights
               | Llama models and their community derivatives, which have
               | been able to do extremely well with a tiny fraction of
               | OpenAI's resources - and to justify their astronomical
               | valuation. The economic incentive to cheat is _extreme_ ;
               | I think that cheating has to be the default presumption.
        
         | advisedwang wrote:
         | It's perfectly possible for OpenAI to run the model (or prove
         | others the means to run it) without storing queries/outputs for
         | future. I expect Epoch AI would insist on this. Perhaps OpenAI
         | would lie about it, but that's opening up serious charges.
        
       | ashoeafoot wrote:
       | Ai has a interior world model thus it can do math if a chain of
       | proof is walking without uncertainty from room to room. the
       | problem is its inability to reflect on its own uncertainty and to
       | then overrife that uncertainty ,should a new room entrance method
       | be selfsimilar to a previous entrance
        
       | voidhorse wrote:
       | Eventually we may produce a collection of problems exhaustive
       | enough that these tools can solve almost any problem that isn't
       | novel in practice, but I doubt that they will ever become general
       | problem solvers capable of what we consider to be reasoning in
       | humans.
       | 
       | Historically, the claim that neural nets were actual models of
       | the human brain and human thinking was always epistemically
       | dubious. It still is. Even as the _practical_ problems of
       | producing better and better algorithms, architectures, and output
       | have been solved, there is no reason to believe a connection
       | between the mechanical model and what happens in organisms has
       | been established. The most important point, in my view, is that
       | all of the representation and interpretation still has to happen
       | outside the computational units. Without human interpreters, none
       | of the AI outputs have any meaning. Unless you believe in
       | determinism and an overseeing god, the story for human beings is
       | much different. AI will not be capable of reason until, like
       | humans, it can develop socio-rational collectivities of meaning
       | that are _independent_ of the human being.
       | 
       | Researchers seemed to have a decent grasp on this in the 90s, but
       | today, everyone seems all too ready to make the same ridiculous
       | leaps as the original creators of neural nets. They did not show,
       | as they claimed, that thinking is reducible to computation. All
       | they showed was that a neural net can realize a _boolean
       | function_ --which is not even logic, since, again, the entire
       | semantic interpretive side of the logic is ignored.
        
         | nmca wrote:
         | Can you define what you mean by novel here?
        
         | red75prime wrote:
         | > there is no reason to believe a connection between the
         | mechanical model and what happens in organisms has been
         | established
         | 
         | The universal approximation theorem. And that's basically it.
         | The rest is empirical.
         | 
         | No matter which physical processes happen inside the human
         | brain, a sufficiently large neural network can approximate
         | them. Barring unknowns like super-Turing computational
         | processes in the brain.
        
           | lupire wrote:
           | That's not useful by itself, because "anything cam model
           | anything else" doesn't put any upper bound on emulation cost,
           | which for one small task could be larger than the total
           | energy available in the entire Universem
        
             | pixl97 wrote:
             | I mean, that is why they mention super-Turning processes
             | like quantum based computing.
        
               | dinosaurdynasty wrote:
               | Quantum computing actually isn't super-Turing, it "just"
               | computes some things faster. (Strictly speaking it's
               | somewhere between a standard Turing machine and a
               | nondeterministic Turing machine in speed, and the first
               | can emulate the second.)
        
               | staunton wrote:
               | If we're nitpicking: quantum computing algorithms could
               | (if implemented) compute _certain things_ faster _than
               | the best classical algorithms we know_. We don 't know
               | any quantum algorithms that are provably faster than _all
               | possible_ classical algorithms.
        
               | dinosaurdynasty wrote:
               | Well yeah, we haven't even proved that P != NP yet.
        
             | red75prime wrote:
             | Either the brain violates the physical Church-Turing thesis
             | or it's not.
             | 
             | If it does, well, it will take more time to incorporate
             | those physical mechanisms into computers to get them on par
             | with the brain.
             | 
             | I leave the possibility that it's "magic"[1] aside. It's
             | just impossible to predict, because it will violate
             | everything we know about our physical world.
             | 
             | [1] One example of "magic": we live in a simulation and the
             | brain is not fully simulated by the physics engine, but
             | creators of the simulation for some reason gave it access
             | to computational resources that are impossible to harness
             | using the standard physics of the simulated world. Another
             | example: interactionistic soul.
        
           | exprofmaddy wrote:
           | The universal approximation theorem is set in a precise
           | mathematical context; I encourage you to limit its
           | applicability to that context despite the marketing label
           | "universal" (which it isn't). Consider your concession about
           | empiricism. There's no empirical way to prove (i.e. there's
           | no experiment that can demonstrate beyond doubt) that all
           | brain or other organic processes are deterministic and can be
           | represented completely as functions.
        
             | red75prime wrote:
             | Function is the most general way of describing relations.
             | Non-deterministic processes can be represented as functions
             | with a probability distribution codomain. Physics seems to
             | require only continuous functions.
             | 
             | Sorry, but there's not much evidence that can support human
             | exceptionalism.
        
               | exprofmaddy wrote:
               | Some differential equations that model physics admit
               | singularities and multiple solutions. Therefore,
               | functions are not the most general way of describing
               | relations. Functions are a subset of relations.
               | 
               | Although "non-deterministic" and "stochastic" are often
               | used interchangeably, they are not equivalent.
               | Probability is applied analysis whose objects are
               | distributions. Analysis is a form of deductive, i.e.
               | mechanical, reasoning. Therefore, it's more accurate
               | (philosophically) to identify mathematical probability
               | with determinism. Probability is a model for our
               | experience. That doesn't mean our experience is truly
               | probabilistic.
               | 
               | Humans aren't exceptional. Math modeling and reasoning
               | are human activities.
        
               | red75prime wrote:
               | > Some differential equations that model physics admit
               | singularities and multiple solutions.
               | 
               | And physicists regard those as unphysical: the theory
               | breaks down, we need better one.
        
               | exprofmaddy wrote:
               | For example, the Euler equations model compressible flow
               | with discontinuities (shocks in the flow field variables)
               | and rarefaction waves. These theories are accepted and
               | used routinely.
        
               | red75prime wrote:
               | Great. A useful approximation of what really happens in
               | the fluid. But I'm sure there are no shocks and
               | rarefactions in physicists' neurons while they are
               | thinking about it.
               | 
               | Switching into a less facetious mode...
               | 
               | Do you understand that in context of this dialogue it's
               | not enough to show some examples of discontinuous or
               | otherwise unrepresentable by NNs functions? You need at
               | least to give a hint why such functions cannot be avoided
               | while approximating functionality of the human brain.
               | 
               | Many things are possible, but I'm not going to keep my
               | mind open to a possibility of a teal Russell's teapot
               | before I get a hint at its existence, so to speak.
        
               | voidhorse wrote:
               | I don't understand your point here. A (logical) relation
               | is, by definition, a more general way of describing
               | relations than a function, and it is telling that we
               | still suck at using and developing truly _relational_
               | models that are not univalent (i.e. functions). Only a
               | few old logicians really took the calculus of relations
               | proper seriously (Pierce, for one). We use functions
               | precisely because they are less general, they are rigid,
               | and simpler to work with. I do not think anyone is
               | working under the impression that a function is a high
               | fidelity means to model the world as it is experienced
               | and actually exists. It is necessarily reductionistic
               | (and abstract). Any truth we achieve through functional
               | models is necessarily a general, abstracted, truth, which
               | in many ways proves to be extremely useful but in others
               | (e.g. when an essential piece of information in the
               | _particular_ is not accounted for in the _general
               | reductive model_ ) can be disastrous.
        
               | red75prime wrote:
               | I'm not a big fan of philosophy. The epistemology you are
               | talking about is another abstraction on top of the
               | physical world. But the evolution of the physical world
               | as far as we know can be described as a function of time
               | (at least, in a weak gravitational field when energies
               | involved are well below the grand unification energy
               | level, that is for the objects like brains).
               | 
               | The brain is a physical system, so whatever it does
               | (including philosophy) can be replicated by modelling (a
               | (vastly) simplified version of) underlying physics.
               | 
               | Anyway, I am not especially interested in discussing
               | possible impossibility of an LLM-based AGI. It might be
               | resolved empirically soon enough.
        
         | tananan wrote:
         | > Unless you believe in determinism and an overseeing god
         | 
         | Or perhaps, determinism and mechanistic materialism - which in
         | STEM-adjacent circles has a relatively prevalent adherence.
         | 
         | Worldviews which strip a human being of agency in the sense you
         | invoke crop up quite a lot today in such spaces. If you start
         | of adopting a view like this, you have a deflationary sword
         | which can cut down most any notion that's not mechanistic in
         | terms of mechanistic parts. "Meaning? Well that's just an
         | emergent phenomenon of the influence of such and such causal
         | factors in the unrolling of a deterministic physical system."
         | 
         | Similar for reasoning, etc.
         | 
         | Now obviously large swathes of people don't really subscribe to
         | this - but it is prevalent and ties in well with utopian
         | progress stories. If something is amenable to mechanistic
         | dissection, possibly it's amenable to mechanistic control. And
         | that's what our education is really good at teaching us. So
         | such stories end up having intoxicating "hype" effects and
         | drive fundraising, and so we get where we are.
         | 
         | For one, I wish people were just excited about making computers
         | do things they couldn't do before, without needing to dress it
         | up as something more than it is. "This model can prove a set of
         | theorems in this format with such and such limits and
         | efficiency"
        
           | exprofmaddy wrote:
           | Agreed. If someone believes the world is purely mechanistic,
           | then it follows that a sufficiently large computing machine
           | can model the world---like Leibniz's Ratiocinator. The
           | intoxication may stem from the potential for predictability
           | and control.
           | 
           | The irony is: why would someone want control if they don't
           | have true choice? Unfortunately, such a question rarely
           | pierces the intoxicated mind when this mind is preoccupied
           | with pass the class, get an A, get a job, buy a house, raise
           | funds, sell the product, win clients, gain status, eat right,
           | exercise, check insta, watch the game, binge the show, post
           | on Reddit, etc.
        
             | Quekid5 wrote:
             | > If someone believes the world is purely mechanistic, then
             | it follows that a sufficiently large computing machine can
             | model the world
             | 
             | Is this controversial in some way? The problem is that to
             | simulate a universe you need a bigger universe -- which
             | doesn't exist (or is certainly out of reach due to
             | information theoretical limits)
             | 
             | > ---like Leibniz's Ratiocinator. The intoxication may stem
             | from the potential for predictability and control.
             | 
             | I really don't understand the 'control' angle here. It
             | seems pretty obvious that even in a purely mechanistic view
             | of the universe, information theory forbids using the
             | universe to simulate itself. Limited simulations, sure...
             | but that leaves lots of gaps wherein you lose determinism
             | (and control, whatever that means).
        
               | tananan wrote:
               | > Is this controversial in some way?
               | 
               | It's not "controversial", it's just not a given that the
               | universe is to be thought a deterministic machine. Not to
               | everyone, at least.
        
               | exprofmaddy wrote:
               | People wish to feel safe. One path to safety is
               | controlling or managing the environment. Lack of
               | sufficient control produces anxiety. But control is only
               | possible if the environment is predictable, i.e.,
               | relatively certain knowledge that if I do X then the
               | environment responds with Y. Humans use models for
               | prediction. Loosely speaking, if the universe is truly
               | mechanistic/deterministic, then the goal of modeling is
               | to get the correct model (though notions of "goals" are
               | problematic in determinism without real counterfactuals).
               | However, if we can't know whether the universe is truly
               | deterministic, then modeling is a pragmatic exercise in
               | control (or management).
               | 
               | My comments are not about simulating the universe on a
               | real machine. They're about the validity and value of
               | math/computational modeling in a universe where
               | determinism is scientifically indeterminable.
        
             | HDThoreaun wrote:
             | Choice is over rated. This gets to an issue Ive long had
             | with Nozicks experience machine. Not only would I happily
             | spend my days in such a machine, Im pretty sure most other
             | people would too. Maybe they say they wouldnt but if you
             | let them try it out and then offered them the question
             | again I think theyd say yes. The real conclusion of the
             | experience machine is that the unknown is scary.
        
             | fire_lake wrote:
             | > Agreed. If someone believes the world is purely
             | mechanistic, then it follows that a sufficiently large
             | computing machine can model the world---like Leibniz's
             | Ratiocinator.
             | 
             | I don't think it does. Taking computers as an analogy... if
             | you have a computer with 1GB memory, then you can't
             | simulate a computer with more than 1GB memory inside of it.
        
               | exprofmaddy wrote:
               | "sufficiently large machine" ... It's a thought
               | experiment. Leibniz didn't have a computer, but he still
               | imagined it.
        
         | gmadsen wrote:
         | I hear these arguments a lot from law and philosophy students,
         | never from those trained in mathematics. It seems to me,
         | "literary" people will still be discussing these theoretical
         | hypotheticals as technology passes them by building it.
        
           | voidhorse wrote:
           | I straddle both worlds. Consider that using the lens of
           | mathematical reasoning to understand everything is a bit like
           | trying to use a single mathematical theory (eg that of
           | groups) to comprehend mathematics as a whole. You will almost
           | always benefit and enrich your own understanding by daring to
           | incorporate outside perspectives.
           | 
           | Consider also that even as digital technology and the
           | ratiomathimatical understanding of the world has advanced it
           | is still rife with dynamics and problems that require a
           | humanistic approach. In particular, a mathematical conception
           | cannot resolve _teleological_ problems which require the
           | establishment of consensus and the actual determination of
           | what we, as a species, want the world to look like. Climate
           | change and general economic imbalance are already evidence of
           | the kind of disasters that mount when you limit yourself to a
           | reductionistic, overly mathematical and technological
           | understanding of life and existence. Being is not a solely
           | technical problem.
        
             | gmadsen wrote:
             | I don't disagree, I just don't think it is done well or at
             | least as seriously as it used to. In modern philosophy,
             | there are many mathematically specious arguments, that just
             | make clear how large the mathematical gap has become e.g.
             | improper application of Godel's incompleteness theorems.
             | Yet Godel was a philosopher himself, who would disagree
             | with its current hand-wavy usage.
             | 
             | 19th/20th was a golden era of philosophy with a coherent
             | and rigorous mathematical lens to apply with other lenses.
             | Russel, Turing, Godel, etc. However this just doesn't exist
             | anymore
        
               | voidhorse wrote:
               | While I agree that these are titans of 20th c.
               | philosophy, particularly of the philosophy of mathematics
               | and logic, the overarching school they belonged to
               | (logical positivism) has been thoroughly and rightly
               | criticized, and it is informative to read these
               | criticisms to understand why a view of life that is
               | _overly_ mathematical is in many ways inadequate. Your
               | comment still argues from a very limited perspective.
               | There is no reason that correct application of Godel s
               | theorem should be any indication of the richness of
               | someone 's philosophical views _unless_ you are already a
               | staunchly committed reductionist who values mathematical
               | arguments above all else (why? can maths help you explain
               | and understand the phenomena of love in a way that will
               | actually help you _experience love_? this is just one
               | example domain where it does not make much sense), _or_
               | unless they are specifically attempting a philosophy of
               | mathematics. The question of whether or not we can
               | effectively model cognition and human mental function
               | using mathematical models is not a question of
               | mathematical philosophy, but rather one of
               | _epistemology_. If you really want to head a spurious
               | argument, read McCulloch and Pitts. They essentially
               | present an argument of two premises, the brain is finite,
               | and we can create a machine of formal  "neurons" (which
               | are not even complete models of real neurons) that
               | computes a boolean function, they then _conclude_ that
               | they must have a model of cognition, that cognition must
               | be nothing more than computation, and that the brain must
               | basically be a Turing machine.
               | 
               | The relevance of mathematics to the cognitive problem
               | must be decided _outside of_ mathematics. As another
               | poster said, even if you buy the theorems, it is still an
               | _empirical question_ as to whether or not they really
               | _model_ what they claim to model, and whether or not that
               | model is of a fidelity that we find acceptable for a
               | definition of general intelligence. Often, people reach
               | claims of adequacy today _not_ by producing really
               | fantastic models but instead by _lowering the bar
               | enormously_. They claim that these models approximate
               | humans by severely reducing the idea of what it means to
               | be an intelligent human to the specific talents their
               | tech happens to excel at (e.g. apparently being a
               | language parrot is all that intelligence is, ignoring all
               | the very nuanced views and definitions of intelligence we
               | have come up with over the course of history. A machine
               | that is not embodied ina skeletal structure and cannot
               | even _experience_ , let alone solve, the vast number of
               | physical, anatomical problems we contend with on a daily
               | basis is, in my view, still very far from anything I
               | would call general intelligence).
        
         | exprofmaddy wrote:
         | I'm with you. Interpreting a problem as a problem requires a
         | human (1) to recognize the problem and (2) to convince other
         | humans that it's a problem worth solving. Both involve value,
         | and value has no computational or mechanistic description
         | (other than "given" or "illusion"). Once humans have identified
         | a problem, they might employ a tool to find the solution. The
         | tool has no sense that the problem is important or even hard;
         | such values are imposed by the tool's users.
         | 
         | It's worth considering why "everyone seems all too ready to
         | make ... leaps ..." "Neural", "intelligence", "learning", and
         | others are metaphors that have performed very well as marketing
         | slogans. Behind the marketing slogans are deep-pocketed,
         | platformed corporate and government (i.e. socio-rational
         | collective) interests. Educational institutions (another socio-
         | rational collective) and their leaders have on the whole
         | postured as trainers and preparers for the "real world" (i.e. a
         | job), which means they accept, support, and promote the
         | corporate narratives about techno-utopia. Which institutions
         | are left to check the narratives? Who has time to ask questions
         | given the need to learn all the technobabble (by paying
         | hundreds of thousands for 120 university credits) to become a
         | competitive job candidate?
         | 
         | I've found there are many voices speaking against the hype---
         | indeed, even (rightly) questioning the epistemic underpinnings
         | of AI. But they're ignored and out-shouted by tech marketing,
         | fundraising politicians, and engagement-driven media.
        
       | alphan0n wrote:
       | As far as ChatGPT goes, you may as well be asking: Can AI use a
       | calculator?
       | 
       | The answer is yes, it can utilize a stateful python environment
       | and solve complex mathematical equations with ease.
        
         | lcnPylGDnU4H9OF wrote:
         | There is a difference between correctly _stating_ that 2 + 2 =
         | 4 within a set of logical rules and _proving_ that 2 + 2 = 4
         | must be true given the rules.
        
           | alphan0n wrote:
           | I think you misunderstood, ChatGPT can utilize Python to
           | solve a mathematical equation and provide proof.
           | 
           | https://chatgpt.com/share/676980cb-d77c-8011-b469-4853647f98.
           | ..
           | 
           | More advanced solutions:
           | 
           | https://chatgpt.com/share/6769895d-7ef8-8011-8171-6e84f33103.
           | ..
        
         | cruffle_duffle wrote:
         | It still has to know what to code in that environment. And
         | based on my years of math as a wee little undergrad, the actual
         | arithmetic was the least interesting part. LLM's are horrible
         | at basic arithmetic, but they can use python for the
         | calculator. But python wont help them write the correct
         | equations or even solve for the right thing (wolfram alpha can
         | do a bit of that though)
        
           | alphan0n wrote:
           | You'll have to show me what you mean.
           | 
           | I've yet to encounter an equation that 4o couldn't answer in
           | 1-2 prompts unless it timed out. Even then it can provide the
           | solution in a Jupyter notebook that can be run locally.
        
             | cruffle_duffle wrote:
             | Never really pushed it. I have to reason to believe it
             | wouldn't get most of that stuff correctly. Math is very
             | much like programming and I'm sure it can output really
             | good python for its notebook to use execute.
        
         | alphan0n wrote:
         | Awful lot of shy downvotes.. Why not say something if you
         | disagree?
        
       | upghost wrote:
       | I didn't see anyone else ask this but.. isn't the FrontierMath
       | dataset compromised now? At the very least OpenAI now knows the
       | questions if not the answers. I would expect that the next
       | iteration will "magically" get over 80% on the FrontierMath test.
       | I imagine that experiment was pretty closely monitored.
        
         | jvanderbot wrote:
         | I figured their model was independently evaluated against the
         | questions/answers. That's not to say it's not compromised by
         | "Here's a bag of money" type methods, but I don't even think
         | it'd be a reasonable test if they just handed over the dataset.
        
           | upghost wrote:
           | I'm sure it was independently evaluated, but I'm sure the
           | folks running the test were not given an on-prem installation
           | of ChatGPT to mess with. It was still done via API calls,
           | presumably through the chat interface UI.
           | 
           | That means the questions went over the fence to OpenAI.
           | 
           | I'm quite certain they are aware of that, and it would be
           | pretty foolish not to take advantage of at least knowing what
           | the questions are.
        
             | jvanderbot wrote:
             | Now that you put it that way, it is laughably easy.
        
             | ls612 wrote:
             | Depending on the plan the researchers used they may have
             | contractual protections against OpenAI training on their
             | inputs.
        
               | upghost wrote:
               | Sure, but given the resourcing at OpenAI, it would not be
               | hard to clean[1] the inputs. I'm just trying to be
               | realistic here, there are plenty of ways around
               | contractual obligations and a significant incentive to do
               | so.
               | 
               | [1]: https://en.wikipedia.org/wiki/Clean-room_design
        
         | optimalsolver wrote:
         | This was my first thought when I saw the results:
         | 
         | https://news.ycombinator.com/item?id=42473470
        
           | upghost wrote:
           | Insightful comment. The thing that's extremely frustrating is
           | look at all the energy poured into this conversation around
           | benchmarks. There is a fundamental assumption of honesty and
           | integrity in the benchmarking process by at least some
           | people. But when the dataset is compromised and generation
           | N+1 has miraculous performance gains, how can we see this as
           | anything other than a ploy to pump up valuations? Some people
           | have millions of dollars at stake here and they don't care
           | about the naysayers in the peanut gallery like us.
        
             | optimalsolver wrote:
             | It's sadly inevitable that when billions in funding and
             | industry hype are tied to performance on a handful of
             | benchmarks, scores will somehow, magically, continue to go
             | up.
             | 
             | Needless to say, it doesn't bring us any closer to AGI.
             | 
             | The only solution I see here is people crafting their own,
             | private benchmarks that the big players don't care about
             | enough to train on. That, at least, gives you a clearer
             | view of the field.
        
               | upghost wrote:
               | Not sure why your comment was downvoted, but it certainly
               | shows the pressure going against people who point out
               | fundamental flaws. This is pushing us towards "AVI"
               | rather than AGI-- "Artificially Valued Intelligence". The
               | optimization function here is around the market.
               | 
               | I'm being completely serious. You are correct, despite
               | the downvotes, that this could not be pushing us towards
               | AGI because if the dataset is leaked you can't claim the
               | G-- generalizability.
               | 
               | The point of the benchmark is to lead is to believe that
               | this is a substantial breakthrough. But a reasonable
               | person would be forced to conclude that the results are
               | misleading to due to optimizing around the training data.
        
       | sincerecook wrote:
       | No it can't, and there's no such thing as AI. How is a thing that
       | predicts the next-most-likely word going to do novel math? It
       | can't even do existing math reliably because logical operations
       | and statistical approximation are fundamentally different. It is
       | fun watching grifters put lipstick on this thing and shop it
       | around as a magic pig though.
        
         | bwfan123 wrote:
         | openai and epochai (frontier math) are startups with a strong
         | incentive to push such narratives. the real test will be in
         | actual adoption in real world use cases.
         | 
         | the management class has a strong incentive to believe in this
         | narrative, since it helps them reduce labor cost. so they are
         | investing in it.
         | 
         | eventually, the emperor will be seen to have no clothes at
         | least in some usecases for which it is being peddled right now.
        
           | comp_throw7 wrote:
           | Epoch is a non-profit research institute, not a startup.
        
       | retrocryptid wrote:
       | When did we decide that AI == LLM? Oh don't answer. I know, The
       | VC world noticed CNNs and LLMs about 10 years ago and it's the
       | only thing anyone's talked about ever since.
       | 
       | Seems to me the answer to 'Can AI do maths yet?' depends on what
       | you call AI and what you call maths. Our old departmental VAX
       | running at a handfull of megahertz could do some very clever
       | symbol manipulation on binomials and if you gave it a few
       | seconds, it could even do something like theorum proving via
       | proto-prolog. Neither are anywhere close to the glorious GAI
       | future we hope to sell to industry and government, but it seems
       | worth considering how they're different, why they worked, and
       | whether there's room for some hybrid approach. Do LLMs need to
       | know how to do math if they know how to write Prolog or Coc
       | statements that can do interesting things?
       | 
       | I've heard people say they want to build software that emulates
       | (simulates?) how humans do arithmetic, but ask a human to add
       | anything bigger than two digit numbers and the first thing they
       | do is reach for a calculator.
        
       | ivan_ah wrote:
       | Yesterday, I saw a thought provoking talk about the future of of
       | "math jobs" assuming automated theory proving becomes more
       | prevalent in the future.
       | 
       | [ (Re)imagining mathematics in a world of reasoning machines by
       | Akshay Venkatesh]
       | 
       | https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]
       | 
       | Abstract: In the coming decades, developments in automated
       | reasoning will likely transform the way that research mathematics
       | is conceptualized and carried out. I will discuss some ways we
       | might think about this. The talk will not be about current or
       | potential abilities of computers to do mathematics--rather I will
       | look at topics such as the history of automation and mathematics,
       | and related philosophical questions.
       | 
       | See discussion at https://news.ycombinator.com/item?id=42465907
        
         | qnleigh wrote:
         | That was wonderful, thank you for linking it. For the benefit
         | of anyone who doesn't have time to watch the whole thing, here
         | are a few really nice quotes that convey some main points.
         | 
         | "We might put the axioms into a reasoning apparatus like the
         | logical machinery of Stanley Jevons, and see all geometry come
         | out of it. That process of reasoning are replaced by symbols
         | and formulas... may seem artificial and puerile; and it is
         | needless to point out how disastrous it would be in teaching
         | and how hurtful to the mental development; how deadening it
         | would be for investigators, whose originality it would nip in
         | the bud. But as used by Professor Hilbert, it explains and
         | justifies itself if one remembers the end pursued." Poincare on
         | the value of reasoning machines, but the analogy to mathematics
         | once we have theorem-proving AI is clear (that the tools and
         | the lie direct outputs are not the ends. Human understanding
         | is).
         | 
         | "Even if such a machine produced largely incomprehensible
         | proofs, I would imagine that we would place much less value on
         | proofs as a goal of math. I don't think humans will stop doing
         | mathematics... I'm not saying there will be jobs for them, but
         | I don't think we'll stop doing math."
         | 
         | "Mathematics is the study of reproducible mental objects." This
         | definition is human ("mental") and social (it implies
         | reproducing among individuals). "Maybe in this world,
         | mathematics would involve a broader range of inquiry... We need
         | to renegotiate the basic goals and values of the discipline."
         | And he gives some examples of deep questions we may tackle
         | beyond just proving theorems.
        
       | swalsh wrote:
       | Every profession seems to have a pessimistic view of AI as soon
       | as it starts to make progress in their domain. Denial, Anger,
       | Bargaining, Depression, and Acceptance. Artists seem to be in the
       | depression state, many programmers are still in the denial phase.
       | Pretty solid denial here from a mathematician. o3 was a proof of
       | concept, like every other domain AI enters, it's going to keep
       | getting better.
       | 
       | Society is CLEARLY not ready for what AI's impact is going to be.
       | We've been through change before, but never at this scale and
       | speed. I think Musk/Vivek's DOGE thing is important, our
       | governent has gotten quite large and bureaucratic. But the clock
       | has started on AI, and this is a social structural issue we've
       | gotta figure out. Putting it off means we probably become
       | subjects to a default set of rulers if not the shoggoth itself.
        
         | haolez wrote:
         | I think it's a little of both. Maybe generative AI algorithms
         | won't overcome their initial limitations. But maybe we don't
         | need to overcome them to transform society in a very
         | significant way.
        
         | WanderPanda wrote:
         | Or is it just white collar workers experiencing what blue
         | collar workers have been experiencing for decades?
        
           | esafak wrote:
           | So will that make society shift to the left in demand
           | stronger of safety nets, or to the right in search of a
           | strongman to rescue them?
        
             | taneq wrote:
             | Depends on the individual, do they think "look after _us_ "
             | or do they think "look after _ME_ "?
        
         | mensetmanusman wrote:
         | The reason why this is so disruptive is because it will effect
         | hundreds of fields simultaneously.
         | 
         | Previously workers in a field disrupted by automation would
         | retrain to a different part of the economy.
         | 
         | If AI pans out to the point that there are mass layoffs in
         | hundreds of sectors of the economy at once, then i'm not sure
         | the process we have haphazardly set up now will work. People
         | will have no idea where to go beyond manual labor. (But this
         | will be difficult due to the obesity crisis - but maybe it will
         | save lives in a weird way).
        
           | hash872 wrote:
           | If there are 'mass layoffs in hundreds of sectors of the
           | economy at once', then the economy immediately goes into
           | Great Depression 2.0 or worse. Consumer spending is two-
           | thirds of the US economy, when everyone loses their jobs and
           | stops having disposable income that's literally what a
           | depression is
        
             | mensetmanusman wrote:
             | This will create a prisoner's dilemma for corporations
             | then, the government will have to step in to provide
             | incentives for insanely profitable corporations to keep the
             | proper number of people employed or limit the rate of
             | layoffs.
        
           | zeroonetwothree wrote:
           | Well it hasn't happened yet at least (unemployment is near
           | historic lows). How much better does AI need to get? And do
           | we actually expect it to happen? Improving on random
           | benchmarks is not necessarily evidence of being able to do a
           | specific job.
        
         | admissionsguy wrote:
         | It's because we then go check it out, and see how useless it is
         | when applied to the domain.
         | 
         | > programmers are still in the denial phase
         | 
         | I am doing a startup and would jump on any way to make the
         | development or process more efficient. But the only thing LLMs
         | are really good for are investor pitches.
        
       | jebarker wrote:
       | > I am dreading the inevitable onslaught in a year or two of
       | language model "proofs" of the Riemann hypothesis which will just
       | contain claims which are vague or inaccurate in the middle of 10
       | pages of correct mathematics which the human will have to wade
       | through to find the line which doesn't hold up.
       | 
       | I wonder what the response of working mathematicians will be to
       | this. If the proofs look credible it might be too tempting to try
       | and validate them, but if there's a deluge that could be a hug
       | time sync. Imagine if Wiles or Perelman had produced a thousand
       | different proofs for their respective problems.
        
         | bqmjjx0kac wrote:
         | Maybe the coming onslaught of AI slop "proofs" will give a
         | little bump to proof assistants like Coq. Of course, it would
         | still take a human mathematician some time to verify theorem
         | definitions.
        
         | Hizonner wrote:
         | Don't waste time on looking at it unless a formal proof checker
         | can verify it.
        
         | kevinventullo wrote:
         | Honestly I think it won't be that different from today, where
         | there is no shortage of cranks producing "proofs" of the
         | Riemann Hypothesis and submitting them to prestigious journals.
        
       | yodsanklai wrote:
       | I understand the appeal of having a machine helping us with maths
       | and expanding the frontier of knowledge. They can assist
       | researchers and make them more productive. Just like they can
       | make already programmers more productive.
       | 
       | But maths are also fun and fulfilling activity. Very often, when
       | we learn a math theory, it's because we want to understand and
       | gain intuition on the concepts, or we want to solve a puzzle (for
       | which we can already look up the solution). Maybe it's similar to
       | chess. We didn't develop search engines to replace human players
       | and make them play together, but they helped us become better
       | chess players or understanding the game better.
       | 
       | So the recent progress is impressive, but I still don't see how
       | we'll use this tech practically and what impacts it can have and
       | in which fields.
        
       | vouaobrasil wrote:
       | My favourite moments of being a graduate student in math was
       | showing my friends (and sometimes professors) proofs of
       | propositions and theorems that we discussed together. To be the
       | first to put together a coherent piece of reasoning that would
       | convince them of the truth was immensely exciting. Those were
       | great bonding moments amongst colleagues. The very fact that we
       | needed each other to figure out the basics of the subject was
       | part of what made the journey so great.
       | 
       | Now, all of that will be done by AI.
       | 
       | Reminds of the time when I finally enabled invincibility in
       | Goldeneye 007. Rather boring.
       | 
       | I think we've stopped to appreciate the human struggle and
       | experience and have placed all the value on the end product, and
       | that's we're developing AI so much.
       | 
       | Yeah, there is the possibility of working with an AI but at that
       | point, what is the point? Seems rather pointless to me in an art
       | like mathematics.
        
         | sourcepluck wrote:
         | > Now, all of that will be done by AI.
         | 
         | No "AI" of any description is doing novel proofs at the moment.
         | Not o3, or anything else.
         | 
         | LLMs are good for chatting about basic intuition with, up to
         | and including complex subjects, if and only if there are
         | publically available data on the topic which have been fed to
         | the LLM during its training. They're good at doing summaries
         | and overviews of specific things (if you push them around and
         | insist they don't waffle and ignore garbage carefully and keep
         | your critical thinking hat on, etc etc).
         | 
         | It's like having a magnifying glass that focuses in on the
         | small little maths question you might have, without you having
         | to sift through ten blogs or videos or whatever.
         | 
         | That's hardly going to replace graduate students doing proofs
         | with professors, though, at least not with the methods being
         | employed thus far!
        
           | vouaobrasil wrote:
           | I am talking about in 20-30 years.
        
       | busyant wrote:
       | As someone who has a 18 yo son who wants to study math, this has
       | me (and him) ... worried ... about becoming obsolete?
       | 
       | But I'm wondering what other people think of this analogy.
       | 
       | I used to be a bench scientist (molecular genetics).
       | 
       | There were world class researchers who were more creative than I
       | was. I even had a Nobel Laureate once tell me that my research
       | was simply "dotting 'i's and crossing 't's".
       | 
       | Nevertheless, I still moved the field forward in my own small
       | ways. I still did respectable work.
       | 
       | So, will these LLMs make us _completely_ obsolete? Or will there
       | still be room for those of us who can dot the  "i"?--if only for
       | the fact that LLMs don't have infinite time/resources to solve
       | "everything."
       | 
       | I don't know. Maybe I'm whistling past the graveyard.
        
         | deepsun wrote:
         | By the way, don't trust Nobel laureates or even winners. E.g.
         | Linus Pauling was talking absolute garbage, harmful and evil,
         | after winning the Nobel.
        
           | Radim wrote:
           | > _don 't trust Nobel laureates or even winners_
           | 
           | Nobel laureate and winner are the same thing.
           | 
           | > _Linus Pauling was talking absolute garbage, harmful and
           | evil, after winning the Nobel._
           | 
           | Can you be more specific, what garbage? And which Nobel prize
           | do you mean - Pauling got two, one for chemistry and one for
           | peace.
        
             | bongodongobob wrote:
             | Eugenics and vitamin C as a cure all.
        
               | lern_too_spel wrote:
               | If Pauling's eugenics policies were bad, then the laws
               | against incest that are currently on the books in many
               | states (which are also eugenics policies that use the
               | same mechanism) are also bad. There are different forms
               | of eugenics policies, and Pauling's proposal to restrict
               | the mating choices of people carrying certain recessive
               | genes so their children don't suffer is ethically
               | different from Hitler exterminating people with certain
               | genes and also ethically different from other governments
               | sterilizing people with certain genes. He later supported
               | voluntary abortion with genetic testing, which is now
               | standard practice in the US today, though no longer in a
               | few states with ethically questionable laws restricting
               | abortion. This again is ethically different from forced
               | abortion.
               | 
               | https://scarc.library.oregonstate.edu/coll/pauling/blood/
               | nar...
        
               | bongodongobob wrote:
               | From what I remember, he wanted to mark people with
               | tattoos or something.
        
               | lern_too_spel wrote:
               | This is mentioned in my link: "According to Pauling,
               | carriers should have an obvious mark, (i.e. a tattoo on
               | the forehead) denoting their disease, which would allow
               | carriers to identify others with the same affliction and
               | avoid marrying them."
               | 
               | The goal wasn't to mark people for ostracism but to make
               | it easier for people carrying these genes to find mates
               | that won't result in suffering for their offspring.
        
               | voltaireodactyl wrote:
               | FWIW my understanding is that the policies against incest
               | you mention actually have much less to do with
               | controlling genetic reproduction and are more directed at
               | combating familial rape/grooming/etc.
               | 
               | Not a fun thing to discuss, but apparently a significant
               | issue, which I guess should be unsurprising given some of
               | the laws allowing underage marriage if the family signs
               | off.
               | 
               | Mentioning only to draw attention to the fact that
               | theoretical policy is often undeniable in a vacuum, but
               | runs aground when faced with real world conditions.
        
             | deepsun wrote:
             | Thank you, my bad.
             | 
             | I was referring to Linus's harmful and evil promotion of
             | Vitamin C as the cure for everything and cancer. I don't
             | think Linus was attaching that garbage to any particular
             | Nobel prize. But people did say to their doctors: "Are you
             | a Nobel winner, doctor?". Don't think they cared about
             | particular prize either.
        
               | red75prime wrote:
               | > Linus's harmful and evil promotion of Vitamin C
               | 
               | Which is "harmful and evil" thanks to your
               | afterknowledge. He had based his books on the research
               | that failed to replicate. But given low toxicity of
               | vitamin C it's not that "evil" to recommend treatment
               | even if probabilistic estimation of positive effects is
               | not that high.
               | 
               | Sloppy, but not exceptionally bad. At least it was
               | instrumental in teaching me to not expect marvels coming
               | from dietary research.
        
         | pfisherman wrote:
         | I used to do bench top work too; and was blessed with "the
         | golden hands" in that I could almost always get protocols
         | working. To me this always felt more like intuition than
         | deductive reasoning. And it made me a terrible TA. My advice to
         | students in lab was always something along the lines of "just
         | mess around with it, and see how it works." Not very helpful
         | for the stressed and struggling student -_-
         | 
         | Digression aside, my point is that I don't think we know
         | exactly what makes or defines "the golden hands". And if that
         | is the case, can we optimize for it?
         | 
         | Another point is that scalable fine tuning only works for
         | verifiable stuff. Think a priori knowledge. To me that seems to
         | be at the opposite end of the spectrum from "mess with it and
         | see what happens".
        
           | busyant wrote:
           | > blessed with "the golden hands" in that I could almost
           | always get protocols working.
           | 
           | Very funny. My friends and I never used the phrase "golden
           | hands" but we used to say something similar: "so-and-so has
           | 'great hands'".
           | 
           | But it meant the same thing.
           | 
           | I, myself, did not have great hands, but my comment was more
           | about the intellectual process of conducting research.
           | 
           | I guess my point was that:
           | 
           | * I've already dealt with more talented researchers, but I
           | still contributed meaningfully.
           | 
           | * Hopefully, the "AI" will simply add another layer of
           | talent, but the rest of us lesser mortals will still be able
           | to contribute.
           | 
           | But I don't know if I'm correct.
        
         | vouaobrasil wrote:
         | I was just thinking about this. I already posted a comment
         | here, but I will say that as a mathematician (PhD in number
         | theory), that for me, AI signficantly takes away the beauty of
         | doing mathematics within a realm in which AI is used.
         | 
         | The best part of math (again, just for me) is that it was a
         | journey that was done by hand with only the human intellect
         | that computers didn't understand. The beauty of the subject was
         | precisely that it was a journey of human intellect.
         | 
         | As I said elsewhere, my friends used to ask me why something
         | was true and it was fun to explain it to them, or ask them and
         | have them explain it to me. Now most will just use some AI.
         | 
         | Soulless, in my opinion. Pure mathematics should be about the
         | art of the thing, not producing results on an assembly line
         | like it will be with AI. Of course, the best mathematicians are
         | going into this because it helps their current careers, not
         | because it helps the future of the subject. Math done with AI
         | will be a lot like Olympic running done with performance-
         | enhancing drugs.
         | 
         | Yes, we will get a few more results, faster. But the results
         | will be entirely boring.
        
           | zmgsabst wrote:
           | Presumably people who get into math going forward will feel
           | differently.
           | 
           | For myself, chasing lemmas was always boring -- and there's
           | little interest in doing the busywork of fleshing out a
           | theory. For me, LLMs are a great way to do the fun parts
           | (conceptual architecture) without the boring parts.
           | 
           | And I expect we'll such much the same change as with physics:
           | computers increase the complexity of the objects we study,
           | which tend to be rather simple when done by hand -- eg,
           | people don't investigate patterns in the diagrams of
           | group(oids) because drawing million element diagrams isn't
           | tractable by hand. And you only notice the patterns in them
           | when you see examples of the diagrams at scale.
        
             | ndriscoll wrote:
             | Even current people will feel differently. I don't bemoan
             | the fact that Lean/Mathlib has `simp` and `linarith` to
             | automate trivial computations. A "copilot for Lean" that
             | can turn "by induction, X" or "evidently Y" into a formal
             | proof sounds great.
             | 
             | The the trick is teaching the thing how high powered of
             | theorems to use or how to factor out details or not
             | depending on the user's level of understanding. We'll have
             | to find a pedagogical balance (e.g. you don't give
             | `linarith` to someone practicing basic proofs), but I'm
             | sure it will be a great tool to aid human understanding.
             | 
             | A tool to help translate natural language to formal
             | propositions/types also sounds great, and could help more
             | people to use more formal methods, which could make for
             | more robust software.
        
             | vouaobrasil wrote:
             | Just a counterpoint, but I wonder how much you'll really
             | understand if you can't even prove the whole thing
             | yourself. Personally, I learn by proving but I guess
             | everyone is different.
        
               | daxfohl wrote:
               | My hunch is it won't be much different, even when we can
               | simply ask a machine that doesn't have a cached proof,
               | "prove riemann hypothesis" and it thinks for ten seconds
               | and spits out a fully correct proof.
               | 
               | As Erdos(I think?) said, great math is not about the
               | answers, it's about the questions. Or maybe it was
               | someone else, and maybe "great mathematicians" rather
               | than "great math". But, gist is the same.
               | 
               | "What happens when you invent a thing that makes a
               | function continuous (aka limit point)"? "What happens
               | when you split the area under a curve into infinitesimal
               | pieces and sum them up"? "What happens when you take the
               | middle third out of an interval recursively"? "Can we
               | define a set of axioms that underlie all mathematics"?
               | "Is the graph of how many repetitions it takes for a
               | complex number to diverge interesting"? I have a hard
               | time imagining computers would ever have a strong enough
               | understanding of the human experience with mathematics to
               | even begin pondering such questions unprompted, let alone
               | answer them and grok the implications.
               | 
               | Ultimately the truths of mathematics, the answers, soon
               | to be proved primarily by computers, already exist.
               | Proving a truth does not create the truth; the truth
               | exists independent of whether it has been proved or not.
               | So fundamentally math is closer to archeology than it may
               | appear. As such, AI is just a tool to help us dig with
               | greater efficiency. But it should not be considered or
               | feared as a replacement for mathematicians. AI can never
               | take away the enlightenment of discovering something new,
               | even if it does all the hard work itself.
        
               | vouaobrasil wrote:
               | > I have a hard time imagining computers would ever have
               | a strong enough understanding of the human experience
               | with mathematics to even begin pondering such questions
               | unprompted, let alone answer them and grok the
               | implications.
               | 
               | The key is that the good questions however come from
               | hard-won experience, not lazily questioning an AI.
        
           | hn3er1q wrote:
           | There are many similarities in your comment to how
           | grandmasters discuss engines. I have a hunch the arc of AI in
           | math will be very similar to the arc of engines in chess.
           | 
           | https://www.wired.com/story/defeated-chess-champ-garry-
           | kaspa...
        
             | vouaobrasil wrote:
             | I agree with that, in the sense that math will become more
             | about who can use AI the fastest to generate the most
             | theories, which sort of side-steps the whole point of math.
        
               | hn3er1q wrote:
               | As a chess aficionado and a former tournament player, who
               | didn't get very far, I can see pros & cons. They helped
               | me train and get significantly better than I would've
               | gotten without them. On the other hand, so did the
               | competition. :) The average level of the game is so much
               | higher than when I was a kid (30+ years ago) and new ways
               | of playing that were unthinkable before are possible now.
               | On the other hand cheating (online anyway) is rampant and
               | all the memorization required to begin to be competitive
               | can be daunting, and that sucks.
        
               | vouaobrasil wrote:
               | Hey I play chess too. Not a very good player though. But
               | to be honest, I enjoy playing with people who are not
               | serious because I do think an overabundance of knowledge
               | makes the game too mechanical. Just my personal
               | experience, but I think the risk of cheaters who use
               | programs and the overmechanization of chess is not worth
               | becoming a better player. (And in fact, I think MOST
               | people can gain satisfaction by improving just by
               | studying books and playing. But I do think that a few who
               | don't have access to opponents benefit from a chess-
               | playing computer).
        
           | _jayhack_ wrote:
           | If you think the purpose of pure math is to provide
           | employment and entertainment to mathematicians, this is a
           | dark day.
           | 
           | If you believe the purpose of pure math is to shed light on
           | patterns in nature, pave the way for the sciences, etc., this
           | is fantastic news.
        
             | shadowerm wrote:
             | We also seem to suffer these automation delusions right
             | now.
             | 
             | I could see how AI could assist me with learning pure math
             | but the idea AI is going to do pure math for me is just
             | absurd.
             | 
             | Not only would I not know how to start, more importantly I
             | have no interest in pure math. There will still be a huge
             | time investment to get up to speed with doing anything with
             | AI and pure math.
             | 
             | You have to know what questions to ask. People with domain
             | knowledge seem to really be selling themselves short. I am
             | not going to randomly stumble on a pure math problem prompt
             | when I have no idea what I am doing.
        
             | vouaobrasil wrote:
             | Well, 99% of pure math will never leave the domain of pure
             | math so I'm really not sure what you are talking about.
        
           | raincole wrote:
           | > Now most will just use some AI.
           | 
           | Do people with PhD in math really ask AI to explain math
           | concepts to them?
        
             | vouaobrasil wrote:
             | They will, when it becomes good enough to prove tricky
             | things.
        
           | agentultra wrote:
           | I think it will become apparent how bad they are at it.
           | They're algorithms and not sentient beings. They do not think
           | of themselves, their place in the world, and do not fathom
           | the contents of the minds of others. They do no care what
           | others think of them.
           | 
           | Whatever they write only happens to contain some truth by
           | virtue of the model and the training data. An algorithm
           | doesn't know what truth is or why we value it. It's a
           | bullshitter of the highest calibre.
           | 
           | Then comes the question: will they write proofs that we will
           | consider beautiful and elegant, that we will remember and
           | pass down?
           | 
           | Or will they generate what they've been asked to and nothing
           | less? That would be utterly boring to read.
        
           | jvvw wrote:
           | I agree wholeheartedly about the beauty of doing mathematics.
           | I will add though that the author of this article, Kevin
           | Buzzard, doesn't need to do this for his career and from what
           | I know of him is somebody who very much cares about
           | mathematics and the future of the subject. The fact that a
           | mathematician of that calibre is interested in this makes me
           | more interested.
        
         | nyrikki wrote:
         | What LLMs can do is limited, they are superior to wet-wear in
         | some tasks like finding and matching patterns in higher
         | dimensional space, they are still fundamentally limited to a
         | tiny class of problems outside of that pattern finding and
         | matching.
         | 
         | LLMs will be tools for some math needs and even if we ever get
         | quantum computers will be limited in what they can do.
         | 
         | LLMs, without pattern matching, can only do up to about integer
         | division, and while they can calculate parity, they can't use
         | it in their calculations.
         | 
         | There are several groups sitting on what are known limitations
         | of LLMs, waiting to take advantage of those who don't
         | understand the fundamental limitations, simplicity bias etc...
         | 
         | The hype will meet reality soon and we will figure out where
         | they work and where they are problematic over the next few
         | years.
         | 
         | But even the most celebrated achievements like proof finding
         | with Lean, heavily depends on smart people producing hints that
         | machines can use.
         | 
         | Basically lots of the fundamental hints of the limits of
         | computation still hold.
         | 
         | Model logic may be an accessable way to approach the limits of
         | statistical inference if you want to know one path yourself.
         | 
         | A lot of what is in this article relates to some the known
         | fundamental limitations.
         | 
         | Remember that for all the amazing progress, one of the core
         | founders of the perceptron, Pitts drank him self to death in
         | the 50s because it was shown that they were insufficient to
         | accurately model biological neurons.
         | 
         | Optimism is high, but reality will hit soon.
         | 
         | So think of it as new tools that will be available to your
         | child, not a replacement.
        
           | ComplexSystems wrote:
           | "LLMs, without pattern matching, can only do up to about
           | integer division, and while they can calculate parity, they
           | can't use it in their calculations." - what do you mean by
           | this? Counting the number of 1's in a bitstring and
           | determining if it's even or odd?
        
             | nyrikki wrote:
             | Yes, in this case PARITY is determining if the number of 1s
             | in a binary input is odd or even
             | 
             | It is an effect of the complex to unpack descriptive
             | complexity class DLOGTIME-uniform TC0, which has AND, OR
             | and MAJORITY gates.
             | 
             | http://arxiv.org/abs/2409.13629
             | 
             | The point being that the ability to use parity gates is
             | different than being able to calculate it, which is where
             | the union of the typically ram machine DLOGTIME with the
             | circuit complexity of uniform TC0 comes into play.
             | 
             | PARITY, MAJ, AND, and OR are all symmetric, and are in TCO,
             | but PARITY is not in DLOGTIME-uniform TC0, which is first-
             | order logic with Majority quantifiers.
             | 
             | Another path, if you think about symantic properties and
             | Rice's theorem, this may make sense especially as PAC
             | learning even depth 2 nets is equivalent to the approximate
             | SVP.
             | 
             | PAC-learning even depth-2 threshold circuits is NP-hard.
             | 
             | https://www.cs.utexas.edu/~klivans/crypto-hs.pdf
             | 
             | For me thinking about how ZFC was structured so we can keep
             | the niceties of the law of the excluded middle, and how
             | statistics pretty much depends on it for the central limit
             | and law of large numbers, IID etc...
             | 
             | But that path runs the risk of reliving the Brouwer-Hilbert
             | controversy.
        
         | TheRealPomax wrote:
         | What part do you think is going to become obsolete? Because
         | Math isn't about "working out the math", it's about finding the
         | relations between seemingly unrelated things to bust open a
         | problem. Short of AGI, there is no amount of neural net that's
         | going to realize that a seemingly impossible probabilistic
         | problem is actually equivalent to a projection of an easy to
         | work with 4D geometry. "Doing the math" is what we have
         | computers for, and the better they get, the easier the tedious
         | parts of the job become, but "doing math" is still very much a
         | human game.
        
           | busyant wrote:
           | > What part do you think is going to become obsolete?
           | 
           | Thank you for the question.
           | 
           | I guess what I'm saying is:
           | 
           | Will LLMs (or whatever comes after them) be _so_ good and
           | _so_ pervasive that we will simply be able to say, "Hey
           | ChatGPT-9000, I'd like to see if the xyz conjecture is
           | correct." And then ChatGPT-9000 just does the work without us
           | contributing beyond asking a question.
           | 
           | Or will the technology be limited/bound in some way such that
           | we will still be able to use ChatGPT-9000 as a tool of our
           | own intellectual augmentation and/or we could still
           | contribute to research even without it.
           | 
           | Hopefully, my comment clarifies my original post.
           | 
           | Also, writing this stuff has helped me think about it more. I
           | don't have any grand insight, but the more I write, the more
           | I lean toward the outcome that these machines will allow us
           | to augment our research.
        
             | TheRealPomax wrote:
             | As amazing as they may seem, they're _still_ just
             | autocompletes, it 's inherent to what an LLM is. So unless
             | we come up with a completely new kind technology, I don't
             | see "test this conjecture for me" becoming more real than
             | the computer assisted proof tooling we already have.
        
         | hyhconito wrote:
         | Let's put it this way, from another mathematician, and I'm sure
         | I'll probably be shot for this one.
         | 
         | Every LLM release moves half of the remaining way to the
         | minimum viable goal of replacing a third class undergrad. If
         | your business or research initiative is fine with that level of
         | competence then you will find utility.
         | 
         | The problem is that I don't know anyone who would find that
         | useful. Nor does it fit within any existing working methodology
         | we have. And on top of that the verification of any output can
         | take considerably longer than just doing it yourself in the
         | first place, particularly where it goes off the rails, which it
         | does all the time. I mean it was 3 months ago I was arguing
         | with a model over it not understanding place-value systems
         | properly, something we teach 7 year olds here?
         | 
         | But the abstract problem is at a higher level. If it doesn't
         | become a general utility for people outside of mathematics,
         | which is very very evident at the moment by the poor overall
         | adoption and very public criticism of the poor result quality,
         | then the funding will dry up. Models cost lots of money to
         | train and if you don't have customers it's not happening and no
         | one is going to lend you the money any more. And then it's
         | moot.
        
           | binarymax wrote:
           | This is a great point that nobody will shoot you over :)
           | 
           | But the main question is still: assuming you replace an
           | undergrad with a model, who checks the work? If you have a
           | good process around that already, and find utility as an
           | augmented system, then get you'll get value - but I still
           | think it's better for the undergrad to still have the job and
           | be at the wheel, and does things faster and better when
           | leveraging a powerful tool.
        
             | hyhconito wrote:
             | Shot already for criticising the shiny thing (happened with
             | crypto and blockchain already...)
             | 
             | Well to be fair no one checks what the graduates do
             | properly, even if we hired KPMG in. That is until we get
             | sued. But at least we have someone to blame then. What we
             | don't want is something for the graduate to blame. The buck
             | stops at someone corporeal because that's what the
             | customers want and the regulators require.
             | 
             | That's the reality and it's not quite as shiny and happy as
             | the tech industry loves to promote itself.
             | 
             | My main point, probably cleared up with a simple point: no
             | one gives a shit about this either way.
        
           | meroes wrote:
           | Well said. As someone with only a math undergrad and as a
           | math RLHF'er, this speaks to my experience the most.
           | 
           | That craving for an understanding an elegant proof is nowhere
           | to be found with verifying an LLM's proof.
           | 
           | Like sure, you could put together a car by first building an
           | airplane, disassembling all of it minus the two front seats,
           | and having zero elegance and still get a car at the end. But
           | if you do all that and don't provide novelty in results or
           | useful techniques, there's no business.
           | 
           | Hell, I can't even get a model to calculate compound interest
           | for me (save for the technicality of prompt engineering a
           | python function to do it). What do I expect?
        
           | qnleigh wrote:
           | I think there's a pretty good case to be made that LLMs
           | paired with automated theorem provers will become a useful
           | tool to working mathematicians in the next few years. Another
           | thread here links to a lecture from a professor of
           | mathematics who makes this point about halfway in, based
           | solely on Alpha Proofs current abilities
           | (https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]).
           | Terence Tao, well-known mathematician at UCLA has been saying
           | similar things for years. He's blogged about LLMs helping to
           | learn new tools (like Lean) and occasionally helping with
           | brainstorming.
           | 
           | At this stage, the point they're making isn't 'OMG AGI!!!'
           | but rather something like 'having an enthusiastic, often
           | wrong undergrad assistant who's available 24/7 can be useful,
           | if you use it carefully.'
        
         | peterbonney wrote:
         | If you looked at how the average accountant spent their time
         | before the arrival of the digital spreadsheet, you might have
         | predicted that automated calculation would make the profession
         | obsolete. But it didn't.
         | 
         | This time could be different, of course. But I'll need a lot
         | more evidence before I start telling people to base their major
         | life decisions on projected technological change.
         | 
         | That's before we even consider that only a very slim minority
         | of the people who study math (or physics or statistics or
         | biology or literature or...) go on to work in the field of math
         | (or physics or statistics or biology or literature or...). AI
         | could completely take over math research and still have next to
         | impact on the value of the skills one acquires from studying
         | math.
         | 
         | Or if you want to be more fatalistic about it: if AI is going
         | to put everyone out of work then it doesn't really matter what
         | you do now to prepare for it. Might as well follow your
         | interests in the meantime.
        
           | blagie wrote:
           | It's important to base life decisions on very real
           | technological change. We don't know what the change will be,
           | but it's coming. At the very least, that suggests more
           | diverse skills.
           | 
           | We're all usually (but not always) better off, with more
           | productivity, eventually, but in the meantime, jobs do
           | disappear. Robotics did not fully displace machinists and
           | factory workers, but single-skilled people in Detroit did not
           | do well. The loom, the steam engine... all of them displaced
           | often highly-trained often low-skilled artisans.
        
             | rafaelmn wrote:
             | If AI reaches this level socioeconomic impact is going to
             | be so immense, that choosing what subject you study will
             | have no impact on your outcome - no matter what it is - so
             | it's a pointless consideration.
        
         | bawolff wrote:
         | I doubt it.
         | 
         | Most likely AI will be good at some things and not others, and
         | mathematicians will just move to whatever AI isn't good at.
         | 
         | Alternatively, if AI is able to do all math at a level above
         | PhDs, then its going to be a brave new world and basically the
         | singularity. Everything will change so much that speculating
         | about it will probably be useless.
        
         | ykonstant wrote:
         | > I even had a Nobel Laureate once tell me that my research was
         | simply "dotting 'i's and crossing 't's".
         | 
         | (.*<*.)
        
         | ccppurcell wrote:
         | The mathematicians of the future will still have to figure out
         | the right questions, even if llms can give them the answers.
         | And "prompt engineering" will require mathematical skills, at
         | the very least.
         | 
         | Evaluating the output of llms will also require mathematical
         | skills.
         | 
         | But I'd go further, if your son enjoys mathematics and has some
         | ability in the area, it's wonderful for your inner life. Anyone
         | who becomes sufficiently interested in anything will rediscover
         | mathematics lurking at the bottom.
        
         | jvvw wrote:
         | Another PhD in maths here and I would say not to worry. It's
         | the process of doing and understanding mathematics, and
         | thinking mathematically that is ultimately important.
         | 
         | There's never been the equivalent of the 'bench scientist' in
         | mathematics and there aren't many direct careers in
         | mathematics, or pure mathematics at least - so very few people
         | ultimately become researchers. Instead, I think you take your
         | way of thinking and apply it to whatever else you do (and it
         | certainly doesn't do any harm to understand various
         | mathematical concepts incredibly well).
        
         | qnleigh wrote:
         | Highly recommend this lecture by a working mathematician shared
         | above (https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]).
         | It's very much grounded in history and experience, much more so
         | than in speculation. I wrote a brief summary of some main
         | points in that thread.
         | 
         | But specifically to your worry about humans just dotting i's
         | and crossing t's, he predicts that exactly the opposite will
         | happen. At the end he emphasizes that the ultimate goal of
         | mathematics is more about human understanding than proving
         | theorems.
        
           | busyant wrote:
           | Thank you.
        
       | jokoon wrote:
       | I wish scientists who do psychology and cognition of actual
       | brains could approach those AI things and talk about it, and
       | maybe make suggestions.
       | 
       | I really really wish AI would make some breakthrough and be
       | really useful, but I am so skeptical and negative about it.
        
         | joe_the_user wrote:
         | Unfortunately, the scientists who study actually brains have
         | all sort of interesting models but ultimately very little clue
         | _how_ these actual brains work at the level of problem solving.
         | I mean, there 's all sort of "this area is associated with that
         | kind of process" and "here's evidence this area does this
         | algorithm" stuff but it's all at the level you imagine steam
         | engine engineers trying to understand a warp drive.
         | 
         | The "open worm project" was an effort years ago to get computer
         | scientists involved in trying to understand what "software" a
         | very small actual brain could run. I believe progress here has
         | been very slow and that an idea of ignorance that much larger
         | brains involve.
         | 
         | https://en.wikipedia.org/wiki/OpenWorm
        
         | bongodongobob wrote:
         | If you can't find useful things for LLMs or AI at this point,
         | you must just lack imagination.
        
       | 0points wrote:
       | > How much longer this will go on for nobody knows, but there are
       | lots of people pouring lots of money into this game so it would
       | be a fool who bets on progress slowing down any time soon.
       | 
       | Money cannot solve the issues faced by the industry which mainly
       | revolves around lack of training data.
       | 
       | They already used the entirety of the internet, all available
       | video, audio and books and they are now dealing with the fact
       | that most content online is now generated by these models, thus
       | making it useless as training data.
        
       | charlieyu1 wrote:
       | One thing I know is that there wouldn't be machines entering IMO
       | 2025. The concept of "marker" does not exist in IMO - scores are
       | decided by negotiations between team leaders of each country and
       | the juries. It is important to get each team leader involved for
       | grading the work of students for their country, for
       | accountability as well as acknowledging cultural differences. And
       | the hundreds of people are not going to stay longer to grade AI
       | work.
        
       | witnesser2 wrote:
       | I was not refuted sufficiently a couple of years ago. I claimed
       | "training is open boundary" etc.
        
         | witnesser2 wrote:
         | Like as a few years ago, I just boringly add again "you need
         | modeling" to close it.
        
       | mangomountain wrote:
       | In other news we've discovered life (our bacteria) on mars Just
       | joking
        
       | Syzygies wrote:
       | "Can AI do math for us" is the canonical wrong question. People
       | want self-driving cars so they can drink and watch TV. We should
       | crave tools that enhance our abilities, as tools have done since
       | prehistoric times.
       | 
       | I'm a research mathematician. In the 1980's I'd ask everyone I
       | knew a question, and flip through the hard bound library volumes
       | of Mathematical Reviews, hoping to recognize something. If I was
       | lucky, I'd get a hit in three weeks.
       | 
       | Internet search has shortened this turnaround. One instead needs
       | to guess what someone else might call an idea. "Broken circuits?"
       | Score! Still, time consuming.
       | 
       | I went all in on ChatGPT after hearing that Terry Tao had learned
       | the Lean 4 proof assistant in a matter of weeks, relying heavily
       | on AI advice. It's clumsy, but a very fast way to get
       | suggestions.
       | 
       | Now, one can hold involved conversations with ChatGPT or Claude,
       | exploring mathematical ideas. AI is often wrong, never knows when
       | it's wrong, but people are like this too. Read how the insurance
       | incidents for self-driving taxis are well below the human
       | incident rates? Talking to fellow mathematicians can be
       | frustrating, and so is talking with AI, but AI conversations go
       | faster and can take place in the middle of the night.
       | 
       | I don't want AI to prove theorems for me, those theorems will be
       | as boring as most of the dreck published by humans. I want AI to
       | inspire bursts of creativity in humans.
        
         | ninetyninenine wrote:
         | Your optimism should be tempered with the downside of progress
         | meaning that AI in the near future may not only inspire
         | creativity in humans, but it can replace human creativity all
         | together.
         | 
         | Why do I need to hire an artist for my movie/video
         | game/advertisement when AI can replicate all the creativity I
         | need.
        
           | wnc3141 wrote:
           | There is research on AI limiting creative output in
           | completive arenas. Essentially it breaks expectancy therefore
           | deteriorates iteration.
           | 
           | https://direct.mit.edu/rest/article-
           | abstract/102/3/583/96779...
        
           | immibis wrote:
           | This was about mathematics.
        
         | didibus wrote:
         | I think I'm missing your point? You still want to enjoy doing
         | math yourself? Is that what you are saying? So you equate "Can
         | AI do math in my place?" with "Can AI drink and watch TV in my
         | place?"
        
           | bubble12345 wrote:
           | AI will not do math for us, but maybe eventually it will lead
           | to another mainstream tool for mathematicians. Along with R,
           | Matlab, Sage, GAP, Magma, ...
           | 
           | It would be interesting if in the future mathematicians are
           | just as fluent in some (possibly AI-powered) proof verifying
           | tool, as they are with LaTeX today.
        
             | bufferoverflow wrote:
             | AI can already do a bunch of math. So "AI will not do math
             | for us" is just factually wrong.
        
               | fooker wrote:
               | Your idea of 'do math' is a bit different from this
               | context.
               | 
               | Here it means do math research or better, find new math.
        
               | vlovich123 wrote:
               | Can AI solve "toy" math problems that computers have not
               | been able to do? Yes. Can AI produce novel math research?
               | No, it hasn't yet. So "AI will not do math for us" is
               | only factually wrong if you take the weaker definition of
               | "doing math for us". The stronger definition is not
               | factually wrong yet.
               | 
               | More problematic with that statement is that a timeline
               | isn't specified. 1 year? Probably not. 10 years?
               | Probably. 20 years? Very likely. 100 years? None of us
               | here will be alive to be proven wrong but I'll venture
               | that that's a certainty.
        
               | khafra wrote:
               | This is a pretty strong position to take in the comments
               | of a post where a mathematician declared the 5 problems
               | he'd seen to be PhD level, and speculated that the real
               | difficulty with switching from numerical answers to
               | proofs will be finding humans qualified to judge the AI's
               | answers.
               | 
               | I will agree that it's likely none of us here will be
               | alive to be proven wrong, but that's in the 1 to 10 year
               | range.
        
           | elbear wrote:
           | In a way, AI is part of the process, but it's a collaborative
           | process. It doesn't do all the work.
        
           | whimsicalism wrote:
           | Ingredients to a top HN comment on AI include some nominal
           | expert explaining why actually labor won't be replaced and it
           | will be a collaborative process so you don't need to worry
           | sprinkled with a little bit of 'the status quo will stay
           | still even though this tech only appeared in the last 2
           | years'
        
             | FiberBundle wrote:
             | It didn't appear in the last two years. We have had deep
             | learning based autoregressive language models (like
             | Word2Vec) for at least 10 years.
        
               | fosk wrote:
               | Early computer networks appeared in the 1960s and the
               | public internet as we know it in the 1990s.
               | 
               | We are still early in AI.
        
               | whimsicalism wrote:
               | totally, and i've been working with attention since at
               | least 2017. but i'm colloquially referring to the real
               | breakout and substantial scale up in resources being
               | thrown at it
        
         | amanda99 wrote:
         | > AI is often wrong, never knows when it's wrong, but people
         | are like this too.
         | 
         | When talking with various models of ChatGPT about research
         | math, my biggest gripe is that it's either confidently right
         | (10% of my work) or confidently wrong (90%). A human researcher
         | would be right 15% of the time, unsure 50% of the time, and
         | give helpful ideas that are right/helpful (25%) or wrong/a red
         | herring (10%). And only 5% of the time would a good researcher
         | be confidently wrong in a way that ChatGPT is often.
         | 
         | In other words, ChatGPT completely lacks the meta-layer of
         | "having a feeling/knowing how confident it is", which is so
         | useful in research.
        
           | portaouflop wrote:
           | A human researcher that is basically right 40%-95% of the
           | time would probably an Einstein level genius.
           | 
           | Just assume that the LLM is wrong and test their assumptions
           | - math is one of the few disciplines where you can do that
           | easily
        
             | rednerrus wrote:
             | It's pretty easy to test when it makes coding mistakes as
             | well. It's also really good at "Hey that didn't work,
             | here's my error message."
        
             | amanda99 wrote:
             | I think you are imagining a different class of "questions".
             | 
             | To clarify, I was doing research on applied math. My field
             | is not analysis, but I needed to prove some bounds on
             | certain messed up expressions (involving special functions,
             | etc), and analyze an ODE that's not analytically solvable.
             | I used the COT model a fair bit.
             | 
             | I would ask ChatGPT for hints/ideas/direction in proving
             | various bounds, asking it for theorems or similar results
             | in literature. This is exactly the kind of thing where a
             | researcher would go "yeah this looks like X" or "I think I
             | saw something like this in (book/article name)", or just
             | know a method; or alternatively say they have no clue.
             | ChatGPT most often will confidently give me a "solution",
             | being right 10% of the time (when there's a pretty standard
             | way to do it that I didn't see/know).
             | 
             | On the whole it was quite useful.
        
           | eleveriven wrote:
           | Do you think there's potential for AI to develop a kind of
           | probabilistic reasoning?
        
             | Sparkyte wrote:
             | It think it is every sci-fiction dreamer to teach a robot
             | to love.
             | 
             | I don't think AI will think conventionally. It isn't
             | thinking to begin with. It is weighing options. Those
             | options permutate and that is why every response is
             | different.
        
           | halayli wrote:
           | these numbers are just your perception. The way you ask the
           | question will very much influence the output and certain
           | topics more than others. I get much better results when I
           | share my certainty levels in my questions and say things like
           | "if at all", "if any" etc.
        
             | mrbungie wrote:
             | Yeah, blame the users for "using it wrong" (phrase of the
             | week I would say after the o3 discussions), and then sell
             | the solution as almost-AGI.
             | 
             | PS: I'm starting to see a lot of plausible deniability in
             | some comments about LLMs capabilites. When LLMs do great =>
             | "cool, we are scaling AI". when LLMs do something wrong =>
             | "user problem", "skill issues", "don't judge a fish for its
             | ability to fly".
        
             | vector_spaces wrote:
             | I agree with this approach and use it myself, but these
             | confidence markers can also skew output in undesirable
             | ways. All of these heuristics are especially fragile when
             | the subject matter touches the frontiers of what is known.
             | 
             | In any case my best experiences with LLMs for pure math
             | research have been for exploring the problem space and
             | ideation -- queries along the line of "Here's a problem I'm
             | working on ... . Do any other fields have a version of this
             | problem, but framed differently?" or "Give me some totally
             | left field methods, even if they are from different fields
             | or unlikely to work. Assume I've exhausted all the
             | 'obvious' approaches from field X"
        
             | amanda99 wrote:
             | > these numbers are just your perception.
             | 
             | Of course they are, I hoped it was clear I was just sharing
             | my experience trying to use it for research!
             | 
             | I did in general word it as I would a question to a
             | researcher, which includes an uncertainty in it being true.
             | E.g. this is from a recent prompt: "is this true in
             | general, if not, what are the conditions for this to be
             | true?"
        
         | heresie-dabord wrote:
         | > Talking to fellow <humans> can be frustrating, and so is
         | talking with AI, but AI conversations go faster and can take
         | place in the middle of the night.
         | 
         | I made a slight change to generalise your statement, I think
         | you have summarised the actual marketing opportunity.
        
         | eleveriven wrote:
         | The analogy with self-driving cars is spot on
        
         | pontus wrote:
         | I agree. I think it comes down to the motivation behind why one
         | does mathematics (or any other field for that matter). If it's
         | a means to an end, then sure have the AI do the work and get
         | rid of the researchers. However, that's not why everyone does
         | math. For many it's more akin to why an artist paints. People
         | still paint today even though a camera can produce much more
         | realistic images. It was probably the case (I'm guessing!) that
         | there was a significant drop in jobs for artists-for-hire, for
         | whom painting was just a means to an end (e.g. creating a
         | portrait), but the artists who were doing it for the sake of
         | art survived and were presumably made better by the ability to
         | see photos of other places they want to paint or art from other
         | artists due to the invention of the camera.
        
         | goalieca wrote:
         | > People want self-driving cars so they can drink and watch TV.
         | We should crave tools that enhance our abilities, as tools have
         | done since prehistoric times.
         | 
         | Improved tooling and techniques have given humans the free time
         | and resources needed for arts, culture, philosophy, sports, and
         | spending time to enjoy life! Fancy telecom technologies have
         | allowed me to work from home and i love it :)
        
         | rfurmani wrote:
         | Absolutely agree. There's some interesting articles in a recent
         | [AMS Bulletin](https://www.ams.org/journals/bull/2024-61-02/hom
         | e.html?activ...) giving perspectives on this question: what
         | does it do to math if there's a strong theorem prover out
         | there, in what ways can AI help mathematicians, what is math
         | exactly?
         | 
         | I find that a lot of AI+Math work is focused on the end game
         | where you have a clear problem to solve, rather than the early
         | exploratory work where most of the time is spent. The challenge
         | is in making the right connections and analogies, discovering
         | hidden useful results, asking the right questions, translating
         | between fields.
         | 
         | I'm getting ready to launch [Sugaku](https://sugaku.net), where
         | I'm trying to build tools for the above, based on processing
         | the published math literature and training models on it. The
         | kind of search of MR that you mentioned doing is exactly what a
         | computer should do instead. I can create an account for you and
         | would love some feedback.
        
       | Onavo wrote:
       | Considering that they have Terence Tao himself working on the
       | problem, betting against it would be unwise.
        
       | Sparkyte wrote:
       | After playing with and using AI for almost two years now it is
       | not getting better from both a cost perspective and performance.
       | 
       | So the higher the cost the better the performance. While models
       | and hardware can be improved the curve is still steep.
       | 
       | The big answer is what are people using it for? We'll they are
       | using lightweight simplistic models to do targeted tasks. To do
       | many smaller and easier to process tasks.
       | 
       | Most of the news on AI is just there to promote a product to earn
       | more cash.
        
       | aomix wrote:
       | No comment on the article it's just always interesting to get hit
       | with intense jargon from a field I know very little about.
       | 
       |  _I understood the statements of all five questions. I could do
       | the third one relatively quickly (I had seen the trick before
       | that the function mapping a natural n to alpha^n was p-adically
       | continuous in n iff the p-adic valuation of alpha-1 was
       | positive)_
        
       | YeGoblynQueenne wrote:
       | >> There were language models before ChatGPT, and on the whole
       | they couldn't even write coherent sentences and paragraphs.
       | ChatGPT was really the first public model which was coherent.
       | 
       | If that's referring to Large Language Models, meaning everything
       | after the fist GPT and BERT, then that's absolutely not right.
       | The first LLM that demonstrated the ability to generate coherent,
       | fluently grammatical English was GPT-2. That story about the
       | unicorns- that was the first time a statistical language model
       | was able to generate text that stayed on the subject over a long
       | distance _and_ made (some) sense.
       | 
       | GPT-2 was followed by GPT 3 and GPT 3.5 that turned the hype dial
       | up to 11 and were certainly "public" at least if that means
       | publicly available. They were coherent enough that many people
       | predicted all sorts of fancy things, like the end of programming
       | jobs and the end of journalist jobs and so on.
       | 
       | So, weird statement that one and it kind of makes me wary of
       | Gell-Mann amnesia while reading the article.
        
       ___________________________________________________________________
       (page generated 2024-12-24 23:00 UTC)