hngopher.com

       [HN Gopher] Can AI do maths yet? Thoughts from a mathematician
       ___________________________________________________________________
        
       Can AI do maths yet? Thoughts from a mathematician
        
       Author : mathgenius
       Score  : 231 points
       Date   : 2024-12-23 10:50 UTC (12 hours ago)
        
 (HTM) web link (xenaproject.wordpress.com)
 (TXT) w3m dump (xenaproject.wordpress.com)
        
       | noFaceDiscoG668 wrote:
       | "once" the training data can do it, LLMs will be able to do it.
       | and AI will be able to do math once it comes to check out the
       | lights of our day and night. until then it'll probably wonder
       | continuously and contiguously: "wtf! permanence! why?! how?! by
       | my guts, it actually fucking works! why?! how?!"
        
         | tossandthrow wrote:
         | I do think it is time to start questioning whether the utility
         | of ai solely can be reduced to the quality of the training
         | data.
         | 
         | This might be a dogma that needs to die.
        
           | noFaceDiscoG668 wrote:
           | I tried. I don't have the time to formulate and scrutinise
           | adequate arguments, though.
           | 
           | Do you? Anything anywhere you could point me to?
           | 
           | The algorithms live entirely off the training data. They
           | consistently fail to "abduct" (inference) beyond any
           | language-in/of-the-training-specific information.
        
             | jstanley wrote:
             | The best way to predict the next word is to accurately
             | model the underlying system that is being described.
        
             | tossandthrow wrote:
             | It is a gradual thing. Presumably the models are inferring
             | things on runtime that was not a part of their training
             | data.
             | 
             | Anyhow, philosophically speaking you are also only exposed
             | to what your senses pick up, but presumably you are able to
             | infer things?
             | 
             | As written: this is a dogma that stems from a limited
             | understanding of what algorithmic processes are and the
             | insistence that emergence can not happen from algorithmic
             | systems.
        
           | croes wrote:
           | If not bad training data shouldn't be problem
        
             | kergonath wrote:
             | There can be more than one problem. The history of
             | computing (or even just the history of AI) is full of
             | things that worked better and better right until they hit a
             | wall. We get diminishing returns adding more and more
             | training data. It's really not hard to imagine a series of
             | breakthroughs bringing us way ahead of LLMs.
        
         | Flenkno wrote:
         | AWS announced 2 or 3 weeks a way of formulating rules into a
         | formal language.
         | 
         | AI doesn't need to learn everything, our LLM Models already
         | contain EVERYTHING. Including ways of how to find a solution
         | step by step.
         | 
         | Which means, you can tell an LLM to translate whatever you
         | want, into a logical language and use an external logic
         | verifier. The only thing a LLM or AI needs to 'understand' at
         | this point is to make sure that the statistical translation
         | from left to right is high enough.
         | 
         | Your brain doesn't just do logic out of the box, You conclude
         | things and formulate them.
         | 
         | And plenty of companies work on this. Its the same with
         | programming, if you are able to write code and execute it, you
         | execute it until the compiler errors are gone. Now your LLM can
         | write valid code out of the box. Let the LLM write unit tests,
         | now it can verify itself.
         | 
         | Claude for example offers you, out of the box, to write a
         | validation script. You can give claude back the output of the
         | script claude suggested to you.
         | 
         | Don't underestimate LLMs
        
           | TeamDman wrote:
           | Is this the AWS thing you referenced?
           | https://aws.amazon.com/what-is/automated-reasoning/
        
       | casenmgreen wrote:
       | I may be wrong, but I think it a silly question. AI is basically
       | auto-complete. It can do math to the extent you can find a
       | solution via auto-complete based on an existing corpus of text.
        
         | Bootvis wrote:
         | You're underestimating the emergent behaviour of these LLM's.
         | See for example what Terrence Tao thinks about o1:
         | 
         | https://mathstodon.xyz/@tao/113132502735585408
        
           | WhyOhWhyQ wrote:
           | I'm always just so pleased that the most famous mathematician
           | alive today is also an extremely kind human being. That has
           | often not been the case.
        
         | roflc0ptic wrote:
         | Pretty sure this is out of date now
        
           | noFaceDiscoG668 wrote:
           | [flagged]
        
             | kergonath wrote:
             | Why would others provide proofs when you are yourself
             | posting groundless opinions as facts in this very thread?
        
         | mdp2021 wrote:
         | > _AI is basically_
         | 
         | Very many things conventionally labelled in the 50's.
         | 
         | You are speaking of LLMs.
        
           | casenmgreen wrote:
           | Yes - I mean only to say "AI" as the term is commonly used
           | today.
        
         | esafak wrote:
         | Humans can autocomplete sentences too because we understand
         | what's going on. Prediction is a necessary criterion for
         | intelligence, not an irrelevant one.
        
       | aithrowawaycomm wrote:
       | I am fairly optimistic about LLMs as a human math -> theorem-
       | prover translator, and as a fan of Idris I am glad that the AI
       | community is investing in Lean. As the author shows, the answer
       | to "Can AI be useful for automated mathematical work?" is clearly
       | "yes."
       | 
       | But I am confident the answer to the question in the headline is
       | "no, not for several decades." It's not just the underwhelming
       | benchmark results discussed in the post, or the general concern
       | about hard undergraduate math using different skillsets than
       | ordinary research math. IMO the deeper problem still seems to be
       | a basic gap where LLMs can seemingly do formal math at the level
       | of a smart graduate student but fail at quantitative/geometric
       | reasoning problems designed for fish. I suspect this holds for
       | O3, based on one of the ARC problems it wasn't able to solve:
       | https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...
       | (via https://www.interconnects.ai/p/openais-o3-the-2024-finale-
       | of...) ANNs are simply not able to form abstractions, they can
       | only imitate them via enormous amounts of data and compute. I
       | would say there has been _zero_ progress on  "common sense" math
       | in computers since the invention of Lisp: we are still faking it
       | with expert systems, even if LLM expert systems are easier to
       | build at scale with raw data.
       | 
       | It is the same old problem where an ANN can attain superhuman
       | performance on level 1 of Breakout, but it has to be retrained
       | for level 2. I am not convinced it makes sense to say AI can do
       | math if AI doesn't understand what "four" means with the same
       | depth as a rat, even if it can solve sophisticated modular
       | arithmetic problems. In human terms, does it make sense to say a
       | straightedge-and-compass AI understands Euclidean geometry if
       | it's not capable of understanding the physical intuition behind
       | Euclid's axioms? It makes more sense to say it's a brainless tool
       | that helps with the tedium and drudgery of actually proving
       | things in mathematics.
        
         | asddubs wrote:
         | it can take my math and point out a step I missed and then show
         | me the correct procedure but still get the wrong result because
         | it can't reliably multiply 2-digit numbers
        
           | fifilura wrote:
           | Better than an average human then.
        
             | actionfromafar wrote:
             | Different than an average human.
        
           | watt wrote:
           | it's a "language" model (LLM), not a "math" model. when it is
           | generating your answer, predicting and outputing a word after
           | word it is _not_ multiplying your numbers internally.
        
         | QuadmasterXLII wrote:
         | To give a sense if scale: It's not that o3 failed to solve that
         | red blue rectangle problem once: o3 spent thousands of gpu
         | hours putting out text about that problem, creating by my math
         | about a million pages of text, and did not find the answer
         | anywhere in those pages. For other problems it did find the
         | answer around the million page mark, as at the ~$3000 per
         | problem spend setting the score was still slowly creeping up.
        
           | josh-sematic wrote:
           | If the trajectory of the past two years is any guide, things
           | that can be done at great compute expense now will rapidly
           | become possible for a fraction of the cost.
        
             | asadotzler wrote:
             | The trajectory is not a guide, unless you count the recent
             | plateauing.
        
         | aithrowawaycomm wrote:
         | Just a comment: the example o1 got wrong was actually
         | underspecified: https://anokas.substack.com/p/o3-and-arc-agi-
         | the-unsolved-ta...
         | 
         | Which is actually a problem I have with ARC (and IQ tests more
         | generally): it is computationally cheaper to go from ARC
         | transformation rule -> ARC problem than it is the other way
         | around. But this means it's pretty easy to generate ARC
         | problems with non-unique solutions.
        
       | est wrote:
       | At this stage I assume everything having a sequencial pattern can
       | and will be automated by LLM AIs.
        
         | Someone wrote:
         | I think that's provably incorrect for the current approach to
         | LLMs. They all have a horizon over which they correlate tokens
         | in the input stream.
         | 
         | So, for any LLM, if you intersperse more than that number of
         | 'X' tokens between each useful token, they won't be able to do
         | anything resembling intelligence.
         | 
         | The current LLMs are a bit like n-gram databases that do not
         | use letters, but larger units.
        
           | red75prime wrote:
           | The follow-up question is "Does it require a paradigm shift
           | to solve it?". And the answer could be "No". Episodic memory,
           | hierarchical learnable tokenization, online learning or
           | whatever works well on GPUs.
        
           | beng-nl wrote:
           | It's that a bit of an unfair sabotage?
           | 
           | Naturally, humans couldn't do it, even though they could edit
           | the input to remove the X's, but shouldn't we evaluate the
           | ability (even intelligent ability) of LLM's on what they can
           | generally do rather than amplify their weakness?
        
             | Someone wrote:
             | Why is that unfair in reply to the claim _"At this stage I
             | assume everything having a sequencial pattern can and will
             | be automated by LLM AIs."_?
             | 
             | I am not claiming LLMs aren't or cannot be intelligent, not
             | even that they cannot do magical things; I just rebuked a
             | statement about the lack of limits of LLMs.
             | 
             | > Naturally, humans couldn't do it, even though they could
             | edit the input to remove the X's
             | 
             | So, what are you claiming: that they cannot or that they
             | can? I think most people can and many would. Confronted
             | with a file containing millions of X's, many humans will
             | wonder whether there's something else than X's in the file,
             | do a 'replace all', discover the question hidden in that
             | sea of X's, and answer it.
             | 
             | There even are simple files where most humans would easily
             | spot things without having to think of removing those X's.
             | Consider a file                  How         X X X X X X
             | many        X X X X X X        days        X X X X X X
             | are         X X X X X X        there       X X X X X X
             | in          X X X X X X        a           X X X X X X
             | week?       X X X X X X
             | 
             | with a million X's on the end of each line. Spotting the
             | question in that is easy for humans, but impossible for the
             | current bunch of LLMs
        
               | int_19h wrote:
               | If you have a million Xs on the end of each line, when a
               | human is looking at that file, he's not looking at the
               | entirety of it, but only at the part that is actually
               | visible on-screen, so the equivalent task for an LLM
               | would be to feed it the same subset as input. In which
               | case they can all answer this question just fine.
        
         | palata wrote:
         | At this stage I _hope_ everything that needs to be reliable won
         | 't be automated by LLM AIs.
        
       | ned99 wrote:
       | I think this is a silly question, you could track AI's doing very
       | simple maths back in 1960 - 1970's
        
         | mdp2021 wrote:
         | It's just the worrisome linguistic confusion between AI and
         | LLMs.
        
       | jampekka wrote:
       | I just spent a few days trying to figure out some linear algebra
       | with the help of ChatGPT. It's very useful for finding conceptual
       | information from literature (which for a not-professional-
       | mathematician at least can be really hard to find and decipher).
       | But in the actual math it constantly makes very silly errors.
       | E.g. indexing a vector beyond its dimension, trying to do matrix
       | decomposition for scalars and insisting on multiplying matrices
       | with mismatching dimensions.
       | 
       | O1 is a lot better at spotting its errors than 4o but it too
       | still makes a lot of really stupid mistakes. It seems to be quite
       | far from producing results itself consistently without at least a
       | somewhat clueful human doing hand-holding.
        
         | glimshe wrote:
         | Isn't Wolfram Alpha a better "ChatGPT of Math"?
        
           | Filligree wrote:
           | Wolfram Alpha is better at actually doing math, but far worse
           | at explaining what it's doing, and why.
        
             | dartos wrote:
             | What's worse about it?
             | 
             | It never tells you the wrong thing, at the very least.
        
               | fn-mote wrote:
               | Its understanding of problems was very bad last time I
               | used it. Meaning it was difficult to communicate what you
               | wanted it to do. Usually I try to write in the
               | Mathematica language, but even that is not foolproof.
               | 
               | Hopefully they have incorporated more modern LLM since
               | then, but it hasn't been that long.
        
               | jampekka wrote:
               | Wolfram Alpha's "smartness" is often Clippy level
               | enraging. E.g. it makes assumptions of symbols based on
               | their names (e.g. a is assumed to be a constant,
               | derivatives are taken w.r.t. x). Even with Mathematica
               | syntax it tends to make such assumptions and refuses to
               | lift them even when explicitly directed. Quite often one
               | has to change the variable symbols used to try to make
               | Alpha to do what's meant.
        
               | jvanderbot wrote:
               | When you give it a large math problem and the answer is
               | "seven point one three five ... ", and it shows a plot of
               | the result v some randomly selected domain, well there
               | could be more I'd like to know.
               | 
               | You can unlock a full derivation of the solution, for
               | cases where you say "Solve" or "Simplify", but what I
               | (and I suspect GP) might want, is to know why a few of
               | the key steps might work.
               | 
               | It's a fantastic tool that helped get me through my
               | (engineering) grad work, but ultimately the breakthrough
               | inequalities that helped me write some of my best stuff
               | were out of a book I bought in desperation that basically
               | cataloged linear algebra known inequalities and
               | simplifications.
               | 
               | When I try that kind of thing with the best LLM I can use
               | (as of a few months ago, albeit), the results can get
               | incorrect pretty quickly.
        
             | amelius wrote:
             | I wish there was a way to tell Chatgpt where it has made a
             | mistake, with a single mouse click.
        
             | a3w wrote:
             | Is the explanation a pro feature? At the very end it says
             | "step by step? Pay here"
        
           | jampekka wrote:
           | Wolfram Alpha is mostly for "trivia" type problems. Or giving
           | solutions to equations.
           | 
           | I was figuring out some mode decomposition methods such as
           | ESPRIT and Prony and how to potentially extend/customize
           | them. Wolfram Alpha doesn't seem to have a clue about such.
        
           | lupire wrote:
           | No. Wolfram Alpha can't solve anything that isn't a function
           | evaluation or equation. And it can't do modular arithmetic to
           | save its unlife.
           | 
           | WolframOne/Mathematica is better, but that requires the user
           | (or ChatGPT!)to write complicated code, not natural language
           | queries.
        
           | GuB-42 wrote:
           | Wolfram Alpha can solve equations well, but it is terrible at
           | understanding natural language.
           | 
           | For example I asked Wolfram Alpha "How heavy a rocket has to
           | be to launch 5 tons to LEO with a specific impulse of 400s",
           | which is a straightforward application of the Tsiolkovsky
           | rocket equation. Wolfram Alpha gave me some nonsense about
           | particle physics (result: 95 MeV/c^2), GPT-4o did it right
           | (result: 53.45 tons).
           | 
           | Wolfram alpha knows about the Tsiolkovsky rocket equation, it
           | knows about LEO (low earth orbit), but I found no way to get
           | a delta-v out of it, again, more nonsense. It tells me about
           | Delta airlines, mentions satellites that it knows are not in
           | LEO. The "natural language" part is a joke. It is more like
           | an advanced calculator, and for that, it is great.
        
             | bongodongobob wrote:
             | You're using it wrong, you can use natural language in your
             | equation, but afaik it's not supposed to be able to do what
             | you're asking of it.
        
               | CamperBob2 wrote:
               | You know, "You're using it wrong" is usually meant to
               | carry an ironic or sarcastic tone, right?
               | 
               | It dates back to Steve Jobs blaming an iPhone 4 user for
               | "holding it wrong" rather than acknowledging a flawed
               | antenna design that was causing dropped calls. The
               | closest Apple ever came to admitting that it was their
               | problem was when they subsequently ran an employment ad
               | to hire a new antenna engineering lead. Maybe it's time
               | for Wolfram to hire a new language-model lead.
        
               | bongodongobob wrote:
               | It's not an LLM. You're simply asking too much of it. It
               | doesn't work the way you want it to, sorry.
        
         | spacemanspiff01 wrote:
         | I wonder if these are tokenization issues? I really am curious
         | about metas byte tokenization scheme...
        
           | jampekka wrote:
           | Probably mostly not. The errors tend to be
           | logical/conceptual. E.g. mixing up scalars and matrices is
           | unlikely to be from tokenization. Especially if using spaces
           | between the variables and operators, as AFAIK GPTs don't form
           | tokens over spaces (although tokens may start or end with
           | them).
        
         | lordnacho wrote:
         | The only thing I've consistently had issues with while using AI
         | is graphs. If I ask it to put some simple function, it produces
         | a really weird image that has nothing to do with the graph I
         | want. It will be a weird swirl of lines and words, and it never
         | corrects itself no matter what I say to it.
         | 
         | Has anyone had any luck with this? It seems like the only thing
         | that it just can't do.
        
           | KeplerBoy wrote:
           | You're doing it wrong. It can't produce proper graphs with
           | it's diffusion style image generation.
           | 
           | Ask it to produce graphs with python and matplotlib. That
           | will work.
        
           | thomashop wrote:
           | Ask it to plot the graph with python plotting utilities. Not
           | using its image generator. I think you need a ChatGPT
           | subscription though for it to be able to run python code.
        
             | lupire wrote:
             | You seem to get 2(?) free Python program runs per week(?)
             | as part of the 01 preview.
             | 
             | When you visit chatgpt on the free account it automatically
             | gives you the best model and then disables it after some
             | amount of work and says to come back later or upgrade.
        
               | amelius wrote:
               | Just install Python locally, and copy paste the code.
        
             | xienze wrote:
             | Shouldn't ChatGPT be smart enough to know to do this
             | automatically, based on context?
        
               | CamperBob2 wrote:
               | It was, for a while. I think this is an area where there
               | may have been some regression. It can still write code to
               | solve problems that are a poor fit for the language
               | model, but you may need to ask it to do that explicitly.
        
           | HDThoreaun wrote:
           | The agentic reasoning models should be able to fix this if
           | they have the ability to run code instead of giving each task
           | to itself. "I need to make a graph" "LLMs have difficulty
           | graphing novel functions" "Call python instead" is a line of
           | reasoning I would expect after seeing what O1 has come up
           | with on other problems.
           | 
           | Giving AI the ability to execute code is the safety peoples
           | nightmare though, wonder if we'll hear anything from them as
           | this is surely coming
        
         | amelius wrote:
         | Don't most mathematical papers contain at least one such error?
        
           | aiono wrote:
           | Where is this data from?
        
             | amelius wrote:
             | It's a question, and to be fair to AI it should actually
             | refer to papers before review.
        
       | lproven wrote:
       | Betteridge's Law applies.
        
       | LittleTimothy wrote:
       | It's fascinating that this has run into the exact same problem as
       | the Quantum research. Ie, in the quantum research to demonstrate
       | any valuable forward progress you must compute something that is
       | impossible to do with a traditional computer. If you can't do it
       | with a traditional computer, it suddenly becomes difficult to
       | verify correctness (ie, you can't just check it was matching the
       | traditional computer's answer.
       | 
       | In the same way ChatGPT scores 25% on this and the question is
       | "How close were those 25% to questions in the training set". Or
       | to put it another way we want to answer the question "Is ChatGPT
       | getting better at applying it's reasoning to out-of-set problems
       | or is it pulling more data into it's training set". Or "Is the
       | test leaking into the training".
       | 
       | Maybe the whole question is academic and it doesn't matter, we
       | solve the entire problem by pulling all human knowledge into the
       | training set and that's a massive benefit. But maybe it implies a
       | limit to how far it can push human knowledge forward.
        
         | lazide wrote:
         | If constrained by existing human knowledge to come up with an
         | answer, won't it fundamentally be unable to push human
         | knowledge forward?
        
           | actionfromafar wrote:
           | Then much of human research and development is also
           | fundamentally impossible.
        
             | AnerealDew wrote:
             | Only if you think current "AI" is on the same level as
             | human creativity and intelligence, which it clearly is not.
        
               | actionfromafar wrote:
               | I think current "AI" (i.e. LLMs) is unable to push human
               | knowledge forward, but not because it's constrained by
               | existing human knowledge. It's more like peeking into a
               | very large magic-8 ball, new answers everytime you shake
               | it. Some useful.
        
               | SJC_Hacker wrote:
               | It may be able to push human knowledge forward to an
               | extent.
               | 
               | In the past, there was quite a bit of low hanging fruit
               | such that you could have polymaths able to contribute to
               | a wide variety of fields, such as Newton.
               | 
               | But in the past 100 years or so, the problem is there is
               | so much known, it is impossible for any single person to
               | have deep knowledge of everything. e.g. its rare to find
               | a really good mathematician who also has a deep knowledge
               | (beyond intro courses) about say, chemistry.
               | 
               | Would a sufficiently powerful AI / ML model be able to
               | come up with this synthesis across fields?
        
               | lupire wrote:
               | That's not a strong reason. Yes, that means ChatGPT isn't
               | good at wholly independently pushing knowledge forward,
               | but a good brainstormer that is right even 10% of the
               | time is an incredible found of knowledge.
        
           | Havoc wrote:
           | I don't think many expect AI to push knowledge forward? A
           | thing that basically just regurgitates consensus historic
           | knowledge seems badly suited to that
        
             | calmoo wrote:
             | But apparently these new frontier models can 'reason' - so
             | with that logic, they should be able to generate new
             | knowledge?
        
             | tomjen3 wrote:
             | O1 was able to find the math problem in a recently
             | published paper, so yes.
        
           | LittleTimothy wrote:
           | Depends on your understanding of human knowledge I guess?
           | People talk about the frontier of human knowledge and if your
           | view of knowledge is like that of a unique human genius
           | pushing forward the frontier then yes - it'd be stuck. But if
           | you think of knowledge as more complex than that you could
           | have areas that are kind of within our frontier of knowledge
           | (that we could reasonably know, but don't actually know) -
           | taking concepts that we already know in one field and
           | applying them to some other field. Today the reason that
           | doesn't happen is because genius A in physics doesn't know
           | about the existence of genius B in mathematics (let alone
           | understand their research), but if it's all imbibed by "The
           | Model" then it's trivial to make that discovery.
        
             | lazide wrote:
             | I was referring specifically to the parent comments
             | statements around current AI systems.
        
           | wongarsu wrote:
           | Reasoning is essentially the creation of new knowledge from
           | existing knowledge. The better the model can reason the less
           | constrained it is to existing knowledge.
           | 
           | The challenge is how to figure out if a model is genuinely
           | reasoning
        
             | lupire wrote:
             | Reasoning is a very minor (but essential) part of knowledge
             | creation.
             | 
             | Knowledge creation comes from collecting data from the real
             | world, and cleaning it up somehow, and brainstorming
             | creative models to explain it.
             | 
             | NN/LLM's version of model building is frustrating because
             | it is quite good, but not highly "explainable". Human
             | models have higher explainability, while machine models
             | have high predictive value on test examples due to an
             | impenetrable mountain of algebra.
        
           | dinosaurdynasty wrote:
           | There are likely lots of connections that could be made that
           | no individual has made because no individual has _all of
           | existing human knowledge_ at their immediate disposal.
        
         | eagerpace wrote:
         | How much of this could be resolved if its training set were
         | reduced? Conceivably, most of the training serves only to
         | confuse the model when only aiming to solve a math equation.
        
         | newpavlov wrote:
         | >in the quantum research to demonstrate any valuable forward
         | progress you must compute something that is impossible to do
         | with a traditional computer
         | 
         | This is factually wrong. The most interesting problems
         | motivating the quantum computing research are hard to solve,
         | but easy to verify on classical computers. The factorization
         | problem is the most classical example.
         | 
         | The problem is that existing quantum computers are not powerful
         | enough to solve the interesting problems, so researchers have
         | to invent semi-artificial problems to demonstrate "quantum
         | advantage" to keep the funding flowing.
         | 
         | There is a plethora of opportunities for LLMs to show their
         | worth. For example, finding interesting links between different
         | areas of research or being a proof assistant in a
         | math/programming formal verification system. There is a lot of
         | ongoing work in this area, but at the moment signal-to-noise
         | ratio of such tools is too low for them to be practical.
        
           | aleph_minus_one wrote:
           | > This is factually wrong. The most interesting problems
           | motivating the quantum computing research are hard to solve,
           | but easy to verify on classical computers.
           | 
           | You parent did not talk about quantum _computers_. I guess he
           | rather had predictions of novel quantum-field theories or
           | theories of quantum gravity in the back of his mind.
        
             | newpavlov wrote:
             | Then his comment makes even less sense.
        
           | bondarchuk wrote:
           | No, it is factually right, at least if Scott Aaronson is to
           | be believed:
           | 
           | > _Having said that, the biggest caveat to the "10^25 years"
           | result is one to which I fear Google drew insufficient
           | attention. Namely, for the exact same reason why (as far as
           | anyone knows) this quantum computation would take ~10^25
           | years for a classical computer to simulate, it would also
           | take ~10^25 years for a classical computer to directly verify
           | the quantum computer's results!! (For example, by computing
           | the "Linear Cross-Entropy" score of the outputs.) For this
           | reason, all validation of Google's new supremacy experiment
           | is indirect, based on extrapolations from smaller circuits,
           | ones for which a classical computer can feasibly check the
           | results. To be clear, I personally see no reason to doubt
           | those extrapolations. But for anyone who wonders why I've
           | been obsessing for years about the need to design efficiently
           | verifiable near-term quantum supremacy experiments: well,
           | this is why! We're now deeply into the unverifiable regime
           | that I warned about._
           | 
           | https://scottaaronson.blog/?p=8525
        
             | newpavlov wrote:
             | It's a property of the "semi-artificial" problem chosen by
             | Google. If anything, it means that we should heavily
             | discount this claim of "quantum advantage", especially in
             | the light of inherent probabilistic nature of quantum
             | computations.
             | 
             | Note that the OP wrote "you MUST compute something that is
             | impossible to do with a traditional computer". I
             | demonstrated a simple counter-example to this statement:
             | you CAN demonstrate forward progress by factorizing big
             | numbers, but the problem is that no one can do it despite
             | billions of investments.
        
               | bondarchuk wrote:
               | Apparently they can't, right now, as you admit. Anyway
               | this is turning into a stupid semantic argument, have a
               | nice day.
        
               | joshuaissac wrote:
               | If they can't, then is it really quantum supremacy?
               | 
               | They claimed it last time in 2019 with Sycamore, which
               | could perform in 200 seconds a calculation that Google
               | claimed would take a classical supercomputer 10,000
               | years.
               | 
               | That was debunked when a team of scientists replicated
               | the same thing on an ordinary computer in 15 hours with a
               | large number of GPUs. Scott Aaronson said that on a
               | supercomputer, the same technique would have solved the
               | problem in seconds.[1]
               | 
               | So if they now come up with another problem which they
               | say cannot even be verified by a classical computer and
               | uses it to claim quantum advantage, then it is right to
               | be suspicious of that claim.
               | 
               | 1. https://www.science.org/content/article/ordinary-
               | computers-c...
        
             | noqc wrote:
             | the unverifiable regime is a _great_ way to extract
             | funding.
        
           | derangedHorse wrote:
           | > This is factually wrong.
           | 
           | What's factually wrong about it? OP said "you must compute
           | something that is impossible to do with a traditional
           | computer" which is true, regardless of the output produced.
           | Verifying an output is very different from verifying the
           | proper execution of a program. The difference between testing
           | a program and seeing its code.
           | 
           | What is being computed is fundamentally different from
           | classical computers, therefore the verification methods of
           | proper adherence to instructions becomes increasingly
           | complex.
        
             | ajmurmann wrote:
             | They left out the key part which was incorrect and the
             | sentence right after "If you can't do it with a traditional
             | computer, it suddenly becomes difficult to verify
             | correctness"
             | 
             | The point stands that for actually interesting problems
             | verifying correctness of the results is trivial. I don't
             | know if "adherence to instructions" transudates at all to
             | quantum computing.
        
         | 0xfffafaCrash wrote:
         | I agree with the issue of "is the test dataset leaking into the
         | training dataset" being an issue with interpreting LLM
         | capabilities in novel contexts, but not sure I follow what you
         | mean on the quantum computing front.
         | 
         | My understanding is that many problems have solutions that are
         | easier to verify than to solve using classical computing. e.g.
         | prime factorization
        
           | LittleTimothy wrote:
           | Oh it's a totally different issue on the quantum side that
           | leads to the same issue with difficulty verifying. There, the
           | algorithms that Google for example is using today, aren't
           | like prime factorization, they're not easy to directly verify
           | with traditional computers, so as far as I'm aware they kind
           | of check the result for a suitably small run, and then do the
           | performance metrics on a large run that they _hope_ gave a
           | correct answer but aren 't able to directly verify.
        
       | intellix wrote:
       | I haven't checked in a while, but last I checked ChatGPT it
       | struggled on very basic things like: how many Fs are in this
       | word? Not sure if they've managed to fix that but since that I
       | had lost hope in getting it to do any sort of math
        
       | sylware wrote:
       | How to train an AI strapped to a formal solver.
        
       | puttycat wrote:
       | No: https://github.com/0xnurl/gpts-cant-count
        
         | sebzim4500 wrote:
         | I can't reliably multiply four digit numbers in my head either,
         | what's your point?
        
           | reshlo wrote:
           | Nobody said you have to do it in your head.
        
             | sebzim4500 wrote:
             | That's the equivalent to what we are asking the model to
             | do. If you give the model a calculator it will get 100%. If
             | you give it a pen and paper (e.g. let it show it's working)
             | then it will get near 100%.
        
               | reshlo wrote:
               | Citation needed.
        
       | rishicomplex wrote:
       | Who is the author?
        
         | williamstein wrote:
         | Kevin Buzzard
        
       | nebulous1 wrote:
       | There was a little more information in that reddit thread. Of the
       | three difficulty tiers, 25% are T1 (easiest) and 50% are T2. Of
       | the five public problems that the author looked at, two were T1
       | and two were T2. Glazer on reddit described T1 as
       | "IMO/undergraduate problems", but the article author says that
       | they don't consider them to be undergraduate problems. So the LLM
       | is _already_ doing what the author says they would be surprised
       | about.
       | 
       | Also Glazer seemed to regret calling T1 "IMO/undergraduate", and
       | not only because of the disparity between IMO and typical
       | undergraduate. He said that "We bump problems down a tier if we
       | feel the difficulty comes too heavily from applying a major
       | result, even in an advanced field, as a black box, since that
       | makes a problem vulnerable to naive attacks from models"
       | 
       | Also, all of the problems shows to Tao were T3
        
         | riku_iki wrote:
         | > So the LLM is already doing what the author says they would
         | be surprised about.
         | 
         | that's if you unconditionally believe in result without any
         | proofreading, confirmation, reproducability and even barely any
         | details (we are given only one slide).
        
         | joe_the_user wrote:
         | The reddit thread is ... interesting (direct link[1]). It seems
         | to be a debate among mathematicians some of whom do have access
         | to the secret set. But they're debating publicly and so
         | naturally avoiding any concrete examples that would give the
         | set away so wind-up with fuzzy-fiddly language for the
         | qualities of the problem tiers.
         | 
         | The "reality" of keeping this stuff secret 'cause someone would
         | train on it is itself bizarre and certainly shouldn't be above
         | questioning.
         | 
         | https://www.reddit.com/r/OpenAI/comments/1hiq4yv/comment/m30...
        
           | obastani wrote:
           | It's not about training directly on the test set, it's about
           | people discussing questions in the test set online (e.g., in
           | forums), and then this data is swept up into the training
           | set. That's what makes test set contamination so difficult to
           | avoid.
        
             | joe_the_user wrote:
             | Yes,
             | 
             | That is the "reality" - that because companies can train
             | their models on the whole Internet, companies will train
             | their (base) models on the entire Internet.
             | 
             | And in this situation, "having heard the problem" actually
             | serves as a barrier to understanding of these harder
             | problems since any variation of known problem will receive
             | a standard "half-assed guestimate".
             | 
             | And these companies "can't not" use these base models since
             | they're resigned to the "bitter lesson" (better the "bitter
             | lesson viewpoint" imo) that they need large scale
             | heuristics for the start of their process and only then can
             | they start symbolic/reasoning manipulations.
             | 
             | But hold-up! Why couldn't an organization freeze their
             | training set and their problems and release both to the
             | public? That would give us an idea where the research
             | stands. Ah, the answer comes out, 'cause they don't own the
             | training set and the result they want to train is a
             | commercial product that needs every drop of data to be the
             | best. As Yan LeCun has said, _this isn 't research, this is
             | product development_.
        
           | zifpanachr23 wrote:
           | Not having access to the dataset really makes the whole thing
           | seem incredibly shady. Totally valid questions you are
           | raising
        
       | seafoamteal wrote:
       | I don't have much to opine from an advanced maths perspective,
       | but I'd like to point out a couple examples of where ChatGPT made
       | basic errors in questions I asked it as an undergrad CS student.
       | 
       | 1. I asked it to show me the derivation of a formula for the
       | efficiency of Stop-and-Wait ARQ and it seemed to do it, but a day
       | later, I realised that in one of the steps, it just made a term
       | vanish to get to the next step. Obviously, I should have verified
       | more carefully, but when I asked it to spot the mistake in that
       | step, it did the same thing twice more with bs explanations of
       | how the term is absorbed.
       | 
       | 2. I asked it to provide me syllogisms that I could practice
       | proving. An overwhelming number of the syllogisms it gave me were
       | inconsistent and did not hold. This surprised me more because
       | syllogisms are about the most structured arguments you can find,
       | having been formalized centuries ago and discussed extensively
       | since then. In this case, asking it to walk step-by-step actually
       | fixed the issue.
       | 
       | Both of these were done on the free plan of ChatGPT, but I can
       | remember if it was 4o or 4.
        
         | voiper1 wrote:
         | The first question is always: which model? Which fortunately
         | you at least addressed: >free plan of ChatGPT, but I can
         | remember if it was 4o or 4.
         | 
         | Since chatgpt-4o, there has been o1-preview, and o1 (full) is
         | out. They just announced o3 got a 25% on frontiermath which is
         | what this article is a reaction to. So, any tests on 4o are at
         | least TWO (or three) AI releases with new capabilities.
        
       | Xcelerate wrote:
       | So here's what I'm perplexed about. There are statements in
       | Presburger arithmetic that take time doubly exponential (or
       | worse) in the size of the statement to reach via _any path_ of
       | the formal system whatsoever. These are arithmetic truths about
       | the natural numbers. Can these statements be reached faster in
       | ZFC? Possibly--it 's well-known that there exist shorter proofs
       | of true statements in more powerful consistent systems.
       | 
       | But the problem then is that one can suppose there are also true
       | short statements in ZFC which likewise require doubly exponential
       | time to reach via any path. Presburger Arithmetic is decidable
       | whereas ZFC is not, so these statements would require the
       | additional axioms of ZFC for shorter proofs, but I think it's
       | safe to assume such statements exist.
       | 
       | Now let's suppose an AI model can resolve the truth of these
       | short statements quickly. That means one of three things:
       | 
       | 1) The AI model can discover doubly exponential length proof
       | paths within the framework of ZFC.
       | 
       | 2) There are certain short statements in the formal language of
       | ZFC that the AI model cannot discover the truth of.
       | 
       | 3) The AI model operates outside of ZFC to find the truth of
       | statements in the framework of some other, potentially unknown
       | formal system (and for arithmetical statements, the system must
       | necessarily be sound).
       | 
       | How likely are each of these outcomes?
       | 
       | 1) is not possible within any coherent, human-scale timeframe.
       | 
       | 2) IMO is the most likely outcome, but then this means there are
       | some _really_ interesting things in mathematics that AI cannot
       | discover. Perhaps the same set of things that humans find
       | interesting. Once we have exhausted the theorems with short
       | proofs in ZFC, there will still be an infinite number of short
       | and interesting statements that we cannot resolve.
       | 
       | 3) This would be the most bizarre outcome of all. If AI operates
       | in a consistent way outside the framework of ZFC, then that would
       | be equivalent to solving the halting problem for certain
       | (infinite) sets of Turing machine configurations that ZFC cannot
       | solve. That in itself itself isn't too strange (e.g., it might
       | turn out that ZFC lacks an axiom necessary to prove something as
       | simple as the Collatz conjecture), but what would be strange is
       | that it could find these new formal systems _efficiently_. In
       | other words, it would have discovered an algorithmic way to
       | procure new axioms that lead to efficient proofs of true
       | arithmetic statements. One could also view that as an efficient
       | algorithm for computing BB(n), which obviously we think isn 't
       | possible. See Levin's papers on the feasibility of extending PA
       | in a way that leads to quickly discovering more of the halting
       | sequence.
        
         | aleph_minus_one wrote:
         | > There are statements in Presburger arithmetic that take time
         | doubly exponential (or worse) in the size of the statement to
         | reach via any path of the formal system whatsoever.
         | 
         | This is a correct statement about the _worst_ case runtime.
         | What is interesting for practical applications is whether such
         | statements are among those that you are practically interested
         | in.
        
           | Xcelerate wrote:
           | I would certainly think so. The statements mathematicians
           | seem to be interested in tend to be at a "higher level" than
           | simple but true statements like 2+3=5. And they necessarily
           | have a short description in the formal language of ZFC,
           | otherwise we couldn't write them down (e.g., Fermat's last
           | theorem).
           | 
           | If the truth of these higher level statements instantly
           | unlocks many other truths, then it makes sense to think of
           | them in the same way that knowing BB(5) allows one to
           | instantly classify any Turing machine configuration on the
           | computation graph of all n <= 5 state Turing machines (on
           | empty tape input) as halting/non-halting.
        
         | wbl wrote:
         | 2 is definitely true. 3 is much more interesting and likely
         | true but even saying it takes us into deep philosophical
         | waters.
         | 
         | If every true theorem had a proof in a computationally bounded
         | length the halting problem would be solvable. So the AI can't
         | find some of those proofs.
         | 
         | The reason I say 3 is deep is that ultimately our foundational
         | reasons to assume ZFC+the bits we need for logic come from
         | philosohical groundings and not everyone accepts the same ones.
         | Ultrafinitists and large cardinal theorists are both kinds of
         | people I've met.
        
           | Xcelerate wrote:
           | My understanding is that no model-dependent theorem of ZFC or
           | its extensions (e.g., ZFC+CH, ZFC+!CH) provides any insight
           | into the behavior of Turing machines. If our goal is to
           | invent an algorithm that finds better algorithms, then the
           | philosophical angle is irrelevant. For computational
           | purposes, we would only care about new axioms independent of
           | ZFC if they allow us to prove additional Turing machine
           | configurations as non-halting.
        
         | semolinapudding wrote:
         | ZFC is way worse than Presburger arithmetic -- since it is
         | undecidable, we know that the length of the minimal proof of a
         | statement cannot be bounded by a computable function of the
         | length of the statement.
         | 
         | This has little to do with the usefulness of LLMs for research-
         | level mathematics though. I do not think that anyone is hoping
         | to get a decision procedure out of it, but rather something
         | that would imitate human reasoning, which is heavily based on
         | analogies ("we want to solve this problem, which shares some
         | similarities with that other solved problem, can we apply the
         | same proof strategy? if not, can we generalise the strategy so
         | that it becomes applicable?").
        
       | bambax wrote:
       | > _As an academic mathematician who spent their entire life
       | collaborating openly on research problems and sharing my ideas
       | with other people, it frustrates me_ [that] _I am not even to
       | give you a coherent description of some basic facts about this
       | dataset, for example, its size. However there is a good reason
       | for the secrecy. Language models train on large databases of
       | knowledge, so you moment you make a database of maths questions
       | public, the language models will train on it._
       | 
       | Well, yes and no. This is only true because we are talking about
       | closed models from closed companies like so-called "OpenAI".
       | 
       | But if all models were truly open, then we could simply verify
       | what they had been trained on, and make experiments with models
       | that we could be sure had never seen the dataset.
       | 
       | Decades ago Microsoft (in the words of Ballmer and Gates)
       | famously accused open source of being a "cancer" because of the
       | cascading nature of the GPL.
       | 
       | But it's the opposite. In software, and in knowledge in general,
       | the true disease is secrecy.
        
         | ludwik wrote:
         | > But if all models were truly open, then we could simply
         | verify what they had been trained on
         | 
         | How do you verify what a particular open model was trained on
         | if you haven't trained it yourself? Typically, for open models,
         | you only get the architecture and the trained weights. How can
         | you reliably verify what the model was trained on from this?
         | 
         | Even if they provide the training set (which is not typically
         | the case), you still have to take their word for it--that's not
         | really "verification."
        
           | asadotzler wrote:
           | The OP said "truly open" not "open model" or any of the other
           | BS out there. If you are truly open you share the training
           | corpora as well or at least a comprehensive description of
           | what it is and where to get it.
        
             | ludwik wrote:
             | It seems like you skipped the second paragraph of my
             | comment?
        
           | bambax wrote:
           | If they provide the training set it's reproducible and
           | therefore verifiable.
           | 
           | If not, it's not really "open", it's bs-open.
        
       | 4ad wrote:
       | > FrontierMath is a secret dataset of "hundreds" of hard maths
       | questions, curated by Epoch AI, and announced last month.
       | 
       | The database stopped being secret when it was fed to proprietary
       | LLMs running in the cloud. If anyone is not thinking that OpenAI
       | has trained and tuned O3 on the "secret" problems people fed to
       | GPT-4o, I have a bridge to sell you.
        
         | fn-mote wrote:
         | This level of conspiracy thinking requires evidence to be
         | useful.
         | 
         | Edit: I do see from your profile that you are a real person
         | though, so I say this with more respect.
        
           | dns_snek wrote:
           | What evidence do we need that AI companies are exploiting
           | every bit of information they can use to get ahead in the
           | benchmarks to generate more hype? Ignoring terms/agreements,
           | violating copyright, and otherwise exploiting information for
           | personal gain is the foundation of that entire industry for
           | crying out loud.
        
       | ashoeafoot wrote:
       | Ai has a interior world model thus it can do math if a chain of
       | proof is walking without uncertainty from room to room. the
       | problem is its inability to reflect on its own uncertainty and to
       | then overrife that uncertainty ,should a new room entrance method
       | be selfsimilar to a previous entrance
        
       | voidhorse wrote:
       | Eventually we may produce a collection of problems exhaustive
       | enough that these tools can solve almost any problem that isn't
       | novel in practice, but I doubt that they will ever become general
       | problem solvers capable of what we consider to be reasoning in
       | humans.
       | 
       | Historically, the claim that neural nets were actual models of
       | the human brain and human thinking was always epistemically
       | dubious. It still is. Even as the _practical_ problems of
       | producing better and better algorithms, architectures, and output
       | have been solved, there is no reason to believe a connection
       | between the mechanical model and what happens in organisms has
       | been established. The most important point, in my view, is that
       | all of the representation and interpretation still has to happen
       | outside the computational units. Without human interpreters, none
       | of the AI outputs have any meaning. Unless you believe in
       | determinism and an overseeing god, the story for human beings is
       | much different. AI will not be capable of reason until, like
       | humans, it can develop socio-rational collectivities of meaning
       | that are _independent_ of the human being.
       | 
       | Researchers seemed to have a decent grasp on this in the 90s, but
       | today, everyone seems all too ready to make the same ridiculous
       | leaps as the original creators of neural nets. They did not show,
       | as they claimed, that thinking is reducible to computation. All
       | they showed was that a neural net can realize a _boolean
       | function_ --which is not even logic, since, again, the entire
       | semantic interpretive side of the logic is ignored.
        
         | nmca wrote:
         | Can you define what you mean by novel here?
        
         | red75prime wrote:
         | > there is no reason to believe a connection between the
         | mechanical model and what happens in organisms has been
         | established
         | 
         | The universal approximation theorem. And that's basically it.
         | The rest is empirical.
         | 
         | No matter which physical processes happen inside the human
         | brain, a sufficiently large neural network can approximate
         | them. Barring unknowns like super-Turing computational
         | processes in the brain.
        
           | lupire wrote:
           | That's not useful by itself, because "anything cam model
           | anything else" doesn't put any upper bound on emulation cost,
           | which for one small task could be larger than the total
           | energy available in the entire Universem
        
             | pixl97 wrote:
             | I mean, that is why they mention super-Turning processes
             | like quantum based computing.
        
               | dinosaurdynasty wrote:
               | Quantum computing actually isn't super-Turing, it "just"
               | computes some things faster. (Strictly speaking it's
               | somewhere between a standard Turing machine and a
               | nondeterministic Turing machine in speed, and the first
               | can emulate the second.)
        
             | red75prime wrote:
             | Either the brain violates the physical Church-Turing thesis
             | or it's not.
             | 
             | If it does, well, it will take more time to incorporate
             | those physical mechanisms into computers to get them on par
             | with the brain.
             | 
             | I leave the possibility that it's "magic"[1] aside. It's
             | just impossible to predict, because it will violate
             | everything we know about our physical world.
             | 
             | [1] One example of "magic": we live in a simulation and the
             | brain is not fully simulated by the physics engine, but
             | creators of the simulation for some reason gave it access
             | to computational resources that are impossible to harness
             | using the standard physics of the simulated world. Another
             | example: interactionistic soul.
        
           | exprofmaddy wrote:
           | The universal approximation theorem is set in a precise
           | mathematical context; I encourage you to limit its
           | applicability to that context despite the marketing label
           | "universal" (which it isn't). Consider your concession about
           | empiricism. There's no empirical way to prove (i.e. there's
           | no experiment that can demonstrate beyond doubt) that all
           | brain or other organic processes are deterministic and can be
           | represented completely as functions.
        
             | red75prime wrote:
             | Function is the most general way of describing relations.
             | Non-deterministic processes can be represented as functions
             | with a probability distribution codomain. Physics seems to
             | require only continuous functions.
             | 
             | Sorry, but there's not much evidence that can support human
             | exceptionalism.
        
               | exprofmaddy wrote:
               | Some differential equations that model physics admit
               | singularities and multiple solutions. Therefore,
               | functions are not the most general way of describing
               | relations. Functions are a subset of relations.
               | 
               | Although "non-deterministic" and "stochastic" are often
               | used interchangeably, they are not equivalent.
               | Probability is applied analysis whose objects are
               | distributions. Analysis is a form of deductive, i.e.
               | mechanical, reasoning. Therefore, it's more accurate
               | (philosophically) to identify mathematical probability
               | with determinism. Probability is a model for our
               | experience. That doesn't mean our experience is truly
               | probabilistic.
               | 
               | Humans aren't exceptional. Math modeling and reasoning
               | are human activities.
        
         | tananan wrote:
         | > Unless you believe in determinism and an overseeing god
         | 
         | Or perhaps, determinism and mechanistic materialism - which in
         | STEM-adjacent circles has a relatively prevalent adherence.
         | 
         | Worldviews which strip a human being of agency in the sense you
         | invoke crop up quite a lot today in such spaces. If you start
         | of adopting a view like this, you have a deflationary sword
         | which can cut down most any notion that's not mechanistic in
         | terms of mechanistic parts. "Meaning? Well that's just an
         | emergent phenomenon of the influence of such and such causal
         | factors in the unrolling of a deterministic physical system."
         | 
         | Similar for reasoning, etc.
         | 
         | Now obviously large swathes of people don't really subscribe to
         | this - but it is prevalent and ties in well with utopian
         | progress stories. If something is amenable to mechanistic
         | dissection, possibly it's amenable to mechanistic control. And
         | that's what our education is really good at teaching us. So
         | such stories end up having intoxicating "hype" effects and
         | drive fundraising, and so we get where we are.
         | 
         | For one, I wish people were just excited about making computers
         | do things they couldn't do before, without needing to dress it
         | up as something more than it is. "This model can prove a set of
         | theorems in this format with such and such limits and
         | efficiency"
        
           | exprofmaddy wrote:
           | Agreed. If someone believes the world is purely mechanistic,
           | then it follows that a sufficiently large computing machine
           | can model the world---like Leibniz's Ratiocinator. The
           | intoxication may stem from the potential for predictability
           | and control.
           | 
           | The irony is: why would someone want control if they don't
           | have true choice? Unfortunately, such a question rarely
           | pierces the intoxicated mind when this mind is preoccupied
           | with pass the class, get an A, get a job, buy a house, raise
           | funds, sell the product, win clients, gain status, eat right,
           | exercise, check insta, watch the game, binge the show, post
           | on Reddit, etc.
        
             | Quekid5 wrote:
             | > If someone believes the world is purely mechanistic, then
             | it follows that a sufficiently large computing machine can
             | model the world
             | 
             | Is this controversial in some way? The problem is that to
             | simulate a universe you need a bigger universe -- which
             | doesn't exist (or is certainly out of reach due to
             | information theoretical limits)
             | 
             | > ---like Leibniz's Ratiocinator. The intoxication may stem
             | from the potential for predictability and control.
             | 
             | I really don't understand the 'control' angle here. It
             | seems pretty obvious that even in a purely mechanistic view
             | of the universe, information theory forbids using the
             | universe to simulate itself. Limited simulations, sure...
             | but that leaves lots of gaps wherein you lose determinism
             | (and control, whatever that means).
        
               | tananan wrote:
               | > Is this controversial in some way?
               | 
               | It's not "controversial", it's just not a given that the
               | universe is to be thought a deterministic machine. Not to
               | everyone, at least.
        
             | HDThoreaun wrote:
             | Choice is over rated. This gets to an issue Ive long had
             | with Nozicks experience machine. Not only would I happily
             | spend my days in such a machine, Im pretty sure most other
             | people would too. Maybe they say they wouldnt but if you
             | let them try it out and then offered them the question
             | again I think theyd say yes. The real conclusion of the
             | experience machine is that the unknown is scary.
        
         | gmadsen wrote:
         | I hear these arguments a lot from law and philosophy students,
         | never from those trained in mathematics. It seems to me,
         | "literary" people will still be discussing these theoretical
         | hypotheticals as technology passes them by building it.
        
           | voidhorse wrote:
           | I straddle both worlds. Consider that using the lens of
           | mathematical reasoning to understand everything is a bit like
           | trying to use a single mathematical theory (eg that of
           | groups) to comprehend mathematics as a whole. You will almost
           | always benefit and enrich your own understanding by daring to
           | incorporate outside perspectives.
           | 
           | Consider also that even as digital technology and the
           | ratiomathimatical understanding of the world has advanced it
           | is still rife with dynamics and problems that require a
           | humanistic approach. In particular, a mathematical conception
           | cannot resolve _teleological_ problems which require the
           | establishment of consensus and the actual determination of
           | what we, as a species, want the world to look like. Climate
           | change and general economic imbalance are already evidence of
           | the kind of disasters that mount when you limit yourself to a
           | reductionistic, overly mathematical and technological
           | understanding of life and existence. Being is not a solely
           | technical problem.
        
             | gmadsen wrote:
             | I don't disagree, I just don't think it is done well or at
             | least as seriously as it used to. In modern philosophy,
             | there are many mathematically specious arguments, that just
             | make clear how large the mathematical gap has become e.g.
             | improper application of Godel's incompleteness theorems.
             | Yet Godel was a philosopher himself, who would disagree
             | with its current hand-wavy usage.
             | 
             | 19th/20th was a golden era of philosophy with a coherent
             | and rigorous mathematical lens to apply with other lenses.
             | Russel, Turing, Godel, etc. However this just doesn't exist
             | anymore
        
         | exprofmaddy wrote:
         | I'm with you. Interpreting a problem as a problem requires a
         | human (1) to recognize the problem and (2) to convince other
         | humans that it's a problem worth solving. Both involve value,
         | and value has no computational or mechanistic description
         | (other than "given" or "illusion"). Once humans have identified
         | a problem, they might employ a tool to find the solution. The
         | tool has no sense that the problem is important or even hard;
         | such values are imposed by the tool's users.
         | 
         | It's worth considering why "everyone seems all too ready to
         | make ... leaps ..." "Neural", "intelligence", "learning", and
         | others are metaphors that have performed very well as marketing
         | slogans. Behind the marketing slogans are deep-pocketed,
         | platformed corporate and government (i.e. socio-rational
         | collective) interests. Educational institutions (another socio-
         | rational collective) and their leaders have on the whole
         | postured as trainers and preparers for the "real world" (i.e. a
         | job), which means they accept, support, and promote the
         | corporate narratives about techno-utopia. Which institutions
         | are left to check the narratives? Who has time to ask questions
         | given the need to learn all the technobabble (by paying
         | hundreds of thousands for 120 university credits) to become a
         | competitive job candidate?
         | 
         | I've found there are many voices speaking against the hype---
         | indeed, even (rightly) questioning the epistemic underpinnings
         | of AI. But they're ignored and out-shouted by tech marketing,
         | fundraising politicians, and engagement-driven media.
        
       | alphan0n wrote:
       | As far as ChatGPT goes, you may as well be asking: Can AI use a
       | calculator?
       | 
       | The answer is yes, it can utilize a stateful python environment
       | and solve complex mathematical equations with ease.
        
         | lcnPylGDnU4H9OF wrote:
         | There is a difference between correctly _stating_ that 2 + 2 =
         | 4 within a set of logical rules and _proving_ that 2 + 2 = 4
         | must be true given the rules.
        
           | alphan0n wrote:
           | I think you misunderstood, ChatGPT can utilize Python to
           | solve a mathematical equation and provide proof.
           | 
           | https://chatgpt.com/share/676980cb-d77c-8011-b469-4853647f98.
           | ..
           | 
           | More advanced solutions:
           | 
           | https://chatgpt.com/share/6769895d-7ef8-8011-8171-6e84f33103.
           | ..
        
         | cruffle_duffle wrote:
         | It still has to know what to code in that environment. And
         | based on my years of math as a wee little undergrad, the actual
         | arithmetic was the least interesting part. LLM's are horrible
         | at basic arithmetic, but they can use python for the
         | calculator. But python wont help them write the correct
         | equations or even solve for the right thing (wolfram alpha can
         | do a bit of that though)
        
           | alphan0n wrote:
           | You'll have to show me what you mean.
           | 
           | I've yet to encounter an equation that 4o couldn't answer in
           | 1-2 prompts unless it timed out. Even then it can provide the
           | solution in a Jupyter notebook that can be run locally.
        
             | cruffle_duffle wrote:
             | Never really pushed it. I have to reason to believe it
             | wouldn't get most of that stuff correctly. Math is very
             | much like programming and I'm sure it can output really
             | good python for its notebook to use execute.
        
       | upghost wrote:
       | I didn't see anyone else ask this but.. isn't the FrontierMath
       | dataset compromised now? At the very least OpenAI now knows the
       | questions if not the answers. I would expect that the next
       | iteration will "magically" get over 80% on the FrontierMath test.
       | I imagine that experiment was pretty closely monitored.
        
         | jvanderbot wrote:
         | I figured their model was independently evaluated against the
         | questions/answers. That's not to say it's not compromised by
         | "Here's a bag of money" type methods, but I don't even think
         | it'd be a reasonable test if they just handed over the dataset.
        
           | upghost wrote:
           | I'm sure it was independently evaluated, but I'm sure the
           | folks running the test were not given an on-prem installation
           | of ChatGPT to mess with. It was still done via API calls,
           | presumably through the chat interface UI.
           | 
           | That means the questions went over the fence to OpenAI.
           | 
           | I'm quite certain they are aware of that, and it would be
           | pretty foolish not to take advantage of at least knowing what
           | the questions are.
        
             | jvanderbot wrote:
             | Now that you put it that way, it is laughably easy.
        
         | optimalsolver wrote:
         | This was my first thought when I saw the results:
         | 
         | https://news.ycombinator.com/item?id=42473470
        
           | upghost wrote:
           | Insightful comment. The thing that's extremely frustrating is
           | look at all the energy poured into this conversation around
           | benchmarks. There is a fundamental assumption of honesty and
           | integrity in the benchmarking process by at least some
           | people. But when the dataset is compromised and generation
           | N+1 has miraculous performance gains, how can we see this as
           | anything other than a ploy to pump up valuations? Some people
           | have millions of dollars at stake here and they don't care
           | about the naysayers in the peanut gallery like us.
        
             | optimalsolver wrote:
             | It's sadly inevitable that when billions in funding and
             | industry hype are tied to performance on a handful of
             | benchmarks, scores will somehow, magically, continue to go
             | up.
             | 
             | Needless to say, it doesn't bring us any closer to AGI.
             | 
             | The only solution I see here is people crafting their own,
             | private benchmarks that the big players don't care about
             | enough to train on. That, at least, gives you a clearer
             | view of the field.
        
               | upghost wrote:
               | Not sure why your comment was downvoted, but it certainly
               | shows the pressure going against people who point out
               | fundamental flaws. This is pushing us towards "AVI"
               | rather than AGI-- "Artificially Valued Intelligence". The
               | optimization function here is around the market.
               | 
               | I'm being completely serious. You are correct, despite
               | the downvotes, that this could not be pushing us towards
               | AGI because if the dataset is leaked you can't claim the
               | G-- generalizability.
               | 
               | The point of the benchmark is to lead is to believe that
               | this is a substantial breakthrough. But a reasonable
               | person would be forced to conclude that the results are
               | misleading to due to optimizing around the training data.
        
       | sincerecook wrote:
       | No it can't, and there's no such thing as AI. How is a thing that
       | predicts the next-most-likely word going to do novel math? It
       | can't even do existing math reliably because logical operations
       | and statistical approximation are fundamentally different. It is
       | fun watching grifters put lipstick on this thing and shop it
       | around as a magic pig though.
        
       | retrocryptid wrote:
       | When did we decide that AI == LLM? Oh don't answer. I know, The
       | VC world noticed CNNs and LLMs about 10 years ago and it's the
       | only thing anyone's talked about ever since.
       | 
       | Seems to me the answer to 'Can AI do maths yet?' depends on what
       | you call AI and what you call maths. Our old departmental VAX
       | running at a handfull of megahertz could do some very clever
       | symbol manipulation on binomials and if you gave it a few
       | seconds, it could even do something like theorum proving via
       | proto-prolog. Neither are anywhere close to the glorious GAI
       | future we hope to sell to industry and government, but it seems
       | worth considering how they're different, why they worked, and
       | whether there's room for some hybrid approach. Do LLMs need to
       | know how to do math if they know how to write Prolog or Coc
       | statements that can do interesting things?
       | 
       | I've heard people say they want to build software that emulates
       | (simulates?) how humans do arithmetic, but ask a human to add
       | anything bigger than two digit numbers and the first thing they
       | do is reach for a calculator.
        
       | ivan_ah wrote:
       | Yesterday, I saw a thought provoking talk about the future of of
       | "math jobs" assuming automated theory proving becomes more
       | prevalent in the future.
       | 
       | [ (Re)imagining mathematics in a world of reasoning machines by
       | Akshay Venkatesh]
       | 
       | https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]
       | 
       | Abstract: In the coming decades, developments in automated
       | reasoning will likely transform the way that research mathematics
       | is conceptualized and carried out. I will discuss some ways we
       | might think about this. The talk will not be about current or
       | potential abilities of computers to do mathematics--rather I will
       | look at topics such as the history of automation and mathematics,
       | and related philosophical questions.
       | 
       | See discussion at https://news.ycombinator.com/item?id=42465907
        
       | swalsh wrote:
       | Every profession seems to have a pessimistic view of AI as soon
       | as it starts to make progress in their domain. Denial, Anger,
       | Bargaining, Depression, and Acceptance. Artists seem to be in the
       | depression state, many programmers are still in the denial phase.
       | Pretty solid denial here from a mathematician. o3 was a proof of
       | concept, like every other domain AI enters, it's going to keep
       | getting better.
       | 
       | Society is CLEARLY not ready for what AI's impact is going to be.
       | We've been through change before, but never at this scale and
       | speed. I think Musk/Vivek's DOGE thing is important, our
       | governent has gotten quite large and bureaucratic. But the clock
       | has started on AI, and this is a social structural issue we've
       | gotta figure out. Putting it off means we probably become
       | subjects to a default set of rulers if not the shoggoth itself.
        
         | haolez wrote:
         | I think it's a little of both. Maybe generative AI algorithms
         | won't overcome their initial limitations. But maybe we don't
         | need to overcome them to transform society in a very
         | significant way.
        
         | WanderPanda wrote:
         | Or is it just white collar workers experiencing what blue
         | collar workers have been experiencing for decades?
        
           | esafak wrote:
           | So will that make society shift to the left in demand
           | stronger of safety nets, or to the right in search of a
           | strongman to rescue them?
        
         | mensetmanusman wrote:
         | The reason why this is so disruptive is because it will effect
         | hundreds of fields simultaneously.
         | 
         | Previously workers in a field disrupted by automation would
         | retrain to a different part of the economy.
         | 
         | If AI pans out to the point that there are mass layoffs in
         | hundreds of sectors of the economy at once, then i'm not sure
         | the process we have haphazardly set up now will work. People
         | will have no idea where to go beyond manual labor. (But this
         | will be difficult due to the obesity crisis - but maybe it will
         | save lives in a weird way).
        
           | hash872 wrote:
           | If there are 'mass layoffs in hundreds of sectors of the
           | economy at once', then the economy immediately goes into
           | Great Depression 2.0 or worse. Consumer spending is two-
           | thirds of the US economy, when everyone loses their jobs and
           | stops having disposable income that's literally what a
           | depression is
        
             | mensetmanusman wrote:
             | This will create a prisoner's dilemma for corporations
             | then, the government will have to step in to provide
             | incentives for insanely profitable corporations to keep the
             | proper number of people employed or limit the rate of
             | layoffs.
        
       | jebarker wrote:
       | > I am dreading the inevitable onslaught in a year or two of
       | language model "proofs" of the Riemann hypothesis which will just
       | contain claims which are vague or inaccurate in the middle of 10
       | pages of correct mathematics which the human will have to wade
       | through to find the line which doesn't hold up.
       | 
       | I wonder what the response of working mathematicians will be to
       | this. If the proofs look credible it might be too tempting to try
       | and validate them, but if there's a deluge that could be a hug
       | time sync. Imagine if Wiles or Perelman had produced a thousand
       | different proofs for their respective problems.
        
         | bqmjjx0kac wrote:
         | Maybe the coming onslaught of AI slop "proofs" will give a
         | little bump to proof assistants like Coq. Of course, it would
         | still take a human mathematician some time to verify theorem
         | definitions.
        
         | Hizonner wrote:
         | Don't waste time on looking at it unless a formal proof checker
         | can verify it.
        
       | yodsanklai wrote:
       | I understand the appeal of having a machine helping us with maths
       | and expanding the frontier of knowledge. They can assist
       | researchers and make them more productive. Just like they can
       | make already programmers more productive.
       | 
       | But maths are also fun and fulfilling activity. Very often, when
       | we learn a math theory, it's because we want to understand and
       | gain intuition on the concepts, or we want to solve a puzzle (for
       | which we can already look up the solution). Maybe it's similar to
       | chess. We didn't develop search engines to replace human players
       | and make them play together, but they helped us become better
       | chess players or understanding the game better.
       | 
       | So the recent progress is impressive, but I still don't see how
       | we'll use this tech practically and what impacts it can have and
       | in which fields.
        
       | vouaobrasil wrote:
       | My favourite moments of being a graduate student in math was
       | showing my friends (and sometimes professors) proofs of
       | propositions and theorems that we discussed together. To be the
       | first to put together a coherent piece of reasoning that would
       | convince them of the truth was immensely exciting. Those were
       | great bonding moments amongst colleagues. The very fact that we
       | needed each other to figure out the basics of the subject was
       | part of what made the journey so great.
       | 
       | Now, all of that will be done by AI.
       | 
       | Reminds of the time when I finally enabled invincibility in
       | Goldeneye 007. Rather boring.
       | 
       | I think we've stopped to appreciate the human struggle and
       | experience and have placed all the value on the end product, and
       | that's we're developing AI so much.
       | 
       | Yeah, there is the possibility of working with an AI but at that
       | point, what is the point? Seems rather pointless to me in an art
       | like mathematics.
        
       | busyant wrote:
       | As someone who has a 18 yo son who wants to study math, this has
       | me (and him) ... worried ... about becoming obsolete?
       | 
       | But I'm wondering what other people think of this analogy.
       | 
       | I used to be a bench scientist (molecular genetics).
       | 
       | There were world class researchers who were more creative than I
       | was. I even had a Nobel Laureate once tell me that my research
       | was simply "dotting 'i's and crossing 't's".
       | 
       | Nevertheless, I still moved the field forward in my own small
       | ways. I still did respectable work.
       | 
       | So, will these LLMs make us _completely_ obsolete? Or will there
       | still be room for those of us who can dot the  "i"?--if only for
       | the fact that LLMs don't have infinite time/resources to solve
       | "everything."
       | 
       | I don't know. Maybe I'm whistling past the graveyard.
        
         | deepsun wrote:
         | By the way, don't trust Nobel laureates or even winners. E.g.
         | Linus Pauling was talking absolute garbage, harmful and evil,
         | after winning the Nobel.
        
           | Radim wrote:
           | > _don 't trust Nobel laureates or even winners_
           | 
           | Nobel laureate and winner are the same thing.
           | 
           | > _Linus Pauling was talking absolute garbage, harmful and
           | evil, after winning the Nobel._
           | 
           | Can you be more specific, what garbage? And which Nobel prize
           | do you mean - Pauling got two, one for chemistry and one for
           | peace.
        
             | bongodongobob wrote:
             | Eugenics and vitamin C as a cure all.
        
               | lern_too_spel wrote:
               | If Pauling's eugenics policies were bad, then the laws
               | against incest that are currently on the books in many
               | states (which are also eugenics policies that use the
               | same mechanism) are also bad. There are different forms
               | of eugenics policies, and Pauling's proposal to restrict
               | the mating choices of people carrying certain recessive
               | genes so their children don't suffer is ethically
               | different from Hitler exterminating people with certain
               | genes and also ethically different from other governments
               | sterilizing people with certain genes. He later supported
               | voluntary abortion with genetic testing, which is now
               | standard practice in the US today, though no longer in a
               | few states with ethically questionable laws restricting
               | abortion. This again is ethically different from forced
               | abortion.
               | 
               | https://scarc.library.oregonstate.edu/coll/pauling/blood/
               | nar...
        
             | deepsun wrote:
             | Thank you, my bad.
             | 
             | I was referring to Linus's harmful and evil promotion of
             | Vitamin C as the cure for everything and cancer. I don't
             | think Linus was attaching that garbage to any particular
             | Nobel prize. But people did say to their doctors: "Are you
             | a Nobel winner, doctor?". Don't think they cared about
             | particular prize either.
        
         | pfisherman wrote:
         | I used to do bench top work too; and was blessed with "the
         | golden hands" in that I could almost always get protocols
         | working. To me this always felt more like intuition than
         | deductive reasoning. And it made me a terrible TA. My advice to
         | students in lab was always something along the lines of "just
         | mess around with it, and see how it works." Not very helpful
         | for the stressed and struggling student -_-
         | 
         | Digression aside, my point is that I don't think we know
         | exactly what makes or defines "the golden hands". And if that
         | is the case, can we optimize for it?
         | 
         | Another point is that scalable fine tuning only works for
         | verifiable stuff. Think a priori knowledge. To me that seems to
         | be at the opposite end of the spectrum from "mess with it and
         | see what happens".
        
           | busyant wrote:
           | > blessed with "the golden hands" in that I could almost
           | always get protocols working.
           | 
           | Very funny. My friends and I never used the phrase "golden
           | hands" but we used to say something similar: "so-and-so has
           | 'great hands'".
           | 
           | But it meant the same thing.
           | 
           | I, myself, did not have great hands, but my comment was more
           | about the intellectual process of conducting research.
           | 
           | I guess my point was that:
           | 
           | * I've already dealt with more talented researchers, but I
           | still contributed meaningfully.
           | 
           | * Hopefully, the "AI" will simply add another layer of
           | talent, but the rest of us lesser mortals will still be able
           | to contribute.
           | 
           | But I don't know if I'm correct.
        
         | vouaobrasil wrote:
         | I was just thinking about this. I already posted a comment
         | here, but I will say that as a mathematician (PhD in number
         | theory), that for me, AI signficantly takes away the beauty of
         | doing mathematics within a realm in which AI is used.
         | 
         | The best part of math (again, just for me) is that it was a
         | journey that was done by hand with only the human intellect
         | that computers didn't understand. The beauty of the subject was
         | precisely that it was a journey of human intellect.
         | 
         | As I said elsewhere, my friends used to ask me why something
         | was true and it was fun to explain it to them, or ask them and
         | have them explain it to me. Now most will just use some AI.
         | 
         | Soulless, in my opinion. Pure mathematics should be about the
         | art of the thing, not producing results on an assembly line
         | like it will be with AI. Of course, the best mathematicians are
         | going into this because it helps their current careers, not
         | because it helps the future of the subject. Math done with AI
         | will be a lot like Olympic running done with performance-
         | enhancing drugs.
         | 
         | Yes, we will get a few more results, faster. But the results
         | will be entirely boring.
        
           | zmgsabst wrote:
           | Presumably people who get into math going forward will feel
           | differently.
           | 
           | For myself, chasing lemmas was always boring -- and there's
           | little interest in doing the busywork of fleshing out a
           | theory. For me, LLMs are a great way to do the fun parts
           | (conceptual architecture) without the boring parts.
           | 
           | And I expect we'll such much the same change as with physics:
           | computers increase the complexity of the objects we study,
           | which tend to be rather simple when done by hand -- eg,
           | people don't investigate patterns in the diagrams of
           | group(oids) because drawing million element diagrams isn't
           | tractable by hand. And you only notice the patterns in them
           | when you see examples of the diagrams at scale.
        
             | ndriscoll wrote:
             | Even current people will feel differently. I don't bemoan
             | the fact that Lean/Mathlib has `simp` and `linarith` to
             | automate trivial computations. A "copilot for Lean" that
             | can turn "by induction, X" or "evidently Y" into a formal
             | proof sounds great.
             | 
             | The the trick is teaching the thing how high powered of
             | theorems to use or how to factor out details or not
             | depending on the user's level of understanding. We'll have
             | to find a pedagogical balance (e.g. you don't give
             | `linarith` to someone practicing basic proofs), but I'm
             | sure it will be a great tool to aid human understanding.
             | 
             | A tool to help translate natural language to formal
             | propositions/types also sounds great, and could help more
             | people to use more formal methods, which could make for
             | more robust software.
        
             | vouaobrasil wrote:
             | Just a counterpoint, but I wonder how much you'll really
             | understand if you can't even prove the whole thing
             | yourself. Personally, I learn by proving but I guess
             | everyone is different.
        
           | hn3er1q wrote:
           | There are many similarities in your comment to how
           | grandmasters discuss engines. I have a hunch the arc of AI in
           | math will be very similar to the arc of engines in chess.
           | 
           | https://www.wired.com/story/defeated-chess-champ-garry-
           | kaspa...
        
             | vouaobrasil wrote:
             | I agree with that, in the sense that math will become more
             | about who can use AI the fastest to generate the most
             | theories, which sort of side-steps the whole point of math.
        
               | hn3er1q wrote:
               | As a chess aficionado and a former tournament player, who
               | didn't get very far, I can see pros & cons. They helped
               | me train and get significantly better than I would've
               | gotten without them. On the other hand, so did the
               | competition. :) The average level of the game is so much
               | higher than when I was a kid (30+ years ago) and new ways
               | of playing that were unthinkable before are possible now.
               | On the other hand cheating (online anyway) is rampant and
               | all the memorization required to begin to be competitive
               | can be daunting, and that sucks.
        
               | vouaobrasil wrote:
               | Hey I play chess too. Not a very good player though. But
               | to be honest, I enjoy playing with people who are not
               | serious because I do think an overabundance of knowledge
               | makes the game too mechanical. Just my personal
               | experience, but I think the risk of cheaters who use
               | programs and the overmechanization of chess is not worth
               | becoming a better player. (And in fact, I think MOST
               | people can gain satisfaction by improving just by
               | studying books and playing. But I do think that a few who
               | don't have access to opponents benefit from a chess-
               | playing computer).
        
         | nyrikki wrote:
         | What LLMs can do is limited, they are superior to wet-wear in
         | some tasks like finding and matching patterns in higher
         | dimensional space, they are still fundamentally limited to a
         | tiny class of problems outside of that pattern finding and
         | matching.
         | 
         | LLMs will be tools for some math needs and even if we ever get
         | quantum computers will be limited in what they can do.
         | 
         | LLMs, without pattern matching, can only do up to about integer
         | division, and while they can calculate parity, they can't use
         | it in their calculations.
         | 
         | There are several groups sitting on what are known limitations
         | of LLMs, waiting to take advantage of those who don't
         | understand the fundamental limitations, simplicity bias etc...
         | 
         | The hype will meet reality soon and we will figure out where
         | they work and where they are problematic over the next few
         | years.
         | 
         | But even the most celebrated achievements like proof finding
         | with Lean, heavily depends on smart people producing hints that
         | machines can use.
         | 
         | Basically lots of the fundamental hints of the limits of
         | computation still hold.
         | 
         | Model logic may be an accessable way to approach the limits of
         | statistical inference if you want to know one path yourself.
         | 
         | A lot of what is in this article relates to some the known
         | fundamental limitations.
         | 
         | Remember that for all the amazing progress, one of the core
         | founders of the perceptron, Pitts drank him self to death in
         | the 50s because it was shown that they were insufficient to
         | accurately model biological neurons.
         | 
         | Optimism is high, but reality will hit soon.
         | 
         | So think of it as new tools that will be available to your
         | child, not a replacement.
        
           | ComplexSystems wrote:
           | "LLMs, without pattern matching, can only do up to about
           | integer division, and while they can calculate parity, they
           | can't use it in their calculations." - what do you mean by
           | this? Counting the number of 1's in a bitstring and
           | determining if it's even or odd?
        
             | nyrikki wrote:
             | Yes, in this case PARITY is determining if the number of 1s
             | in a binary input is odd or even
             | 
             | It is an effect of the complex to unpack descriptive
             | complexity class DLOGTIME-uniform TC0, which has AND, OR
             | and MAJORITY gates.
             | 
             | http://arxiv.org/abs/2409.13629
             | 
             | The point being that the ability to use parity gates is
             | different than being able to calculate it, which is where
             | the union of the typically ram machine DLOGTIME with the
             | circuit complexity of uniform TC0 comes into play.
             | 
             | PARITY, MAJ, AND, and OR are all symmetric, and are in TCO,
             | but PARITY is not in DLOGTIME-uniform TC0, which is first-
             | order logic with Majority quantifiers.
             | 
             | Another path, if you think about symantic properties and
             | Rice's theorem, this may make sense especially as PAC
             | learning even depth 2 nets is equivalent to the approximate
             | SVP.
             | 
             | PAC-learning even depth-2 threshold circuits is NP-hard.
             | 
             | https://www.cs.utexas.edu/~klivans/crypto-hs.pdf
             | 
             | For me thinking about how ZFC was structured so we can keep
             | the niceties of the law of the excluded middle, and how
             | statistics pretty much depends on it for the central limit
             | and law of large numbers, IID etc...
             | 
             | But that path runs the risk of reliving the Brouwer-Hilbert
             | controversy.
        
         | TheRealPomax wrote:
         | What part do you think is going to become obsolete? Because
         | Math isn't about "working out the math", it's about finding the
         | relations between seemingly unrelated things to bust open a
         | problem. Short of AGI, there is no amount of neural net that's
         | going to realize that a seemingly impossible probabilistic
         | problem is actually equivalent to a projection of an easy to
         | work with 4D geometry. "Doing the math" is what we have
         | computers for, and the better they get, the easier the tedious
         | parts of the job become, but "doing math" is still very much a
         | human game.
        
           | busyant wrote:
           | > What part do you think is going to become obsolete?
           | 
           | Thank you for the question.
           | 
           | I guess what I'm saying is:
           | 
           | Will LLMs (or whatever comes after them) be _so_ good and
           | _so_ pervasive that we will simply be able to say, "Hey
           | ChatGPT-9000, I'd like to see if the xyz conjecture is
           | correct." And then ChatGPT-9000 just does the work without us
           | contributing beyond asking a question.
           | 
           | Or will the technology be limited/bound in some way such that
           | we will still be able to use ChatGPT-9000 as a tool of our
           | own intellectual augmentation and/or we could still
           | contribute to research even without it.
           | 
           | Hopefully, my comment clarifies my original post.
           | 
           | Also, writing this stuff has helped me think about it more. I
           | don't have any grand insight, but the more I write, the more
           | I lean toward the outcome that these machines will allow us
           | to augment our research.
        
         | hyhconito wrote:
         | Let's put it this way, from another mathematician, and I'm sure
         | I'll probably be shot for this one.
         | 
         | Every LLM release moves half of the remaining way to the
         | minimum viable goal of replacing a third class undergrad. If
         | your business or research initiative is fine with that level of
         | competence then you will find utility.
         | 
         | The problem is that I don't know anyone who would find that
         | useful. Nor does it fit within any existing working methodology
         | we have. And on top of that the verification of any output can
         | take considerably longer than just doing it yourself in the
         | first place, particularly where it goes off the rails, which it
         | does all the time. I mean it was 3 months ago I was arguing
         | with a model over it not understanding place-value systems
         | properly, something we teach 7 year olds here?
         | 
         | But the abstract problem is at a higher level. If it doesn't
         | become a general utility for people outside of mathematics,
         | which is very very evident at the moment by the poor overall
         | adoption and very public criticism of the poor result quality,
         | then the funding will dry up. Models cost lots of money to
         | train and if you don't have customers it's not happening and no
         | one is going to lend you the money any more. And then it's
         | moot.
        
           | binarymax wrote:
           | This is a great point that nobody will shoot you over :)
           | 
           | But the main question is still: assuming you replace an
           | undergrad with a model, who checks the work? If you have a
           | good process around that already, and find utility as an
           | augmented system, then get you'll get value - but I still
           | think it's better for the undergrad to still have the job and
           | be at the wheel, and does things faster and better when
           | leveraging a powerful tool.
        
             | hyhconito wrote:
             | Shot already for criticising the shiny thing (happened with
             | crypto and blockchain already...)
             | 
             | Well to be fair no one checks what the graduates do
             | properly, even if we hired KPMG in. That is until we get
             | sued. But at least we have someone to blame then. What we
             | don't want is something for the graduate to blame. The buck
             | stops at someone corporeal because that's what the
             | customers want and the regulators require.
             | 
             | That's the reality and it's not quite as shiny and happy as
             | the tech industry loves to promote itself.
             | 
             | My main point, probably cleared up with a simple point: no
             | one gives a shit about this either way.
        
         | peterbonney wrote:
         | If you looked at how the average accountant spent their time
         | before the arrival of the digital spreadsheet, you might have
         | predicted that automated calculation would make the profession
         | obsolete. But it didn't.
         | 
         | This time could be different, of course. But I'll need a lot
         | more evidence before I start telling people to base their major
         | life decisions on projected technological change.
         | 
         | That's before we even consider that only a very slim minority
         | of the people who study math (or physics or statistics or
         | biology or literature or...) go on to work in the field of math
         | (or physics or statistics or biology or literature or...). AI
         | could completely take over math research and still have next to
         | impact on the value of the skills one acquires from studying
         | math.
         | 
         | Or if you want to be more fatalistic about it: if AI is going
         | to put everyone out of work then it doesn't really matter what
         | you do now to prepare for it. Might as well follow your
         | interests in the meantime.
        
           | blagie wrote:
           | It's important to base life decisions on very real
           | technological change. We don't know what the change will be,
           | but it's coming. At the very least, that suggests more
           | diverse skills.
           | 
           | We're all usually (but not always) better off, with more
           | productivity, eventually, but in the meantime, jobs do
           | disappear. Robotics did not fully displace machinists and
           | factory workers, but single-skilled people in Detroit did not
           | do well. The loom, the steam engine... all of them displaced
           | often highly-trained often low-skilled artisans.
        
             | rafaelmn wrote:
             | If AI reaches this level socioeconomic impact is going to
             | be so immense, that choosing what subject you study will
             | have no impact on your outcome - no matter what it is - so
             | it's a pointless consideration.
        
       | jokoon wrote:
       | I wish scientists who do psychology and cognition of actual
       | brains could approach those AI things and talk about it, and
       | maybe make suggestions.
       | 
       | I really really wish AI would make some breakthrough and be
       | really useful, but I am so skeptical and negative about it.
        
         | joe_the_user wrote:
         | Unfortunately, the scientists who study actually brains have
         | all sort of interesting models but ultimately very little clue
         | _how_ these actual brains work at the level of problem solving.
         | I mean, there 's all sort of "this area is associated with that
         | kind of process" and "here's evidence this area does this
         | algorithm" stuff but it's all at the level you imagine steam
         | engine engineers trying to understand a warp drive.
         | 
         | The "open worm project" was an effort years ago to get computer
         | scientists involved in trying to understand what "software" a
         | very small actual brain could run. I believe progress here has
         | been very slow and that an idea of ignorance that much larger
         | brains involve.
         | 
         | https://en.wikipedia.org/wiki/OpenWorm
        
         | bongodongobob wrote:
         | If you can't find useful things for LLMs or AI at this point,
         | you must just lack imagination.
        
       | 0points wrote:
       | > How much longer this will go on for nobody knows, but there are
       | lots of people pouring lots of money into this game so it would
       | be a fool who bets on progress slowing down any time soon.
       | 
       | Money cannot solve the issues faced by the industry which mainly
       | revolves around lack of training data.
       | 
       | They already used the entirety of the internet, all available
       | video, audio and books and they are now dealing with the fact
       | that most content online is now generated by these models, thus
       | making it useless as training data.
        
       | charlieyu1 wrote:
       | One thing I know is that there wouldn't be machines entering IMO
       | 2025. The concept of "marker" does not exist in IMO - scores are
       | decided by negotiations between team leaders of each country and
       | the juries. It is important to get each team leader involved for
       | grading the work of students for their country, for
       | accountability as well as acknowledging cultural differences. And
       | the hundreds of people are not going to stay longer to grade AI
       | work.
        
       | witnesser2 wrote:
       | I was not refuted sufficiently a couple of years ago. I claimed
       | "training is open boundary" etc.
        
         | witnesser2 wrote:
         | Like as a few years ago, I just boringly add again "you need
         | modeling" to close it.
        
       ___________________________________________________________________
       (page generated 2024-12-23 23:00 UTC)