[HN Gopher] Can AI do maths yet? Thoughts from a mathematician
___________________________________________________________________
Can AI do maths yet? Thoughts from a mathematician
Author : mathgenius
Score : 372 points
Date : 2024-12-23 10:50 UTC (1 days ago)
(HTM) web link (xenaproject.wordpress.com)
(TXT) w3m dump (xenaproject.wordpress.com)
| noFaceDiscoG668 wrote:
| "once" the training data can do it, LLMs will be able to do it.
| and AI will be able to do math once it comes to check out the
| lights of our day and night. until then it'll probably wonder
| continuously and contiguously: "wtf! permanence! why?! how?! by
| my guts, it actually fucking works! why?! how?!"
| tossandthrow wrote:
| I do think it is time to start questioning whether the utility
| of ai solely can be reduced to the quality of the training
| data.
|
| This might be a dogma that needs to die.
| noFaceDiscoG668 wrote:
| I tried. I don't have the time to formulate and scrutinise
| adequate arguments, though.
|
| Do you? Anything anywhere you could point me to?
|
| The algorithms live entirely off the training data. They
| consistently fail to "abduct" (inference) beyond any
| language-in/of-the-training-specific information.
| jstanley wrote:
| The best way to predict the next word is to accurately
| model the underlying system that is being described.
| tossandthrow wrote:
| It is a gradual thing. Presumably the models are inferring
| things on runtime that was not a part of their training
| data.
|
| Anyhow, philosophically speaking you are also only exposed
| to what your senses pick up, but presumably you are able to
| infer things?
|
| As written: this is a dogma that stems from a limited
| understanding of what algorithmic processes are and the
| insistence that emergence can not happen from algorithmic
| systems.
| croes wrote:
| If not bad training data shouldn't be problem
| kergonath wrote:
| There can be more than one problem. The history of
| computing (or even just the history of AI) is full of
| things that worked better and better right until they hit a
| wall. We get diminishing returns adding more and more
| training data. It's really not hard to imagine a series of
| breakthroughs bringing us way ahead of LLMs.
| Flenkno wrote:
| AWS announced 2 or 3 weeks a way of formulating rules into a
| formal language.
|
| AI doesn't need to learn everything, our LLM Models already
| contain EVERYTHING. Including ways of how to find a solution
| step by step.
|
| Which means, you can tell an LLM to translate whatever you
| want, into a logical language and use an external logic
| verifier. The only thing a LLM or AI needs to 'understand' at
| this point is to make sure that the statistical translation
| from left to right is high enough.
|
| Your brain doesn't just do logic out of the box, You conclude
| things and formulate them.
|
| And plenty of companies work on this. Its the same with
| programming, if you are able to write code and execute it, you
| execute it until the compiler errors are gone. Now your LLM can
| write valid code out of the box. Let the LLM write unit tests,
| now it can verify itself.
|
| Claude for example offers you, out of the box, to write a
| validation script. You can give claude back the output of the
| script claude suggested to you.
|
| Don't underestimate LLMs
| TeamDman wrote:
| Is this the AWS thing you referenced?
| https://aws.amazon.com/what-is/automated-reasoning/
| casenmgreen wrote:
| I may be wrong, but I think it a silly question. AI is basically
| auto-complete. It can do math to the extent you can find a
| solution via auto-complete based on an existing corpus of text.
| Bootvis wrote:
| You're underestimating the emergent behaviour of these LLM's.
| See for example what Terrence Tao thinks about o1:
|
| https://mathstodon.xyz/@tao/113132502735585408
| WhyOhWhyQ wrote:
| I'm always just so pleased that the most famous mathematician
| alive today is also an extremely kind human being. That has
| often not been the case.
| roflc0ptic wrote:
| Pretty sure this is out of date now
| mdp2021 wrote:
| > _AI is basically_
|
| Very many things conventionally labelled in the 50's.
|
| You are speaking of LLMs.
| casenmgreen wrote:
| Yes - I mean only to say "AI" as the term is commonly used
| today.
| mdp2021 wrote:
| > _as the term is commonly used today_
|
| Which is, _wrongly_ : so, don't spread the bad notion and
| habit.
|
| Bad notion and habit which has a counter-helpful impact on
| debate.
| esafak wrote:
| Humans can autocomplete sentences too because we understand
| what's going on. Prediction is a necessary criterion for
| intelligence, not an irrelevant one.
| aithrowawaycomm wrote:
| I am fairly optimistic about LLMs as a human math -> theorem-
| prover translator, and as a fan of Idris I am glad that the AI
| community is investing in Lean. As the author shows, the answer
| to "Can AI be useful for automated mathematical work?" is clearly
| "yes."
|
| But I am confident the answer to the question in the headline is
| "no, not for several decades." It's not just the underwhelming
| benchmark results discussed in the post, or the general concern
| about hard undergraduate math using different skillsets than
| ordinary research math. IMO the deeper problem still seems to be
| a basic gap where LLMs can seemingly do formal math at the level
| of a smart graduate student but fail at quantitative/geometric
| reasoning problems designed for fish. I suspect this holds for
| O3, based on one of the ARC problems it wasn't able to solve:
| https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...
| (via https://www.interconnects.ai/p/openais-o3-the-2024-finale-
| of...) ANNs are simply not able to form abstractions, they can
| only imitate them via enormous amounts of data and compute. I
| would say there has been _zero_ progress on "common sense" math
| in computers since the invention of Lisp: we are still faking it
| with expert systems, even if LLM expert systems are easier to
| build at scale with raw data.
|
| It is the same old problem where an ANN can attain superhuman
| performance on level 1 of Breakout, but it has to be retrained
| for level 2. I am not convinced it makes sense to say AI can do
| math if AI doesn't understand what "four" means with the same
| depth as a rat, even if it can solve sophisticated modular
| arithmetic problems. In human terms, does it make sense to say a
| straightedge-and-compass AI understands Euclidean geometry if
| it's not capable of understanding the physical intuition behind
| Euclid's axioms? It makes more sense to say it's a brainless tool
| that helps with the tedium and drudgery of actually proving
| things in mathematics.
| asddubs wrote:
| it can take my math and point out a step I missed and then show
| me the correct procedure but still get the wrong result because
| it can't reliably multiply 2-digit numbers
| fifilura wrote:
| Better than an average human then.
| actionfromafar wrote:
| Different than an average human.
| watt wrote:
| it's a "language" model (LLM), not a "math" model. when it is
| generating your answer, predicting and outputing a word after
| word it is _not_ multiplying your numbers internally.
| asddubs wrote:
| Yes, I know. It's just kind of interesting how it can make
| inferences about complicated things but not get
| multiplications correct that would almost definitely have
| been in its training set many times (two digit by two
| digit)
| QuadmasterXLII wrote:
| To give a sense if scale: It's not that o3 failed to solve that
| red blue rectangle problem once: o3 spent thousands of gpu
| hours putting out text about that problem, creating by my math
| about a million pages of text, and did not find the answer
| anywhere in those pages. For other problems it did find the
| answer around the million page mark, as at the ~$3000 per
| problem spend setting the score was still slowly creeping up.
| josh-sematic wrote:
| If the trajectory of the past two years is any guide, things
| that can be done at great compute expense now will rapidly
| become possible for a fraction of the cost.
| asadotzler wrote:
| The trajectory is not a guide, unless you count the recent
| plateauing.
| aithrowawaycomm wrote:
| Just a comment: the example o1 got wrong was actually
| underspecified: https://anokas.substack.com/p/o3-and-arc-agi-
| the-unsolved-ta...
|
| Which is actually a problem I have with ARC (and IQ tests more
| generally): it is computationally cheaper to go from ARC
| transformation rule -> ARC problem than it is the other way
| around. But this means it's pretty easy to generate ARC
| problems with non-unique solutions.
| est wrote:
| At this stage I assume everything having a sequencial pattern can
| and will be automated by LLM AIs.
| Someone wrote:
| I think that's provably incorrect for the current approach to
| LLMs. They all have a horizon over which they correlate tokens
| in the input stream.
|
| So, for any LLM, if you intersperse more than that number of
| 'X' tokens between each useful token, they won't be able to do
| anything resembling intelligence.
|
| The current LLMs are a bit like n-gram databases that do not
| use letters, but larger units.
| red75prime wrote:
| The follow-up question is "Does it require a paradigm shift
| to solve it?". And the answer could be "No". Episodic memory,
| hierarchical learnable tokenization, online learning or
| whatever works well on GPUs.
| beng-nl wrote:
| It's that a bit of an unfair sabotage?
|
| Naturally, humans couldn't do it, even though they could edit
| the input to remove the X's, but shouldn't we evaluate the
| ability (even intelligent ability) of LLM's on what they can
| generally do rather than amplify their weakness?
| Someone wrote:
| Why is that unfair in reply to the claim _"At this stage I
| assume everything having a sequencial pattern can and will
| be automated by LLM AIs."_?
|
| I am not claiming LLMs aren't or cannot be intelligent, not
| even that they cannot do magical things; I just rebuked a
| statement about the lack of limits of LLMs.
|
| > Naturally, humans couldn't do it, even though they could
| edit the input to remove the X's
|
| So, what are you claiming: that they cannot or that they
| can? I think most people can and many would. Confronted
| with a file containing millions of X's, many humans will
| wonder whether there's something else than X's in the file,
| do a 'replace all', discover the question hidden in that
| sea of X's, and answer it.
|
| There even are simple files where most humans would easily
| spot things without having to think of removing those X's.
| Consider a file How X X X X X X
| many X X X X X X days X X X X X X
| are X X X X X X there X X X X X X
| in X X X X X X a X X X X X X
| week? X X X X X X
|
| with a million X's on the end of each line. Spotting the
| question in that is easy for humans, but impossible for the
| current bunch of LLMs
| int_19h wrote:
| If you have a million Xs on the end of each line, when a
| human is looking at that file, he's not looking at the
| entirety of it, but only at the part that is actually
| visible on-screen, so the equivalent task for an LLM
| would be to feed it the same subset as input. In which
| case they can all answer this question just fine.
| orbital-decay wrote:
| This is only easy because the software does line wrapping
| for you, mechanistically transforming the hard pattern of
| millions of symbols into another that happens to be easy
| for your visual system to match. Do the same for any
| visually capable model and it will get that easily too.
| Conversely, make that a single line (like the one
| transformers sees) and you will struggle much more than
| the transformer because you'll have to scan millions of
| symbols sequentially looking for patterns.
|
| Humans have weak attention compared to it, this is a poor
| example.
| palata wrote:
| At this stage I _hope_ everything that needs to be reliable won
| 't be automated by LLM AIs.
| ned99 wrote:
| I think this is a silly question, you could track AI's doing very
| simple maths back in 1960 - 1970's
| mdp2021 wrote:
| It's just the worrisome linguistic confusion between AI and
| LLMs.
| jampekka wrote:
| I just spent a few days trying to figure out some linear algebra
| with the help of ChatGPT. It's very useful for finding conceptual
| information from literature (which for a not-professional-
| mathematician at least can be really hard to find and decipher).
| But in the actual math it constantly makes very silly errors.
| E.g. indexing a vector beyond its dimension, trying to do matrix
| decomposition for scalars and insisting on multiplying matrices
| with mismatching dimensions.
|
| O1 is a lot better at spotting its errors than 4o but it too
| still makes a lot of really stupid mistakes. It seems to be quite
| far from producing results itself consistently without at least a
| somewhat clueful human doing hand-holding.
| glimshe wrote:
| Isn't Wolfram Alpha a better "ChatGPT of Math"?
| Filligree wrote:
| Wolfram Alpha is better at actually doing math, but far worse
| at explaining what it's doing, and why.
| dartos wrote:
| What's worse about it?
|
| It never tells you the wrong thing, at the very least.
| fn-mote wrote:
| Its understanding of problems was very bad last time I
| used it. Meaning it was difficult to communicate what you
| wanted it to do. Usually I try to write in the
| Mathematica language, but even that is not foolproof.
|
| Hopefully they have incorporated more modern LLM since
| then, but it hasn't been that long.
| jampekka wrote:
| Wolfram Alpha's "smartness" is often Clippy level
| enraging. E.g. it makes assumptions of symbols based on
| their names (e.g. a is assumed to be a constant,
| derivatives are taken w.r.t. x). Even with Mathematica
| syntax it tends to make such assumptions and refuses to
| lift them even when explicitly directed. Quite often one
| has to change the variable symbols used to try to make
| Alpha to do what's meant.
| jvanderbot wrote:
| When you give it a large math problem and the answer is
| "seven point one three five ... ", and it shows a plot of
| the result v some randomly selected domain, well there
| could be more I'd like to know.
|
| You can unlock a full derivation of the solution, for
| cases where you say "Solve" or "Simplify", but what I
| (and I suspect GP) might want, is to know why a few of
| the key steps might work.
|
| It's a fantastic tool that helped get me through my
| (engineering) grad work, but ultimately the breakthrough
| inequalities that helped me write some of my best stuff
| were out of a book I bought in desperation that basically
| cataloged linear algebra known inequalities and
| simplifications.
|
| When I try that kind of thing with the best LLM I can use
| (as of a few months ago, albeit), the results can get
| incorrect pretty quickly.
| kens wrote:
| What book was it that you found helpful?
| seattleeng wrote:
| Im reviewing linear algebra now and would also love to
| know that book!
| PeeMcGee wrote:
| > [...], but what I (and I suspect GP) might want, is to
| know why a few of the key steps might work.
|
| It's been some time since I've used the step-by-step
| explainer, and it was for calculus or intro physics
| problems at best, but IIRC the pro subscription will at
| least mention the method used to solve each step and link
| to reference materials (e.g., a clickable tag labeled
| "integration by parts"). Doesn't exactly explain _why_
| but does provide useful keywords in a sequence that can
| be used to derive the why.
| amelius wrote:
| I wish there was a way to tell Chatgpt where it has made a
| mistake, with a single mouse click.
| akoboldfrying wrote:
| What's surprising to me is that this would surely be in
| OpenAI's interests, too -- free RLHF!
|
| Of course there would be the risk of adversaries giving
| bogus feedback, but my gut says it's relatively
| straightforward to filter out most of this muck.
| a3w wrote:
| Is the explanation a pro feature? At the very end it says
| "step by step? Pay here"
| jampekka wrote:
| Wolfram Alpha is mostly for "trivia" type problems. Or giving
| solutions to equations.
|
| I was figuring out some mode decomposition methods such as
| ESPRIT and Prony and how to potentially extend/customize
| them. Wolfram Alpha doesn't seem to have a clue about such.
| lupire wrote:
| No. Wolfram Alpha can't solve anything that isn't a function
| evaluation or equation. And it can't do modular arithmetic to
| save its unlife.
|
| WolframOne/Mathematica is better, but that requires the user
| (or ChatGPT!)to write complicated code, not natural language
| queries.
| GuB-42 wrote:
| Wolfram Alpha can solve equations well, but it is terrible at
| understanding natural language.
|
| For example I asked Wolfram Alpha "How heavy a rocket has to
| be to launch 5 tons to LEO with a specific impulse of 400s",
| which is a straightforward application of the Tsiolkovsky
| rocket equation. Wolfram Alpha gave me some nonsense about
| particle physics (result: 95 MeV/c^2), GPT-4o did it right
| (result: 53.45 tons).
|
| Wolfram alpha knows about the Tsiolkovsky rocket equation, it
| knows about LEO (low earth orbit), but I found no way to get
| a delta-v out of it, again, more nonsense. It tells me about
| Delta airlines, mentions satellites that it knows are not in
| LEO. The "natural language" part is a joke. It is more like
| an advanced calculator, and for that, it is great.
| bongodongobob wrote:
| You're using it wrong, you can use natural language in your
| equation, but afaik it's not supposed to be able to do what
| you're asking of it.
| CamperBob2 wrote:
| You know, "You're using it wrong" is usually meant to
| carry an ironic or sarcastic tone, right?
|
| It dates back to Steve Jobs blaming an iPhone 4 user for
| "holding it wrong" rather than acknowledging a flawed
| antenna design that was causing dropped calls. The
| closest Apple ever came to admitting that it was their
| problem was when they subsequently ran an employment ad
| to hire a new antenna engineering lead. Maybe it's time
| for Wolfram to hire a new language-model lead.
| bongodongobob wrote:
| It's not an LLM. You're simply asking too much of it. It
| doesn't work the way you want it to, sorry.
| CamperBob2 wrote:
| Tell Wolfram. They're the ones who've been advertising it
| for years, well before LLMs were a thing, using English-
| language prompts like these examples:
| https://www.pcmag.com/news/23-cool-non-math-things-you-
| can-d...
|
| The problem has always been that you only get good
| answers if you happen to stumble on a specific question
| that it can handle. Combining Alpha with an LLM could
| actually be pretty awesome, but I'm sure it's easier said
| than done.
| Sharlin wrote:
| Before LLMs exploded nobody really _expected_ WA to
| perform well at natural language comprehension. The
| expectations were at the level of "an ELIZA that knows
| math".
| edflsafoiewq wrote:
| Correct, so it isn't a "ChatGPT of Math", which was the
| point.
| kortilla wrote:
| No, "holding it wrong" is the sarcastic version. "You're
| using it wrong" is a super common way to tell people they
| are literally using something wrong.
| CamperBob2 wrote:
| But they're not using it wrong. They are using it as
| advertised by Wolfram themselves (read: himself).
|
| The GP's rocket equation question is _exactly_ the sort
| of use case for which Alpha has been touted for years.
| spacemanspiff01 wrote:
| I wonder if these are tokenization issues? I really am curious
| about metas byte tokenization scheme...
| jampekka wrote:
| Probably mostly not. The errors tend to be
| logical/conceptual. E.g. mixing up scalars and matrices is
| unlikely to be from tokenization. Especially if using spaces
| between the variables and operators, as AFAIK GPTs don't form
| tokens over spaces (although tokens may start or end with
| them).
| lordnacho wrote:
| The only thing I've consistently had issues with while using AI
| is graphs. If I ask it to put some simple function, it produces
| a really weird image that has nothing to do with the graph I
| want. It will be a weird swirl of lines and words, and it never
| corrects itself no matter what I say to it.
|
| Has anyone had any luck with this? It seems like the only thing
| that it just can't do.
| KeplerBoy wrote:
| You're doing it wrong. It can't produce proper graphs with
| it's diffusion style image generation.
|
| Ask it to produce graphs with python and matplotlib. That
| will work.
| lanstin wrote:
| And works very well - it made me a nice general "draw
| successively accurate Fourier series approximations given
| this lambda for coefficients and this lambda for the
| constant term". PNG output, no real programming errors (I
| wouldn't remember if it had some stupid error, I'm a python
| programmer). Even TikZ in LaTeX isn't hopeless (although I
| did ending up reading the tikz manual)
| thomashop wrote:
| Ask it to plot the graph with python plotting utilities. Not
| using its image generator. I think you need a ChatGPT
| subscription though for it to be able to run python code.
| lupire wrote:
| You seem to get 2(?) free Python program runs per week(?)
| as part of the 01 preview.
|
| When you visit chatgpt on the free account it automatically
| gives you the best model and then disables it after some
| amount of work and says to come back later or upgrade.
| amelius wrote:
| Just install Python locally, and copy paste the code.
| xienze wrote:
| Shouldn't ChatGPT be smart enough to know to do this
| automatically, based on context?
| CamperBob2 wrote:
| It was, for a while. I think this is an area where there
| may have been some regression. It can still write code to
| solve problems that are a poor fit for the language
| model, but you may need to ask it to do that explicitly.
| HDThoreaun wrote:
| The agentic reasoning models should be able to fix this if
| they have the ability to run code instead of giving each task
| to itself. "I need to make a graph" "LLMs have difficulty
| graphing novel functions" "Call python instead" is a line of
| reasoning I would expect after seeing what O1 has come up
| with on other problems.
|
| Giving AI the ability to execute code is the safety peoples
| nightmare though, wonder if we'll hear anything from them as
| this is surely coming
| amelius wrote:
| Don't most mathematical papers contain at least one such error?
| aiono wrote:
| Where is this data from?
| amelius wrote:
| It's a question, and to be fair to AI it should actually
| refer to papers before review.
| monktastic1 wrote:
| Yes, it's a question, but you haven't answered what you
| read that makes you suspect so.
| mseri wrote:
| It reliably fails also basic real analysis proofs, but I think
| this is not too surprising since those require a mix of logic
| and computation that is likely hard to just infer from
| statistical likelihood of tokens
| cheald wrote:
| LLMs have been very useful for me in explorations of linear
| algebra, because I can have an idea and say "what's this
| operation called?" or "how do I go from this thing to that
| thing?", and it'll give me the mechanism and an explanation,
| and then I can go read actual human-written literature or
| documentation on the subject.
|
| It often gets the actual math wrong, but it is good enough at
| connecting the dots between my layman's intuition and the
| "right answer" that I can get myself over humps that I'd
| previously have been hopelessly stuck on.
|
| It does make those mistakes you're talking about very
| frequently, but once I'm told that the thing I'm trying to do
| is achievable with the Gram-Schmidt process, I can go self-
| educate on that further.
|
| The big thing I've had to watch out for is that it'll usually
| agree that my approach is a good or valid one, even when it
| turns out not to be. I've learned to ask my questions in the
| shape of "how do I", rather than "what if I..." or "is it a
| good idea to...", because most of the time it'll twist itself
| into shapes to affirm the direction I'm taking rather than
| challenging and refining it.
| lproven wrote:
| Betteridge's Law applies.
| LittleTimothy wrote:
| It's fascinating that this has run into the exact same problem as
| the Quantum research. Ie, in the quantum research to demonstrate
| any valuable forward progress you must compute something that is
| impossible to do with a traditional computer. If you can't do it
| with a traditional computer, it suddenly becomes difficult to
| verify correctness (ie, you can't just check it was matching the
| traditional computer's answer.
|
| In the same way ChatGPT scores 25% on this and the question is
| "How close were those 25% to questions in the training set". Or
| to put it another way we want to answer the question "Is ChatGPT
| getting better at applying it's reasoning to out-of-set problems
| or is it pulling more data into it's training set". Or "Is the
| test leaking into the training".
|
| Maybe the whole question is academic and it doesn't matter, we
| solve the entire problem by pulling all human knowledge into the
| training set and that's a massive benefit. But maybe it implies a
| limit to how far it can push human knowledge forward.
| lazide wrote:
| If constrained by existing human knowledge to come up with an
| answer, won't it fundamentally be unable to push human
| knowledge forward?
| actionfromafar wrote:
| Then much of human research and development is also
| fundamentally impossible.
| AnerealDew wrote:
| Only if you think current "AI" is on the same level as
| human creativity and intelligence, which it clearly is not.
| actionfromafar wrote:
| I think current "AI" (i.e. LLMs) is unable to push human
| knowledge forward, but not because it's constrained by
| existing human knowledge. It's more like peeking into a
| very large magic-8 ball, new answers everytime you shake
| it. Some useful.
| SJC_Hacker wrote:
| It may be able to push human knowledge forward to an
| extent.
|
| In the past, there was quite a bit of low hanging fruit
| such that you could have polymaths able to contribute to
| a wide variety of fields, such as Newton.
|
| But in the past 100 years or so, the problem is there is
| so much known, it is impossible for any single person to
| have deep knowledge of everything. e.g. its rare to find
| a really good mathematician who also has a deep knowledge
| (beyond intro courses) about say, chemistry.
|
| Would a sufficiently powerful AI / ML model be able to
| come up with this synthesis across fields?
| lupire wrote:
| That's not a strong reason. Yes, that means ChatGPT isn't
| good at wholly independently pushing knowledge forward,
| but a good brainstormer that is right even 10% of the
| time is an incredible found of knowledge.
| Havoc wrote:
| I don't think many expect AI to push knowledge forward? A
| thing that basically just regurgitates consensus historic
| knowledge seems badly suited to that
| calmoo wrote:
| But apparently these new frontier models can 'reason' - so
| with that logic, they should be able to generate new
| knowledge?
| tomjen3 wrote:
| O1 was able to find the math problem in a recently
| published paper, so yes.
| LittleTimothy wrote:
| Depends on your understanding of human knowledge I guess?
| People talk about the frontier of human knowledge and if your
| view of knowledge is like that of a unique human genius
| pushing forward the frontier then yes - it'd be stuck. But if
| you think of knowledge as more complex than that you could
| have areas that are kind of within our frontier of knowledge
| (that we could reasonably know, but don't actually know) -
| taking concepts that we already know in one field and
| applying them to some other field. Today the reason that
| doesn't happen is because genius A in physics doesn't know
| about the existence of genius B in mathematics (let alone
| understand their research), but if it's all imbibed by "The
| Model" then it's trivial to make that discovery.
| lazide wrote:
| I was referring specifically to the parent comments
| statements around current AI systems.
| wongarsu wrote:
| Reasoning is essentially the creation of new knowledge from
| existing knowledge. The better the model can reason the less
| constrained it is to existing knowledge.
|
| The challenge is how to figure out if a model is genuinely
| reasoning
| lupire wrote:
| Reasoning is a very minor (but essential) part of knowledge
| creation.
|
| Knowledge creation comes from collecting data from the real
| world, and cleaning it up somehow, and brainstorming
| creative models to explain it.
|
| NN/LLM's version of model building is frustrating because
| it is quite good, but not highly "explainable". Human
| models have higher explainability, while machine models
| have high predictive value on test examples due to an
| impenetrable mountain of algebra.
| dinosaurdynasty wrote:
| There are likely lots of connections that could be made that
| no individual has made because no individual has _all of
| existing human knowledge_ at their immediate disposal.
| eagerpace wrote:
| How much of this could be resolved if its training set were
| reduced? Conceivably, most of the training serves only to
| confuse the model when only aiming to solve a math equation.
| newpavlov wrote:
| >in the quantum research to demonstrate any valuable forward
| progress you must compute something that is impossible to do
| with a traditional computer
|
| This is factually wrong. The most interesting problems
| motivating the quantum computing research are hard to solve,
| but easy to verify on classical computers. The factorization
| problem is the most classical example.
|
| The problem is that existing quantum computers are not powerful
| enough to solve the interesting problems, so researchers have
| to invent semi-artificial problems to demonstrate "quantum
| advantage" to keep the funding flowing.
|
| There is a plethora of opportunities for LLMs to show their
| worth. For example, finding interesting links between different
| areas of research or being a proof assistant in a
| math/programming formal verification system. There is a lot of
| ongoing work in this area, but at the moment signal-to-noise
| ratio of such tools is too low for them to be practical.
| aleph_minus_one wrote:
| > This is factually wrong. The most interesting problems
| motivating the quantum computing research are hard to solve,
| but easy to verify on classical computers.
|
| You parent did not talk about quantum _computers_. I guess he
| rather had predictions of novel quantum-field theories or
| theories of quantum gravity in the back of his mind.
| newpavlov wrote:
| Then his comment makes even less sense.
| bondarchuk wrote:
| No, it is factually right, at least if Scott Aaronson is to
| be believed:
|
| > _Having said that, the biggest caveat to the "10^25 years"
| result is one to which I fear Google drew insufficient
| attention. Namely, for the exact same reason why (as far as
| anyone knows) this quantum computation would take ~10^25
| years for a classical computer to simulate, it would also
| take ~10^25 years for a classical computer to directly verify
| the quantum computer's results!! (For example, by computing
| the "Linear Cross-Entropy" score of the outputs.) For this
| reason, all validation of Google's new supremacy experiment
| is indirect, based on extrapolations from smaller circuits,
| ones for which a classical computer can feasibly check the
| results. To be clear, I personally see no reason to doubt
| those extrapolations. But for anyone who wonders why I've
| been obsessing for years about the need to design efficiently
| verifiable near-term quantum supremacy experiments: well,
| this is why! We're now deeply into the unverifiable regime
| that I warned about._
|
| https://scottaaronson.blog/?p=8525
| newpavlov wrote:
| It's a property of the "semi-artificial" problem chosen by
| Google. If anything, it means that we should heavily
| discount this claim of "quantum advantage", especially in
| the light of inherent probabilistic nature of quantum
| computations.
|
| Note that the OP wrote "you MUST compute something that is
| impossible to do with a traditional computer". I
| demonstrated a simple counter-example to this statement:
| you CAN demonstrate forward progress by factorizing big
| numbers, but the problem is that no one can do it despite
| billions of investments.
| bondarchuk wrote:
| Apparently they can't, right now, as you admit. Anyway
| this is turning into a stupid semantic argument, have a
| nice day.
| joshuaissac wrote:
| If they can't, then is it really quantum supremacy?
|
| They claimed it last time in 2019 with Sycamore, which
| could perform in 200 seconds a calculation that Google
| claimed would take a classical supercomputer 10,000
| years.
|
| That was debunked when a team of scientists replicated
| the same thing on an ordinary computer in 15 hours with a
| large number of GPUs. Scott Aaronson said that on a
| supercomputer, the same technique would have solved the
| problem in seconds.[1]
|
| So if they now come up with another problem which they
| say cannot even be verified by a classical computer and
| uses it to claim quantum advantage, then it is right to
| be suspicious of that claim.
|
| 1. https://www.science.org/content/article/ordinary-
| computers-c...
| lmm wrote:
| > If they can't, then is it really quantum supremacy?
|
| Yes, quantum supremacy on an artificial problem is
| quantum supremacy (even if it's "this quantum computer
| can simulate itself faster than a classical computer").
| Quantum supremacy on problems that are easy to verify
| would of course be nicer, but unfortunately not all
| problems happen to have an easy verification.
| noqc wrote:
| the unverifiable regime is a _great_ way to extract
| funding.
| cowl wrote:
| that applies specifically to this artificial problem google
| created to be hard for classical computers and in fact in
| the end it turned out it was not so much. IBM came up with
| a method to do what google said it would take 10.000 years
| on a classical computers in just 2 days. I would not be
| surprised if a similar reduction happened also to their
| second attempt if anyone was motivated enough to look at
| it.
|
| In general we have thousands of optimisations problems that
| are hard to solve but immediate to verify.
| derangedHorse wrote:
| > This is factually wrong.
|
| What's factually wrong about it? OP said "you must compute
| something that is impossible to do with a traditional
| computer" which is true, regardless of the output produced.
| Verifying an output is very different from verifying the
| proper execution of a program. The difference between testing
| a program and seeing its code.
|
| What is being computed is fundamentally different from
| classical computers, therefore the verification methods of
| proper adherence to instructions becomes increasingly
| complex.
| ajmurmann wrote:
| They left out the key part which was incorrect and the
| sentence right after "If you can't do it with a traditional
| computer, it suddenly becomes difficult to verify
| correctness"
|
| The point stands that for actually interesting problems
| verifying correctness of the results is trivial. I don't
| know if "adherence to instructions" transudates at all to
| quantum computing.
| 0xfffafaCrash wrote:
| I agree with the issue of "is the test dataset leaking into the
| training dataset" being an issue with interpreting LLM
| capabilities in novel contexts, but not sure I follow what you
| mean on the quantum computing front.
|
| My understanding is that many problems have solutions that are
| easier to verify than to solve using classical computing. e.g.
| prime factorization
| LittleTimothy wrote:
| Oh it's a totally different issue on the quantum side that
| leads to the same issue with difficulty verifying. There, the
| algorithms that Google for example is using today, aren't
| like prime factorization, they're not easy to directly verify
| with traditional computers, so as far as I'm aware they kind
| of check the result for a suitably small run, and then do the
| performance metrics on a large run that they _hope_ gave a
| correct answer but aren 't able to directly verify.
| intellix wrote:
| I haven't checked in a while, but last I checked ChatGPT it
| struggled on very basic things like: how many Fs are in this
| word? Not sure if they've managed to fix that but since that I
| had lost hope in getting it to do any sort of math
| sylware wrote:
| How to train an AI strapped to a formal solver.
| puttycat wrote:
| No: https://github.com/0xnurl/gpts-cant-count
| sebzim4500 wrote:
| I can't reliably multiply four digit numbers in my head either,
| what's your point?
| reshlo wrote:
| Nobody said you have to do it in your head.
| sebzim4500 wrote:
| That's the equivalent to what we are asking the model to
| do. If you give the model a calculator it will get 100%. If
| you give it a pen and paper (e.g. let it show it's working)
| then it will get near 100%.
| reshlo wrote:
| Citation needed.
| sebzim4500 wrote:
| Which bit do you need a citation for? I can run the
| experiment in 10 mins.
| reshlo wrote:
| > That's the equivalent to what we are asking the model
| to do.
|
| Why?
|
| What does it mean to give a model a calculator?
|
| What do you mean "let it show its working"? If I ask an
| LLM to do a calculation, I never said it can't express
| the answer to me in long-form text or with intermediate
| steps.
|
| If I ask a human to do a calculation that they can't
| reliably do in their head, they are intelligent enough to
| know that they should use a pen and paper without needing
| my preemptive permission.
| rishicomplex wrote:
| Who is the author?
| williamstein wrote:
| Kevin Buzzard
| nebulous1 wrote:
| There was a little more information in that reddit thread. Of the
| three difficulty tiers, 25% are T1 (easiest) and 50% are T2. Of
| the five public problems that the author looked at, two were T1
| and two were T2. Glazer on reddit described T1 as
| "IMO/undergraduate problems", but the article author says that
| they don't consider them to be undergraduate problems. So the LLM
| is _already_ doing what the author says they would be surprised
| about.
|
| Also Glazer seemed to regret calling T1 "IMO/undergraduate", and
| not only because of the disparity between IMO and typical
| undergraduate. He said that "We bump problems down a tier if we
| feel the difficulty comes too heavily from applying a major
| result, even in an advanced field, as a black box, since that
| makes a problem vulnerable to naive attacks from models"
|
| Also, all of the problems shows to Tao were T3
| riku_iki wrote:
| > So the LLM is already doing what the author says they would
| be surprised about.
|
| that's if you unconditionally believe in result without any
| proofreading, confirmation, reproducability and even barely any
| details (we are given only one slide).
| joe_the_user wrote:
| The reddit thread is ... interesting (direct link[1]). It seems
| to be a debate among mathematicians some of whom do have access
| to the secret set. But they're debating publicly and so
| naturally avoiding any concrete examples that would give the
| set away so wind-up with fuzzy-fiddly language for the
| qualities of the problem tiers.
|
| The "reality" of keeping this stuff secret 'cause someone would
| train on it is itself bizarre and certainly shouldn't be above
| questioning.
|
| https://www.reddit.com/r/OpenAI/comments/1hiq4yv/comment/m30...
| obastani wrote:
| It's not about training directly on the test set, it's about
| people discussing questions in the test set online (e.g., in
| forums), and then this data is swept up into the training
| set. That's what makes test set contamination so difficult to
| avoid.
| joe_the_user wrote:
| Yes,
|
| That is the "reality" - that because companies can train
| their models on the whole Internet, companies will train
| their (base) models on the entire Internet.
|
| And in this situation, "having heard the problem" actually
| serves as a barrier to understanding of these harder
| problems since any variation of known problem will receive
| a standard "half-assed guestimate".
|
| And these companies "can't not" use these base models since
| they're resigned to the "bitter lesson" (better the "bitter
| lesson viewpoint" imo) that they need large scale
| heuristics for the start of their process and only then can
| they start symbolic/reasoning manipulations.
|
| But hold-up! Why couldn't an organization freeze their
| training set and their problems and release both to the
| public? That would give us an idea where the research
| stands. Ah, the answer comes out, 'cause they don't own the
| training set and the result they want to train is a
| commercial product that needs every drop of data to be the
| best. As Yan LeCun has said, _this isn 't research, this is
| product development_.
| phkahler wrote:
| >> It's not about training directly on the test set, it's
| about people discussing questions in the test set online
|
| Don't kid yourself. There are 10's of billions of dollars
| going into AI. Some of the humans involved would happily
| cheat on comparative tests to boost investment.
| xmprt wrote:
| The incentives are definitely there, but even CEOs and
| VCs know that if they cheat the tests just to get more
| investment, they're only cheating themselves. No one is
| liquidating within the next 5 years so either they end up
| getting caught and lose everything or they spent all this
| energy trying to cheat while having a subpar model which
| results in them losing to competitors who actually
| invested in good technology.
|
| Having a higher valuation could help with attracting
| better talent or more funding to invest in GPUs and
| actual model improvements but I don't think that
| outweighs the risks unless you're a tiny startup with
| nothing to show (but then you wouldn't have the money to
| bribe anyone).
| earnestinger wrote:
| People like to cheat. See the VW case. Company is big and
| established and still cheated.
|
| It depends a lot on individuals making up the companies
| command chain and their values.
| davidcbc wrote:
| Why is this any different from say, Theranos?
|
| CEOs and VCs will happily lie because they are convinced
| they are smarter than everyone else and will solve the
| problem before they get caught.
| zifpanachr23 wrote:
| Not having access to the dataset really makes the whole thing
| seem incredibly shady. Totally valid questions you are
| raising
| whimsicalism wrote:
| it's a key aspect of the entire project. we have gone
| through many cycles of evils where the dataset is public
| seafoamteal wrote:
| I don't have much to opine from an advanced maths perspective,
| but I'd like to point out a couple examples of where ChatGPT made
| basic errors in questions I asked it as an undergrad CS student.
|
| 1. I asked it to show me the derivation of a formula for the
| efficiency of Stop-and-Wait ARQ and it seemed to do it, but a day
| later, I realised that in one of the steps, it just made a term
| vanish to get to the next step. Obviously, I should have verified
| more carefully, but when I asked it to spot the mistake in that
| step, it did the same thing twice more with bs explanations of
| how the term is absorbed.
|
| 2. I asked it to provide me syllogisms that I could practice
| proving. An overwhelming number of the syllogisms it gave me were
| inconsistent and did not hold. This surprised me more because
| syllogisms are about the most structured arguments you can find,
| having been formalized centuries ago and discussed extensively
| since then. In this case, asking it to walk step-by-step actually
| fixed the issue.
|
| Both of these were done on the free plan of ChatGPT, but I can
| remember if it was 4o or 4.
| voiper1 wrote:
| The first question is always: which model? Which fortunately
| you at least addressed: >free plan of ChatGPT, but I can
| remember if it was 4o or 4.
|
| Since chatgpt-4o, there has been o1-preview, and o1 (full) is
| out. They just announced o3 got a 25% on frontiermath which is
| what this article is a reaction to. So, any tests on 4o are at
| least TWO (or three) AI releases with new capabilities.
| Xcelerate wrote:
| So here's what I'm perplexed about. There are statements in
| Presburger arithmetic that take time doubly exponential (or
| worse) in the size of the statement to reach via _any path_ of
| the formal system whatsoever. These are arithmetic truths about
| the natural numbers. Can these statements be reached faster in
| ZFC? Possibly--it 's well-known that there exist shorter proofs
| of true statements in more powerful consistent systems.
|
| But the problem then is that one can suppose there are also true
| short statements in ZFC which likewise require doubly exponential
| time to reach via any path. Presburger Arithmetic is decidable
| whereas ZFC is not, so these statements would require the
| additional axioms of ZFC for shorter proofs, but I think it's
| safe to assume such statements exist.
|
| Now let's suppose an AI model can resolve the truth of these
| short statements quickly. That means one of three things:
|
| 1) The AI model can discover doubly exponential length proof
| paths within the framework of ZFC.
|
| 2) There are certain short statements in the formal language of
| ZFC that the AI model cannot discover the truth of.
|
| 3) The AI model operates outside of ZFC to find the truth of
| statements in the framework of some other, potentially unknown
| formal system (and for arithmetical statements, the system must
| necessarily be sound).
|
| How likely are each of these outcomes?
|
| 1) is not possible within any coherent, human-scale timeframe.
|
| 2) IMO is the most likely outcome, but then this means there are
| some _really_ interesting things in mathematics that AI cannot
| discover. Perhaps the same set of things that humans find
| interesting. Once we have exhausted the theorems with short
| proofs in ZFC, there will still be an infinite number of short
| and interesting statements that we cannot resolve.
|
| 3) This would be the most bizarre outcome of all. If AI operates
| in a consistent way outside the framework of ZFC, then that would
| be equivalent to solving the halting problem for certain
| (infinite) sets of Turing machine configurations that ZFC cannot
| solve. That in itself itself isn't too strange (e.g., it might
| turn out that ZFC lacks an axiom necessary to prove something as
| simple as the Collatz conjecture), but what would be strange is
| that it could find these new formal systems _efficiently_. In
| other words, it would have discovered an algorithmic way to
| procure new axioms that lead to efficient proofs of true
| arithmetic statements. One could also view that as an efficient
| algorithm for computing BB(n), which obviously we think isn 't
| possible. See Levin's papers on the feasibility of extending PA
| in a way that leads to quickly discovering more of the halting
| sequence.
| aleph_minus_one wrote:
| > There are statements in Presburger arithmetic that take time
| doubly exponential (or worse) in the size of the statement to
| reach via any path of the formal system whatsoever.
|
| This is a correct statement about the _worst_ case runtime.
| What is interesting for practical applications is whether such
| statements are among those that you are practically interested
| in.
| Xcelerate wrote:
| I would certainly think so. The statements mathematicians
| seem to be interested in tend to be at a "higher level" than
| simple but true statements like 2+3=5. And they necessarily
| have a short description in the formal language of ZFC,
| otherwise we couldn't write them down (e.g., Fermat's last
| theorem).
|
| If the truth of these higher level statements instantly
| unlocks many other truths, then it makes sense to think of
| them in the same way that knowing BB(5) allows one to
| instantly classify any Turing machine configuration on the
| computation graph of all n <= 5 state Turing machines (on
| empty tape input) as halting/non-halting.
| wbl wrote:
| 2 is definitely true. 3 is much more interesting and likely
| true but even saying it takes us into deep philosophical
| waters.
|
| If every true theorem had a proof in a computationally bounded
| length the halting problem would be solvable. So the AI can't
| find some of those proofs.
|
| The reason I say 3 is deep is that ultimately our foundational
| reasons to assume ZFC+the bits we need for logic come from
| philosohical groundings and not everyone accepts the same ones.
| Ultrafinitists and large cardinal theorists are both kinds of
| people I've met.
| Xcelerate wrote:
| My understanding is that no model-dependent theorem of ZFC or
| its extensions (e.g., ZFC+CH, ZFC+!CH) provides any insight
| into the behavior of Turing machines. If our goal is to
| invent an algorithm that finds better algorithms, then the
| philosophical angle is irrelevant. For computational
| purposes, we would only care about new axioms independent of
| ZFC if they allow us to prove additional Turing machine
| configurations as non-halting.
| semolinapudding wrote:
| ZFC is way worse than Presburger arithmetic -- since it is
| undecidable, we know that the length of the minimal proof of a
| statement cannot be bounded by a computable function of the
| length of the statement.
|
| This has little to do with the usefulness of LLMs for research-
| level mathematics though. I do not think that anyone is hoping
| to get a decision procedure out of it, but rather something
| that would imitate human reasoning, which is heavily based on
| analogies ("we want to solve this problem, which shares some
| similarities with that other solved problem, can we apply the
| same proof strategy? if not, can we generalise the strategy so
| that it becomes applicable?").
| lmm wrote:
| > and for arithmetical statements, the system must necessarily
| be sound
|
| Why do you say this? The AI doesn't know or care about
| soundness. Probably it has mathematical intuition that makes
| unsound assumptions, like human mathematicians do.
|
| > How likely are each of these outcomes?
|
| I think they'll all be true to a certain extent, just as they
| are for human mathematicians. There will probably be certain
| classes of extremely long proofs that the AI has no trouble
| discovering (because they have some kind of structure, just not
| structure that can be expressed in ZFC), certain truths that
| the AI makes an intuitive leap to despite not being able to
| prove them in ZFC (just as human mathematicians do), and
| certain short statements that the AI cannot prove one way or
| another (like Goldbach or twin primes or what have you, again,
| just as human mathematicians can't).
| bambax wrote:
| > _As an academic mathematician who spent their entire life
| collaborating openly on research problems and sharing my ideas
| with other people, it frustrates me_ [that] _I am not even to
| give you a coherent description of some basic facts about this
| dataset, for example, its size. However there is a good reason
| for the secrecy. Language models train on large databases of
| knowledge, so you moment you make a database of maths questions
| public, the language models will train on it._
|
| Well, yes and no. This is only true because we are talking about
| closed models from closed companies like so-called "OpenAI".
|
| But if all models were truly open, then we could simply verify
| what they had been trained on, and make experiments with models
| that we could be sure had never seen the dataset.
|
| Decades ago Microsoft (in the words of Ballmer and Gates)
| famously accused open source of being a "cancer" because of the
| cascading nature of the GPL.
|
| But it's the opposite. In software, and in knowledge in general,
| the true disease is secrecy.
| ludwik wrote:
| > But if all models were truly open, then we could simply
| verify what they had been trained on
|
| How do you verify what a particular open model was trained on
| if you haven't trained it yourself? Typically, for open models,
| you only get the architecture and the trained weights. How can
| you reliably verify what the model was trained on from this?
|
| Even if they provide the training set (which is not typically
| the case), you still have to take their word for it--that's not
| really "verification."
| asadotzler wrote:
| The OP said "truly open" not "open model" or any of the other
| BS out there. If you are truly open you share the training
| corpora as well or at least a comprehensive description of
| what it is and where to get it.
| ludwik wrote:
| It seems like you skipped the second paragraph of my
| comment?
| SpaceManNabs wrote:
| Because it is mostly hogwash.
|
| Lots of ai researchers have shown that you can both give
| credit and discredit "open models" when you are given a
| dataset and training steps.
|
| Many lauded papers fell into reddit Ml or twitter ire
| when people couldnt reproduce the model or results.
|
| If you are given the training set, the weights, the steps
| required, and enough compute, you can do it.
|
| Having enough compute and people releasing the steps is
| the main impediment.
|
| For my research I always release all of my code, and the
| order of execution steps, and of course the training set.
| I also give confidence intervals based on my runs so
| people can reproduce and see if we get similar intervals.
| bambax wrote:
| If they provide the training set it's reproducible and
| therefore verifiable.
|
| If not, it's not really "open", it's bs-open.
| lmm wrote:
| > Even if they provide the training set (which is not
| typically the case), you still have to take their word for it
| --that's not really "verification."
|
| If they've done it right, you can re-run the training and get
| the same weights. And maybe you could spot-check parts of it
| without running the full training (e.g. if there are glitch
| tokens in the weights, you'd look for where they came from in
| the training data, and if they weren't there at all that
| would be a red flag). Is it possible to release the wrong
| training set (or the wrong instructions) and hope you don't
| get caught? Sure, but demanding that it be published and
| available to check raises the bar and makes it much more
| risky to cheat.
| 4ad wrote:
| > FrontierMath is a secret dataset of "hundreds" of hard maths
| questions, curated by Epoch AI, and announced last month.
|
| The database stopped being secret when it was fed to proprietary
| LLMs running in the cloud. If anyone is not thinking that OpenAI
| has trained and tuned O3 on the "secret" problems people fed to
| GPT-4o, I have a bridge to sell you.
| fn-mote wrote:
| This level of conspiracy thinking requires evidence to be
| useful.
|
| Edit: I do see from your profile that you are a real person
| though, so I say this with more respect.
| dns_snek wrote:
| What evidence do we need that AI companies are exploiting
| every bit of information they can use to get ahead in the
| benchmarks to generate more hype? Ignoring terms/agreements,
| violating copyright, and otherwise exploiting information for
| personal gain is the foundation of that entire industry for
| crying out loud.
| threeseed wrote:
| Some people are also forgetting who is the CEO of OpenAI.
|
| Sam Altman has long talked about believing in the "move
| fast and break things" way of doing business. Which is just
| a nicer way of saying do whatever dodgy things you can get
| away with.
| cheald wrote:
| OpenAI's also in the position of having to compete
| against other LLM trainers - including the open-weights
| Llama models and their community derivatives, which have
| been able to do extremely well with a tiny fraction of
| OpenAI's resources - and to justify their astronomical
| valuation. The economic incentive to cheat is _extreme_ ;
| I think that cheating has to be the default presumption.
| advisedwang wrote:
| It's perfectly possible for OpenAI to run the model (or prove
| others the means to run it) without storing queries/outputs for
| future. I expect Epoch AI would insist on this. Perhaps OpenAI
| would lie about it, but that's opening up serious charges.
| ashoeafoot wrote:
| Ai has a interior world model thus it can do math if a chain of
| proof is walking without uncertainty from room to room. the
| problem is its inability to reflect on its own uncertainty and to
| then overrife that uncertainty ,should a new room entrance method
| be selfsimilar to a previous entrance
| voidhorse wrote:
| Eventually we may produce a collection of problems exhaustive
| enough that these tools can solve almost any problem that isn't
| novel in practice, but I doubt that they will ever become general
| problem solvers capable of what we consider to be reasoning in
| humans.
|
| Historically, the claim that neural nets were actual models of
| the human brain and human thinking was always epistemically
| dubious. It still is. Even as the _practical_ problems of
| producing better and better algorithms, architectures, and output
| have been solved, there is no reason to believe a connection
| between the mechanical model and what happens in organisms has
| been established. The most important point, in my view, is that
| all of the representation and interpretation still has to happen
| outside the computational units. Without human interpreters, none
| of the AI outputs have any meaning. Unless you believe in
| determinism and an overseeing god, the story for human beings is
| much different. AI will not be capable of reason until, like
| humans, it can develop socio-rational collectivities of meaning
| that are _independent_ of the human being.
|
| Researchers seemed to have a decent grasp on this in the 90s, but
| today, everyone seems all too ready to make the same ridiculous
| leaps as the original creators of neural nets. They did not show,
| as they claimed, that thinking is reducible to computation. All
| they showed was that a neural net can realize a _boolean
| function_ --which is not even logic, since, again, the entire
| semantic interpretive side of the logic is ignored.
| nmca wrote:
| Can you define what you mean by novel here?
| red75prime wrote:
| > there is no reason to believe a connection between the
| mechanical model and what happens in organisms has been
| established
|
| The universal approximation theorem. And that's basically it.
| The rest is empirical.
|
| No matter which physical processes happen inside the human
| brain, a sufficiently large neural network can approximate
| them. Barring unknowns like super-Turing computational
| processes in the brain.
| lupire wrote:
| That's not useful by itself, because "anything cam model
| anything else" doesn't put any upper bound on emulation cost,
| which for one small task could be larger than the total
| energy available in the entire Universem
| pixl97 wrote:
| I mean, that is why they mention super-Turning processes
| like quantum based computing.
| dinosaurdynasty wrote:
| Quantum computing actually isn't super-Turing, it "just"
| computes some things faster. (Strictly speaking it's
| somewhere between a standard Turing machine and a
| nondeterministic Turing machine in speed, and the first
| can emulate the second.)
| staunton wrote:
| If we're nitpicking: quantum computing algorithms could
| (if implemented) compute _certain things_ faster _than
| the best classical algorithms we know_. We don 't know
| any quantum algorithms that are provably faster than _all
| possible_ classical algorithms.
| dinosaurdynasty wrote:
| Well yeah, we haven't even proved that P != NP yet.
| red75prime wrote:
| Either the brain violates the physical Church-Turing thesis
| or it's not.
|
| If it does, well, it will take more time to incorporate
| those physical mechanisms into computers to get them on par
| with the brain.
|
| I leave the possibility that it's "magic"[1] aside. It's
| just impossible to predict, because it will violate
| everything we know about our physical world.
|
| [1] One example of "magic": we live in a simulation and the
| brain is not fully simulated by the physics engine, but
| creators of the simulation for some reason gave it access
| to computational resources that are impossible to harness
| using the standard physics of the simulated world. Another
| example: interactionistic soul.
| exprofmaddy wrote:
| The universal approximation theorem is set in a precise
| mathematical context; I encourage you to limit its
| applicability to that context despite the marketing label
| "universal" (which it isn't). Consider your concession about
| empiricism. There's no empirical way to prove (i.e. there's
| no experiment that can demonstrate beyond doubt) that all
| brain or other organic processes are deterministic and can be
| represented completely as functions.
| red75prime wrote:
| Function is the most general way of describing relations.
| Non-deterministic processes can be represented as functions
| with a probability distribution codomain. Physics seems to
| require only continuous functions.
|
| Sorry, but there's not much evidence that can support human
| exceptionalism.
| exprofmaddy wrote:
| Some differential equations that model physics admit
| singularities and multiple solutions. Therefore,
| functions are not the most general way of describing
| relations. Functions are a subset of relations.
|
| Although "non-deterministic" and "stochastic" are often
| used interchangeably, they are not equivalent.
| Probability is applied analysis whose objects are
| distributions. Analysis is a form of deductive, i.e.
| mechanical, reasoning. Therefore, it's more accurate
| (philosophically) to identify mathematical probability
| with determinism. Probability is a model for our
| experience. That doesn't mean our experience is truly
| probabilistic.
|
| Humans aren't exceptional. Math modeling and reasoning
| are human activities.
| red75prime wrote:
| > Some differential equations that model physics admit
| singularities and multiple solutions.
|
| And physicists regard those as unphysical: the theory
| breaks down, we need better one.
| exprofmaddy wrote:
| For example, the Euler equations model compressible flow
| with discontinuities (shocks in the flow field variables)
| and rarefaction waves. These theories are accepted and
| used routinely.
| red75prime wrote:
| Great. A useful approximation of what really happens in
| the fluid. But I'm sure there are no shocks and
| rarefactions in physicists' neurons while they are
| thinking about it.
|
| Switching into a less facetious mode...
|
| Do you understand that in context of this dialogue it's
| not enough to show some examples of discontinuous or
| otherwise unrepresentable by NNs functions? You need at
| least to give a hint why such functions cannot be avoided
| while approximating functionality of the human brain.
|
| Many things are possible, but I'm not going to keep my
| mind open to a possibility of a teal Russell's teapot
| before I get a hint at its existence, so to speak.
| voidhorse wrote:
| I don't understand your point here. A (logical) relation
| is, by definition, a more general way of describing
| relations than a function, and it is telling that we
| still suck at using and developing truly _relational_
| models that are not univalent (i.e. functions). Only a
| few old logicians really took the calculus of relations
| proper seriously (Pierce, for one). We use functions
| precisely because they are less general, they are rigid,
| and simpler to work with. I do not think anyone is
| working under the impression that a function is a high
| fidelity means to model the world as it is experienced
| and actually exists. It is necessarily reductionistic
| (and abstract). Any truth we achieve through functional
| models is necessarily a general, abstracted, truth, which
| in many ways proves to be extremely useful but in others
| (e.g. when an essential piece of information in the
| _particular_ is not accounted for in the _general
| reductive model_ ) can be disastrous.
| red75prime wrote:
| I'm not a big fan of philosophy. The epistemology you are
| talking about is another abstraction on top of the
| physical world. But the evolution of the physical world
| as far as we know can be described as a function of time
| (at least, in a weak gravitational field when energies
| involved are well below the grand unification energy
| level, that is for the objects like brains).
|
| The brain is a physical system, so whatever it does
| (including philosophy) can be replicated by modelling (a
| (vastly) simplified version of) underlying physics.
|
| Anyway, I am not especially interested in discussing
| possible impossibility of an LLM-based AGI. It might be
| resolved empirically soon enough.
| tananan wrote:
| > Unless you believe in determinism and an overseeing god
|
| Or perhaps, determinism and mechanistic materialism - which in
| STEM-adjacent circles has a relatively prevalent adherence.
|
| Worldviews which strip a human being of agency in the sense you
| invoke crop up quite a lot today in such spaces. If you start
| of adopting a view like this, you have a deflationary sword
| which can cut down most any notion that's not mechanistic in
| terms of mechanistic parts. "Meaning? Well that's just an
| emergent phenomenon of the influence of such and such causal
| factors in the unrolling of a deterministic physical system."
|
| Similar for reasoning, etc.
|
| Now obviously large swathes of people don't really subscribe to
| this - but it is prevalent and ties in well with utopian
| progress stories. If something is amenable to mechanistic
| dissection, possibly it's amenable to mechanistic control. And
| that's what our education is really good at teaching us. So
| such stories end up having intoxicating "hype" effects and
| drive fundraising, and so we get where we are.
|
| For one, I wish people were just excited about making computers
| do things they couldn't do before, without needing to dress it
| up as something more than it is. "This model can prove a set of
| theorems in this format with such and such limits and
| efficiency"
| exprofmaddy wrote:
| Agreed. If someone believes the world is purely mechanistic,
| then it follows that a sufficiently large computing machine
| can model the world---like Leibniz's Ratiocinator. The
| intoxication may stem from the potential for predictability
| and control.
|
| The irony is: why would someone want control if they don't
| have true choice? Unfortunately, such a question rarely
| pierces the intoxicated mind when this mind is preoccupied
| with pass the class, get an A, get a job, buy a house, raise
| funds, sell the product, win clients, gain status, eat right,
| exercise, check insta, watch the game, binge the show, post
| on Reddit, etc.
| Quekid5 wrote:
| > If someone believes the world is purely mechanistic, then
| it follows that a sufficiently large computing machine can
| model the world
|
| Is this controversial in some way? The problem is that to
| simulate a universe you need a bigger universe -- which
| doesn't exist (or is certainly out of reach due to
| information theoretical limits)
|
| > ---like Leibniz's Ratiocinator. The intoxication may stem
| from the potential for predictability and control.
|
| I really don't understand the 'control' angle here. It
| seems pretty obvious that even in a purely mechanistic view
| of the universe, information theory forbids using the
| universe to simulate itself. Limited simulations, sure...
| but that leaves lots of gaps wherein you lose determinism
| (and control, whatever that means).
| tananan wrote:
| > Is this controversial in some way?
|
| It's not "controversial", it's just not a given that the
| universe is to be thought a deterministic machine. Not to
| everyone, at least.
| exprofmaddy wrote:
| People wish to feel safe. One path to safety is
| controlling or managing the environment. Lack of
| sufficient control produces anxiety. But control is only
| possible if the environment is predictable, i.e.,
| relatively certain knowledge that if I do X then the
| environment responds with Y. Humans use models for
| prediction. Loosely speaking, if the universe is truly
| mechanistic/deterministic, then the goal of modeling is
| to get the correct model (though notions of "goals" are
| problematic in determinism without real counterfactuals).
| However, if we can't know whether the universe is truly
| deterministic, then modeling is a pragmatic exercise in
| control (or management).
|
| My comments are not about simulating the universe on a
| real machine. They're about the validity and value of
| math/computational modeling in a universe where
| determinism is scientifically indeterminable.
| HDThoreaun wrote:
| Choice is over rated. This gets to an issue Ive long had
| with Nozicks experience machine. Not only would I happily
| spend my days in such a machine, Im pretty sure most other
| people would too. Maybe they say they wouldnt but if you
| let them try it out and then offered them the question
| again I think theyd say yes. The real conclusion of the
| experience machine is that the unknown is scary.
| fire_lake wrote:
| > Agreed. If someone believes the world is purely
| mechanistic, then it follows that a sufficiently large
| computing machine can model the world---like Leibniz's
| Ratiocinator.
|
| I don't think it does. Taking computers as an analogy... if
| you have a computer with 1GB memory, then you can't
| simulate a computer with more than 1GB memory inside of it.
| exprofmaddy wrote:
| "sufficiently large machine" ... It's a thought
| experiment. Leibniz didn't have a computer, but he still
| imagined it.
| gmadsen wrote:
| I hear these arguments a lot from law and philosophy students,
| never from those trained in mathematics. It seems to me,
| "literary" people will still be discussing these theoretical
| hypotheticals as technology passes them by building it.
| voidhorse wrote:
| I straddle both worlds. Consider that using the lens of
| mathematical reasoning to understand everything is a bit like
| trying to use a single mathematical theory (eg that of
| groups) to comprehend mathematics as a whole. You will almost
| always benefit and enrich your own understanding by daring to
| incorporate outside perspectives.
|
| Consider also that even as digital technology and the
| ratiomathimatical understanding of the world has advanced it
| is still rife with dynamics and problems that require a
| humanistic approach. In particular, a mathematical conception
| cannot resolve _teleological_ problems which require the
| establishment of consensus and the actual determination of
| what we, as a species, want the world to look like. Climate
| change and general economic imbalance are already evidence of
| the kind of disasters that mount when you limit yourself to a
| reductionistic, overly mathematical and technological
| understanding of life and existence. Being is not a solely
| technical problem.
| gmadsen wrote:
| I don't disagree, I just don't think it is done well or at
| least as seriously as it used to. In modern philosophy,
| there are many mathematically specious arguments, that just
| make clear how large the mathematical gap has become e.g.
| improper application of Godel's incompleteness theorems.
| Yet Godel was a philosopher himself, who would disagree
| with its current hand-wavy usage.
|
| 19th/20th was a golden era of philosophy with a coherent
| and rigorous mathematical lens to apply with other lenses.
| Russel, Turing, Godel, etc. However this just doesn't exist
| anymore
| voidhorse wrote:
| While I agree that these are titans of 20th c.
| philosophy, particularly of the philosophy of mathematics
| and logic, the overarching school they belonged to
| (logical positivism) has been thoroughly and rightly
| criticized, and it is informative to read these
| criticisms to understand why a view of life that is
| _overly_ mathematical is in many ways inadequate. Your
| comment still argues from a very limited perspective.
| There is no reason that correct application of Godel s
| theorem should be any indication of the richness of
| someone 's philosophical views _unless_ you are already a
| staunchly committed reductionist who values mathematical
| arguments above all else (why? can maths help you explain
| and understand the phenomena of love in a way that will
| actually help you _experience love_? this is just one
| example domain where it does not make much sense), _or_
| unless they are specifically attempting a philosophy of
| mathematics. The question of whether or not we can
| effectively model cognition and human mental function
| using mathematical models is not a question of
| mathematical philosophy, but rather one of
| _epistemology_. If you really want to head a spurious
| argument, read McCulloch and Pitts. They essentially
| present an argument of two premises, the brain is finite,
| and we can create a machine of formal "neurons" (which
| are not even complete models of real neurons) that
| computes a boolean function, they then _conclude_ that
| they must have a model of cognition, that cognition must
| be nothing more than computation, and that the brain must
| basically be a Turing machine.
|
| The relevance of mathematics to the cognitive problem
| must be decided _outside of_ mathematics. As another
| poster said, even if you buy the theorems, it is still an
| _empirical question_ as to whether or not they really
| _model_ what they claim to model, and whether or not that
| model is of a fidelity that we find acceptable for a
| definition of general intelligence. Often, people reach
| claims of adequacy today _not_ by producing really
| fantastic models but instead by _lowering the bar
| enormously_. They claim that these models approximate
| humans by severely reducing the idea of what it means to
| be an intelligent human to the specific talents their
| tech happens to excel at (e.g. apparently being a
| language parrot is all that intelligence is, ignoring all
| the very nuanced views and definitions of intelligence we
| have come up with over the course of history. A machine
| that is not embodied ina skeletal structure and cannot
| even _experience_ , let alone solve, the vast number of
| physical, anatomical problems we contend with on a daily
| basis is, in my view, still very far from anything I
| would call general intelligence).
| exprofmaddy wrote:
| I'm with you. Interpreting a problem as a problem requires a
| human (1) to recognize the problem and (2) to convince other
| humans that it's a problem worth solving. Both involve value,
| and value has no computational or mechanistic description
| (other than "given" or "illusion"). Once humans have identified
| a problem, they might employ a tool to find the solution. The
| tool has no sense that the problem is important or even hard;
| such values are imposed by the tool's users.
|
| It's worth considering why "everyone seems all too ready to
| make ... leaps ..." "Neural", "intelligence", "learning", and
| others are metaphors that have performed very well as marketing
| slogans. Behind the marketing slogans are deep-pocketed,
| platformed corporate and government (i.e. socio-rational
| collective) interests. Educational institutions (another socio-
| rational collective) and their leaders have on the whole
| postured as trainers and preparers for the "real world" (i.e. a
| job), which means they accept, support, and promote the
| corporate narratives about techno-utopia. Which institutions
| are left to check the narratives? Who has time to ask questions
| given the need to learn all the technobabble (by paying
| hundreds of thousands for 120 university credits) to become a
| competitive job candidate?
|
| I've found there are many voices speaking against the hype---
| indeed, even (rightly) questioning the epistemic underpinnings
| of AI. But they're ignored and out-shouted by tech marketing,
| fundraising politicians, and engagement-driven media.
| alphan0n wrote:
| As far as ChatGPT goes, you may as well be asking: Can AI use a
| calculator?
|
| The answer is yes, it can utilize a stateful python environment
| and solve complex mathematical equations with ease.
| lcnPylGDnU4H9OF wrote:
| There is a difference between correctly _stating_ that 2 + 2 =
| 4 within a set of logical rules and _proving_ that 2 + 2 = 4
| must be true given the rules.
| alphan0n wrote:
| I think you misunderstood, ChatGPT can utilize Python to
| solve a mathematical equation and provide proof.
|
| https://chatgpt.com/share/676980cb-d77c-8011-b469-4853647f98.
| ..
|
| More advanced solutions:
|
| https://chatgpt.com/share/6769895d-7ef8-8011-8171-6e84f33103.
| ..
| cruffle_duffle wrote:
| It still has to know what to code in that environment. And
| based on my years of math as a wee little undergrad, the actual
| arithmetic was the least interesting part. LLM's are horrible
| at basic arithmetic, but they can use python for the
| calculator. But python wont help them write the correct
| equations or even solve for the right thing (wolfram alpha can
| do a bit of that though)
| alphan0n wrote:
| You'll have to show me what you mean.
|
| I've yet to encounter an equation that 4o couldn't answer in
| 1-2 prompts unless it timed out. Even then it can provide the
| solution in a Jupyter notebook that can be run locally.
| cruffle_duffle wrote:
| Never really pushed it. I have to reason to believe it
| wouldn't get most of that stuff correctly. Math is very
| much like programming and I'm sure it can output really
| good python for its notebook to use execute.
| alphan0n wrote:
| Awful lot of shy downvotes.. Why not say something if you
| disagree?
| upghost wrote:
| I didn't see anyone else ask this but.. isn't the FrontierMath
| dataset compromised now? At the very least OpenAI now knows the
| questions if not the answers. I would expect that the next
| iteration will "magically" get over 80% on the FrontierMath test.
| I imagine that experiment was pretty closely monitored.
| jvanderbot wrote:
| I figured their model was independently evaluated against the
| questions/answers. That's not to say it's not compromised by
| "Here's a bag of money" type methods, but I don't even think
| it'd be a reasonable test if they just handed over the dataset.
| upghost wrote:
| I'm sure it was independently evaluated, but I'm sure the
| folks running the test were not given an on-prem installation
| of ChatGPT to mess with. It was still done via API calls,
| presumably through the chat interface UI.
|
| That means the questions went over the fence to OpenAI.
|
| I'm quite certain they are aware of that, and it would be
| pretty foolish not to take advantage of at least knowing what
| the questions are.
| jvanderbot wrote:
| Now that you put it that way, it is laughably easy.
| ls612 wrote:
| Depending on the plan the researchers used they may have
| contractual protections against OpenAI training on their
| inputs.
| upghost wrote:
| Sure, but given the resourcing at OpenAI, it would not be
| hard to clean[1] the inputs. I'm just trying to be
| realistic here, there are plenty of ways around
| contractual obligations and a significant incentive to do
| so.
|
| [1]: https://en.wikipedia.org/wiki/Clean-room_design
| optimalsolver wrote:
| This was my first thought when I saw the results:
|
| https://news.ycombinator.com/item?id=42473470
| upghost wrote:
| Insightful comment. The thing that's extremely frustrating is
| look at all the energy poured into this conversation around
| benchmarks. There is a fundamental assumption of honesty and
| integrity in the benchmarking process by at least some
| people. But when the dataset is compromised and generation
| N+1 has miraculous performance gains, how can we see this as
| anything other than a ploy to pump up valuations? Some people
| have millions of dollars at stake here and they don't care
| about the naysayers in the peanut gallery like us.
| optimalsolver wrote:
| It's sadly inevitable that when billions in funding and
| industry hype are tied to performance on a handful of
| benchmarks, scores will somehow, magically, continue to go
| up.
|
| Needless to say, it doesn't bring us any closer to AGI.
|
| The only solution I see here is people crafting their own,
| private benchmarks that the big players don't care about
| enough to train on. That, at least, gives you a clearer
| view of the field.
| upghost wrote:
| Not sure why your comment was downvoted, but it certainly
| shows the pressure going against people who point out
| fundamental flaws. This is pushing us towards "AVI"
| rather than AGI-- "Artificially Valued Intelligence". The
| optimization function here is around the market.
|
| I'm being completely serious. You are correct, despite
| the downvotes, that this could not be pushing us towards
| AGI because if the dataset is leaked you can't claim the
| G-- generalizability.
|
| The point of the benchmark is to lead is to believe that
| this is a substantial breakthrough. But a reasonable
| person would be forced to conclude that the results are
| misleading to due to optimizing around the training data.
| sincerecook wrote:
| No it can't, and there's no such thing as AI. How is a thing that
| predicts the next-most-likely word going to do novel math? It
| can't even do existing math reliably because logical operations
| and statistical approximation are fundamentally different. It is
| fun watching grifters put lipstick on this thing and shop it
| around as a magic pig though.
| bwfan123 wrote:
| openai and epochai (frontier math) are startups with a strong
| incentive to push such narratives. the real test will be in
| actual adoption in real world use cases.
|
| the management class has a strong incentive to believe in this
| narrative, since it helps them reduce labor cost. so they are
| investing in it.
|
| eventually, the emperor will be seen to have no clothes at
| least in some usecases for which it is being peddled right now.
| comp_throw7 wrote:
| Epoch is a non-profit research institute, not a startup.
| retrocryptid wrote:
| When did we decide that AI == LLM? Oh don't answer. I know, The
| VC world noticed CNNs and LLMs about 10 years ago and it's the
| only thing anyone's talked about ever since.
|
| Seems to me the answer to 'Can AI do maths yet?' depends on what
| you call AI and what you call maths. Our old departmental VAX
| running at a handfull of megahertz could do some very clever
| symbol manipulation on binomials and if you gave it a few
| seconds, it could even do something like theorum proving via
| proto-prolog. Neither are anywhere close to the glorious GAI
| future we hope to sell to industry and government, but it seems
| worth considering how they're different, why they worked, and
| whether there's room for some hybrid approach. Do LLMs need to
| know how to do math if they know how to write Prolog or Coc
| statements that can do interesting things?
|
| I've heard people say they want to build software that emulates
| (simulates?) how humans do arithmetic, but ask a human to add
| anything bigger than two digit numbers and the first thing they
| do is reach for a calculator.
| ivan_ah wrote:
| Yesterday, I saw a thought provoking talk about the future of of
| "math jobs" assuming automated theory proving becomes more
| prevalent in the future.
|
| [ (Re)imagining mathematics in a world of reasoning machines by
| Akshay Venkatesh]
|
| https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]
|
| Abstract: In the coming decades, developments in automated
| reasoning will likely transform the way that research mathematics
| is conceptualized and carried out. I will discuss some ways we
| might think about this. The talk will not be about current or
| potential abilities of computers to do mathematics--rather I will
| look at topics such as the history of automation and mathematics,
| and related philosophical questions.
|
| See discussion at https://news.ycombinator.com/item?id=42465907
| qnleigh wrote:
| That was wonderful, thank you for linking it. For the benefit
| of anyone who doesn't have time to watch the whole thing, here
| are a few really nice quotes that convey some main points.
|
| "We might put the axioms into a reasoning apparatus like the
| logical machinery of Stanley Jevons, and see all geometry come
| out of it. That process of reasoning are replaced by symbols
| and formulas... may seem artificial and puerile; and it is
| needless to point out how disastrous it would be in teaching
| and how hurtful to the mental development; how deadening it
| would be for investigators, whose originality it would nip in
| the bud. But as used by Professor Hilbert, it explains and
| justifies itself if one remembers the end pursued." Poincare on
| the value of reasoning machines, but the analogy to mathematics
| once we have theorem-proving AI is clear (that the tools and
| the lie direct outputs are not the ends. Human understanding
| is).
|
| "Even if such a machine produced largely incomprehensible
| proofs, I would imagine that we would place much less value on
| proofs as a goal of math. I don't think humans will stop doing
| mathematics... I'm not saying there will be jobs for them, but
| I don't think we'll stop doing math."
|
| "Mathematics is the study of reproducible mental objects." This
| definition is human ("mental") and social (it implies
| reproducing among individuals). "Maybe in this world,
| mathematics would involve a broader range of inquiry... We need
| to renegotiate the basic goals and values of the discipline."
| And he gives some examples of deep questions we may tackle
| beyond just proving theorems.
| swalsh wrote:
| Every profession seems to have a pessimistic view of AI as soon
| as it starts to make progress in their domain. Denial, Anger,
| Bargaining, Depression, and Acceptance. Artists seem to be in the
| depression state, many programmers are still in the denial phase.
| Pretty solid denial here from a mathematician. o3 was a proof of
| concept, like every other domain AI enters, it's going to keep
| getting better.
|
| Society is CLEARLY not ready for what AI's impact is going to be.
| We've been through change before, but never at this scale and
| speed. I think Musk/Vivek's DOGE thing is important, our
| governent has gotten quite large and bureaucratic. But the clock
| has started on AI, and this is a social structural issue we've
| gotta figure out. Putting it off means we probably become
| subjects to a default set of rulers if not the shoggoth itself.
| haolez wrote:
| I think it's a little of both. Maybe generative AI algorithms
| won't overcome their initial limitations. But maybe we don't
| need to overcome them to transform society in a very
| significant way.
| WanderPanda wrote:
| Or is it just white collar workers experiencing what blue
| collar workers have been experiencing for decades?
| esafak wrote:
| So will that make society shift to the left in demand
| stronger of safety nets, or to the right in search of a
| strongman to rescue them?
| taneq wrote:
| Depends on the individual, do they think "look after _us_ "
| or do they think "look after _ME_ "?
| mensetmanusman wrote:
| The reason why this is so disruptive is because it will effect
| hundreds of fields simultaneously.
|
| Previously workers in a field disrupted by automation would
| retrain to a different part of the economy.
|
| If AI pans out to the point that there are mass layoffs in
| hundreds of sectors of the economy at once, then i'm not sure
| the process we have haphazardly set up now will work. People
| will have no idea where to go beyond manual labor. (But this
| will be difficult due to the obesity crisis - but maybe it will
| save lives in a weird way).
| hash872 wrote:
| If there are 'mass layoffs in hundreds of sectors of the
| economy at once', then the economy immediately goes into
| Great Depression 2.0 or worse. Consumer spending is two-
| thirds of the US economy, when everyone loses their jobs and
| stops having disposable income that's literally what a
| depression is
| mensetmanusman wrote:
| This will create a prisoner's dilemma for corporations
| then, the government will have to step in to provide
| incentives for insanely profitable corporations to keep the
| proper number of people employed or limit the rate of
| layoffs.
| zeroonetwothree wrote:
| Well it hasn't happened yet at least (unemployment is near
| historic lows). How much better does AI need to get? And do
| we actually expect it to happen? Improving on random
| benchmarks is not necessarily evidence of being able to do a
| specific job.
| admissionsguy wrote:
| It's because we then go check it out, and see how useless it is
| when applied to the domain.
|
| > programmers are still in the denial phase
|
| I am doing a startup and would jump on any way to make the
| development or process more efficient. But the only thing LLMs
| are really good for are investor pitches.
| jebarker wrote:
| > I am dreading the inevitable onslaught in a year or two of
| language model "proofs" of the Riemann hypothesis which will just
| contain claims which are vague or inaccurate in the middle of 10
| pages of correct mathematics which the human will have to wade
| through to find the line which doesn't hold up.
|
| I wonder what the response of working mathematicians will be to
| this. If the proofs look credible it might be too tempting to try
| and validate them, but if there's a deluge that could be a hug
| time sync. Imagine if Wiles or Perelman had produced a thousand
| different proofs for their respective problems.
| bqmjjx0kac wrote:
| Maybe the coming onslaught of AI slop "proofs" will give a
| little bump to proof assistants like Coq. Of course, it would
| still take a human mathematician some time to verify theorem
| definitions.
| Hizonner wrote:
| Don't waste time on looking at it unless a formal proof checker
| can verify it.
| kevinventullo wrote:
| Honestly I think it won't be that different from today, where
| there is no shortage of cranks producing "proofs" of the
| Riemann Hypothesis and submitting them to prestigious journals.
| yodsanklai wrote:
| I understand the appeal of having a machine helping us with maths
| and expanding the frontier of knowledge. They can assist
| researchers and make them more productive. Just like they can
| make already programmers more productive.
|
| But maths are also fun and fulfilling activity. Very often, when
| we learn a math theory, it's because we want to understand and
| gain intuition on the concepts, or we want to solve a puzzle (for
| which we can already look up the solution). Maybe it's similar to
| chess. We didn't develop search engines to replace human players
| and make them play together, but they helped us become better
| chess players or understanding the game better.
|
| So the recent progress is impressive, but I still don't see how
| we'll use this tech practically and what impacts it can have and
| in which fields.
| vouaobrasil wrote:
| My favourite moments of being a graduate student in math was
| showing my friends (and sometimes professors) proofs of
| propositions and theorems that we discussed together. To be the
| first to put together a coherent piece of reasoning that would
| convince them of the truth was immensely exciting. Those were
| great bonding moments amongst colleagues. The very fact that we
| needed each other to figure out the basics of the subject was
| part of what made the journey so great.
|
| Now, all of that will be done by AI.
|
| Reminds of the time when I finally enabled invincibility in
| Goldeneye 007. Rather boring.
|
| I think we've stopped to appreciate the human struggle and
| experience and have placed all the value on the end product, and
| that's we're developing AI so much.
|
| Yeah, there is the possibility of working with an AI but at that
| point, what is the point? Seems rather pointless to me in an art
| like mathematics.
| sourcepluck wrote:
| > Now, all of that will be done by AI.
|
| No "AI" of any description is doing novel proofs at the moment.
| Not o3, or anything else.
|
| LLMs are good for chatting about basic intuition with, up to
| and including complex subjects, if and only if there are
| publically available data on the topic which have been fed to
| the LLM during its training. They're good at doing summaries
| and overviews of specific things (if you push them around and
| insist they don't waffle and ignore garbage carefully and keep
| your critical thinking hat on, etc etc).
|
| It's like having a magnifying glass that focuses in on the
| small little maths question you might have, without you having
| to sift through ten blogs or videos or whatever.
|
| That's hardly going to replace graduate students doing proofs
| with professors, though, at least not with the methods being
| employed thus far!
| vouaobrasil wrote:
| I am talking about in 20-30 years.
| busyant wrote:
| As someone who has a 18 yo son who wants to study math, this has
| me (and him) ... worried ... about becoming obsolete?
|
| But I'm wondering what other people think of this analogy.
|
| I used to be a bench scientist (molecular genetics).
|
| There were world class researchers who were more creative than I
| was. I even had a Nobel Laureate once tell me that my research
| was simply "dotting 'i's and crossing 't's".
|
| Nevertheless, I still moved the field forward in my own small
| ways. I still did respectable work.
|
| So, will these LLMs make us _completely_ obsolete? Or will there
| still be room for those of us who can dot the "i"?--if only for
| the fact that LLMs don't have infinite time/resources to solve
| "everything."
|
| I don't know. Maybe I'm whistling past the graveyard.
| deepsun wrote:
| By the way, don't trust Nobel laureates or even winners. E.g.
| Linus Pauling was talking absolute garbage, harmful and evil,
| after winning the Nobel.
| Radim wrote:
| > _don 't trust Nobel laureates or even winners_
|
| Nobel laureate and winner are the same thing.
|
| > _Linus Pauling was talking absolute garbage, harmful and
| evil, after winning the Nobel._
|
| Can you be more specific, what garbage? And which Nobel prize
| do you mean - Pauling got two, one for chemistry and one for
| peace.
| bongodongobob wrote:
| Eugenics and vitamin C as a cure all.
| lern_too_spel wrote:
| If Pauling's eugenics policies were bad, then the laws
| against incest that are currently on the books in many
| states (which are also eugenics policies that use the
| same mechanism) are also bad. There are different forms
| of eugenics policies, and Pauling's proposal to restrict
| the mating choices of people carrying certain recessive
| genes so their children don't suffer is ethically
| different from Hitler exterminating people with certain
| genes and also ethically different from other governments
| sterilizing people with certain genes. He later supported
| voluntary abortion with genetic testing, which is now
| standard practice in the US today, though no longer in a
| few states with ethically questionable laws restricting
| abortion. This again is ethically different from forced
| abortion.
|
| https://scarc.library.oregonstate.edu/coll/pauling/blood/
| nar...
| bongodongobob wrote:
| From what I remember, he wanted to mark people with
| tattoos or something.
| lern_too_spel wrote:
| This is mentioned in my link: "According to Pauling,
| carriers should have an obvious mark, (i.e. a tattoo on
| the forehead) denoting their disease, which would allow
| carriers to identify others with the same affliction and
| avoid marrying them."
|
| The goal wasn't to mark people for ostracism but to make
| it easier for people carrying these genes to find mates
| that won't result in suffering for their offspring.
| voltaireodactyl wrote:
| FWIW my understanding is that the policies against incest
| you mention actually have much less to do with
| controlling genetic reproduction and are more directed at
| combating familial rape/grooming/etc.
|
| Not a fun thing to discuss, but apparently a significant
| issue, which I guess should be unsurprising given some of
| the laws allowing underage marriage if the family signs
| off.
|
| Mentioning only to draw attention to the fact that
| theoretical policy is often undeniable in a vacuum, but
| runs aground when faced with real world conditions.
| deepsun wrote:
| Thank you, my bad.
|
| I was referring to Linus's harmful and evil promotion of
| Vitamin C as the cure for everything and cancer. I don't
| think Linus was attaching that garbage to any particular
| Nobel prize. But people did say to their doctors: "Are you
| a Nobel winner, doctor?". Don't think they cared about
| particular prize either.
| red75prime wrote:
| > Linus's harmful and evil promotion of Vitamin C
|
| Which is "harmful and evil" thanks to your
| afterknowledge. He had based his books on the research
| that failed to replicate. But given low toxicity of
| vitamin C it's not that "evil" to recommend treatment
| even if probabilistic estimation of positive effects is
| not that high.
|
| Sloppy, but not exceptionally bad. At least it was
| instrumental in teaching me to not expect marvels coming
| from dietary research.
| pfisherman wrote:
| I used to do bench top work too; and was blessed with "the
| golden hands" in that I could almost always get protocols
| working. To me this always felt more like intuition than
| deductive reasoning. And it made me a terrible TA. My advice to
| students in lab was always something along the lines of "just
| mess around with it, and see how it works." Not very helpful
| for the stressed and struggling student -_-
|
| Digression aside, my point is that I don't think we know
| exactly what makes or defines "the golden hands". And if that
| is the case, can we optimize for it?
|
| Another point is that scalable fine tuning only works for
| verifiable stuff. Think a priori knowledge. To me that seems to
| be at the opposite end of the spectrum from "mess with it and
| see what happens".
| busyant wrote:
| > blessed with "the golden hands" in that I could almost
| always get protocols working.
|
| Very funny. My friends and I never used the phrase "golden
| hands" but we used to say something similar: "so-and-so has
| 'great hands'".
|
| But it meant the same thing.
|
| I, myself, did not have great hands, but my comment was more
| about the intellectual process of conducting research.
|
| I guess my point was that:
|
| * I've already dealt with more talented researchers, but I
| still contributed meaningfully.
|
| * Hopefully, the "AI" will simply add another layer of
| talent, but the rest of us lesser mortals will still be able
| to contribute.
|
| But I don't know if I'm correct.
| vouaobrasil wrote:
| I was just thinking about this. I already posted a comment
| here, but I will say that as a mathematician (PhD in number
| theory), that for me, AI signficantly takes away the beauty of
| doing mathematics within a realm in which AI is used.
|
| The best part of math (again, just for me) is that it was a
| journey that was done by hand with only the human intellect
| that computers didn't understand. The beauty of the subject was
| precisely that it was a journey of human intellect.
|
| As I said elsewhere, my friends used to ask me why something
| was true and it was fun to explain it to them, or ask them and
| have them explain it to me. Now most will just use some AI.
|
| Soulless, in my opinion. Pure mathematics should be about the
| art of the thing, not producing results on an assembly line
| like it will be with AI. Of course, the best mathematicians are
| going into this because it helps their current careers, not
| because it helps the future of the subject. Math done with AI
| will be a lot like Olympic running done with performance-
| enhancing drugs.
|
| Yes, we will get a few more results, faster. But the results
| will be entirely boring.
| zmgsabst wrote:
| Presumably people who get into math going forward will feel
| differently.
|
| For myself, chasing lemmas was always boring -- and there's
| little interest in doing the busywork of fleshing out a
| theory. For me, LLMs are a great way to do the fun parts
| (conceptual architecture) without the boring parts.
|
| And I expect we'll such much the same change as with physics:
| computers increase the complexity of the objects we study,
| which tend to be rather simple when done by hand -- eg,
| people don't investigate patterns in the diagrams of
| group(oids) because drawing million element diagrams isn't
| tractable by hand. And you only notice the patterns in them
| when you see examples of the diagrams at scale.
| ndriscoll wrote:
| Even current people will feel differently. I don't bemoan
| the fact that Lean/Mathlib has `simp` and `linarith` to
| automate trivial computations. A "copilot for Lean" that
| can turn "by induction, X" or "evidently Y" into a formal
| proof sounds great.
|
| The the trick is teaching the thing how high powered of
| theorems to use or how to factor out details or not
| depending on the user's level of understanding. We'll have
| to find a pedagogical balance (e.g. you don't give
| `linarith` to someone practicing basic proofs), but I'm
| sure it will be a great tool to aid human understanding.
|
| A tool to help translate natural language to formal
| propositions/types also sounds great, and could help more
| people to use more formal methods, which could make for
| more robust software.
| vouaobrasil wrote:
| Just a counterpoint, but I wonder how much you'll really
| understand if you can't even prove the whole thing
| yourself. Personally, I learn by proving but I guess
| everyone is different.
| daxfohl wrote:
| My hunch is it won't be much different, even when we can
| simply ask a machine that doesn't have a cached proof,
| "prove riemann hypothesis" and it thinks for ten seconds
| and spits out a fully correct proof.
|
| As Erdos(I think?) said, great math is not about the
| answers, it's about the questions. Or maybe it was
| someone else, and maybe "great mathematicians" rather
| than "great math". But, gist is the same.
|
| "What happens when you invent a thing that makes a
| function continuous (aka limit point)"? "What happens
| when you split the area under a curve into infinitesimal
| pieces and sum them up"? "What happens when you take the
| middle third out of an interval recursively"? "Can we
| define a set of axioms that underlie all mathematics"?
| "Is the graph of how many repetitions it takes for a
| complex number to diverge interesting"? I have a hard
| time imagining computers would ever have a strong enough
| understanding of the human experience with mathematics to
| even begin pondering such questions unprompted, let alone
| answer them and grok the implications.
|
| Ultimately the truths of mathematics, the answers, soon
| to be proved primarily by computers, already exist.
| Proving a truth does not create the truth; the truth
| exists independent of whether it has been proved or not.
| So fundamentally math is closer to archeology than it may
| appear. As such, AI is just a tool to help us dig with
| greater efficiency. But it should not be considered or
| feared as a replacement for mathematicians. AI can never
| take away the enlightenment of discovering something new,
| even if it does all the hard work itself.
| vouaobrasil wrote:
| > I have a hard time imagining computers would ever have
| a strong enough understanding of the human experience
| with mathematics to even begin pondering such questions
| unprompted, let alone answer them and grok the
| implications.
|
| The key is that the good questions however come from
| hard-won experience, not lazily questioning an AI.
| hn3er1q wrote:
| There are many similarities in your comment to how
| grandmasters discuss engines. I have a hunch the arc of AI in
| math will be very similar to the arc of engines in chess.
|
| https://www.wired.com/story/defeated-chess-champ-garry-
| kaspa...
| vouaobrasil wrote:
| I agree with that, in the sense that math will become more
| about who can use AI the fastest to generate the most
| theories, which sort of side-steps the whole point of math.
| hn3er1q wrote:
| As a chess aficionado and a former tournament player, who
| didn't get very far, I can see pros & cons. They helped
| me train and get significantly better than I would've
| gotten without them. On the other hand, so did the
| competition. :) The average level of the game is so much
| higher than when I was a kid (30+ years ago) and new ways
| of playing that were unthinkable before are possible now.
| On the other hand cheating (online anyway) is rampant and
| all the memorization required to begin to be competitive
| can be daunting, and that sucks.
| vouaobrasil wrote:
| Hey I play chess too. Not a very good player though. But
| to be honest, I enjoy playing with people who are not
| serious because I do think an overabundance of knowledge
| makes the game too mechanical. Just my personal
| experience, but I think the risk of cheaters who use
| programs and the overmechanization of chess is not worth
| becoming a better player. (And in fact, I think MOST
| people can gain satisfaction by improving just by
| studying books and playing. But I do think that a few who
| don't have access to opponents benefit from a chess-
| playing computer).
| _jayhack_ wrote:
| If you think the purpose of pure math is to provide
| employment and entertainment to mathematicians, this is a
| dark day.
|
| If you believe the purpose of pure math is to shed light on
| patterns in nature, pave the way for the sciences, etc., this
| is fantastic news.
| shadowerm wrote:
| We also seem to suffer these automation delusions right
| now.
|
| I could see how AI could assist me with learning pure math
| but the idea AI is going to do pure math for me is just
| absurd.
|
| Not only would I not know how to start, more importantly I
| have no interest in pure math. There will still be a huge
| time investment to get up to speed with doing anything with
| AI and pure math.
|
| You have to know what questions to ask. People with domain
| knowledge seem to really be selling themselves short. I am
| not going to randomly stumble on a pure math problem prompt
| when I have no idea what I am doing.
| vouaobrasil wrote:
| Well, 99% of pure math will never leave the domain of pure
| math so I'm really not sure what you are talking about.
| raincole wrote:
| > Now most will just use some AI.
|
| Do people with PhD in math really ask AI to explain math
| concepts to them?
| vouaobrasil wrote:
| They will, when it becomes good enough to prove tricky
| things.
| agentultra wrote:
| I think it will become apparent how bad they are at it.
| They're algorithms and not sentient beings. They do not think
| of themselves, their place in the world, and do not fathom
| the contents of the minds of others. They do no care what
| others think of them.
|
| Whatever they write only happens to contain some truth by
| virtue of the model and the training data. An algorithm
| doesn't know what truth is or why we value it. It's a
| bullshitter of the highest calibre.
|
| Then comes the question: will they write proofs that we will
| consider beautiful and elegant, that we will remember and
| pass down?
|
| Or will they generate what they've been asked to and nothing
| less? That would be utterly boring to read.
| jvvw wrote:
| I agree wholeheartedly about the beauty of doing mathematics.
| I will add though that the author of this article, Kevin
| Buzzard, doesn't need to do this for his career and from what
| I know of him is somebody who very much cares about
| mathematics and the future of the subject. The fact that a
| mathematician of that calibre is interested in this makes me
| more interested.
| nyrikki wrote:
| What LLMs can do is limited, they are superior to wet-wear in
| some tasks like finding and matching patterns in higher
| dimensional space, they are still fundamentally limited to a
| tiny class of problems outside of that pattern finding and
| matching.
|
| LLMs will be tools for some math needs and even if we ever get
| quantum computers will be limited in what they can do.
|
| LLMs, without pattern matching, can only do up to about integer
| division, and while they can calculate parity, they can't use
| it in their calculations.
|
| There are several groups sitting on what are known limitations
| of LLMs, waiting to take advantage of those who don't
| understand the fundamental limitations, simplicity bias etc...
|
| The hype will meet reality soon and we will figure out where
| they work and where they are problematic over the next few
| years.
|
| But even the most celebrated achievements like proof finding
| with Lean, heavily depends on smart people producing hints that
| machines can use.
|
| Basically lots of the fundamental hints of the limits of
| computation still hold.
|
| Model logic may be an accessable way to approach the limits of
| statistical inference if you want to know one path yourself.
|
| A lot of what is in this article relates to some the known
| fundamental limitations.
|
| Remember that for all the amazing progress, one of the core
| founders of the perceptron, Pitts drank him self to death in
| the 50s because it was shown that they were insufficient to
| accurately model biological neurons.
|
| Optimism is high, but reality will hit soon.
|
| So think of it as new tools that will be available to your
| child, not a replacement.
| ComplexSystems wrote:
| "LLMs, without pattern matching, can only do up to about
| integer division, and while they can calculate parity, they
| can't use it in their calculations." - what do you mean by
| this? Counting the number of 1's in a bitstring and
| determining if it's even or odd?
| nyrikki wrote:
| Yes, in this case PARITY is determining if the number of 1s
| in a binary input is odd or even
|
| It is an effect of the complex to unpack descriptive
| complexity class DLOGTIME-uniform TC0, which has AND, OR
| and MAJORITY gates.
|
| http://arxiv.org/abs/2409.13629
|
| The point being that the ability to use parity gates is
| different than being able to calculate it, which is where
| the union of the typically ram machine DLOGTIME with the
| circuit complexity of uniform TC0 comes into play.
|
| PARITY, MAJ, AND, and OR are all symmetric, and are in TCO,
| but PARITY is not in DLOGTIME-uniform TC0, which is first-
| order logic with Majority quantifiers.
|
| Another path, if you think about symantic properties and
| Rice's theorem, this may make sense especially as PAC
| learning even depth 2 nets is equivalent to the approximate
| SVP.
|
| PAC-learning even depth-2 threshold circuits is NP-hard.
|
| https://www.cs.utexas.edu/~klivans/crypto-hs.pdf
|
| For me thinking about how ZFC was structured so we can keep
| the niceties of the law of the excluded middle, and how
| statistics pretty much depends on it for the central limit
| and law of large numbers, IID etc...
|
| But that path runs the risk of reliving the Brouwer-Hilbert
| controversy.
| TheRealPomax wrote:
| What part do you think is going to become obsolete? Because
| Math isn't about "working out the math", it's about finding the
| relations between seemingly unrelated things to bust open a
| problem. Short of AGI, there is no amount of neural net that's
| going to realize that a seemingly impossible probabilistic
| problem is actually equivalent to a projection of an easy to
| work with 4D geometry. "Doing the math" is what we have
| computers for, and the better they get, the easier the tedious
| parts of the job become, but "doing math" is still very much a
| human game.
| busyant wrote:
| > What part do you think is going to become obsolete?
|
| Thank you for the question.
|
| I guess what I'm saying is:
|
| Will LLMs (or whatever comes after them) be _so_ good and
| _so_ pervasive that we will simply be able to say, "Hey
| ChatGPT-9000, I'd like to see if the xyz conjecture is
| correct." And then ChatGPT-9000 just does the work without us
| contributing beyond asking a question.
|
| Or will the technology be limited/bound in some way such that
| we will still be able to use ChatGPT-9000 as a tool of our
| own intellectual augmentation and/or we could still
| contribute to research even without it.
|
| Hopefully, my comment clarifies my original post.
|
| Also, writing this stuff has helped me think about it more. I
| don't have any grand insight, but the more I write, the more
| I lean toward the outcome that these machines will allow us
| to augment our research.
| TheRealPomax wrote:
| As amazing as they may seem, they're _still_ just
| autocompletes, it 's inherent to what an LLM is. So unless
| we come up with a completely new kind technology, I don't
| see "test this conjecture for me" becoming more real than
| the computer assisted proof tooling we already have.
| hyhconito wrote:
| Let's put it this way, from another mathematician, and I'm sure
| I'll probably be shot for this one.
|
| Every LLM release moves half of the remaining way to the
| minimum viable goal of replacing a third class undergrad. If
| your business or research initiative is fine with that level of
| competence then you will find utility.
|
| The problem is that I don't know anyone who would find that
| useful. Nor does it fit within any existing working methodology
| we have. And on top of that the verification of any output can
| take considerably longer than just doing it yourself in the
| first place, particularly where it goes off the rails, which it
| does all the time. I mean it was 3 months ago I was arguing
| with a model over it not understanding place-value systems
| properly, something we teach 7 year olds here?
|
| But the abstract problem is at a higher level. If it doesn't
| become a general utility for people outside of mathematics,
| which is very very evident at the moment by the poor overall
| adoption and very public criticism of the poor result quality,
| then the funding will dry up. Models cost lots of money to
| train and if you don't have customers it's not happening and no
| one is going to lend you the money any more. And then it's
| moot.
| binarymax wrote:
| This is a great point that nobody will shoot you over :)
|
| But the main question is still: assuming you replace an
| undergrad with a model, who checks the work? If you have a
| good process around that already, and find utility as an
| augmented system, then get you'll get value - but I still
| think it's better for the undergrad to still have the job and
| be at the wheel, and does things faster and better when
| leveraging a powerful tool.
| hyhconito wrote:
| Shot already for criticising the shiny thing (happened with
| crypto and blockchain already...)
|
| Well to be fair no one checks what the graduates do
| properly, even if we hired KPMG in. That is until we get
| sued. But at least we have someone to blame then. What we
| don't want is something for the graduate to blame. The buck
| stops at someone corporeal because that's what the
| customers want and the regulators require.
|
| That's the reality and it's not quite as shiny and happy as
| the tech industry loves to promote itself.
|
| My main point, probably cleared up with a simple point: no
| one gives a shit about this either way.
| meroes wrote:
| Well said. As someone with only a math undergrad and as a
| math RLHF'er, this speaks to my experience the most.
|
| That craving for an understanding an elegant proof is nowhere
| to be found with verifying an LLM's proof.
|
| Like sure, you could put together a car by first building an
| airplane, disassembling all of it minus the two front seats,
| and having zero elegance and still get a car at the end. But
| if you do all that and don't provide novelty in results or
| useful techniques, there's no business.
|
| Hell, I can't even get a model to calculate compound interest
| for me (save for the technicality of prompt engineering a
| python function to do it). What do I expect?
| qnleigh wrote:
| I think there's a pretty good case to be made that LLMs
| paired with automated theorem provers will become a useful
| tool to working mathematicians in the next few years. Another
| thread here links to a lecture from a professor of
| mathematics who makes this point about halfway in, based
| solely on Alpha Proofs current abilities
| (https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]).
| Terence Tao, well-known mathematician at UCLA has been saying
| similar things for years. He's blogged about LLMs helping to
| learn new tools (like Lean) and occasionally helping with
| brainstorming.
|
| At this stage, the point they're making isn't 'OMG AGI!!!'
| but rather something like 'having an enthusiastic, often
| wrong undergrad assistant who's available 24/7 can be useful,
| if you use it carefully.'
| peterbonney wrote:
| If you looked at how the average accountant spent their time
| before the arrival of the digital spreadsheet, you might have
| predicted that automated calculation would make the profession
| obsolete. But it didn't.
|
| This time could be different, of course. But I'll need a lot
| more evidence before I start telling people to base their major
| life decisions on projected technological change.
|
| That's before we even consider that only a very slim minority
| of the people who study math (or physics or statistics or
| biology or literature or...) go on to work in the field of math
| (or physics or statistics or biology or literature or...). AI
| could completely take over math research and still have next to
| impact on the value of the skills one acquires from studying
| math.
|
| Or if you want to be more fatalistic about it: if AI is going
| to put everyone out of work then it doesn't really matter what
| you do now to prepare for it. Might as well follow your
| interests in the meantime.
| blagie wrote:
| It's important to base life decisions on very real
| technological change. We don't know what the change will be,
| but it's coming. At the very least, that suggests more
| diverse skills.
|
| We're all usually (but not always) better off, with more
| productivity, eventually, but in the meantime, jobs do
| disappear. Robotics did not fully displace machinists and
| factory workers, but single-skilled people in Detroit did not
| do well. The loom, the steam engine... all of them displaced
| often highly-trained often low-skilled artisans.
| rafaelmn wrote:
| If AI reaches this level socioeconomic impact is going to
| be so immense, that choosing what subject you study will
| have no impact on your outcome - no matter what it is - so
| it's a pointless consideration.
| bawolff wrote:
| I doubt it.
|
| Most likely AI will be good at some things and not others, and
| mathematicians will just move to whatever AI isn't good at.
|
| Alternatively, if AI is able to do all math at a level above
| PhDs, then its going to be a brave new world and basically the
| singularity. Everything will change so much that speculating
| about it will probably be useless.
| ykonstant wrote:
| > I even had a Nobel Laureate once tell me that my research was
| simply "dotting 'i's and crossing 't's".
|
| (.*<*.)
| ccppurcell wrote:
| The mathematicians of the future will still have to figure out
| the right questions, even if llms can give them the answers.
| And "prompt engineering" will require mathematical skills, at
| the very least.
|
| Evaluating the output of llms will also require mathematical
| skills.
|
| But I'd go further, if your son enjoys mathematics and has some
| ability in the area, it's wonderful for your inner life. Anyone
| who becomes sufficiently interested in anything will rediscover
| mathematics lurking at the bottom.
| jvvw wrote:
| Another PhD in maths here and I would say not to worry. It's
| the process of doing and understanding mathematics, and
| thinking mathematically that is ultimately important.
|
| There's never been the equivalent of the 'bench scientist' in
| mathematics and there aren't many direct careers in
| mathematics, or pure mathematics at least - so very few people
| ultimately become researchers. Instead, I think you take your
| way of thinking and apply it to whatever else you do (and it
| certainly doesn't do any harm to understand various
| mathematical concepts incredibly well).
| qnleigh wrote:
| Highly recommend this lecture by a working mathematician shared
| above (https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]).
| It's very much grounded in history and experience, much more so
| than in speculation. I wrote a brief summary of some main
| points in that thread.
|
| But specifically to your worry about humans just dotting i's
| and crossing t's, he predicts that exactly the opposite will
| happen. At the end he emphasizes that the ultimate goal of
| mathematics is more about human understanding than proving
| theorems.
| busyant wrote:
| Thank you.
| jokoon wrote:
| I wish scientists who do psychology and cognition of actual
| brains could approach those AI things and talk about it, and
| maybe make suggestions.
|
| I really really wish AI would make some breakthrough and be
| really useful, but I am so skeptical and negative about it.
| joe_the_user wrote:
| Unfortunately, the scientists who study actually brains have
| all sort of interesting models but ultimately very little clue
| _how_ these actual brains work at the level of problem solving.
| I mean, there 's all sort of "this area is associated with that
| kind of process" and "here's evidence this area does this
| algorithm" stuff but it's all at the level you imagine steam
| engine engineers trying to understand a warp drive.
|
| The "open worm project" was an effort years ago to get computer
| scientists involved in trying to understand what "software" a
| very small actual brain could run. I believe progress here has
| been very slow and that an idea of ignorance that much larger
| brains involve.
|
| https://en.wikipedia.org/wiki/OpenWorm
| bongodongobob wrote:
| If you can't find useful things for LLMs or AI at this point,
| you must just lack imagination.
| 0points wrote:
| > How much longer this will go on for nobody knows, but there are
| lots of people pouring lots of money into this game so it would
| be a fool who bets on progress slowing down any time soon.
|
| Money cannot solve the issues faced by the industry which mainly
| revolves around lack of training data.
|
| They already used the entirety of the internet, all available
| video, audio and books and they are now dealing with the fact
| that most content online is now generated by these models, thus
| making it useless as training data.
| charlieyu1 wrote:
| One thing I know is that there wouldn't be machines entering IMO
| 2025. The concept of "marker" does not exist in IMO - scores are
| decided by negotiations between team leaders of each country and
| the juries. It is important to get each team leader involved for
| grading the work of students for their country, for
| accountability as well as acknowledging cultural differences. And
| the hundreds of people are not going to stay longer to grade AI
| work.
| witnesser2 wrote:
| I was not refuted sufficiently a couple of years ago. I claimed
| "training is open boundary" etc.
| witnesser2 wrote:
| Like as a few years ago, I just boringly add again "you need
| modeling" to close it.
| mangomountain wrote:
| In other news we've discovered life (our bacteria) on mars Just
| joking
| Syzygies wrote:
| "Can AI do math for us" is the canonical wrong question. People
| want self-driving cars so they can drink and watch TV. We should
| crave tools that enhance our abilities, as tools have done since
| prehistoric times.
|
| I'm a research mathematician. In the 1980's I'd ask everyone I
| knew a question, and flip through the hard bound library volumes
| of Mathematical Reviews, hoping to recognize something. If I was
| lucky, I'd get a hit in three weeks.
|
| Internet search has shortened this turnaround. One instead needs
| to guess what someone else might call an idea. "Broken circuits?"
| Score! Still, time consuming.
|
| I went all in on ChatGPT after hearing that Terry Tao had learned
| the Lean 4 proof assistant in a matter of weeks, relying heavily
| on AI advice. It's clumsy, but a very fast way to get
| suggestions.
|
| Now, one can hold involved conversations with ChatGPT or Claude,
| exploring mathematical ideas. AI is often wrong, never knows when
| it's wrong, but people are like this too. Read how the insurance
| incidents for self-driving taxis are well below the human
| incident rates? Talking to fellow mathematicians can be
| frustrating, and so is talking with AI, but AI conversations go
| faster and can take place in the middle of the night.
|
| I don't want AI to prove theorems for me, those theorems will be
| as boring as most of the dreck published by humans. I want AI to
| inspire bursts of creativity in humans.
| ninetyninenine wrote:
| Your optimism should be tempered with the downside of progress
| meaning that AI in the near future may not only inspire
| creativity in humans, but it can replace human creativity all
| together.
|
| Why do I need to hire an artist for my movie/video
| game/advertisement when AI can replicate all the creativity I
| need.
| wnc3141 wrote:
| There is research on AI limiting creative output in
| completive arenas. Essentially it breaks expectancy therefore
| deteriorates iteration.
|
| https://direct.mit.edu/rest/article-
| abstract/102/3/583/96779...
| immibis wrote:
| This was about mathematics.
| didibus wrote:
| I think I'm missing your point? You still want to enjoy doing
| math yourself? Is that what you are saying? So you equate "Can
| AI do math in my place?" with "Can AI drink and watch TV in my
| place?"
| bubble12345 wrote:
| AI will not do math for us, but maybe eventually it will lead
| to another mainstream tool for mathematicians. Along with R,
| Matlab, Sage, GAP, Magma, ...
|
| It would be interesting if in the future mathematicians are
| just as fluent in some (possibly AI-powered) proof verifying
| tool, as they are with LaTeX today.
| bufferoverflow wrote:
| AI can already do a bunch of math. So "AI will not do math
| for us" is just factually wrong.
| fooker wrote:
| Your idea of 'do math' is a bit different from this
| context.
|
| Here it means do math research or better, find new math.
| vlovich123 wrote:
| Can AI solve "toy" math problems that computers have not
| been able to do? Yes. Can AI produce novel math research?
| No, it hasn't yet. So "AI will not do math for us" is
| only factually wrong if you take the weaker definition of
| "doing math for us". The stronger definition is not
| factually wrong yet.
|
| More problematic with that statement is that a timeline
| isn't specified. 1 year? Probably not. 10 years?
| Probably. 20 years? Very likely. 100 years? None of us
| here will be alive to be proven wrong but I'll venture
| that that's a certainty.
| khafra wrote:
| This is a pretty strong position to take in the comments
| of a post where a mathematician declared the 5 problems
| he'd seen to be PhD level, and speculated that the real
| difficulty with switching from numerical answers to
| proofs will be finding humans qualified to judge the AI's
| answers.
|
| I will agree that it's likely none of us here will be
| alive to be proven wrong, but that's in the 1 to 10 year
| range.
| elbear wrote:
| In a way, AI is part of the process, but it's a collaborative
| process. It doesn't do all the work.
| whimsicalism wrote:
| Ingredients to a top HN comment on AI include some nominal
| expert explaining why actually labor won't be replaced and it
| will be a collaborative process so you don't need to worry
| sprinkled with a little bit of 'the status quo will stay
| still even though this tech only appeared in the last 2
| years'
| FiberBundle wrote:
| It didn't appear in the last two years. We have had deep
| learning based autoregressive language models (like
| Word2Vec) for at least 10 years.
| fosk wrote:
| Early computer networks appeared in the 1960s and the
| public internet as we know it in the 1990s.
|
| We are still early in AI.
| whimsicalism wrote:
| totally, and i've been working with attention since at
| least 2017. but i'm colloquially referring to the real
| breakout and substantial scale up in resources being
| thrown at it
| amanda99 wrote:
| > AI is often wrong, never knows when it's wrong, but people
| are like this too.
|
| When talking with various models of ChatGPT about research
| math, my biggest gripe is that it's either confidently right
| (10% of my work) or confidently wrong (90%). A human researcher
| would be right 15% of the time, unsure 50% of the time, and
| give helpful ideas that are right/helpful (25%) or wrong/a red
| herring (10%). And only 5% of the time would a good researcher
| be confidently wrong in a way that ChatGPT is often.
|
| In other words, ChatGPT completely lacks the meta-layer of
| "having a feeling/knowing how confident it is", which is so
| useful in research.
| portaouflop wrote:
| A human researcher that is basically right 40%-95% of the
| time would probably an Einstein level genius.
|
| Just assume that the LLM is wrong and test their assumptions
| - math is one of the few disciplines where you can do that
| easily
| rednerrus wrote:
| It's pretty easy to test when it makes coding mistakes as
| well. It's also really good at "Hey that didn't work,
| here's my error message."
| amanda99 wrote:
| I think you are imagining a different class of "questions".
|
| To clarify, I was doing research on applied math. My field
| is not analysis, but I needed to prove some bounds on
| certain messed up expressions (involving special functions,
| etc), and analyze an ODE that's not analytically solvable.
| I used the COT model a fair bit.
|
| I would ask ChatGPT for hints/ideas/direction in proving
| various bounds, asking it for theorems or similar results
| in literature. This is exactly the kind of thing where a
| researcher would go "yeah this looks like X" or "I think I
| saw something like this in (book/article name)", or just
| know a method; or alternatively say they have no clue.
| ChatGPT most often will confidently give me a "solution",
| being right 10% of the time (when there's a pretty standard
| way to do it that I didn't see/know).
|
| On the whole it was quite useful.
| eleveriven wrote:
| Do you think there's potential for AI to develop a kind of
| probabilistic reasoning?
| Sparkyte wrote:
| It think it is every sci-fiction dreamer to teach a robot
| to love.
|
| I don't think AI will think conventionally. It isn't
| thinking to begin with. It is weighing options. Those
| options permutate and that is why every response is
| different.
| halayli wrote:
| these numbers are just your perception. The way you ask the
| question will very much influence the output and certain
| topics more than others. I get much better results when I
| share my certainty levels in my questions and say things like
| "if at all", "if any" etc.
| mrbungie wrote:
| Yeah, blame the users for "using it wrong" (phrase of the
| week I would say after the o3 discussions), and then sell
| the solution as almost-AGI.
|
| PS: I'm starting to see a lot of plausible deniability in
| some comments about LLMs capabilites. When LLMs do great =>
| "cool, we are scaling AI". when LLMs do something wrong =>
| "user problem", "skill issues", "don't judge a fish for its
| ability to fly".
| vector_spaces wrote:
| I agree with this approach and use it myself, but these
| confidence markers can also skew output in undesirable
| ways. All of these heuristics are especially fragile when
| the subject matter touches the frontiers of what is known.
|
| In any case my best experiences with LLMs for pure math
| research have been for exploring the problem space and
| ideation -- queries along the line of "Here's a problem I'm
| working on ... . Do any other fields have a version of this
| problem, but framed differently?" or "Give me some totally
| left field methods, even if they are from different fields
| or unlikely to work. Assume I've exhausted all the
| 'obvious' approaches from field X"
| amanda99 wrote:
| > these numbers are just your perception.
|
| Of course they are, I hoped it was clear I was just sharing
| my experience trying to use it for research!
|
| I did in general word it as I would a question to a
| researcher, which includes an uncertainty in it being true.
| E.g. this is from a recent prompt: "is this true in
| general, if not, what are the conditions for this to be
| true?"
| heresie-dabord wrote:
| > Talking to fellow <humans> can be frustrating, and so is
| talking with AI, but AI conversations go faster and can take
| place in the middle of the night.
|
| I made a slight change to generalise your statement, I think
| you have summarised the actual marketing opportunity.
| eleveriven wrote:
| The analogy with self-driving cars is spot on
| pontus wrote:
| I agree. I think it comes down to the motivation behind why one
| does mathematics (or any other field for that matter). If it's
| a means to an end, then sure have the AI do the work and get
| rid of the researchers. However, that's not why everyone does
| math. For many it's more akin to why an artist paints. People
| still paint today even though a camera can produce much more
| realistic images. It was probably the case (I'm guessing!) that
| there was a significant drop in jobs for artists-for-hire, for
| whom painting was just a means to an end (e.g. creating a
| portrait), but the artists who were doing it for the sake of
| art survived and were presumably made better by the ability to
| see photos of other places they want to paint or art from other
| artists due to the invention of the camera.
| goalieca wrote:
| > People want self-driving cars so they can drink and watch TV.
| We should crave tools that enhance our abilities, as tools have
| done since prehistoric times.
|
| Improved tooling and techniques have given humans the free time
| and resources needed for arts, culture, philosophy, sports, and
| spending time to enjoy life! Fancy telecom technologies have
| allowed me to work from home and i love it :)
| rfurmani wrote:
| Absolutely agree. There's some interesting articles in a recent
| [AMS Bulletin](https://www.ams.org/journals/bull/2024-61-02/hom
| e.html?activ...) giving perspectives on this question: what
| does it do to math if there's a strong theorem prover out
| there, in what ways can AI help mathematicians, what is math
| exactly?
|
| I find that a lot of AI+Math work is focused on the end game
| where you have a clear problem to solve, rather than the early
| exploratory work where most of the time is spent. The challenge
| is in making the right connections and analogies, discovering
| hidden useful results, asking the right questions, translating
| between fields.
|
| I'm getting ready to launch [Sugaku](https://sugaku.net), where
| I'm trying to build tools for the above, based on processing
| the published math literature and training models on it. The
| kind of search of MR that you mentioned doing is exactly what a
| computer should do instead. I can create an account for you and
| would love some feedback.
| Onavo wrote:
| Considering that they have Terence Tao himself working on the
| problem, betting against it would be unwise.
| Sparkyte wrote:
| After playing with and using AI for almost two years now it is
| not getting better from both a cost perspective and performance.
|
| So the higher the cost the better the performance. While models
| and hardware can be improved the curve is still steep.
|
| The big answer is what are people using it for? We'll they are
| using lightweight simplistic models to do targeted tasks. To do
| many smaller and easier to process tasks.
|
| Most of the news on AI is just there to promote a product to earn
| more cash.
| aomix wrote:
| No comment on the article it's just always interesting to get hit
| with intense jargon from a field I know very little about.
|
| _I understood the statements of all five questions. I could do
| the third one relatively quickly (I had seen the trick before
| that the function mapping a natural n to alpha^n was p-adically
| continuous in n iff the p-adic valuation of alpha-1 was
| positive)_
| YeGoblynQueenne wrote:
| >> There were language models before ChatGPT, and on the whole
| they couldn't even write coherent sentences and paragraphs.
| ChatGPT was really the first public model which was coherent.
|
| If that's referring to Large Language Models, meaning everything
| after the fist GPT and BERT, then that's absolutely not right.
| The first LLM that demonstrated the ability to generate coherent,
| fluently grammatical English was GPT-2. That story about the
| unicorns- that was the first time a statistical language model
| was able to generate text that stayed on the subject over a long
| distance _and_ made (some) sense.
|
| GPT-2 was followed by GPT 3 and GPT 3.5 that turned the hype dial
| up to 11 and were certainly "public" at least if that means
| publicly available. They were coherent enough that many people
| predicted all sorts of fancy things, like the end of programming
| jobs and the end of journalist jobs and so on.
|
| So, weird statement that one and it kind of makes me wary of
| Gell-Mann amnesia while reading the article.
___________________________________________________________________
(page generated 2024-12-24 23:00 UTC)