[HN Gopher] Can AI do maths yet? Thoughts from a mathematician
___________________________________________________________________
Can AI do maths yet? Thoughts from a mathematician
Author : mathgenius
Score : 231 points
Date : 2024-12-23 10:50 UTC (12 hours ago)
(HTM) web link (xenaproject.wordpress.com)
(TXT) w3m dump (xenaproject.wordpress.com)
| noFaceDiscoG668 wrote:
| "once" the training data can do it, LLMs will be able to do it.
| and AI will be able to do math once it comes to check out the
| lights of our day and night. until then it'll probably wonder
| continuously and contiguously: "wtf! permanence! why?! how?! by
| my guts, it actually fucking works! why?! how?!"
| tossandthrow wrote:
| I do think it is time to start questioning whether the utility
| of ai solely can be reduced to the quality of the training
| data.
|
| This might be a dogma that needs to die.
| noFaceDiscoG668 wrote:
| I tried. I don't have the time to formulate and scrutinise
| adequate arguments, though.
|
| Do you? Anything anywhere you could point me to?
|
| The algorithms live entirely off the training data. They
| consistently fail to "abduct" (inference) beyond any
| language-in/of-the-training-specific information.
| jstanley wrote:
| The best way to predict the next word is to accurately
| model the underlying system that is being described.
| tossandthrow wrote:
| It is a gradual thing. Presumably the models are inferring
| things on runtime that was not a part of their training
| data.
|
| Anyhow, philosophically speaking you are also only exposed
| to what your senses pick up, but presumably you are able to
| infer things?
|
| As written: this is a dogma that stems from a limited
| understanding of what algorithmic processes are and the
| insistence that emergence can not happen from algorithmic
| systems.
| croes wrote:
| If not bad training data shouldn't be problem
| kergonath wrote:
| There can be more than one problem. The history of
| computing (or even just the history of AI) is full of
| things that worked better and better right until they hit a
| wall. We get diminishing returns adding more and more
| training data. It's really not hard to imagine a series of
| breakthroughs bringing us way ahead of LLMs.
| Flenkno wrote:
| AWS announced 2 or 3 weeks a way of formulating rules into a
| formal language.
|
| AI doesn't need to learn everything, our LLM Models already
| contain EVERYTHING. Including ways of how to find a solution
| step by step.
|
| Which means, you can tell an LLM to translate whatever you
| want, into a logical language and use an external logic
| verifier. The only thing a LLM or AI needs to 'understand' at
| this point is to make sure that the statistical translation
| from left to right is high enough.
|
| Your brain doesn't just do logic out of the box, You conclude
| things and formulate them.
|
| And plenty of companies work on this. Its the same with
| programming, if you are able to write code and execute it, you
| execute it until the compiler errors are gone. Now your LLM can
| write valid code out of the box. Let the LLM write unit tests,
| now it can verify itself.
|
| Claude for example offers you, out of the box, to write a
| validation script. You can give claude back the output of the
| script claude suggested to you.
|
| Don't underestimate LLMs
| TeamDman wrote:
| Is this the AWS thing you referenced?
| https://aws.amazon.com/what-is/automated-reasoning/
| casenmgreen wrote:
| I may be wrong, but I think it a silly question. AI is basically
| auto-complete. It can do math to the extent you can find a
| solution via auto-complete based on an existing corpus of text.
| Bootvis wrote:
| You're underestimating the emergent behaviour of these LLM's.
| See for example what Terrence Tao thinks about o1:
|
| https://mathstodon.xyz/@tao/113132502735585408
| WhyOhWhyQ wrote:
| I'm always just so pleased that the most famous mathematician
| alive today is also an extremely kind human being. That has
| often not been the case.
| roflc0ptic wrote:
| Pretty sure this is out of date now
| noFaceDiscoG668 wrote:
| [flagged]
| kergonath wrote:
| Why would others provide proofs when you are yourself
| posting groundless opinions as facts in this very thread?
| mdp2021 wrote:
| > _AI is basically_
|
| Very many things conventionally labelled in the 50's.
|
| You are speaking of LLMs.
| casenmgreen wrote:
| Yes - I mean only to say "AI" as the term is commonly used
| today.
| esafak wrote:
| Humans can autocomplete sentences too because we understand
| what's going on. Prediction is a necessary criterion for
| intelligence, not an irrelevant one.
| aithrowawaycomm wrote:
| I am fairly optimistic about LLMs as a human math -> theorem-
| prover translator, and as a fan of Idris I am glad that the AI
| community is investing in Lean. As the author shows, the answer
| to "Can AI be useful for automated mathematical work?" is clearly
| "yes."
|
| But I am confident the answer to the question in the headline is
| "no, not for several decades." It's not just the underwhelming
| benchmark results discussed in the post, or the general concern
| about hard undergraduate math using different skillsets than
| ordinary research math. IMO the deeper problem still seems to be
| a basic gap where LLMs can seemingly do formal math at the level
| of a smart graduate student but fail at quantitative/geometric
| reasoning problems designed for fish. I suspect this holds for
| O3, based on one of the ARC problems it wasn't able to solve:
| https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...
| (via https://www.interconnects.ai/p/openais-o3-the-2024-finale-
| of...) ANNs are simply not able to form abstractions, they can
| only imitate them via enormous amounts of data and compute. I
| would say there has been _zero_ progress on "common sense" math
| in computers since the invention of Lisp: we are still faking it
| with expert systems, even if LLM expert systems are easier to
| build at scale with raw data.
|
| It is the same old problem where an ANN can attain superhuman
| performance on level 1 of Breakout, but it has to be retrained
| for level 2. I am not convinced it makes sense to say AI can do
| math if AI doesn't understand what "four" means with the same
| depth as a rat, even if it can solve sophisticated modular
| arithmetic problems. In human terms, does it make sense to say a
| straightedge-and-compass AI understands Euclidean geometry if
| it's not capable of understanding the physical intuition behind
| Euclid's axioms? It makes more sense to say it's a brainless tool
| that helps with the tedium and drudgery of actually proving
| things in mathematics.
| asddubs wrote:
| it can take my math and point out a step I missed and then show
| me the correct procedure but still get the wrong result because
| it can't reliably multiply 2-digit numbers
| fifilura wrote:
| Better than an average human then.
| actionfromafar wrote:
| Different than an average human.
| watt wrote:
| it's a "language" model (LLM), not a "math" model. when it is
| generating your answer, predicting and outputing a word after
| word it is _not_ multiplying your numbers internally.
| QuadmasterXLII wrote:
| To give a sense if scale: It's not that o3 failed to solve that
| red blue rectangle problem once: o3 spent thousands of gpu
| hours putting out text about that problem, creating by my math
| about a million pages of text, and did not find the answer
| anywhere in those pages. For other problems it did find the
| answer around the million page mark, as at the ~$3000 per
| problem spend setting the score was still slowly creeping up.
| josh-sematic wrote:
| If the trajectory of the past two years is any guide, things
| that can be done at great compute expense now will rapidly
| become possible for a fraction of the cost.
| asadotzler wrote:
| The trajectory is not a guide, unless you count the recent
| plateauing.
| aithrowawaycomm wrote:
| Just a comment: the example o1 got wrong was actually
| underspecified: https://anokas.substack.com/p/o3-and-arc-agi-
| the-unsolved-ta...
|
| Which is actually a problem I have with ARC (and IQ tests more
| generally): it is computationally cheaper to go from ARC
| transformation rule -> ARC problem than it is the other way
| around. But this means it's pretty easy to generate ARC
| problems with non-unique solutions.
| est wrote:
| At this stage I assume everything having a sequencial pattern can
| and will be automated by LLM AIs.
| Someone wrote:
| I think that's provably incorrect for the current approach to
| LLMs. They all have a horizon over which they correlate tokens
| in the input stream.
|
| So, for any LLM, if you intersperse more than that number of
| 'X' tokens between each useful token, they won't be able to do
| anything resembling intelligence.
|
| The current LLMs are a bit like n-gram databases that do not
| use letters, but larger units.
| red75prime wrote:
| The follow-up question is "Does it require a paradigm shift
| to solve it?". And the answer could be "No". Episodic memory,
| hierarchical learnable tokenization, online learning or
| whatever works well on GPUs.
| beng-nl wrote:
| It's that a bit of an unfair sabotage?
|
| Naturally, humans couldn't do it, even though they could edit
| the input to remove the X's, but shouldn't we evaluate the
| ability (even intelligent ability) of LLM's on what they can
| generally do rather than amplify their weakness?
| Someone wrote:
| Why is that unfair in reply to the claim _"At this stage I
| assume everything having a sequencial pattern can and will
| be automated by LLM AIs."_?
|
| I am not claiming LLMs aren't or cannot be intelligent, not
| even that they cannot do magical things; I just rebuked a
| statement about the lack of limits of LLMs.
|
| > Naturally, humans couldn't do it, even though they could
| edit the input to remove the X's
|
| So, what are you claiming: that they cannot or that they
| can? I think most people can and many would. Confronted
| with a file containing millions of X's, many humans will
| wonder whether there's something else than X's in the file,
| do a 'replace all', discover the question hidden in that
| sea of X's, and answer it.
|
| There even are simple files where most humans would easily
| spot things without having to think of removing those X's.
| Consider a file How X X X X X X
| many X X X X X X days X X X X X X
| are X X X X X X there X X X X X X
| in X X X X X X a X X X X X X
| week? X X X X X X
|
| with a million X's on the end of each line. Spotting the
| question in that is easy for humans, but impossible for the
| current bunch of LLMs
| int_19h wrote:
| If you have a million Xs on the end of each line, when a
| human is looking at that file, he's not looking at the
| entirety of it, but only at the part that is actually
| visible on-screen, so the equivalent task for an LLM
| would be to feed it the same subset as input. In which
| case they can all answer this question just fine.
| palata wrote:
| At this stage I _hope_ everything that needs to be reliable won
| 't be automated by LLM AIs.
| ned99 wrote:
| I think this is a silly question, you could track AI's doing very
| simple maths back in 1960 - 1970's
| mdp2021 wrote:
| It's just the worrisome linguistic confusion between AI and
| LLMs.
| jampekka wrote:
| I just spent a few days trying to figure out some linear algebra
| with the help of ChatGPT. It's very useful for finding conceptual
| information from literature (which for a not-professional-
| mathematician at least can be really hard to find and decipher).
| But in the actual math it constantly makes very silly errors.
| E.g. indexing a vector beyond its dimension, trying to do matrix
| decomposition for scalars and insisting on multiplying matrices
| with mismatching dimensions.
|
| O1 is a lot better at spotting its errors than 4o but it too
| still makes a lot of really stupid mistakes. It seems to be quite
| far from producing results itself consistently without at least a
| somewhat clueful human doing hand-holding.
| glimshe wrote:
| Isn't Wolfram Alpha a better "ChatGPT of Math"?
| Filligree wrote:
| Wolfram Alpha is better at actually doing math, but far worse
| at explaining what it's doing, and why.
| dartos wrote:
| What's worse about it?
|
| It never tells you the wrong thing, at the very least.
| fn-mote wrote:
| Its understanding of problems was very bad last time I
| used it. Meaning it was difficult to communicate what you
| wanted it to do. Usually I try to write in the
| Mathematica language, but even that is not foolproof.
|
| Hopefully they have incorporated more modern LLM since
| then, but it hasn't been that long.
| jampekka wrote:
| Wolfram Alpha's "smartness" is often Clippy level
| enraging. E.g. it makes assumptions of symbols based on
| their names (e.g. a is assumed to be a constant,
| derivatives are taken w.r.t. x). Even with Mathematica
| syntax it tends to make such assumptions and refuses to
| lift them even when explicitly directed. Quite often one
| has to change the variable symbols used to try to make
| Alpha to do what's meant.
| jvanderbot wrote:
| When you give it a large math problem and the answer is
| "seven point one three five ... ", and it shows a plot of
| the result v some randomly selected domain, well there
| could be more I'd like to know.
|
| You can unlock a full derivation of the solution, for
| cases where you say "Solve" or "Simplify", but what I
| (and I suspect GP) might want, is to know why a few of
| the key steps might work.
|
| It's a fantastic tool that helped get me through my
| (engineering) grad work, but ultimately the breakthrough
| inequalities that helped me write some of my best stuff
| were out of a book I bought in desperation that basically
| cataloged linear algebra known inequalities and
| simplifications.
|
| When I try that kind of thing with the best LLM I can use
| (as of a few months ago, albeit), the results can get
| incorrect pretty quickly.
| amelius wrote:
| I wish there was a way to tell Chatgpt where it has made a
| mistake, with a single mouse click.
| a3w wrote:
| Is the explanation a pro feature? At the very end it says
| "step by step? Pay here"
| jampekka wrote:
| Wolfram Alpha is mostly for "trivia" type problems. Or giving
| solutions to equations.
|
| I was figuring out some mode decomposition methods such as
| ESPRIT and Prony and how to potentially extend/customize
| them. Wolfram Alpha doesn't seem to have a clue about such.
| lupire wrote:
| No. Wolfram Alpha can't solve anything that isn't a function
| evaluation or equation. And it can't do modular arithmetic to
| save its unlife.
|
| WolframOne/Mathematica is better, but that requires the user
| (or ChatGPT!)to write complicated code, not natural language
| queries.
| GuB-42 wrote:
| Wolfram Alpha can solve equations well, but it is terrible at
| understanding natural language.
|
| For example I asked Wolfram Alpha "How heavy a rocket has to
| be to launch 5 tons to LEO with a specific impulse of 400s",
| which is a straightforward application of the Tsiolkovsky
| rocket equation. Wolfram Alpha gave me some nonsense about
| particle physics (result: 95 MeV/c^2), GPT-4o did it right
| (result: 53.45 tons).
|
| Wolfram alpha knows about the Tsiolkovsky rocket equation, it
| knows about LEO (low earth orbit), but I found no way to get
| a delta-v out of it, again, more nonsense. It tells me about
| Delta airlines, mentions satellites that it knows are not in
| LEO. The "natural language" part is a joke. It is more like
| an advanced calculator, and for that, it is great.
| bongodongobob wrote:
| You're using it wrong, you can use natural language in your
| equation, but afaik it's not supposed to be able to do what
| you're asking of it.
| CamperBob2 wrote:
| You know, "You're using it wrong" is usually meant to
| carry an ironic or sarcastic tone, right?
|
| It dates back to Steve Jobs blaming an iPhone 4 user for
| "holding it wrong" rather than acknowledging a flawed
| antenna design that was causing dropped calls. The
| closest Apple ever came to admitting that it was their
| problem was when they subsequently ran an employment ad
| to hire a new antenna engineering lead. Maybe it's time
| for Wolfram to hire a new language-model lead.
| bongodongobob wrote:
| It's not an LLM. You're simply asking too much of it. It
| doesn't work the way you want it to, sorry.
| spacemanspiff01 wrote:
| I wonder if these are tokenization issues? I really am curious
| about metas byte tokenization scheme...
| jampekka wrote:
| Probably mostly not. The errors tend to be
| logical/conceptual. E.g. mixing up scalars and matrices is
| unlikely to be from tokenization. Especially if using spaces
| between the variables and operators, as AFAIK GPTs don't form
| tokens over spaces (although tokens may start or end with
| them).
| lordnacho wrote:
| The only thing I've consistently had issues with while using AI
| is graphs. If I ask it to put some simple function, it produces
| a really weird image that has nothing to do with the graph I
| want. It will be a weird swirl of lines and words, and it never
| corrects itself no matter what I say to it.
|
| Has anyone had any luck with this? It seems like the only thing
| that it just can't do.
| KeplerBoy wrote:
| You're doing it wrong. It can't produce proper graphs with
| it's diffusion style image generation.
|
| Ask it to produce graphs with python and matplotlib. That
| will work.
| thomashop wrote:
| Ask it to plot the graph with python plotting utilities. Not
| using its image generator. I think you need a ChatGPT
| subscription though for it to be able to run python code.
| lupire wrote:
| You seem to get 2(?) free Python program runs per week(?)
| as part of the 01 preview.
|
| When you visit chatgpt on the free account it automatically
| gives you the best model and then disables it after some
| amount of work and says to come back later or upgrade.
| amelius wrote:
| Just install Python locally, and copy paste the code.
| xienze wrote:
| Shouldn't ChatGPT be smart enough to know to do this
| automatically, based on context?
| CamperBob2 wrote:
| It was, for a while. I think this is an area where there
| may have been some regression. It can still write code to
| solve problems that are a poor fit for the language
| model, but you may need to ask it to do that explicitly.
| HDThoreaun wrote:
| The agentic reasoning models should be able to fix this if
| they have the ability to run code instead of giving each task
| to itself. "I need to make a graph" "LLMs have difficulty
| graphing novel functions" "Call python instead" is a line of
| reasoning I would expect after seeing what O1 has come up
| with on other problems.
|
| Giving AI the ability to execute code is the safety peoples
| nightmare though, wonder if we'll hear anything from them as
| this is surely coming
| amelius wrote:
| Don't most mathematical papers contain at least one such error?
| aiono wrote:
| Where is this data from?
| amelius wrote:
| It's a question, and to be fair to AI it should actually
| refer to papers before review.
| lproven wrote:
| Betteridge's Law applies.
| LittleTimothy wrote:
| It's fascinating that this has run into the exact same problem as
| the Quantum research. Ie, in the quantum research to demonstrate
| any valuable forward progress you must compute something that is
| impossible to do with a traditional computer. If you can't do it
| with a traditional computer, it suddenly becomes difficult to
| verify correctness (ie, you can't just check it was matching the
| traditional computer's answer.
|
| In the same way ChatGPT scores 25% on this and the question is
| "How close were those 25% to questions in the training set". Or
| to put it another way we want to answer the question "Is ChatGPT
| getting better at applying it's reasoning to out-of-set problems
| or is it pulling more data into it's training set". Or "Is the
| test leaking into the training".
|
| Maybe the whole question is academic and it doesn't matter, we
| solve the entire problem by pulling all human knowledge into the
| training set and that's a massive benefit. But maybe it implies a
| limit to how far it can push human knowledge forward.
| lazide wrote:
| If constrained by existing human knowledge to come up with an
| answer, won't it fundamentally be unable to push human
| knowledge forward?
| actionfromafar wrote:
| Then much of human research and development is also
| fundamentally impossible.
| AnerealDew wrote:
| Only if you think current "AI" is on the same level as
| human creativity and intelligence, which it clearly is not.
| actionfromafar wrote:
| I think current "AI" (i.e. LLMs) is unable to push human
| knowledge forward, but not because it's constrained by
| existing human knowledge. It's more like peeking into a
| very large magic-8 ball, new answers everytime you shake
| it. Some useful.
| SJC_Hacker wrote:
| It may be able to push human knowledge forward to an
| extent.
|
| In the past, there was quite a bit of low hanging fruit
| such that you could have polymaths able to contribute to
| a wide variety of fields, such as Newton.
|
| But in the past 100 years or so, the problem is there is
| so much known, it is impossible for any single person to
| have deep knowledge of everything. e.g. its rare to find
| a really good mathematician who also has a deep knowledge
| (beyond intro courses) about say, chemistry.
|
| Would a sufficiently powerful AI / ML model be able to
| come up with this synthesis across fields?
| lupire wrote:
| That's not a strong reason. Yes, that means ChatGPT isn't
| good at wholly independently pushing knowledge forward,
| but a good brainstormer that is right even 10% of the
| time is an incredible found of knowledge.
| Havoc wrote:
| I don't think many expect AI to push knowledge forward? A
| thing that basically just regurgitates consensus historic
| knowledge seems badly suited to that
| calmoo wrote:
| But apparently these new frontier models can 'reason' - so
| with that logic, they should be able to generate new
| knowledge?
| tomjen3 wrote:
| O1 was able to find the math problem in a recently
| published paper, so yes.
| LittleTimothy wrote:
| Depends on your understanding of human knowledge I guess?
| People talk about the frontier of human knowledge and if your
| view of knowledge is like that of a unique human genius
| pushing forward the frontier then yes - it'd be stuck. But if
| you think of knowledge as more complex than that you could
| have areas that are kind of within our frontier of knowledge
| (that we could reasonably know, but don't actually know) -
| taking concepts that we already know in one field and
| applying them to some other field. Today the reason that
| doesn't happen is because genius A in physics doesn't know
| about the existence of genius B in mathematics (let alone
| understand their research), but if it's all imbibed by "The
| Model" then it's trivial to make that discovery.
| lazide wrote:
| I was referring specifically to the parent comments
| statements around current AI systems.
| wongarsu wrote:
| Reasoning is essentially the creation of new knowledge from
| existing knowledge. The better the model can reason the less
| constrained it is to existing knowledge.
|
| The challenge is how to figure out if a model is genuinely
| reasoning
| lupire wrote:
| Reasoning is a very minor (but essential) part of knowledge
| creation.
|
| Knowledge creation comes from collecting data from the real
| world, and cleaning it up somehow, and brainstorming
| creative models to explain it.
|
| NN/LLM's version of model building is frustrating because
| it is quite good, but not highly "explainable". Human
| models have higher explainability, while machine models
| have high predictive value on test examples due to an
| impenetrable mountain of algebra.
| dinosaurdynasty wrote:
| There are likely lots of connections that could be made that
| no individual has made because no individual has _all of
| existing human knowledge_ at their immediate disposal.
| eagerpace wrote:
| How much of this could be resolved if its training set were
| reduced? Conceivably, most of the training serves only to
| confuse the model when only aiming to solve a math equation.
| newpavlov wrote:
| >in the quantum research to demonstrate any valuable forward
| progress you must compute something that is impossible to do
| with a traditional computer
|
| This is factually wrong. The most interesting problems
| motivating the quantum computing research are hard to solve,
| but easy to verify on classical computers. The factorization
| problem is the most classical example.
|
| The problem is that existing quantum computers are not powerful
| enough to solve the interesting problems, so researchers have
| to invent semi-artificial problems to demonstrate "quantum
| advantage" to keep the funding flowing.
|
| There is a plethora of opportunities for LLMs to show their
| worth. For example, finding interesting links between different
| areas of research or being a proof assistant in a
| math/programming formal verification system. There is a lot of
| ongoing work in this area, but at the moment signal-to-noise
| ratio of such tools is too low for them to be practical.
| aleph_minus_one wrote:
| > This is factually wrong. The most interesting problems
| motivating the quantum computing research are hard to solve,
| but easy to verify on classical computers.
|
| You parent did not talk about quantum _computers_. I guess he
| rather had predictions of novel quantum-field theories or
| theories of quantum gravity in the back of his mind.
| newpavlov wrote:
| Then his comment makes even less sense.
| bondarchuk wrote:
| No, it is factually right, at least if Scott Aaronson is to
| be believed:
|
| > _Having said that, the biggest caveat to the "10^25 years"
| result is one to which I fear Google drew insufficient
| attention. Namely, for the exact same reason why (as far as
| anyone knows) this quantum computation would take ~10^25
| years for a classical computer to simulate, it would also
| take ~10^25 years for a classical computer to directly verify
| the quantum computer's results!! (For example, by computing
| the "Linear Cross-Entropy" score of the outputs.) For this
| reason, all validation of Google's new supremacy experiment
| is indirect, based on extrapolations from smaller circuits,
| ones for which a classical computer can feasibly check the
| results. To be clear, I personally see no reason to doubt
| those extrapolations. But for anyone who wonders why I've
| been obsessing for years about the need to design efficiently
| verifiable near-term quantum supremacy experiments: well,
| this is why! We're now deeply into the unverifiable regime
| that I warned about._
|
| https://scottaaronson.blog/?p=8525
| newpavlov wrote:
| It's a property of the "semi-artificial" problem chosen by
| Google. If anything, it means that we should heavily
| discount this claim of "quantum advantage", especially in
| the light of inherent probabilistic nature of quantum
| computations.
|
| Note that the OP wrote "you MUST compute something that is
| impossible to do with a traditional computer". I
| demonstrated a simple counter-example to this statement:
| you CAN demonstrate forward progress by factorizing big
| numbers, but the problem is that no one can do it despite
| billions of investments.
| bondarchuk wrote:
| Apparently they can't, right now, as you admit. Anyway
| this is turning into a stupid semantic argument, have a
| nice day.
| joshuaissac wrote:
| If they can't, then is it really quantum supremacy?
|
| They claimed it last time in 2019 with Sycamore, which
| could perform in 200 seconds a calculation that Google
| claimed would take a classical supercomputer 10,000
| years.
|
| That was debunked when a team of scientists replicated
| the same thing on an ordinary computer in 15 hours with a
| large number of GPUs. Scott Aaronson said that on a
| supercomputer, the same technique would have solved the
| problem in seconds.[1]
|
| So if they now come up with another problem which they
| say cannot even be verified by a classical computer and
| uses it to claim quantum advantage, then it is right to
| be suspicious of that claim.
|
| 1. https://www.science.org/content/article/ordinary-
| computers-c...
| noqc wrote:
| the unverifiable regime is a _great_ way to extract
| funding.
| derangedHorse wrote:
| > This is factually wrong.
|
| What's factually wrong about it? OP said "you must compute
| something that is impossible to do with a traditional
| computer" which is true, regardless of the output produced.
| Verifying an output is very different from verifying the
| proper execution of a program. The difference between testing
| a program and seeing its code.
|
| What is being computed is fundamentally different from
| classical computers, therefore the verification methods of
| proper adherence to instructions becomes increasingly
| complex.
| ajmurmann wrote:
| They left out the key part which was incorrect and the
| sentence right after "If you can't do it with a traditional
| computer, it suddenly becomes difficult to verify
| correctness"
|
| The point stands that for actually interesting problems
| verifying correctness of the results is trivial. I don't
| know if "adherence to instructions" transudates at all to
| quantum computing.
| 0xfffafaCrash wrote:
| I agree with the issue of "is the test dataset leaking into the
| training dataset" being an issue with interpreting LLM
| capabilities in novel contexts, but not sure I follow what you
| mean on the quantum computing front.
|
| My understanding is that many problems have solutions that are
| easier to verify than to solve using classical computing. e.g.
| prime factorization
| LittleTimothy wrote:
| Oh it's a totally different issue on the quantum side that
| leads to the same issue with difficulty verifying. There, the
| algorithms that Google for example is using today, aren't
| like prime factorization, they're not easy to directly verify
| with traditional computers, so as far as I'm aware they kind
| of check the result for a suitably small run, and then do the
| performance metrics on a large run that they _hope_ gave a
| correct answer but aren 't able to directly verify.
| intellix wrote:
| I haven't checked in a while, but last I checked ChatGPT it
| struggled on very basic things like: how many Fs are in this
| word? Not sure if they've managed to fix that but since that I
| had lost hope in getting it to do any sort of math
| sylware wrote:
| How to train an AI strapped to a formal solver.
| puttycat wrote:
| No: https://github.com/0xnurl/gpts-cant-count
| sebzim4500 wrote:
| I can't reliably multiply four digit numbers in my head either,
| what's your point?
| reshlo wrote:
| Nobody said you have to do it in your head.
| sebzim4500 wrote:
| That's the equivalent to what we are asking the model to
| do. If you give the model a calculator it will get 100%. If
| you give it a pen and paper (e.g. let it show it's working)
| then it will get near 100%.
| reshlo wrote:
| Citation needed.
| rishicomplex wrote:
| Who is the author?
| williamstein wrote:
| Kevin Buzzard
| nebulous1 wrote:
| There was a little more information in that reddit thread. Of the
| three difficulty tiers, 25% are T1 (easiest) and 50% are T2. Of
| the five public problems that the author looked at, two were T1
| and two were T2. Glazer on reddit described T1 as
| "IMO/undergraduate problems", but the article author says that
| they don't consider them to be undergraduate problems. So the LLM
| is _already_ doing what the author says they would be surprised
| about.
|
| Also Glazer seemed to regret calling T1 "IMO/undergraduate", and
| not only because of the disparity between IMO and typical
| undergraduate. He said that "We bump problems down a tier if we
| feel the difficulty comes too heavily from applying a major
| result, even in an advanced field, as a black box, since that
| makes a problem vulnerable to naive attacks from models"
|
| Also, all of the problems shows to Tao were T3
| riku_iki wrote:
| > So the LLM is already doing what the author says they would
| be surprised about.
|
| that's if you unconditionally believe in result without any
| proofreading, confirmation, reproducability and even barely any
| details (we are given only one slide).
| joe_the_user wrote:
| The reddit thread is ... interesting (direct link[1]). It seems
| to be a debate among mathematicians some of whom do have access
| to the secret set. But they're debating publicly and so
| naturally avoiding any concrete examples that would give the
| set away so wind-up with fuzzy-fiddly language for the
| qualities of the problem tiers.
|
| The "reality" of keeping this stuff secret 'cause someone would
| train on it is itself bizarre and certainly shouldn't be above
| questioning.
|
| https://www.reddit.com/r/OpenAI/comments/1hiq4yv/comment/m30...
| obastani wrote:
| It's not about training directly on the test set, it's about
| people discussing questions in the test set online (e.g., in
| forums), and then this data is swept up into the training
| set. That's what makes test set contamination so difficult to
| avoid.
| joe_the_user wrote:
| Yes,
|
| That is the "reality" - that because companies can train
| their models on the whole Internet, companies will train
| their (base) models on the entire Internet.
|
| And in this situation, "having heard the problem" actually
| serves as a barrier to understanding of these harder
| problems since any variation of known problem will receive
| a standard "half-assed guestimate".
|
| And these companies "can't not" use these base models since
| they're resigned to the "bitter lesson" (better the "bitter
| lesson viewpoint" imo) that they need large scale
| heuristics for the start of their process and only then can
| they start symbolic/reasoning manipulations.
|
| But hold-up! Why couldn't an organization freeze their
| training set and their problems and release both to the
| public? That would give us an idea where the research
| stands. Ah, the answer comes out, 'cause they don't own the
| training set and the result they want to train is a
| commercial product that needs every drop of data to be the
| best. As Yan LeCun has said, _this isn 't research, this is
| product development_.
| zifpanachr23 wrote:
| Not having access to the dataset really makes the whole thing
| seem incredibly shady. Totally valid questions you are
| raising
| seafoamteal wrote:
| I don't have much to opine from an advanced maths perspective,
| but I'd like to point out a couple examples of where ChatGPT made
| basic errors in questions I asked it as an undergrad CS student.
|
| 1. I asked it to show me the derivation of a formula for the
| efficiency of Stop-and-Wait ARQ and it seemed to do it, but a day
| later, I realised that in one of the steps, it just made a term
| vanish to get to the next step. Obviously, I should have verified
| more carefully, but when I asked it to spot the mistake in that
| step, it did the same thing twice more with bs explanations of
| how the term is absorbed.
|
| 2. I asked it to provide me syllogisms that I could practice
| proving. An overwhelming number of the syllogisms it gave me were
| inconsistent and did not hold. This surprised me more because
| syllogisms are about the most structured arguments you can find,
| having been formalized centuries ago and discussed extensively
| since then. In this case, asking it to walk step-by-step actually
| fixed the issue.
|
| Both of these were done on the free plan of ChatGPT, but I can
| remember if it was 4o or 4.
| voiper1 wrote:
| The first question is always: which model? Which fortunately
| you at least addressed: >free plan of ChatGPT, but I can
| remember if it was 4o or 4.
|
| Since chatgpt-4o, there has been o1-preview, and o1 (full) is
| out. They just announced o3 got a 25% on frontiermath which is
| what this article is a reaction to. So, any tests on 4o are at
| least TWO (or three) AI releases with new capabilities.
| Xcelerate wrote:
| So here's what I'm perplexed about. There are statements in
| Presburger arithmetic that take time doubly exponential (or
| worse) in the size of the statement to reach via _any path_ of
| the formal system whatsoever. These are arithmetic truths about
| the natural numbers. Can these statements be reached faster in
| ZFC? Possibly--it 's well-known that there exist shorter proofs
| of true statements in more powerful consistent systems.
|
| But the problem then is that one can suppose there are also true
| short statements in ZFC which likewise require doubly exponential
| time to reach via any path. Presburger Arithmetic is decidable
| whereas ZFC is not, so these statements would require the
| additional axioms of ZFC for shorter proofs, but I think it's
| safe to assume such statements exist.
|
| Now let's suppose an AI model can resolve the truth of these
| short statements quickly. That means one of three things:
|
| 1) The AI model can discover doubly exponential length proof
| paths within the framework of ZFC.
|
| 2) There are certain short statements in the formal language of
| ZFC that the AI model cannot discover the truth of.
|
| 3) The AI model operates outside of ZFC to find the truth of
| statements in the framework of some other, potentially unknown
| formal system (and for arithmetical statements, the system must
| necessarily be sound).
|
| How likely are each of these outcomes?
|
| 1) is not possible within any coherent, human-scale timeframe.
|
| 2) IMO is the most likely outcome, but then this means there are
| some _really_ interesting things in mathematics that AI cannot
| discover. Perhaps the same set of things that humans find
| interesting. Once we have exhausted the theorems with short
| proofs in ZFC, there will still be an infinite number of short
| and interesting statements that we cannot resolve.
|
| 3) This would be the most bizarre outcome of all. If AI operates
| in a consistent way outside the framework of ZFC, then that would
| be equivalent to solving the halting problem for certain
| (infinite) sets of Turing machine configurations that ZFC cannot
| solve. That in itself itself isn't too strange (e.g., it might
| turn out that ZFC lacks an axiom necessary to prove something as
| simple as the Collatz conjecture), but what would be strange is
| that it could find these new formal systems _efficiently_. In
| other words, it would have discovered an algorithmic way to
| procure new axioms that lead to efficient proofs of true
| arithmetic statements. One could also view that as an efficient
| algorithm for computing BB(n), which obviously we think isn 't
| possible. See Levin's papers on the feasibility of extending PA
| in a way that leads to quickly discovering more of the halting
| sequence.
| aleph_minus_one wrote:
| > There are statements in Presburger arithmetic that take time
| doubly exponential (or worse) in the size of the statement to
| reach via any path of the formal system whatsoever.
|
| This is a correct statement about the _worst_ case runtime.
| What is interesting for practical applications is whether such
| statements are among those that you are practically interested
| in.
| Xcelerate wrote:
| I would certainly think so. The statements mathematicians
| seem to be interested in tend to be at a "higher level" than
| simple but true statements like 2+3=5. And they necessarily
| have a short description in the formal language of ZFC,
| otherwise we couldn't write them down (e.g., Fermat's last
| theorem).
|
| If the truth of these higher level statements instantly
| unlocks many other truths, then it makes sense to think of
| them in the same way that knowing BB(5) allows one to
| instantly classify any Turing machine configuration on the
| computation graph of all n <= 5 state Turing machines (on
| empty tape input) as halting/non-halting.
| wbl wrote:
| 2 is definitely true. 3 is much more interesting and likely
| true but even saying it takes us into deep philosophical
| waters.
|
| If every true theorem had a proof in a computationally bounded
| length the halting problem would be solvable. So the AI can't
| find some of those proofs.
|
| The reason I say 3 is deep is that ultimately our foundational
| reasons to assume ZFC+the bits we need for logic come from
| philosohical groundings and not everyone accepts the same ones.
| Ultrafinitists and large cardinal theorists are both kinds of
| people I've met.
| Xcelerate wrote:
| My understanding is that no model-dependent theorem of ZFC or
| its extensions (e.g., ZFC+CH, ZFC+!CH) provides any insight
| into the behavior of Turing machines. If our goal is to
| invent an algorithm that finds better algorithms, then the
| philosophical angle is irrelevant. For computational
| purposes, we would only care about new axioms independent of
| ZFC if they allow us to prove additional Turing machine
| configurations as non-halting.
| semolinapudding wrote:
| ZFC is way worse than Presburger arithmetic -- since it is
| undecidable, we know that the length of the minimal proof of a
| statement cannot be bounded by a computable function of the
| length of the statement.
|
| This has little to do with the usefulness of LLMs for research-
| level mathematics though. I do not think that anyone is hoping
| to get a decision procedure out of it, but rather something
| that would imitate human reasoning, which is heavily based on
| analogies ("we want to solve this problem, which shares some
| similarities with that other solved problem, can we apply the
| same proof strategy? if not, can we generalise the strategy so
| that it becomes applicable?").
| bambax wrote:
| > _As an academic mathematician who spent their entire life
| collaborating openly on research problems and sharing my ideas
| with other people, it frustrates me_ [that] _I am not even to
| give you a coherent description of some basic facts about this
| dataset, for example, its size. However there is a good reason
| for the secrecy. Language models train on large databases of
| knowledge, so you moment you make a database of maths questions
| public, the language models will train on it._
|
| Well, yes and no. This is only true because we are talking about
| closed models from closed companies like so-called "OpenAI".
|
| But if all models were truly open, then we could simply verify
| what they had been trained on, and make experiments with models
| that we could be sure had never seen the dataset.
|
| Decades ago Microsoft (in the words of Ballmer and Gates)
| famously accused open source of being a "cancer" because of the
| cascading nature of the GPL.
|
| But it's the opposite. In software, and in knowledge in general,
| the true disease is secrecy.
| ludwik wrote:
| > But if all models were truly open, then we could simply
| verify what they had been trained on
|
| How do you verify what a particular open model was trained on
| if you haven't trained it yourself? Typically, for open models,
| you only get the architecture and the trained weights. How can
| you reliably verify what the model was trained on from this?
|
| Even if they provide the training set (which is not typically
| the case), you still have to take their word for it--that's not
| really "verification."
| asadotzler wrote:
| The OP said "truly open" not "open model" or any of the other
| BS out there. If you are truly open you share the training
| corpora as well or at least a comprehensive description of
| what it is and where to get it.
| ludwik wrote:
| It seems like you skipped the second paragraph of my
| comment?
| bambax wrote:
| If they provide the training set it's reproducible and
| therefore verifiable.
|
| If not, it's not really "open", it's bs-open.
| 4ad wrote:
| > FrontierMath is a secret dataset of "hundreds" of hard maths
| questions, curated by Epoch AI, and announced last month.
|
| The database stopped being secret when it was fed to proprietary
| LLMs running in the cloud. If anyone is not thinking that OpenAI
| has trained and tuned O3 on the "secret" problems people fed to
| GPT-4o, I have a bridge to sell you.
| fn-mote wrote:
| This level of conspiracy thinking requires evidence to be
| useful.
|
| Edit: I do see from your profile that you are a real person
| though, so I say this with more respect.
| dns_snek wrote:
| What evidence do we need that AI companies are exploiting
| every bit of information they can use to get ahead in the
| benchmarks to generate more hype? Ignoring terms/agreements,
| violating copyright, and otherwise exploiting information for
| personal gain is the foundation of that entire industry for
| crying out loud.
| ashoeafoot wrote:
| Ai has a interior world model thus it can do math if a chain of
| proof is walking without uncertainty from room to room. the
| problem is its inability to reflect on its own uncertainty and to
| then overrife that uncertainty ,should a new room entrance method
| be selfsimilar to a previous entrance
| voidhorse wrote:
| Eventually we may produce a collection of problems exhaustive
| enough that these tools can solve almost any problem that isn't
| novel in practice, but I doubt that they will ever become general
| problem solvers capable of what we consider to be reasoning in
| humans.
|
| Historically, the claim that neural nets were actual models of
| the human brain and human thinking was always epistemically
| dubious. It still is. Even as the _practical_ problems of
| producing better and better algorithms, architectures, and output
| have been solved, there is no reason to believe a connection
| between the mechanical model and what happens in organisms has
| been established. The most important point, in my view, is that
| all of the representation and interpretation still has to happen
| outside the computational units. Without human interpreters, none
| of the AI outputs have any meaning. Unless you believe in
| determinism and an overseeing god, the story for human beings is
| much different. AI will not be capable of reason until, like
| humans, it can develop socio-rational collectivities of meaning
| that are _independent_ of the human being.
|
| Researchers seemed to have a decent grasp on this in the 90s, but
| today, everyone seems all too ready to make the same ridiculous
| leaps as the original creators of neural nets. They did not show,
| as they claimed, that thinking is reducible to computation. All
| they showed was that a neural net can realize a _boolean
| function_ --which is not even logic, since, again, the entire
| semantic interpretive side of the logic is ignored.
| nmca wrote:
| Can you define what you mean by novel here?
| red75prime wrote:
| > there is no reason to believe a connection between the
| mechanical model and what happens in organisms has been
| established
|
| The universal approximation theorem. And that's basically it.
| The rest is empirical.
|
| No matter which physical processes happen inside the human
| brain, a sufficiently large neural network can approximate
| them. Barring unknowns like super-Turing computational
| processes in the brain.
| lupire wrote:
| That's not useful by itself, because "anything cam model
| anything else" doesn't put any upper bound on emulation cost,
| which for one small task could be larger than the total
| energy available in the entire Universem
| pixl97 wrote:
| I mean, that is why they mention super-Turning processes
| like quantum based computing.
| dinosaurdynasty wrote:
| Quantum computing actually isn't super-Turing, it "just"
| computes some things faster. (Strictly speaking it's
| somewhere between a standard Turing machine and a
| nondeterministic Turing machine in speed, and the first
| can emulate the second.)
| red75prime wrote:
| Either the brain violates the physical Church-Turing thesis
| or it's not.
|
| If it does, well, it will take more time to incorporate
| those physical mechanisms into computers to get them on par
| with the brain.
|
| I leave the possibility that it's "magic"[1] aside. It's
| just impossible to predict, because it will violate
| everything we know about our physical world.
|
| [1] One example of "magic": we live in a simulation and the
| brain is not fully simulated by the physics engine, but
| creators of the simulation for some reason gave it access
| to computational resources that are impossible to harness
| using the standard physics of the simulated world. Another
| example: interactionistic soul.
| exprofmaddy wrote:
| The universal approximation theorem is set in a precise
| mathematical context; I encourage you to limit its
| applicability to that context despite the marketing label
| "universal" (which it isn't). Consider your concession about
| empiricism. There's no empirical way to prove (i.e. there's
| no experiment that can demonstrate beyond doubt) that all
| brain or other organic processes are deterministic and can be
| represented completely as functions.
| red75prime wrote:
| Function is the most general way of describing relations.
| Non-deterministic processes can be represented as functions
| with a probability distribution codomain. Physics seems to
| require only continuous functions.
|
| Sorry, but there's not much evidence that can support human
| exceptionalism.
| exprofmaddy wrote:
| Some differential equations that model physics admit
| singularities and multiple solutions. Therefore,
| functions are not the most general way of describing
| relations. Functions are a subset of relations.
|
| Although "non-deterministic" and "stochastic" are often
| used interchangeably, they are not equivalent.
| Probability is applied analysis whose objects are
| distributions. Analysis is a form of deductive, i.e.
| mechanical, reasoning. Therefore, it's more accurate
| (philosophically) to identify mathematical probability
| with determinism. Probability is a model for our
| experience. That doesn't mean our experience is truly
| probabilistic.
|
| Humans aren't exceptional. Math modeling and reasoning
| are human activities.
| tananan wrote:
| > Unless you believe in determinism and an overseeing god
|
| Or perhaps, determinism and mechanistic materialism - which in
| STEM-adjacent circles has a relatively prevalent adherence.
|
| Worldviews which strip a human being of agency in the sense you
| invoke crop up quite a lot today in such spaces. If you start
| of adopting a view like this, you have a deflationary sword
| which can cut down most any notion that's not mechanistic in
| terms of mechanistic parts. "Meaning? Well that's just an
| emergent phenomenon of the influence of such and such causal
| factors in the unrolling of a deterministic physical system."
|
| Similar for reasoning, etc.
|
| Now obviously large swathes of people don't really subscribe to
| this - but it is prevalent and ties in well with utopian
| progress stories. If something is amenable to mechanistic
| dissection, possibly it's amenable to mechanistic control. And
| that's what our education is really good at teaching us. So
| such stories end up having intoxicating "hype" effects and
| drive fundraising, and so we get where we are.
|
| For one, I wish people were just excited about making computers
| do things they couldn't do before, without needing to dress it
| up as something more than it is. "This model can prove a set of
| theorems in this format with such and such limits and
| efficiency"
| exprofmaddy wrote:
| Agreed. If someone believes the world is purely mechanistic,
| then it follows that a sufficiently large computing machine
| can model the world---like Leibniz's Ratiocinator. The
| intoxication may stem from the potential for predictability
| and control.
|
| The irony is: why would someone want control if they don't
| have true choice? Unfortunately, such a question rarely
| pierces the intoxicated mind when this mind is preoccupied
| with pass the class, get an A, get a job, buy a house, raise
| funds, sell the product, win clients, gain status, eat right,
| exercise, check insta, watch the game, binge the show, post
| on Reddit, etc.
| Quekid5 wrote:
| > If someone believes the world is purely mechanistic, then
| it follows that a sufficiently large computing machine can
| model the world
|
| Is this controversial in some way? The problem is that to
| simulate a universe you need a bigger universe -- which
| doesn't exist (or is certainly out of reach due to
| information theoretical limits)
|
| > ---like Leibniz's Ratiocinator. The intoxication may stem
| from the potential for predictability and control.
|
| I really don't understand the 'control' angle here. It
| seems pretty obvious that even in a purely mechanistic view
| of the universe, information theory forbids using the
| universe to simulate itself. Limited simulations, sure...
| but that leaves lots of gaps wherein you lose determinism
| (and control, whatever that means).
| tananan wrote:
| > Is this controversial in some way?
|
| It's not "controversial", it's just not a given that the
| universe is to be thought a deterministic machine. Not to
| everyone, at least.
| HDThoreaun wrote:
| Choice is over rated. This gets to an issue Ive long had
| with Nozicks experience machine. Not only would I happily
| spend my days in such a machine, Im pretty sure most other
| people would too. Maybe they say they wouldnt but if you
| let them try it out and then offered them the question
| again I think theyd say yes. The real conclusion of the
| experience machine is that the unknown is scary.
| gmadsen wrote:
| I hear these arguments a lot from law and philosophy students,
| never from those trained in mathematics. It seems to me,
| "literary" people will still be discussing these theoretical
| hypotheticals as technology passes them by building it.
| voidhorse wrote:
| I straddle both worlds. Consider that using the lens of
| mathematical reasoning to understand everything is a bit like
| trying to use a single mathematical theory (eg that of
| groups) to comprehend mathematics as a whole. You will almost
| always benefit and enrich your own understanding by daring to
| incorporate outside perspectives.
|
| Consider also that even as digital technology and the
| ratiomathimatical understanding of the world has advanced it
| is still rife with dynamics and problems that require a
| humanistic approach. In particular, a mathematical conception
| cannot resolve _teleological_ problems which require the
| establishment of consensus and the actual determination of
| what we, as a species, want the world to look like. Climate
| change and general economic imbalance are already evidence of
| the kind of disasters that mount when you limit yourself to a
| reductionistic, overly mathematical and technological
| understanding of life and existence. Being is not a solely
| technical problem.
| gmadsen wrote:
| I don't disagree, I just don't think it is done well or at
| least as seriously as it used to. In modern philosophy,
| there are many mathematically specious arguments, that just
| make clear how large the mathematical gap has become e.g.
| improper application of Godel's incompleteness theorems.
| Yet Godel was a philosopher himself, who would disagree
| with its current hand-wavy usage.
|
| 19th/20th was a golden era of philosophy with a coherent
| and rigorous mathematical lens to apply with other lenses.
| Russel, Turing, Godel, etc. However this just doesn't exist
| anymore
| exprofmaddy wrote:
| I'm with you. Interpreting a problem as a problem requires a
| human (1) to recognize the problem and (2) to convince other
| humans that it's a problem worth solving. Both involve value,
| and value has no computational or mechanistic description
| (other than "given" or "illusion"). Once humans have identified
| a problem, they might employ a tool to find the solution. The
| tool has no sense that the problem is important or even hard;
| such values are imposed by the tool's users.
|
| It's worth considering why "everyone seems all too ready to
| make ... leaps ..." "Neural", "intelligence", "learning", and
| others are metaphors that have performed very well as marketing
| slogans. Behind the marketing slogans are deep-pocketed,
| platformed corporate and government (i.e. socio-rational
| collective) interests. Educational institutions (another socio-
| rational collective) and their leaders have on the whole
| postured as trainers and preparers for the "real world" (i.e. a
| job), which means they accept, support, and promote the
| corporate narratives about techno-utopia. Which institutions
| are left to check the narratives? Who has time to ask questions
| given the need to learn all the technobabble (by paying
| hundreds of thousands for 120 university credits) to become a
| competitive job candidate?
|
| I've found there are many voices speaking against the hype---
| indeed, even (rightly) questioning the epistemic underpinnings
| of AI. But they're ignored and out-shouted by tech marketing,
| fundraising politicians, and engagement-driven media.
| alphan0n wrote:
| As far as ChatGPT goes, you may as well be asking: Can AI use a
| calculator?
|
| The answer is yes, it can utilize a stateful python environment
| and solve complex mathematical equations with ease.
| lcnPylGDnU4H9OF wrote:
| There is a difference between correctly _stating_ that 2 + 2 =
| 4 within a set of logical rules and _proving_ that 2 + 2 = 4
| must be true given the rules.
| alphan0n wrote:
| I think you misunderstood, ChatGPT can utilize Python to
| solve a mathematical equation and provide proof.
|
| https://chatgpt.com/share/676980cb-d77c-8011-b469-4853647f98.
| ..
|
| More advanced solutions:
|
| https://chatgpt.com/share/6769895d-7ef8-8011-8171-6e84f33103.
| ..
| cruffle_duffle wrote:
| It still has to know what to code in that environment. And
| based on my years of math as a wee little undergrad, the actual
| arithmetic was the least interesting part. LLM's are horrible
| at basic arithmetic, but they can use python for the
| calculator. But python wont help them write the correct
| equations or even solve for the right thing (wolfram alpha can
| do a bit of that though)
| alphan0n wrote:
| You'll have to show me what you mean.
|
| I've yet to encounter an equation that 4o couldn't answer in
| 1-2 prompts unless it timed out. Even then it can provide the
| solution in a Jupyter notebook that can be run locally.
| cruffle_duffle wrote:
| Never really pushed it. I have to reason to believe it
| wouldn't get most of that stuff correctly. Math is very
| much like programming and I'm sure it can output really
| good python for its notebook to use execute.
| upghost wrote:
| I didn't see anyone else ask this but.. isn't the FrontierMath
| dataset compromised now? At the very least OpenAI now knows the
| questions if not the answers. I would expect that the next
| iteration will "magically" get over 80% on the FrontierMath test.
| I imagine that experiment was pretty closely monitored.
| jvanderbot wrote:
| I figured their model was independently evaluated against the
| questions/answers. That's not to say it's not compromised by
| "Here's a bag of money" type methods, but I don't even think
| it'd be a reasonable test if they just handed over the dataset.
| upghost wrote:
| I'm sure it was independently evaluated, but I'm sure the
| folks running the test were not given an on-prem installation
| of ChatGPT to mess with. It was still done via API calls,
| presumably through the chat interface UI.
|
| That means the questions went over the fence to OpenAI.
|
| I'm quite certain they are aware of that, and it would be
| pretty foolish not to take advantage of at least knowing what
| the questions are.
| jvanderbot wrote:
| Now that you put it that way, it is laughably easy.
| optimalsolver wrote:
| This was my first thought when I saw the results:
|
| https://news.ycombinator.com/item?id=42473470
| upghost wrote:
| Insightful comment. The thing that's extremely frustrating is
| look at all the energy poured into this conversation around
| benchmarks. There is a fundamental assumption of honesty and
| integrity in the benchmarking process by at least some
| people. But when the dataset is compromised and generation
| N+1 has miraculous performance gains, how can we see this as
| anything other than a ploy to pump up valuations? Some people
| have millions of dollars at stake here and they don't care
| about the naysayers in the peanut gallery like us.
| optimalsolver wrote:
| It's sadly inevitable that when billions in funding and
| industry hype are tied to performance on a handful of
| benchmarks, scores will somehow, magically, continue to go
| up.
|
| Needless to say, it doesn't bring us any closer to AGI.
|
| The only solution I see here is people crafting their own,
| private benchmarks that the big players don't care about
| enough to train on. That, at least, gives you a clearer
| view of the field.
| upghost wrote:
| Not sure why your comment was downvoted, but it certainly
| shows the pressure going against people who point out
| fundamental flaws. This is pushing us towards "AVI"
| rather than AGI-- "Artificially Valued Intelligence". The
| optimization function here is around the market.
|
| I'm being completely serious. You are correct, despite
| the downvotes, that this could not be pushing us towards
| AGI because if the dataset is leaked you can't claim the
| G-- generalizability.
|
| The point of the benchmark is to lead is to believe that
| this is a substantial breakthrough. But a reasonable
| person would be forced to conclude that the results are
| misleading to due to optimizing around the training data.
| sincerecook wrote:
| No it can't, and there's no such thing as AI. How is a thing that
| predicts the next-most-likely word going to do novel math? It
| can't even do existing math reliably because logical operations
| and statistical approximation are fundamentally different. It is
| fun watching grifters put lipstick on this thing and shop it
| around as a magic pig though.
| retrocryptid wrote:
| When did we decide that AI == LLM? Oh don't answer. I know, The
| VC world noticed CNNs and LLMs about 10 years ago and it's the
| only thing anyone's talked about ever since.
|
| Seems to me the answer to 'Can AI do maths yet?' depends on what
| you call AI and what you call maths. Our old departmental VAX
| running at a handfull of megahertz could do some very clever
| symbol manipulation on binomials and if you gave it a few
| seconds, it could even do something like theorum proving via
| proto-prolog. Neither are anywhere close to the glorious GAI
| future we hope to sell to industry and government, but it seems
| worth considering how they're different, why they worked, and
| whether there's room for some hybrid approach. Do LLMs need to
| know how to do math if they know how to write Prolog or Coc
| statements that can do interesting things?
|
| I've heard people say they want to build software that emulates
| (simulates?) how humans do arithmetic, but ask a human to add
| anything bigger than two digit numbers and the first thing they
| do is reach for a calculator.
| ivan_ah wrote:
| Yesterday, I saw a thought provoking talk about the future of of
| "math jobs" assuming automated theory proving becomes more
| prevalent in the future.
|
| [ (Re)imagining mathematics in a world of reasoning machines by
| Akshay Venkatesh]
|
| https://www.youtube.com/watch?v=vYCT7cw0ycw [54min]
|
| Abstract: In the coming decades, developments in automated
| reasoning will likely transform the way that research mathematics
| is conceptualized and carried out. I will discuss some ways we
| might think about this. The talk will not be about current or
| potential abilities of computers to do mathematics--rather I will
| look at topics such as the history of automation and mathematics,
| and related philosophical questions.
|
| See discussion at https://news.ycombinator.com/item?id=42465907
| swalsh wrote:
| Every profession seems to have a pessimistic view of AI as soon
| as it starts to make progress in their domain. Denial, Anger,
| Bargaining, Depression, and Acceptance. Artists seem to be in the
| depression state, many programmers are still in the denial phase.
| Pretty solid denial here from a mathematician. o3 was a proof of
| concept, like every other domain AI enters, it's going to keep
| getting better.
|
| Society is CLEARLY not ready for what AI's impact is going to be.
| We've been through change before, but never at this scale and
| speed. I think Musk/Vivek's DOGE thing is important, our
| governent has gotten quite large and bureaucratic. But the clock
| has started on AI, and this is a social structural issue we've
| gotta figure out. Putting it off means we probably become
| subjects to a default set of rulers if not the shoggoth itself.
| haolez wrote:
| I think it's a little of both. Maybe generative AI algorithms
| won't overcome their initial limitations. But maybe we don't
| need to overcome them to transform society in a very
| significant way.
| WanderPanda wrote:
| Or is it just white collar workers experiencing what blue
| collar workers have been experiencing for decades?
| esafak wrote:
| So will that make society shift to the left in demand
| stronger of safety nets, or to the right in search of a
| strongman to rescue them?
| mensetmanusman wrote:
| The reason why this is so disruptive is because it will effect
| hundreds of fields simultaneously.
|
| Previously workers in a field disrupted by automation would
| retrain to a different part of the economy.
|
| If AI pans out to the point that there are mass layoffs in
| hundreds of sectors of the economy at once, then i'm not sure
| the process we have haphazardly set up now will work. People
| will have no idea where to go beyond manual labor. (But this
| will be difficult due to the obesity crisis - but maybe it will
| save lives in a weird way).
| hash872 wrote:
| If there are 'mass layoffs in hundreds of sectors of the
| economy at once', then the economy immediately goes into
| Great Depression 2.0 or worse. Consumer spending is two-
| thirds of the US economy, when everyone loses their jobs and
| stops having disposable income that's literally what a
| depression is
| mensetmanusman wrote:
| This will create a prisoner's dilemma for corporations
| then, the government will have to step in to provide
| incentives for insanely profitable corporations to keep the
| proper number of people employed or limit the rate of
| layoffs.
| jebarker wrote:
| > I am dreading the inevitable onslaught in a year or two of
| language model "proofs" of the Riemann hypothesis which will just
| contain claims which are vague or inaccurate in the middle of 10
| pages of correct mathematics which the human will have to wade
| through to find the line which doesn't hold up.
|
| I wonder what the response of working mathematicians will be to
| this. If the proofs look credible it might be too tempting to try
| and validate them, but if there's a deluge that could be a hug
| time sync. Imagine if Wiles or Perelman had produced a thousand
| different proofs for their respective problems.
| bqmjjx0kac wrote:
| Maybe the coming onslaught of AI slop "proofs" will give a
| little bump to proof assistants like Coq. Of course, it would
| still take a human mathematician some time to verify theorem
| definitions.
| Hizonner wrote:
| Don't waste time on looking at it unless a formal proof checker
| can verify it.
| yodsanklai wrote:
| I understand the appeal of having a machine helping us with maths
| and expanding the frontier of knowledge. They can assist
| researchers and make them more productive. Just like they can
| make already programmers more productive.
|
| But maths are also fun and fulfilling activity. Very often, when
| we learn a math theory, it's because we want to understand and
| gain intuition on the concepts, or we want to solve a puzzle (for
| which we can already look up the solution). Maybe it's similar to
| chess. We didn't develop search engines to replace human players
| and make them play together, but they helped us become better
| chess players or understanding the game better.
|
| So the recent progress is impressive, but I still don't see how
| we'll use this tech practically and what impacts it can have and
| in which fields.
| vouaobrasil wrote:
| My favourite moments of being a graduate student in math was
| showing my friends (and sometimes professors) proofs of
| propositions and theorems that we discussed together. To be the
| first to put together a coherent piece of reasoning that would
| convince them of the truth was immensely exciting. Those were
| great bonding moments amongst colleagues. The very fact that we
| needed each other to figure out the basics of the subject was
| part of what made the journey so great.
|
| Now, all of that will be done by AI.
|
| Reminds of the time when I finally enabled invincibility in
| Goldeneye 007. Rather boring.
|
| I think we've stopped to appreciate the human struggle and
| experience and have placed all the value on the end product, and
| that's we're developing AI so much.
|
| Yeah, there is the possibility of working with an AI but at that
| point, what is the point? Seems rather pointless to me in an art
| like mathematics.
| busyant wrote:
| As someone who has a 18 yo son who wants to study math, this has
| me (and him) ... worried ... about becoming obsolete?
|
| But I'm wondering what other people think of this analogy.
|
| I used to be a bench scientist (molecular genetics).
|
| There were world class researchers who were more creative than I
| was. I even had a Nobel Laureate once tell me that my research
| was simply "dotting 'i's and crossing 't's".
|
| Nevertheless, I still moved the field forward in my own small
| ways. I still did respectable work.
|
| So, will these LLMs make us _completely_ obsolete? Or will there
| still be room for those of us who can dot the "i"?--if only for
| the fact that LLMs don't have infinite time/resources to solve
| "everything."
|
| I don't know. Maybe I'm whistling past the graveyard.
| deepsun wrote:
| By the way, don't trust Nobel laureates or even winners. E.g.
| Linus Pauling was talking absolute garbage, harmful and evil,
| after winning the Nobel.
| Radim wrote:
| > _don 't trust Nobel laureates or even winners_
|
| Nobel laureate and winner are the same thing.
|
| > _Linus Pauling was talking absolute garbage, harmful and
| evil, after winning the Nobel._
|
| Can you be more specific, what garbage? And which Nobel prize
| do you mean - Pauling got two, one for chemistry and one for
| peace.
| bongodongobob wrote:
| Eugenics and vitamin C as a cure all.
| lern_too_spel wrote:
| If Pauling's eugenics policies were bad, then the laws
| against incest that are currently on the books in many
| states (which are also eugenics policies that use the
| same mechanism) are also bad. There are different forms
| of eugenics policies, and Pauling's proposal to restrict
| the mating choices of people carrying certain recessive
| genes so their children don't suffer is ethically
| different from Hitler exterminating people with certain
| genes and also ethically different from other governments
| sterilizing people with certain genes. He later supported
| voluntary abortion with genetic testing, which is now
| standard practice in the US today, though no longer in a
| few states with ethically questionable laws restricting
| abortion. This again is ethically different from forced
| abortion.
|
| https://scarc.library.oregonstate.edu/coll/pauling/blood/
| nar...
| deepsun wrote:
| Thank you, my bad.
|
| I was referring to Linus's harmful and evil promotion of
| Vitamin C as the cure for everything and cancer. I don't
| think Linus was attaching that garbage to any particular
| Nobel prize. But people did say to their doctors: "Are you
| a Nobel winner, doctor?". Don't think they cared about
| particular prize either.
| pfisherman wrote:
| I used to do bench top work too; and was blessed with "the
| golden hands" in that I could almost always get protocols
| working. To me this always felt more like intuition than
| deductive reasoning. And it made me a terrible TA. My advice to
| students in lab was always something along the lines of "just
| mess around with it, and see how it works." Not very helpful
| for the stressed and struggling student -_-
|
| Digression aside, my point is that I don't think we know
| exactly what makes or defines "the golden hands". And if that
| is the case, can we optimize for it?
|
| Another point is that scalable fine tuning only works for
| verifiable stuff. Think a priori knowledge. To me that seems to
| be at the opposite end of the spectrum from "mess with it and
| see what happens".
| busyant wrote:
| > blessed with "the golden hands" in that I could almost
| always get protocols working.
|
| Very funny. My friends and I never used the phrase "golden
| hands" but we used to say something similar: "so-and-so has
| 'great hands'".
|
| But it meant the same thing.
|
| I, myself, did not have great hands, but my comment was more
| about the intellectual process of conducting research.
|
| I guess my point was that:
|
| * I've already dealt with more talented researchers, but I
| still contributed meaningfully.
|
| * Hopefully, the "AI" will simply add another layer of
| talent, but the rest of us lesser mortals will still be able
| to contribute.
|
| But I don't know if I'm correct.
| vouaobrasil wrote:
| I was just thinking about this. I already posted a comment
| here, but I will say that as a mathematician (PhD in number
| theory), that for me, AI signficantly takes away the beauty of
| doing mathematics within a realm in which AI is used.
|
| The best part of math (again, just for me) is that it was a
| journey that was done by hand with only the human intellect
| that computers didn't understand. The beauty of the subject was
| precisely that it was a journey of human intellect.
|
| As I said elsewhere, my friends used to ask me why something
| was true and it was fun to explain it to them, or ask them and
| have them explain it to me. Now most will just use some AI.
|
| Soulless, in my opinion. Pure mathematics should be about the
| art of the thing, not producing results on an assembly line
| like it will be with AI. Of course, the best mathematicians are
| going into this because it helps their current careers, not
| because it helps the future of the subject. Math done with AI
| will be a lot like Olympic running done with performance-
| enhancing drugs.
|
| Yes, we will get a few more results, faster. But the results
| will be entirely boring.
| zmgsabst wrote:
| Presumably people who get into math going forward will feel
| differently.
|
| For myself, chasing lemmas was always boring -- and there's
| little interest in doing the busywork of fleshing out a
| theory. For me, LLMs are a great way to do the fun parts
| (conceptual architecture) without the boring parts.
|
| And I expect we'll such much the same change as with physics:
| computers increase the complexity of the objects we study,
| which tend to be rather simple when done by hand -- eg,
| people don't investigate patterns in the diagrams of
| group(oids) because drawing million element diagrams isn't
| tractable by hand. And you only notice the patterns in them
| when you see examples of the diagrams at scale.
| ndriscoll wrote:
| Even current people will feel differently. I don't bemoan
| the fact that Lean/Mathlib has `simp` and `linarith` to
| automate trivial computations. A "copilot for Lean" that
| can turn "by induction, X" or "evidently Y" into a formal
| proof sounds great.
|
| The the trick is teaching the thing how high powered of
| theorems to use or how to factor out details or not
| depending on the user's level of understanding. We'll have
| to find a pedagogical balance (e.g. you don't give
| `linarith` to someone practicing basic proofs), but I'm
| sure it will be a great tool to aid human understanding.
|
| A tool to help translate natural language to formal
| propositions/types also sounds great, and could help more
| people to use more formal methods, which could make for
| more robust software.
| vouaobrasil wrote:
| Just a counterpoint, but I wonder how much you'll really
| understand if you can't even prove the whole thing
| yourself. Personally, I learn by proving but I guess
| everyone is different.
| hn3er1q wrote:
| There are many similarities in your comment to how
| grandmasters discuss engines. I have a hunch the arc of AI in
| math will be very similar to the arc of engines in chess.
|
| https://www.wired.com/story/defeated-chess-champ-garry-
| kaspa...
| vouaobrasil wrote:
| I agree with that, in the sense that math will become more
| about who can use AI the fastest to generate the most
| theories, which sort of side-steps the whole point of math.
| hn3er1q wrote:
| As a chess aficionado and a former tournament player, who
| didn't get very far, I can see pros & cons. They helped
| me train and get significantly better than I would've
| gotten without them. On the other hand, so did the
| competition. :) The average level of the game is so much
| higher than when I was a kid (30+ years ago) and new ways
| of playing that were unthinkable before are possible now.
| On the other hand cheating (online anyway) is rampant and
| all the memorization required to begin to be competitive
| can be daunting, and that sucks.
| vouaobrasil wrote:
| Hey I play chess too. Not a very good player though. But
| to be honest, I enjoy playing with people who are not
| serious because I do think an overabundance of knowledge
| makes the game too mechanical. Just my personal
| experience, but I think the risk of cheaters who use
| programs and the overmechanization of chess is not worth
| becoming a better player. (And in fact, I think MOST
| people can gain satisfaction by improving just by
| studying books and playing. But I do think that a few who
| don't have access to opponents benefit from a chess-
| playing computer).
| nyrikki wrote:
| What LLMs can do is limited, they are superior to wet-wear in
| some tasks like finding and matching patterns in higher
| dimensional space, they are still fundamentally limited to a
| tiny class of problems outside of that pattern finding and
| matching.
|
| LLMs will be tools for some math needs and even if we ever get
| quantum computers will be limited in what they can do.
|
| LLMs, without pattern matching, can only do up to about integer
| division, and while they can calculate parity, they can't use
| it in their calculations.
|
| There are several groups sitting on what are known limitations
| of LLMs, waiting to take advantage of those who don't
| understand the fundamental limitations, simplicity bias etc...
|
| The hype will meet reality soon and we will figure out where
| they work and where they are problematic over the next few
| years.
|
| But even the most celebrated achievements like proof finding
| with Lean, heavily depends on smart people producing hints that
| machines can use.
|
| Basically lots of the fundamental hints of the limits of
| computation still hold.
|
| Model logic may be an accessable way to approach the limits of
| statistical inference if you want to know one path yourself.
|
| A lot of what is in this article relates to some the known
| fundamental limitations.
|
| Remember that for all the amazing progress, one of the core
| founders of the perceptron, Pitts drank him self to death in
| the 50s because it was shown that they were insufficient to
| accurately model biological neurons.
|
| Optimism is high, but reality will hit soon.
|
| So think of it as new tools that will be available to your
| child, not a replacement.
| ComplexSystems wrote:
| "LLMs, without pattern matching, can only do up to about
| integer division, and while they can calculate parity, they
| can't use it in their calculations." - what do you mean by
| this? Counting the number of 1's in a bitstring and
| determining if it's even or odd?
| nyrikki wrote:
| Yes, in this case PARITY is determining if the number of 1s
| in a binary input is odd or even
|
| It is an effect of the complex to unpack descriptive
| complexity class DLOGTIME-uniform TC0, which has AND, OR
| and MAJORITY gates.
|
| http://arxiv.org/abs/2409.13629
|
| The point being that the ability to use parity gates is
| different than being able to calculate it, which is where
| the union of the typically ram machine DLOGTIME with the
| circuit complexity of uniform TC0 comes into play.
|
| PARITY, MAJ, AND, and OR are all symmetric, and are in TCO,
| but PARITY is not in DLOGTIME-uniform TC0, which is first-
| order logic with Majority quantifiers.
|
| Another path, if you think about symantic properties and
| Rice's theorem, this may make sense especially as PAC
| learning even depth 2 nets is equivalent to the approximate
| SVP.
|
| PAC-learning even depth-2 threshold circuits is NP-hard.
|
| https://www.cs.utexas.edu/~klivans/crypto-hs.pdf
|
| For me thinking about how ZFC was structured so we can keep
| the niceties of the law of the excluded middle, and how
| statistics pretty much depends on it for the central limit
| and law of large numbers, IID etc...
|
| But that path runs the risk of reliving the Brouwer-Hilbert
| controversy.
| TheRealPomax wrote:
| What part do you think is going to become obsolete? Because
| Math isn't about "working out the math", it's about finding the
| relations between seemingly unrelated things to bust open a
| problem. Short of AGI, there is no amount of neural net that's
| going to realize that a seemingly impossible probabilistic
| problem is actually equivalent to a projection of an easy to
| work with 4D geometry. "Doing the math" is what we have
| computers for, and the better they get, the easier the tedious
| parts of the job become, but "doing math" is still very much a
| human game.
| busyant wrote:
| > What part do you think is going to become obsolete?
|
| Thank you for the question.
|
| I guess what I'm saying is:
|
| Will LLMs (or whatever comes after them) be _so_ good and
| _so_ pervasive that we will simply be able to say, "Hey
| ChatGPT-9000, I'd like to see if the xyz conjecture is
| correct." And then ChatGPT-9000 just does the work without us
| contributing beyond asking a question.
|
| Or will the technology be limited/bound in some way such that
| we will still be able to use ChatGPT-9000 as a tool of our
| own intellectual augmentation and/or we could still
| contribute to research even without it.
|
| Hopefully, my comment clarifies my original post.
|
| Also, writing this stuff has helped me think about it more. I
| don't have any grand insight, but the more I write, the more
| I lean toward the outcome that these machines will allow us
| to augment our research.
| hyhconito wrote:
| Let's put it this way, from another mathematician, and I'm sure
| I'll probably be shot for this one.
|
| Every LLM release moves half of the remaining way to the
| minimum viable goal of replacing a third class undergrad. If
| your business or research initiative is fine with that level of
| competence then you will find utility.
|
| The problem is that I don't know anyone who would find that
| useful. Nor does it fit within any existing working methodology
| we have. And on top of that the verification of any output can
| take considerably longer than just doing it yourself in the
| first place, particularly where it goes off the rails, which it
| does all the time. I mean it was 3 months ago I was arguing
| with a model over it not understanding place-value systems
| properly, something we teach 7 year olds here?
|
| But the abstract problem is at a higher level. If it doesn't
| become a general utility for people outside of mathematics,
| which is very very evident at the moment by the poor overall
| adoption and very public criticism of the poor result quality,
| then the funding will dry up. Models cost lots of money to
| train and if you don't have customers it's not happening and no
| one is going to lend you the money any more. And then it's
| moot.
| binarymax wrote:
| This is a great point that nobody will shoot you over :)
|
| But the main question is still: assuming you replace an
| undergrad with a model, who checks the work? If you have a
| good process around that already, and find utility as an
| augmented system, then get you'll get value - but I still
| think it's better for the undergrad to still have the job and
| be at the wheel, and does things faster and better when
| leveraging a powerful tool.
| hyhconito wrote:
| Shot already for criticising the shiny thing (happened with
| crypto and blockchain already...)
|
| Well to be fair no one checks what the graduates do
| properly, even if we hired KPMG in. That is until we get
| sued. But at least we have someone to blame then. What we
| don't want is something for the graduate to blame. The buck
| stops at someone corporeal because that's what the
| customers want and the regulators require.
|
| That's the reality and it's not quite as shiny and happy as
| the tech industry loves to promote itself.
|
| My main point, probably cleared up with a simple point: no
| one gives a shit about this either way.
| peterbonney wrote:
| If you looked at how the average accountant spent their time
| before the arrival of the digital spreadsheet, you might have
| predicted that automated calculation would make the profession
| obsolete. But it didn't.
|
| This time could be different, of course. But I'll need a lot
| more evidence before I start telling people to base their major
| life decisions on projected technological change.
|
| That's before we even consider that only a very slim minority
| of the people who study math (or physics or statistics or
| biology or literature or...) go on to work in the field of math
| (or physics or statistics or biology or literature or...). AI
| could completely take over math research and still have next to
| impact on the value of the skills one acquires from studying
| math.
|
| Or if you want to be more fatalistic about it: if AI is going
| to put everyone out of work then it doesn't really matter what
| you do now to prepare for it. Might as well follow your
| interests in the meantime.
| blagie wrote:
| It's important to base life decisions on very real
| technological change. We don't know what the change will be,
| but it's coming. At the very least, that suggests more
| diverse skills.
|
| We're all usually (but not always) better off, with more
| productivity, eventually, but in the meantime, jobs do
| disappear. Robotics did not fully displace machinists and
| factory workers, but single-skilled people in Detroit did not
| do well. The loom, the steam engine... all of them displaced
| often highly-trained often low-skilled artisans.
| rafaelmn wrote:
| If AI reaches this level socioeconomic impact is going to
| be so immense, that choosing what subject you study will
| have no impact on your outcome - no matter what it is - so
| it's a pointless consideration.
| jokoon wrote:
| I wish scientists who do psychology and cognition of actual
| brains could approach those AI things and talk about it, and
| maybe make suggestions.
|
| I really really wish AI would make some breakthrough and be
| really useful, but I am so skeptical and negative about it.
| joe_the_user wrote:
| Unfortunately, the scientists who study actually brains have
| all sort of interesting models but ultimately very little clue
| _how_ these actual brains work at the level of problem solving.
| I mean, there 's all sort of "this area is associated with that
| kind of process" and "here's evidence this area does this
| algorithm" stuff but it's all at the level you imagine steam
| engine engineers trying to understand a warp drive.
|
| The "open worm project" was an effort years ago to get computer
| scientists involved in trying to understand what "software" a
| very small actual brain could run. I believe progress here has
| been very slow and that an idea of ignorance that much larger
| brains involve.
|
| https://en.wikipedia.org/wiki/OpenWorm
| bongodongobob wrote:
| If you can't find useful things for LLMs or AI at this point,
| you must just lack imagination.
| 0points wrote:
| > How much longer this will go on for nobody knows, but there are
| lots of people pouring lots of money into this game so it would
| be a fool who bets on progress slowing down any time soon.
|
| Money cannot solve the issues faced by the industry which mainly
| revolves around lack of training data.
|
| They already used the entirety of the internet, all available
| video, audio and books and they are now dealing with the fact
| that most content online is now generated by these models, thus
| making it useless as training data.
| charlieyu1 wrote:
| One thing I know is that there wouldn't be machines entering IMO
| 2025. The concept of "marker" does not exist in IMO - scores are
| decided by negotiations between team leaders of each country and
| the juries. It is important to get each team leader involved for
| grading the work of students for their country, for
| accountability as well as acknowledging cultural differences. And
| the hundreds of people are not going to stay longer to grade AI
| work.
| witnesser2 wrote:
| I was not refuted sufficiently a couple of years ago. I claimed
| "training is open boundary" etc.
| witnesser2 wrote:
| Like as a few years ago, I just boringly add again "you need
| modeling" to close it.
___________________________________________________________________
(page generated 2024-12-23 23:00 UTC)