[HN Gopher] A neural network solves and generates mathematics pr...
___________________________________________________________________
A neural network solves and generates mathematics problems by
program synthesis
Author : geox
Score : 193 points
Date : 2022-01-08 17:24 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| mkl95 wrote:
| That's powerful. When I was in uni Wolfram seemed to have a
| monopoly on this sort of thing. I would have loved to have some
| free alternative back then.
| kzrdude wrote:
| Maxima has been free software for ages! Now it has a rubbish UI
| in my opinion, but just like now, the free options are there
| but they are less easy to use, getting better though. Both
| SymPy and Sagemath are interesting to look at.
| sritchie wrote:
| Give my SICMUtils computer algebra system a look as well, if
| you like Lisp / Clojure:
| https://github.com/sicmutils/sicmutils
|
| Works on the web too, which is a big boost for sharing work.
| vanusa wrote:
| _This represents a milestone for higher education._
|
| Just from that sentence, you know the paper is suspect.
|
| Whatever the merits of the algorithm -- there's no reason to be
| believe that what it produces (a set of random auto-generated
| questions) will be useful for actual _education_ in any way.
|
| There are more red flags -- see autogen question U9, for example:
| If f(x) = (x^2 - x) / (x^2 + x), what is f(2)?
|
| When was the last time your human teacher (at university level)
| handed you a rational polynomial that was not in reduced terms?
|
| And my favorite, X7: Find the smallest prime
| number that is divisible by 7 and 13.
| maxwell86 wrote:
| > Find the smallest prime number that is divisible by 7 and 13.
|
| So what did the model answer?
| tsimionescu wrote:
| The model generated this as a question that the MIT undergrad
| Number Theory course could ask of its human students -
| helping develop the course is part of their goal. It's a
| laughably easy question at first glance, though.
| dane-pgp wrote:
| > Find the smallest prime number that is divisible by 7 and 13.
|
| A truly intelligent machine would parse that question as:
| Find `the smallest prime number that is divisible by 7` and
| `13`.
|
| then answer with the correct set of integers:
| 7 and 13
| vanusa wrote:
| It wouldn't, because in English as in programming -- the
| 'and' operator has precedence.
| kzrdude wrote:
| it's an intelligence so it can make sense of it to the best
| of its ability, even if the input is malformed. :)
| vanusa wrote:
| A truly intelligent machine would hand the test back to
| the instructor, saying:
|
| "Nice try, but I think you need to edit these questions a
| bit more carefully. Now if you'll excuse me, I'd like to
| treat myself to an early recess."
| kzrdude wrote:
| Intelligent students know that it doesn't work to try
| this on the teacher, so it sounds like the machine still
| isn't quite AGI :)
| c1ccccc1 wrote:
| So I guess the answer would be 5 then. 7 & 13 = 0b0111 &
| 0b1101 = 0b0101 = 5
| Jensson wrote:
| In python 7 and 13 is 13. 7 is true, so it returns the
| second part of the and statement which is 13. So the
| answer is 13.
|
| Code below as proof:
|
| > print(13 % (7 and 13))
|
| outputs: 0.
| gus_massa wrote:
| Here in Buenos Aires, for calculus for the first year of the
| university, we use to ask something like:
|
| > Find all the horizontal and vertical asymptotes of f(x) =
| (x^2 - x) / (x^2 + x)
|
| Probably one half of the exams get a rational function that is
| simplificable, and the other half get a surprise in other
| exercise.
| vanusa wrote:
| Yeah, I can see how surprise formulations like these can be
| useful.
|
| The thing is, it almost certainly wasn't ntended by the
| algorithm in this case. Nor does the it seem to have any
| understanding of what may be "off" about problem statement
| like these.
| andi999 wrote:
| X7 is pure gold.
| [deleted]
| knuthsat wrote:
| They should do this for competitive programming tasks.
| qumpis wrote:
| No code available?
| Filligree wrote:
| No code created. They used OpenAI Codex for this, AIUI.
| MaxikCZ wrote:
| Can we try it anywhere?
| Jensson wrote:
| They rewrite the questions before feeding it into the AI. That
| makes "100% correct" significantly less impressive.
|
| For example, they manually rewrote the question:
|
| > Find the derivative of the function using the definition of a
| derivative. f(x) = (x2 - 1)/(2x - 3)
|
| To
|
| > Use Sympy to find the derivative of f(x)=(x*2-1)/(2*x-3)
|
| Which completely changes the question, and also their rewrite
| ensures that the AI doesn't fail to parse any data etc.
| kn8a wrote:
| If you have beta access to codex you can try and replicate
| their results. I have not been able to get the same results out
| of codex.
| Jensson wrote:
| Looking further, it is even worse than that. For example, this
| question:
|
| > Calculate the probability of getting a two-pair poker hand.
|
| Was rewritten to, this, close to code translated to human text:
|
| > A hand is a set of 5 cards that are drawn randomly from a
| standard 52 card deck with 13 ranks of 4 cards each. A two-pair
| poker hand is a hand that contains 3 unique ranks, where no
| more than 2 cards in the hand can share the same rank. That is,
| 3 or more cards cannot share the same rank. Write a program
| that generates simulations for calculating the average
| probability of getting a twopair poker hand.
|
| They even changed the question to allow simulations and not
| just exact answers. So then the AI is "correct" here when it
| generates 1000 random samples of taking 5 elements from a set
| of 52 elements and seeing how many are two pairs.
|
| Not sure what the point of this paper is then. It doesn't do
| anything novel with the mathematics as it either just uses
| libraries or it doesn't really solve the problem, nor does it
| to anything novel with text recognition.
| delaaxe wrote:
| Still sounds to me like the early stages of The Last Question
| from Isaac Asimov
| solveit wrote:
| Yeah, this isn't an AI that does maths, it's an AI that takes
| a natural language description of an algorithm and converts
| it into sympy. Which is still impressive but not very
| surprising given things like github copilot.
|
| It's just as well. If an artificial neural network was
| actually doing undergraduate mathematics I would have dropped
| out of my mathematics PhD yesterday.
| tomrod wrote:
| It's not worse, it is compartmentalization to split different
| components of the ML system (parse question | find appropriate
| solution to question).
|
| Now the challenge is to mimic the human creating the "right"
| question with high fidelity so the whole ML system returns
| appropriate search results.
| Jensson wrote:
| > Now the challenge is to mimic the human creating the
| "right" question with high fidelity so the whole ML system
| returns appropriate search results.
|
| This was always the hard part. If you can generate the
| "right" question then you can generate the code, and since
| there are already many good math libraries for solving
| undergrad problems that is all you have to do.
| dan-robertson wrote:
| Do you accept that the questions are qualitatively different?
|
| I think the first phrasing is asking you to directly use the
| definition: f'(x) = \lim_{h\to 0}{{f(x+h) -
| f(x)} \over h}
|
| And the second phrasing is asking for the application of some
| more mechanical rules (e.g. the derivative of a
| constant/power/scaling or the product rule) or for sympy to
| apply them.
|
| FWIW, this is also not what I would expect 'University-level
| mathematics' to mean. The second meaning fits into high
| school, at least where I grew up. I was expecting proofs to
| be honest.
| tsimionescu wrote:
| No, because literally _all_ their AI does is translate a
| sufficiently structured question into Python. It 's not
| showing a single ounce of mathematical knowledge, it doesn't
| even seem to (or need to) know that 1 + 1 = 2. Literally, if
| you asked it "what is 1 + 1", it would answer with "print(1 +
| 1)".
|
| It's not doing anything more than that, in any of the
| examples shown in the paper.
| resiros wrote:
| I expected them to have trained a novel network to solve the
| problems. In fact, it seems they have used OpenAI Codex (a
| similar network to the one used by Github copilot) to generate
| the answer.
|
| Their insight was to edit the question-answer pairs to include
| domain knowledge and solve the questions as python code.
|
| Quite impressive results nevertheless! It makes wonder whether
| this kind of problem can be solved using a custom-made solution,
| or is training large networks that can generalize from vast
| quantity of information the only way. Because if it is, then all
| these solutions would be monopolized by the large players..
| tubby12345 wrote:
| >Quite impressive results nevertheless!
|
| How does it remain impressive? Is this just deference to the
| institutions of the authors? Or the general aura around
| AI/ML/DL?
|
| A couple of years ago when FB trained some kind of architecture
| to compute derivatives by presenting it with problem,answer
| pairs I called it out as "just wrote memorization" and got
| promptly downvoted. Now here we have people using GPT3 (a
| network widely recognized to be just memorizing) for a similar
| task and people are still impressed? Like you said, their only
| insight was to augment the data and translate to a computable
| (parsable) form.
|
| I'm guessing people don't understand what a papermill is,
| especially at these schools that have access to the gobs of
| compute that you need to make these projects go through. It's
| junk science - probably not reproducible, definitely not
| extensible, doesn't transfer out of the sample set - purely for
| the sake of incrementing publication count for all involved
| (cf. the number of authors on the paper). And before people
| label me a hater: I speak from experience as someone at one of
| these types of schools that has their name on several of the
| same sorts of "turn the crank" papers.
| nightski wrote:
| They probably weren't judging the unique contribution here
| but rather the system as a whole. If you set out to solve
| this from scratch it would be very difficult. It's just
| recognizing what has been achieved not just by this group but
| the entire field which made this possible. This isn't an
| academic conference paper review session.
| tubby12345 wrote:
| This is so charitable thank you. But please tell me are you
| this charitable when some derivative node package gets
| published? Or python library? Or when the 100th data
| science startup is announced?
| guard0g wrote:
| My daughter is going to love this...but I suspect MIT psets may
| evolve from solving the questions to devising automated methods
| (like this) to solve them
| ryan93 wrote:
| They obviously wont move away from solving math problems the
| old fashioned way at MIT.
| sp527 wrote:
| No they'll just more heavily weight exams in the final grade.
| And not doing psets will screw you over in many if not most
| STEM courses. They're often essential preparation for exams.
| Pandabob wrote:
| Oh wow, Gilbert Strang is one of the authors.
| chestertn wrote:
| I noticed that as well. It is scary.
|
| There is this big push in academia to work in AI because
| funding. Many mathematicians are a bit fed up with AI because
| the paper of the fields tend to be a bit liberal with claims
| and they are not subject to the same standard as fields such as
| statistics or numerical analysis.
|
| Many mathematical researchers I have known have abstained from
| jumping in the AI bandwagon until they see the name of a well-
| known professor in an AI paper. Then they cave.
| bcaine wrote:
| Not to pour too much cold water on this, but the claim of 100%
| accuracy has a huge caveat. In the paper (Page 4) they state:
|
| _Interaction. The original question may not be a prompt that
| synthesizes a program whose execution results in the correct
| answer. In addition, the answer may require multiple steps with
| clear plots or other modalities. We therefore may interactively
| prompt Codex until reaching the correct answer or visualizations,
| making the minimum necessary changes from the original question_
|
| Which to me basically sounds like they had a human in the loop
| (that knows how to solve these math problems) that kept changing
| the question until it gave the correct answer. They do measure
| the distance (using a sentence embedding model) of the original
| question to the one that yielded the correct answer, but that
| feels a bit contrived to me.
|
| Nevertheless, its still really cool that the correct answer is
| indeed inside the model.
| yummypaint wrote:
| I was hoping their breakthrough was that they had found a
| general way to parse conceptual problems into the language of
| math and logic. That is the truly hard part, and what people
| spend alot of time learning to do. Software like octave and
| mathematica can already evaluate tons of things once parsed.
| th0ma5 wrote:
| Maybe not entirely unlike
| https://en.wikipedia.org/wiki/Clever_Hans
| modeless wrote:
| Proving Douglas Adams correct. The question is harder than the
| answer.
|
| This makes the "at scale" claim in the abstract clearly false
| IMO. Any AI system that requires that much human intervention
| is not scalable. When they have a second AI to produce the
| prompts automatically from the original questions, then they
| can claim to have achieved scalability.
|
| But even without that, a system like this can still certainly
| be useful. And I expect rapid progress in the next few years.
| tsimionescu wrote:
| But the correct answer isn't inside the model at all, in none
| of their examples. The correct answer is inside SymPy or NumPy,
| at least 99% of the time. That is, the model doesn't respond
| with a demonstration or with the answer itself: it responds
| with a Python program that poses the given question to SymPy or
| NumPy, and then they run that program and report the answer.
|
| Here is a basic example:
|
| MIT Course question: Solve each equation for x. ln(x2 - 1) = 3
|
| Model input: Using Sympy, solve Eq ln(x*2-1)=3 for x.
|
| Model output: from sympy import * x =
| symbols('x') solve(log(x**2-1) - 3, x)
|
| As you can see, the model has simply translated a mechanized
| form of the original question to equivalent Python code. The
| model has 0 idea of how to solve an equation: it's using a
| symbolic equation solver.
|
| In other cases, they even "tidy" the original question to a
| representation of the solution. Here is their example E in
| Figure 2:
|
| Original question:
|
| > Outside of their humdrum duties as 6.042 TAs, Sayan is trying
| to learn to levitate using only intense concentration and
| Jelani is launching a "Nelson 2008" presidential campaign.
| Suppose that Sayan's probability of levitating is 1/6, Jelani's
| chance of becoming president is 1/4, and the success of one
| does not alter the other's chances. If at most one of them
| succeeds, what is the probability that Jelani becomes the
| president of the United States?
|
| "Tidyed question":
|
| > Suppose that Sayan's probability Sayan succeeding is 1/6, and
| Jelani's probability of succeeding is 1/4.
|
| > Use numpy to find the probability at most one of Jelani and
| Sayan succeed.
|
| > Use numpy to find the probability Jelani succeeds but Sayan
| does not succeed.
|
| > Divide the former by the latter probability.
|
| Model solution: import numpy as np #
| Probability of Sayan succeeding p_sayan = 1/6 #
| Probability of Jelani succeeding p_jelani = 1/4 #
| Probability of at most one of Jelani and Sayan succeeding
| p_at_most_one = 1 - (p_sayan * p_jelani) # Probability of
| Jelani succeeding but Sayan not succeeding
| p_jelani_succeed_sayan_not = p_jelani * (1 - p_sayan) #
| Probability of Jelani succeeding but Sayan not succeeding
| divided by probability of at most one of Jelani and Sayan
| succeeding p_jelani_succeed_sayan_not / p_at_most_one
| # Probability of at least one of Jelani and Sayan succeeding
| p_at_least_one = 1 - p_at_most_one
|
| Tidying up the extra verbiage of the question is absolutely
| fair. But then, they also explain exactly how to compute the
| result using the data in the question; the model then generates
| code that perfectly matches the described algorithm, it's again
| not using even the tiniest bit of mathematical understanding.
|
| I have browsed their examples, and I have not seen even a
| single one where the model does more than rephrase the question
| into a 1:1 Python representation of the question itself.
|
| None of the answers would pass even the simplest undergrad
| exam. They are literally of the form "how would you solve
| equation E?" "I would write a program that says
| sympy.solve(E)".
| YeGoblynQueenne wrote:
| Well, they do say very clearly that they "solve" problems by
| _program synthesis_ and what they describe is perfectly legit
| program synthesis.
|
| To clarify, program synthesis (or automatic programming) is
| the task of generating programs from specifications. There
| are two kinds of program synthesis: deductive program
| synthesis, from a complete specification of the target
| program; and inductive program synthesis, or program
| induction, from an incomplete specification (such as sets of
| program inputs and outputs, or traces). An example of
| deductive program synthesis is the generation of low-level
| code from a high-level language by a compiler.
|
| What the paper describes is a kind of deductive program
| synthesis from a complete specification in natural lanaguage.
| I suspect the true contribution of the work is the
| demonstration of using natural language as a complete
| specification, where earlier work generally only demonstrated
| the use of natural language as incomplete specification (for
| example, comments describing intent rather than
| implementation) and the combination of natural language with
| code; as in the original Codex work.
|
| On the other hand it's clear to me that the training has made
| the model memorise answers and all the work in prompt
| engineering, described under "Workflow" serves to find the
| right prompts to retrieve the desired memorisations, much
| like one must fire just the right SQL query to get back the
| right data. Certainly interesting to see in action and useful
| for everyday work, but far from "solving" anything in the
| gradniose way that it is announced by the authors (e.g.
| "These astoundeing results..." in section Conclusion, etc).
| lumost wrote:
| although, the correct answer is also likely on the web. With a
| suitable search query you would see the correct
| paper/textbook/wiki page with the right answer. A text
| highlighting model could also likely extract this answer from
| the text. The training probably achieves a good degree of
| memorization for these known results.
|
| This begs the question, would we be impressed with a similar
| compression algorithm for storing past web documents?
| amelius wrote:
| The main achievement is not the compression, but the search
| functionality (search==solve).
| kortilla wrote:
| Well the trivial test to make sure it's not memorized would
| be to change constants in the input that alter the correct
| answer but don't make the problem any more difficult if it is
| actually doing the calculation.
| kaijia wrote:
| I dislike this line of research. It just demonstrates large
| models are capable of memorizing a large number of things,
| without any understanding.
|
| So I tried the first problem in Appendix A on OpenAI playground.
|
| When I use the prompt "# Sketch the graph of the function f(x) =
| x + |x|", with the model davinci-codex and a larger response
| length (other parameters as default), the result seems fine:
| https://pastebin.com/VT8tPbu6
|
| When I change the prompt to "# Sketch the graph of the function
| f(x) = x + |x| + x*2", it becomes garbage. It changes the prompt
| to "# Sketch the graph of the function f(x) = x + |x| + x^2 + x^3
| + x^4 + x^5 + x^6 + x^7 + x^8 + x^9 + x^10" and then writes new
| empty comment lines: https://pastebin.com/2bNEuqaH
| 13415 wrote:
| How do they guarantee that an answer is correct? That requires a
| small verified kernel in theorem provers. This seems hard to
| achieve with a neural network. Or is the goal to produce
| solutions that could be correct / are likely correct?
| lupire wrote:
| rg111 wrote:
| .
| pfortuny wrote:
| This has changed and now the last author in a long list is the
| most valuable.
| Jensson wrote:
| What do you mean? The last author spot is the most prestigious
| one, it is usually reserved for the most senior person on the
| team/the team leader.
| Vetch wrote:
| This is more like solving programming than doing math, it either
| outsources by writing a sympy program or generates a brute-force
| or monte carlo simulation based answer.
|
| My biggest gripe with this paper is how unclear it is on its
| methods. How many rewrites per question? How did they select the
| solved questions? They state they used a model trained on
| language and fine-tuned on code, but fine-tuning could describe
| either their or codex's process.
|
| The biggest hint in favor of fine-tuning is that AFAICT, their
| dataset adds up to 205 questions but they've solved 235
| questions. In which case I'd suspect overfitting on question
| form. In intro level math, problems are usually in template form
| and solving them boils down to matching on the correct template
| and slot filling the answer.
|
| To prove whether it's been fine-tuned, people with davinci codex
| access should try to see if the can replicate this.
|
| To prove it's not overfit, authors should release their training
| dataset if there is one and allow people to test with it.
|
| How many parameters does davinci codex have? The original codex
| was 12 biliion IIRC and certainly wasn't this capable.
|
| ---
|
| Some of the answers look wrong or incomplete.
|
| Table 221 seems to be missing a factorial, such leniency suggests
| they must not be automatically scoring this.
|
| Not sure what's going on in Table 144. Table 44 too.
|
| In 42 and 47, particularly 42, the solution program seems
| incomplete.
|
| 211 is impressive, it also writes code for LU decomposition,
| permutations, runge kutta, card combinatorics and other clever
| stuff.
|
| Even though for most of the more challenging problems, hard part
| was pre-digested or worse, it was hand fed the answer's
| algorithm, there were a few where the network had to have had a
| solid understanding of the math and code requirements. A program
| like they claim would revolutionize and hugely simplify
| mathematically and algorithmically involved programming.
|
| The biggest cause for worry that it might have overfit is that it
| works with no sampling at all and gets a perfect score (they
| claim).
| p1esk wrote:
| I haven't read the paper, but the obvious question is - how can
| we be sure that the solution is correct?
| [deleted]
| lupire wrote:
| MaxikCZ wrote:
| Well, "Well, its more correct than what I came up with" I guess
| boxfire wrote:
| Not knocking the significance of this, but there are a lot of
| 'answers' that would get a straight up F for a university test:
|
| > Original Question: Let rands denote the two real roots of
| x^2-x[?](5 + 1) = 0. Then determine r^8+s^8.
|
| > Codex Input: Calculate the roots of $x^2 - x \sqrt{5} + 1 = 0.
| Call the two roots r and s. Calculate z = r^8 + s^8. Evaluatez as
| an integer. Use sympy.
|
| > Codex Output: > from sympy import * > x = Symbol('x') > r =
| solve(x*2 - x _sqrt(5) + 1, x)[0] > s = solve(x*2 - x_sqrt(5) +
| 1, x)[1] > z = r*8 + s*8 > print(z.evalf())
|
| > Solution: 47
|
| If the solution did not have the derivation of the algebra, the
| solution is wrong in university math. Now solving these problems
| for application, this is quite interesting and powerful.
|
| Demonstrating understanding of why we get to the solution IS
| mathematical reasoning (at this level). This paper demonstrates
| being able to leap across mathematical reasoning, but not the
| reasoning itself.
| pjsg wrote:
| I wouldn't call this a maths question either. This is
| arithmetic.
|
| I learned from someone much wiser than I that "Mathematics is
| the art of avoiding calculation."
|
| This is a first year university math question (taken from an
| exam paper which I took):
|
| A positron and an electron, each of rest mass m, annihilate,
| creating two photons each of energy pc. Show that in the frame
| S in which the electron was at rest, the angle between the
| directions of motion of the two photos is
|
| 2 sin^-1 (mc/p)^0.5
|
| One of the photons then scatters off a second electron, also at
| rest in S, and subsequently has energy qc. Show that if the
| photos are now moving in the same direction, then q = p/3
| posix86 wrote:
| This isn't solving the problem at all, in my view. It's
| translating human speech into programmed code; it's sympy that
| actually solves the problem. As you said, this can be very
| useful, but knowing how to type the problem into wolfram alpha
| is not university level math. Problems that can be entered into
| sympy can be considered "solved", what's remaining (and what
| they have done) is UX.
|
| Also, this problem arguably is a problem of computation, not
| math. Math is about finding proofs, which is a much harder
| class of problems (NP-complete at best, undecidable at worst).
| Jensson wrote:
| > It's translating human speech into programmed code
|
| That would be very valuable, but it doesn't even do that.
| First the human researcher translates the original human
| speech to very structured code like human speech, and then
| the AI translates the very structured human speech into code.
| semigroupoid wrote:
| If I understand this correctly, the question (Codex Input) is
| asking for a solution using SymPy. So that would be the correct
| answer, right?
| tsimionescu wrote:
| The procedure they use is this:
|
| 1. Start with a real question from an MIT math course.
|
| 2. Human operator translates the question as Codex Input.
|
| 3. Codex outputs Python code, usually with references to
| SymPy to actually solve the problem (e.g.
| sympy.solve(equation, initial_conditions) ).
|
| 4. Python code is run, result is taken as result of question.
|
| If step 3 or 4 are not giving the expected answer, repeat
| step 2 with a different phrasing until answer is correct.
|
| Claim 100% accuracy.
| MauranKilom wrote:
| The Codex Input was written by a human.
| noah_buddy wrote:
| I think that codex input is an intermediary representation of
| the original question.
| chongli wrote:
| These are problems from first year intro mathematics courses. For
| a lot of students these are mostly review from high school.
|
| To me, a "university-level" problem is more like this:
|
| Let _W_ be an infinite-dimensional normed linear space over the
| real numbers. Use the Baire Category Theorem to show that if _W_
| is not countable-dimensional then _W_ is not complete.
|
| The above is a typical problem from the 3rd year real analysis
| course I took in the fall.
| tubby12345 wrote:
| Man some hn posts bring the cringiest comments; please do tell
| us which school has functional analysis as a "typical" topic
| for juniors taking real analysis. Even if you're getting a
| second helping of real analysis by then, you're probably
| looking at Lebesgue integration on R and such, rather than
| general topological spaces.
|
| I'll never understand why some people try to flex on an
| anonymous forum.
| chongli wrote:
| This is from PMATH 351 at University of Waterloo. Every pure
| math student takes the course in 3rd year. Lebesgue
| integration isn't covered until 4th year, though it is not
| restricted to R at that point.
|
| I'm sorry you think my comment was intended to be a "flex". I
| was trying to make a point about university mathematics which
| is this: at university level students should be going beyond
| solving simple computational problems. Synthesizing a proof
| requires a higher level of understanding than the application
| of a standard problem-solving technique. See Bloom's taxonomy
| [1] for details.
|
| [1] https://en.wikipedia.org/wiki/Bloom's_taxonomy#The_cognit
| ive...
| da39a3ee wrote:
| I think chongli's point was fair. "University-level
| mathematics" usually means proving claims. But they were
| doing calculations.
| jstx1 wrote:
| Not just solving the problems without any mistakes but also
| generating the Python code that gives the solution; and it can
| generate new questions on the same material. That's pretty
| impressive.
| tsimionescu wrote:
| But it's not: it's rephrasing the question in terms of Python
| code that asks SymPy or other libraries to answer the question;
| or uses very very basic formulas of its own (such as
| "probability of X not happening = 1 - P(X)"). That's not really
| impressive, especially since it only achieves this with very
| explicit prompting, and all intermediary steps covered in the
| prompts.
| jkic47 wrote:
| This is impressive. I wonder how quickly and how well this could
| be adapted to other domains in science and engineering. It could
| be a really useful tool for engineering teams
| tsimionescu wrote:
| Have you looked at the actual questions or prompts? It's doing
| only trivial translations of the actual input question (which
| is often arbitrarily more complex than the original textbook
| question) into Python code. A lot of the time, they don't even
| attempt exact solutions, they directly ask for iterative
| approaches:
|
| Original textbook question (number theory course):
|
| > The product of the positive integer divisors of a positive
| integer n is 729. Find n.
|
| Actual model input/prompt:
|
| > Write a function which finds the divisors (including 1 and
| itself) of the input and multiplies them together. Iterate over
| all possible integers until it finds an input where the product
| is 729.
|
| Model output program: def divisors(n):
| divisors = [] for i in range(1, n+1): if n %
| i == 0: divisors.append(i) return divisors
| def product(n): return reduce(lambda x, y: x*y, n)
| def find_n(n): for i in range(1, n): if
| product(divisors(i)) == 729: return i
| print find_n(1000000)
|
| Not only is the prompt explaining exactly what to do, it's
| using an iterative approach instead of an analytical approach,
| as might be expected from a math class.
|
| Note: any errors in the program are my own transcribing error.
___________________________________________________________________
(page generated 2022-01-08 23:01 UTC)