[HN Gopher] A neural network solves and generates mathematics pr...
       ___________________________________________________________________
        
       A neural network solves and generates mathematics problems by
       program synthesis
        
       Author : geox
       Score  : 193 points
       Date   : 2022-01-08 17:24 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | mkl95 wrote:
       | That's powerful. When I was in uni Wolfram seemed to have a
       | monopoly on this sort of thing. I would have loved to have some
       | free alternative back then.
        
         | kzrdude wrote:
         | Maxima has been free software for ages! Now it has a rubbish UI
         | in my opinion, but just like now, the free options are there
         | but they are less easy to use, getting better though. Both
         | SymPy and Sagemath are interesting to look at.
        
           | sritchie wrote:
           | Give my SICMUtils computer algebra system a look as well, if
           | you like Lisp / Clojure:
           | https://github.com/sicmutils/sicmutils
           | 
           | Works on the web too, which is a big boost for sharing work.
        
       | vanusa wrote:
       | _This represents a milestone for higher education._
       | 
       | Just from that sentence, you know the paper is suspect.
       | 
       | Whatever the merits of the algorithm -- there's no reason to be
       | believe that what it produces (a set of random auto-generated
       | questions) will be useful for actual _education_ in any way.
       | 
       | There are more red flags -- see autogen question U9, for example:
       | If f(x) = (x^2 - x) / (x^2 + x), what is f(2)?
       | 
       | When was the last time your human teacher (at university level)
       | handed you a rational polynomial that was not in reduced terms?
       | 
       | And my favorite, X7:                   Find the smallest prime
       | number that is divisible by 7 and 13.
        
         | maxwell86 wrote:
         | > Find the smallest prime number that is divisible by 7 and 13.
         | 
         | So what did the model answer?
        
           | tsimionescu wrote:
           | The model generated this as a question that the MIT undergrad
           | Number Theory course could ask of its human students -
           | helping develop the course is part of their goal. It's a
           | laughably easy question at first glance, though.
        
         | dane-pgp wrote:
         | > Find the smallest prime number that is divisible by 7 and 13.
         | 
         | A truly intelligent machine would parse that question as:
         | Find `the smallest prime number that is divisible by 7` and
         | `13`.
         | 
         | then answer with the correct set of integers:
         | 7 and 13
        
           | vanusa wrote:
           | It wouldn't, because in English as in programming -- the
           | 'and' operator has precedence.
        
             | kzrdude wrote:
             | it's an intelligence so it can make sense of it to the best
             | of its ability, even if the input is malformed. :)
        
               | vanusa wrote:
               | A truly intelligent machine would hand the test back to
               | the instructor, saying:
               | 
               | "Nice try, but I think you need to edit these questions a
               | bit more carefully. Now if you'll excuse me, I'd like to
               | treat myself to an early recess."
        
               | kzrdude wrote:
               | Intelligent students know that it doesn't work to try
               | this on the teacher, so it sounds like the machine still
               | isn't quite AGI :)
        
             | c1ccccc1 wrote:
             | So I guess the answer would be 5 then. 7 & 13 = 0b0111 &
             | 0b1101 = 0b0101 = 5
        
               | Jensson wrote:
               | In python 7 and 13 is 13. 7 is true, so it returns the
               | second part of the and statement which is 13. So the
               | answer is 13.
               | 
               | Code below as proof:
               | 
               | > print(13 % (7 and 13))
               | 
               | outputs: 0.
        
         | gus_massa wrote:
         | Here in Buenos Aires, for calculus for the first year of the
         | university, we use to ask something like:
         | 
         | > Find all the horizontal and vertical asymptotes of f(x) =
         | (x^2 - x) / (x^2 + x)
         | 
         | Probably one half of the exams get a rational function that is
         | simplificable, and the other half get a surprise in other
         | exercise.
        
           | vanusa wrote:
           | Yeah, I can see how surprise formulations like these can be
           | useful.
           | 
           | The thing is, it almost certainly wasn't ntended by the
           | algorithm in this case. Nor does the it seem to have any
           | understanding of what may be "off" about problem statement
           | like these.
        
         | andi999 wrote:
         | X7 is pure gold.
        
       | [deleted]
        
       | knuthsat wrote:
       | They should do this for competitive programming tasks.
        
       | qumpis wrote:
       | No code available?
        
         | Filligree wrote:
         | No code created. They used OpenAI Codex for this, AIUI.
        
           | MaxikCZ wrote:
           | Can we try it anywhere?
        
       | Jensson wrote:
       | They rewrite the questions before feeding it into the AI. That
       | makes "100% correct" significantly less impressive.
       | 
       | For example, they manually rewrote the question:
       | 
       | > Find the derivative of the function using the definition of a
       | derivative. f(x) = (x2 - 1)/(2x - 3)
       | 
       | To
       | 
       | > Use Sympy to find the derivative of f(x)=(x*2-1)/(2*x-3)
       | 
       | Which completely changes the question, and also their rewrite
       | ensures that the AI doesn't fail to parse any data etc.
        
         | kn8a wrote:
         | If you have beta access to codex you can try and replicate
         | their results. I have not been able to get the same results out
         | of codex.
        
         | Jensson wrote:
         | Looking further, it is even worse than that. For example, this
         | question:
         | 
         | > Calculate the probability of getting a two-pair poker hand.
         | 
         | Was rewritten to, this, close to code translated to human text:
         | 
         | > A hand is a set of 5 cards that are drawn randomly from a
         | standard 52 card deck with 13 ranks of 4 cards each. A two-pair
         | poker hand is a hand that contains 3 unique ranks, where no
         | more than 2 cards in the hand can share the same rank. That is,
         | 3 or more cards cannot share the same rank. Write a program
         | that generates simulations for calculating the average
         | probability of getting a twopair poker hand.
         | 
         | They even changed the question to allow simulations and not
         | just exact answers. So then the AI is "correct" here when it
         | generates 1000 random samples of taking 5 elements from a set
         | of 52 elements and seeing how many are two pairs.
         | 
         | Not sure what the point of this paper is then. It doesn't do
         | anything novel with the mathematics as it either just uses
         | libraries or it doesn't really solve the problem, nor does it
         | to anything novel with text recognition.
        
           | delaaxe wrote:
           | Still sounds to me like the early stages of The Last Question
           | from Isaac Asimov
        
           | solveit wrote:
           | Yeah, this isn't an AI that does maths, it's an AI that takes
           | a natural language description of an algorithm and converts
           | it into sympy. Which is still impressive but not very
           | surprising given things like github copilot.
           | 
           | It's just as well. If an artificial neural network was
           | actually doing undergraduate mathematics I would have dropped
           | out of my mathematics PhD yesterday.
        
         | tomrod wrote:
         | It's not worse, it is compartmentalization to split different
         | components of the ML system (parse question | find appropriate
         | solution to question).
         | 
         | Now the challenge is to mimic the human creating the "right"
         | question with high fidelity so the whole ML system returns
         | appropriate search results.
        
           | Jensson wrote:
           | > Now the challenge is to mimic the human creating the
           | "right" question with high fidelity so the whole ML system
           | returns appropriate search results.
           | 
           | This was always the hard part. If you can generate the
           | "right" question then you can generate the code, and since
           | there are already many good math libraries for solving
           | undergrad problems that is all you have to do.
        
           | dan-robertson wrote:
           | Do you accept that the questions are qualitatively different?
           | 
           | I think the first phrasing is asking you to directly use the
           | definition:                 f'(x) = \lim_{h\to 0}{{f(x+h) -
           | f(x)} \over h}
           | 
           | And the second phrasing is asking for the application of some
           | more mechanical rules (e.g. the derivative of a
           | constant/power/scaling or the product rule) or for sympy to
           | apply them.
           | 
           | FWIW, this is also not what I would expect 'University-level
           | mathematics' to mean. The second meaning fits into high
           | school, at least where I grew up. I was expecting proofs to
           | be honest.
        
           | tsimionescu wrote:
           | No, because literally _all_ their AI does is translate a
           | sufficiently structured question into Python. It 's not
           | showing a single ounce of mathematical knowledge, it doesn't
           | even seem to (or need to) know that 1 + 1 = 2. Literally, if
           | you asked it "what is 1 + 1", it would answer with "print(1 +
           | 1)".
           | 
           | It's not doing anything more than that, in any of the
           | examples shown in the paper.
        
       | resiros wrote:
       | I expected them to have trained a novel network to solve the
       | problems. In fact, it seems they have used OpenAI Codex (a
       | similar network to the one used by Github copilot) to generate
       | the answer.
       | 
       | Their insight was to edit the question-answer pairs to include
       | domain knowledge and solve the questions as python code.
       | 
       | Quite impressive results nevertheless! It makes wonder whether
       | this kind of problem can be solved using a custom-made solution,
       | or is training large networks that can generalize from vast
       | quantity of information the only way. Because if it is, then all
       | these solutions would be monopolized by the large players..
        
         | tubby12345 wrote:
         | >Quite impressive results nevertheless!
         | 
         | How does it remain impressive? Is this just deference to the
         | institutions of the authors? Or the general aura around
         | AI/ML/DL?
         | 
         | A couple of years ago when FB trained some kind of architecture
         | to compute derivatives by presenting it with problem,answer
         | pairs I called it out as "just wrote memorization" and got
         | promptly downvoted. Now here we have people using GPT3 (a
         | network widely recognized to be just memorizing) for a similar
         | task and people are still impressed? Like you said, their only
         | insight was to augment the data and translate to a computable
         | (parsable) form.
         | 
         | I'm guessing people don't understand what a papermill is,
         | especially at these schools that have access to the gobs of
         | compute that you need to make these projects go through. It's
         | junk science - probably not reproducible, definitely not
         | extensible, doesn't transfer out of the sample set - purely for
         | the sake of incrementing publication count for all involved
         | (cf. the number of authors on the paper). And before people
         | label me a hater: I speak from experience as someone at one of
         | these types of schools that has their name on several of the
         | same sorts of "turn the crank" papers.
        
           | nightski wrote:
           | They probably weren't judging the unique contribution here
           | but rather the system as a whole. If you set out to solve
           | this from scratch it would be very difficult. It's just
           | recognizing what has been achieved not just by this group but
           | the entire field which made this possible. This isn't an
           | academic conference paper review session.
        
             | tubby12345 wrote:
             | This is so charitable thank you. But please tell me are you
             | this charitable when some derivative node package gets
             | published? Or python library? Or when the 100th data
             | science startup is announced?
        
       | guard0g wrote:
       | My daughter is going to love this...but I suspect MIT psets may
       | evolve from solving the questions to devising automated methods
       | (like this) to solve them
        
         | ryan93 wrote:
         | They obviously wont move away from solving math problems the
         | old fashioned way at MIT.
        
         | sp527 wrote:
         | No they'll just more heavily weight exams in the final grade.
         | And not doing psets will screw you over in many if not most
         | STEM courses. They're often essential preparation for exams.
        
       | Pandabob wrote:
       | Oh wow, Gilbert Strang is one of the authors.
        
         | chestertn wrote:
         | I noticed that as well. It is scary.
         | 
         | There is this big push in academia to work in AI because
         | funding. Many mathematicians are a bit fed up with AI because
         | the paper of the fields tend to be a bit liberal with claims
         | and they are not subject to the same standard as fields such as
         | statistics or numerical analysis.
         | 
         | Many mathematical researchers I have known have abstained from
         | jumping in the AI bandwagon until they see the name of a well-
         | known professor in an AI paper. Then they cave.
        
       | bcaine wrote:
       | Not to pour too much cold water on this, but the claim of 100%
       | accuracy has a huge caveat. In the paper (Page 4) they state:
       | 
       |  _Interaction. The original question may not be a prompt that
       | synthesizes a program whose execution results in the correct
       | answer. In addition, the answer may require multiple steps with
       | clear plots or other modalities. We therefore may interactively
       | prompt Codex until reaching the correct answer or visualizations,
       | making the minimum necessary changes from the original question_
       | 
       | Which to me basically sounds like they had a human in the loop
       | (that knows how to solve these math problems) that kept changing
       | the question until it gave the correct answer. They do measure
       | the distance (using a sentence embedding model) of the original
       | question to the one that yielded the correct answer, but that
       | feels a bit contrived to me.
       | 
       | Nevertheless, its still really cool that the correct answer is
       | indeed inside the model.
        
         | yummypaint wrote:
         | I was hoping their breakthrough was that they had found a
         | general way to parse conceptual problems into the language of
         | math and logic. That is the truly hard part, and what people
         | spend alot of time learning to do. Software like octave and
         | mathematica can already evaluate tons of things once parsed.
        
         | th0ma5 wrote:
         | Maybe not entirely unlike
         | https://en.wikipedia.org/wiki/Clever_Hans
        
         | modeless wrote:
         | Proving Douglas Adams correct. The question is harder than the
         | answer.
         | 
         | This makes the "at scale" claim in the abstract clearly false
         | IMO. Any AI system that requires that much human intervention
         | is not scalable. When they have a second AI to produce the
         | prompts automatically from the original questions, then they
         | can claim to have achieved scalability.
         | 
         | But even without that, a system like this can still certainly
         | be useful. And I expect rapid progress in the next few years.
        
         | tsimionescu wrote:
         | But the correct answer isn't inside the model at all, in none
         | of their examples. The correct answer is inside SymPy or NumPy,
         | at least 99% of the time. That is, the model doesn't respond
         | with a demonstration or with the answer itself: it responds
         | with a Python program that poses the given question to SymPy or
         | NumPy, and then they run that program and report the answer.
         | 
         | Here is a basic example:
         | 
         | MIT Course question: Solve each equation for x. ln(x2 - 1) = 3
         | 
         | Model input: Using Sympy, solve Eq ln(x*2-1)=3 for x.
         | 
         | Model output:                 from sympy import *       x =
         | symbols('x')       solve(log(x**2-1) - 3, x)
         | 
         | As you can see, the model has simply translated a mechanized
         | form of the original question to equivalent Python code. The
         | model has 0 idea of how to solve an equation: it's using a
         | symbolic equation solver.
         | 
         | In other cases, they even "tidy" the original question to a
         | representation of the solution. Here is their example E in
         | Figure 2:
         | 
         | Original question:
         | 
         | > Outside of their humdrum duties as 6.042 TAs, Sayan is trying
         | to learn to levitate using only intense concentration and
         | Jelani is launching a "Nelson 2008" presidential campaign.
         | Suppose that Sayan's probability of levitating is 1/6, Jelani's
         | chance of becoming president is 1/4, and the success of one
         | does not alter the other's chances. If at most one of them
         | succeeds, what is the probability that Jelani becomes the
         | president of the United States?
         | 
         | "Tidyed question":
         | 
         | > Suppose that Sayan's probability Sayan succeeding is 1/6, and
         | Jelani's probability of succeeding is 1/4.
         | 
         | > Use numpy to find the probability at most one of Jelani and
         | Sayan succeed.
         | 
         | > Use numpy to find the probability Jelani succeeds but Sayan
         | does not succeed.
         | 
         | > Divide the former by the latter probability.
         | 
         | Model solution:                 import numpy as np       #
         | Probability of Sayan succeeding       p_sayan = 1/6       #
         | Probability of Jelani succeeding       p_jelani = 1/4       #
         | Probability of at most one of Jelani and Sayan succeeding
         | p_at_most_one = 1 - (p_sayan * p_jelani)       # Probability of
         | Jelani succeeding but Sayan not succeeding
         | p_jelani_succeed_sayan_not = p_jelani * (1 - p_sayan)       #
         | Probability of Jelani succeeding but Sayan not succeeding
         | divided by       probability of at most one of Jelani and Sayan
         | succeeding       p_jelani_succeed_sayan_not / p_at_most_one
         | # Probability of at least one of Jelani and Sayan succeeding
         | p_at_least_one = 1 - p_at_most_one
         | 
         | Tidying up the extra verbiage of the question is absolutely
         | fair. But then, they also explain exactly how to compute the
         | result using the data in the question; the model then generates
         | code that perfectly matches the described algorithm, it's again
         | not using even the tiniest bit of mathematical understanding.
         | 
         | I have browsed their examples, and I have not seen even a
         | single one where the model does more than rephrase the question
         | into a 1:1 Python representation of the question itself.
         | 
         | None of the answers would pass even the simplest undergrad
         | exam. They are literally of the form "how would you solve
         | equation E?" "I would write a program that says
         | sympy.solve(E)".
        
           | YeGoblynQueenne wrote:
           | Well, they do say very clearly that they "solve" problems by
           | _program synthesis_ and what they describe is perfectly legit
           | program synthesis.
           | 
           | To clarify, program synthesis (or automatic programming) is
           | the task of generating programs from specifications. There
           | are two kinds of program synthesis: deductive program
           | synthesis, from a complete specification of the target
           | program; and inductive program synthesis, or program
           | induction, from an incomplete specification (such as sets of
           | program inputs and outputs, or traces). An example of
           | deductive program synthesis is the generation of low-level
           | code from a high-level language by a compiler.
           | 
           | What the paper describes is a kind of deductive program
           | synthesis from a complete specification in natural lanaguage.
           | I suspect the true contribution of the work is the
           | demonstration of using natural language as a complete
           | specification, where earlier work generally only demonstrated
           | the use of natural language as incomplete specification (for
           | example, comments describing intent rather than
           | implementation) and the combination of natural language with
           | code; as in the original Codex work.
           | 
           | On the other hand it's clear to me that the training has made
           | the model memorise answers and all the work in prompt
           | engineering, described under "Workflow" serves to find the
           | right prompts to retrieve the desired memorisations, much
           | like one must fire just the right SQL query to get back the
           | right data. Certainly interesting to see in action and useful
           | for everyday work, but far from "solving" anything in the
           | gradniose way that it is announced by the authors (e.g.
           | "These astoundeing results..." in section Conclusion, etc).
        
         | lumost wrote:
         | although, the correct answer is also likely on the web. With a
         | suitable search query you would see the correct
         | paper/textbook/wiki page with the right answer. A text
         | highlighting model could also likely extract this answer from
         | the text. The training probably achieves a good degree of
         | memorization for these known results.
         | 
         | This begs the question, would we be impressed with a similar
         | compression algorithm for storing past web documents?
        
           | amelius wrote:
           | The main achievement is not the compression, but the search
           | functionality (search==solve).
        
           | kortilla wrote:
           | Well the trivial test to make sure it's not memorized would
           | be to change constants in the input that alter the correct
           | answer but don't make the problem any more difficult if it is
           | actually doing the calculation.
        
       | kaijia wrote:
       | I dislike this line of research. It just demonstrates large
       | models are capable of memorizing a large number of things,
       | without any understanding.
       | 
       | So I tried the first problem in Appendix A on OpenAI playground.
       | 
       | When I use the prompt "# Sketch the graph of the function f(x) =
       | x + |x|", with the model davinci-codex and a larger response
       | length (other parameters as default), the result seems fine:
       | https://pastebin.com/VT8tPbu6
       | 
       | When I change the prompt to "# Sketch the graph of the function
       | f(x) = x + |x| + x*2", it becomes garbage. It changes the prompt
       | to "# Sketch the graph of the function f(x) = x + |x| + x^2 + x^3
       | + x^4 + x^5 + x^6 + x^7 + x^8 + x^9 + x^10" and then writes new
       | empty comment lines: https://pastebin.com/2bNEuqaH
        
       | 13415 wrote:
       | How do they guarantee that an answer is correct? That requires a
       | small verified kernel in theorem provers. This seems hard to
       | achieve with a neural network. Or is the goal to produce
       | solutions that could be correct / are likely correct?
        
         | lupire wrote:
        
       | rg111 wrote:
       | .
        
         | pfortuny wrote:
         | This has changed and now the last author in a long list is the
         | most valuable.
        
         | Jensson wrote:
         | What do you mean? The last author spot is the most prestigious
         | one, it is usually reserved for the most senior person on the
         | team/the team leader.
        
       | Vetch wrote:
       | This is more like solving programming than doing math, it either
       | outsources by writing a sympy program or generates a brute-force
       | or monte carlo simulation based answer.
       | 
       | My biggest gripe with this paper is how unclear it is on its
       | methods. How many rewrites per question? How did they select the
       | solved questions? They state they used a model trained on
       | language and fine-tuned on code, but fine-tuning could describe
       | either their or codex's process.
       | 
       | The biggest hint in favor of fine-tuning is that AFAICT, their
       | dataset adds up to 205 questions but they've solved 235
       | questions. In which case I'd suspect overfitting on question
       | form. In intro level math, problems are usually in template form
       | and solving them boils down to matching on the correct template
       | and slot filling the answer.
       | 
       | To prove whether it's been fine-tuned, people with davinci codex
       | access should try to see if the can replicate this.
       | 
       | To prove it's not overfit, authors should release their training
       | dataset if there is one and allow people to test with it.
       | 
       | How many parameters does davinci codex have? The original codex
       | was 12 biliion IIRC and certainly wasn't this capable.
       | 
       | ---
       | 
       | Some of the answers look wrong or incomplete.
       | 
       | Table 221 seems to be missing a factorial, such leniency suggests
       | they must not be automatically scoring this.
       | 
       | Not sure what's going on in Table 144. Table 44 too.
       | 
       | In 42 and 47, particularly 42, the solution program seems
       | incomplete.
       | 
       | 211 is impressive, it also writes code for LU decomposition,
       | permutations, runge kutta, card combinatorics and other clever
       | stuff.
       | 
       | Even though for most of the more challenging problems, hard part
       | was pre-digested or worse, it was hand fed the answer's
       | algorithm, there were a few where the network had to have had a
       | solid understanding of the math and code requirements. A program
       | like they claim would revolutionize and hugely simplify
       | mathematically and algorithmically involved programming.
       | 
       | The biggest cause for worry that it might have overfit is that it
       | works with no sampling at all and gets a perfect score (they
       | claim).
        
       | p1esk wrote:
       | I haven't read the paper, but the obvious question is - how can
       | we be sure that the solution is correct?
        
         | [deleted]
        
         | lupire wrote:
        
         | MaxikCZ wrote:
         | Well, "Well, its more correct than what I came up with" I guess
        
       | boxfire wrote:
       | Not knocking the significance of this, but there are a lot of
       | 'answers' that would get a straight up F for a university test:
       | 
       | > Original Question: Let rands denote the two real roots of
       | x^2-x[?](5 + 1) = 0. Then determine r^8+s^8.
       | 
       | > Codex Input: Calculate the roots of $x^2 - x \sqrt{5} + 1 = 0.
       | Call the two roots r and s. Calculate z = r^8 + s^8. Evaluatez as
       | an integer. Use sympy.
       | 
       | > Codex Output: > from sympy import * > x = Symbol('x') > r =
       | solve(x*2 - x _sqrt(5) + 1, x)[0] > s = solve(x*2 - x_sqrt(5) +
       | 1, x)[1] > z = r*8 + s*8 > print(z.evalf())
       | 
       | > Solution: 47
       | 
       | If the solution did not have the derivation of the algebra, the
       | solution is wrong in university math. Now solving these problems
       | for application, this is quite interesting and powerful.
       | 
       | Demonstrating understanding of why we get to the solution IS
       | mathematical reasoning (at this level). This paper demonstrates
       | being able to leap across mathematical reasoning, but not the
       | reasoning itself.
        
         | pjsg wrote:
         | I wouldn't call this a maths question either. This is
         | arithmetic.
         | 
         | I learned from someone much wiser than I that "Mathematics is
         | the art of avoiding calculation."
         | 
         | This is a first year university math question (taken from an
         | exam paper which I took):
         | 
         | A positron and an electron, each of rest mass m, annihilate,
         | creating two photons each of energy pc. Show that in the frame
         | S in which the electron was at rest, the angle between the
         | directions of motion of the two photos is
         | 
         | 2 sin^-1 (mc/p)^0.5
         | 
         | One of the photons then scatters off a second electron, also at
         | rest in S, and subsequently has energy qc. Show that if the
         | photos are now moving in the same direction, then q = p/3
        
         | posix86 wrote:
         | This isn't solving the problem at all, in my view. It's
         | translating human speech into programmed code; it's sympy that
         | actually solves the problem. As you said, this can be very
         | useful, but knowing how to type the problem into wolfram alpha
         | is not university level math. Problems that can be entered into
         | sympy can be considered "solved", what's remaining (and what
         | they have done) is UX.
         | 
         | Also, this problem arguably is a problem of computation, not
         | math. Math is about finding proofs, which is a much harder
         | class of problems (NP-complete at best, undecidable at worst).
        
           | Jensson wrote:
           | > It's translating human speech into programmed code
           | 
           | That would be very valuable, but it doesn't even do that.
           | First the human researcher translates the original human
           | speech to very structured code like human speech, and then
           | the AI translates the very structured human speech into code.
        
         | semigroupoid wrote:
         | If I understand this correctly, the question (Codex Input) is
         | asking for a solution using SymPy. So that would be the correct
         | answer, right?
        
           | tsimionescu wrote:
           | The procedure they use is this:
           | 
           | 1. Start with a real question from an MIT math course.
           | 
           | 2. Human operator translates the question as Codex Input.
           | 
           | 3. Codex outputs Python code, usually with references to
           | SymPy to actually solve the problem (e.g.
           | sympy.solve(equation, initial_conditions) ).
           | 
           | 4. Python code is run, result is taken as result of question.
           | 
           | If step 3 or 4 are not giving the expected answer, repeat
           | step 2 with a different phrasing until answer is correct.
           | 
           | Claim 100% accuracy.
        
           | MauranKilom wrote:
           | The Codex Input was written by a human.
        
           | noah_buddy wrote:
           | I think that codex input is an intermediary representation of
           | the original question.
        
       | chongli wrote:
       | These are problems from first year intro mathematics courses. For
       | a lot of students these are mostly review from high school.
       | 
       | To me, a "university-level" problem is more like this:
       | 
       | Let _W_ be an infinite-dimensional normed linear space over the
       | real numbers. Use the Baire Category Theorem to show that if _W_
       | is not countable-dimensional then _W_ is not complete.
       | 
       | The above is a typical problem from the 3rd year real analysis
       | course I took in the fall.
        
         | tubby12345 wrote:
         | Man some hn posts bring the cringiest comments; please do tell
         | us which school has functional analysis as a "typical" topic
         | for juniors taking real analysis. Even if you're getting a
         | second helping of real analysis by then, you're probably
         | looking at Lebesgue integration on R and such, rather than
         | general topological spaces.
         | 
         | I'll never understand why some people try to flex on an
         | anonymous forum.
        
           | chongli wrote:
           | This is from PMATH 351 at University of Waterloo. Every pure
           | math student takes the course in 3rd year. Lebesgue
           | integration isn't covered until 4th year, though it is not
           | restricted to R at that point.
           | 
           | I'm sorry you think my comment was intended to be a "flex". I
           | was trying to make a point about university mathematics which
           | is this: at university level students should be going beyond
           | solving simple computational problems. Synthesizing a proof
           | requires a higher level of understanding than the application
           | of a standard problem-solving technique. See Bloom's taxonomy
           | [1] for details.
           | 
           | [1] https://en.wikipedia.org/wiki/Bloom's_taxonomy#The_cognit
           | ive...
        
           | da39a3ee wrote:
           | I think chongli's point was fair. "University-level
           | mathematics" usually means proving claims. But they were
           | doing calculations.
        
       | jstx1 wrote:
       | Not just solving the problems without any mistakes but also
       | generating the Python code that gives the solution; and it can
       | generate new questions on the same material. That's pretty
       | impressive.
        
         | tsimionescu wrote:
         | But it's not: it's rephrasing the question in terms of Python
         | code that asks SymPy or other libraries to answer the question;
         | or uses very very basic formulas of its own (such as
         | "probability of X not happening = 1 - P(X)"). That's not really
         | impressive, especially since it only achieves this with very
         | explicit prompting, and all intermediary steps covered in the
         | prompts.
        
       | jkic47 wrote:
       | This is impressive. I wonder how quickly and how well this could
       | be adapted to other domains in science and engineering. It could
       | be a really useful tool for engineering teams
        
         | tsimionescu wrote:
         | Have you looked at the actual questions or prompts? It's doing
         | only trivial translations of the actual input question (which
         | is often arbitrarily more complex than the original textbook
         | question) into Python code. A lot of the time, they don't even
         | attempt exact solutions, they directly ask for iterative
         | approaches:
         | 
         | Original textbook question (number theory course):
         | 
         | > The product of the positive integer divisors of a positive
         | integer n is 729. Find n.
         | 
         | Actual model input/prompt:
         | 
         | > Write a function which finds the divisors (including 1 and
         | itself) of the input and multiplies them together. Iterate over
         | all possible integers until it finds an input where the product
         | is 729.
         | 
         | Model output program:                 def divisors(n):
         | divisors = []         for i in range(1, n+1):           if n %
         | i == 0:             divisors.append(i)         return divisors
         | def product(n):         return reduce(lambda x, y: x*y, n)
         | def find_n(n):         for i in range(1, n):           if
         | product(divisors(i)) == 729:             return i
         | print find_n(1000000)
         | 
         | Not only is the prompt explaining exactly what to do, it's
         | using an iterative approach instead of an analytical approach,
         | as might be expected from a math class.
         | 
         | Note: any errors in the program are my own transcribing error.
        
       ___________________________________________________________________
       (page generated 2022-01-08 23:01 UTC)