[HN Gopher] GPT-3 can run code
___________________________________________________________________
GPT-3 can run code
Author : maytc
Score : 163 points
Date : 2022-03-29 15:31 UTC (7 hours ago)
(HTM) web link (mayt.substack.com)
(TXT) w3m dump (mayt.substack.com)
| unixhero wrote:
| Great so how do I run GPT-3 on my own hardware at home?
| mr_toad wrote:
| It's not available to the public or open source so you can't.
| Only the smallest models might run on a single GPU, the largest
| would need a large grid.
| DC-3 wrote:
| Very far from an expert on ML, but isn't GPT-3 trivially not
| Turing Complete since it halts deterministically?
| [deleted]
| charcircuit wrote:
| >Is GPT-3 Turing complete? Maybe.
|
| It's obviously not. To handle infinite loops it needs to solve
| the halting problem. Which is not possible.
| anyfoo wrote:
| I don't quite understand your answer. You don't need to solve
| the halting problem to be Turing complete, quite obviously. Why
| would GPT-3 need to in order to be?
| mountainriver wrote:
| This is such an interesting field but I think there needs to be
| more focus on determinism and correctness. The stuff that's
| happening with retrieval transformers is likely where this is
| heading
| a-dub wrote:
| is there a search engine for the training data so that one can
| verify that it is actually performing novel operations and not
| just quoting back stuff from its incredibly large training set?
| algon33 wrote:
| If I remember rightly, the AlphaCode paper includes a list of
| benchmarks, including the results of a finetuned GPT-3 for
| coding. I think they did it because Codex wasn't available to
| them when were doing their tests, but I might be wrong there.
| imranq wrote:
| An interesting research direction would be to see how much the
| GPT3 deviates as we get more precise on various computational
| tasks. Possibly this would give some measure of some of the
| concepts the model has learned
| sho_hn wrote:
| Do we today have any test suites/benchmarks for models along
| those lines?
| kaetemi wrote:
| It has a ton of programming books in its training data. It only
| "runs" anything that's close enough to any samples it has seen
| that included output. Anything complex, and it fails, because it
| does not reason about it logically. It's bad at the same things
| humans are bad at.
| mr_toad wrote:
| Human programmers rely on intuition and experience much more
| than some people give them credit for. An experienced
| programmer can find common errors quickly, simply because
| they've seen (and made) so many.
|
| Being able to intuit what a block of code does is actually a
| core skill; having to actually step through code in your head
| is slow and difficult.
| Avalaxy wrote:
| Just because you can, doesn't mean that you should. For some
| things it's just better to use a rules-based engine that is
| always correct, rather than a heuristics based algorithm that
| gives answers that are merely close.
| tasty_freeze wrote:
| I don't think the author of the piece (or anyone for that
| matter) thinks GPT-3 should be used for running programs or
| evaluating functions.
|
| It is being discussed because it is surprising that GPT-3 can
| do it at all. It is worth investigating what types of emergent
| knowledge and behavior are encoded in the trained network, as
| the boundaries of its capabilities may help illuminate future
| neural network architecture design.
| kevincox wrote:
| I find it quite interesting that in the JSON to YAML example it
| reordered the list. If this was an access control list that could
| be a serious security issue that could have easily been missed in
| review. (Especially if dozens of files like this were changed at
| once). Of course a malicious user could have done this as well
| and likely got by code review but the fact that it was accidental
| is scarier in a way.
| timdellinger wrote:
| I assume that GPT-3 is just exhibiting rote memory. For small
| numbers, it has accurate answers memorized from the training set,
| but for larger numbers, it just "remembers" whatever is close...
| hence the ability to estimate.
|
| My take is not that GPT-3 can run code, but rather that GPT-3 has
| memorized what code looks like and what the output looks like.
| lopatin wrote:
| Can someone explain for a dummy how this is possible? How does it
| know that range() is zero indexed? Was it specifically trained on
| Python input/function/output data? Or did it just "learn" it? Do
| the researchers know how it learned it?
|
| Does it actually "run" the code? Like, if it was looping over 1
| billion iterations would it take 1B times longer than if it was
| just one iteration? I have so many questions.
| MauranKilom wrote:
| > How does it know that range() is zero indexed?
|
| If you read through all of the internet once, would _you_ know
| that range() is zero indexed?
|
| > Like, if it was looping over 1 billion iterations would it
| take 1B times longer than if it was just one iteration?
|
| It clearly cannot, because querying the network for a token
| executes the exact same sequence of operations every time.
|
| But it's very impressive that it can basically recognize the
| Collatz Conjecture in the code and mostly guess in the right
| ballpark for the results.
|
| The fact that it's just liking (in a loose sense) inputs to
| inputs it has seen is quite visible in the f(g(x)) vs g(f(x))
| behavior - the former is significantly more common, so it
| struggles to work with the latter.
| etskinner wrote:
| It seems more likely that it learned it. If you knew nothing
| about Python, but understood the word "for" a little, and
| understood code a little, you're likely to figure out that
| range() is zero-indexed after you see something like this a few
| times
|
| >>> for i in range(3): print(i)
|
| 0 1 2
| lopatin wrote:
| My mind is just blown that it learned a language runtime
| based on examples. What would happen if you gave it an
| infinitely recusrive function? It can't stack overflow,
| there's no stack! Wait, is there?
| stevenhuang wrote:
| My guess is it would respond with the standard stack
| overflow error, from examples of similar output posted in
| its training set.
| lucidrains wrote:
| https://alphacode.deepmind.com/ gives you a glimpse inside of
| what emerged from a similar attention net trained on code.
| however, whether the attention net has been forced upon pixels,
| language, amino acid sequences, the resultant representations
| are a bit beyond human reasoning, even if we can examine what
| individual attention heads are 'looking' at
| [deleted]
| [deleted]
| [deleted]
| mhh__ wrote:
| How do you know a range is zero indexed? (As in how is it
| stored in your brain)
| lopatin wrote:
| I have no idea how it's stored in my brain. Is that the same
| way it's stored in GPT-3?
| ogogmad wrote:
| Theorem: There are infinitely many prime numbers.
|
| Proof:
|
| _[GPT-3 's "proof"]_
|
| Let p be a prime number. Then by definition, p is a positive
| integer that has no positive divisors other than 1 and p.
|
| Now consider the integer q = p + 1. We will show that q is not a
| prime number, which will prove that there are infinitely many
| prime numbers.
|
| Since p is a positive divisor of q, q cannot be prime.
|
| _[...another attempt with the same prompt]_
|
| We know that there are infinitely many natural numbers. Let's
| suppose that there are only finitely many prime numbers. Then we
| can list them all:
|
| p_1, p_2, p_3, ..., p_n
|
| Now, consider the number
|
| N = p_1 p_2 p_3 ... p_n + 1
|
| This number is greater than any of the prime numbers in our list,
| so it cannot be divisible by any of them. But it is also a
| natural number, so it must be prime. This contradicts our
| assumption that there are only finitely many prime numbers, so we
| must conclude that there are infinitely many prime numbers.
|
| ~~ _Personal remark_ : I had to give it the same prompt like 5
| times to get a correct-ish proof. The first attempts came out
| like the first example.~~ The proof is still wrong.
| [deleted]
| nonameiguess wrote:
| This isn't really the "human level mathematician" equivalent
| task anyway. A human mathematician's main purpose isn't to
| memorize and reproduce proofs generated by other people. It's
| to prove original results no one else has proven before. To
| remember and reproduce existing proofs, I just typed "proof
| infinitely many primes" into DuckDuckGo and it gave me plenty
| of correct results.
| karpierz wrote:
| That's like saying "standing still" isn't a human-level
| sprinter's task. In principle, yes, nothing in the 100m
| sprint requires that you need to be able to stand still. In
| practice, I would be very skeptical of someone who can't
| stand claiming they can sprint.
| zardo wrote:
| It's a human level mathematics student problem. If it can't
| determine that's it's proof is nonsense here there's little
| hope it could produce any worthwhile original work.
| jameshart wrote:
| What does GPT-3 come up with if you ask it for a proof that
| there are a finite number of primes? Or that pi is rational?
|
| I guess it would stitch together some more seemingly sensible
| statements that also don't quite add up to a rigorous proof?
| [deleted]
| gnulinux wrote:
| Both proofs are wrong, second one is closest. Second one should
| not claim that N is a prime (it likely isn't). It should say N
| is not divisible by any of p_i, and since due to Fun. Theo. of
| Arith. it is such that N = Sum {c_i q_i} where q_i are prime,
| and none of q_i in {p_i} which shows a finite list of primes is
| not possible construct.
| brian_cloutier wrote:
| Interestingly, these attempts are about the same as what pops
| up when I try to remember the proof:
|
| - It's a proof by contradiction - The key step is in taking the
| finite list of primes, multiplying them together, and adding 1
|
| I then try to flesh out the details, it might take a second to
| realize that this new number is also prime, and then a few
| moments more to remember the exact rationale why.
|
| Along the way the proof lives in a kind of superposition where
| I'm not clear on the exact details. The "proofs" you gave here
| seem to be serializations of a similar superposition! GPT-3
| seems to remember the proof about as well as I do, but it's
| missing the final sanity check which tweaks the proof until all
| the pieces correctly fit together.
|
| In this case, you seem to be performing a version of this
| sanity check by running the prompt multiple times until a
| correct answer comes out. I wonder if it's possible to prove
| something more obscure using a similar process: GPT-3 comes up
| with ideas and the human sanity checks.
| ctoth wrote:
| I believe this recent paper demonstrates a method for
| allowing these large language models to perform this "sanity
| check" automatically[0].
|
| [0]: Self-Consistency Improves Chain of Thought Reasoning in
| Language Models https://arxiv.org/abs/2203.11171
| actually_a_dog wrote:
| The thing I find interesting about the proof attempts in the
| GP comment is that they very much resemble what you'd expect
| to see coming from a hypothetical somewhat confused
| undergrad. I think that ties into what you say about the
| proof living "in a kind of superposition where I'm not clear
| on the exact details," because that's where I imagine said
| hypothetical confused undergrad's understanding being.
| mr_toad wrote:
| It's imitation rather than true understanding. Still, even
| imitation is a remarkable ability for a computer.
| Banana699 wrote:
| >this new number is also prime
|
| Not necessarily, it might be composite, but in this case one
| of it's prime factors will necessarily not lie in the
| supposed list of primes, therefore also a contradiction.
|
| The first counter example to "If L := {P0,P1,..,Pn} is a list
| of primes, then prod(L)+1 is prime" is {2,3,5,7,11,13}, their
| product is 30030, and 30031 is a composite of 2 primes, none
| of which are in the list.
| falcor84 wrote:
| It's somewhat silly semantics, but I believe it is a valid
| deductive step on the way to the contradiction - if the
| number is not divisible by any other prime, then it must be
| a new prime, [?].
| ivegotnoaccount wrote:
| The issue is that it is not divisible by any other prime
| *from the list*. The two cases (prime or composite) must
| be handled separately since they do not use the same
| logic to infer there is one more prime.
|
| For instance, 2 * 3 * 5 * 7 * 11 * 13 + 1 = 30031 = 59 *
| 509.
| ravi-delia wrote:
| But to get the contradiction, you assume a finite number
| of primes. As each of them does _not_ divide the new one,
| the new one is not divisible by a prime. It seems like
| your method is some kind of induction? Which probably
| gets a little closer to the "reason" for it, but isn't
| the standard proof I've seen.
| Tainnor wrote:
| You don't need two separate cases.
|
| Assume p1, ..., pn is a finite list of primes. The sum
| p1+...+pn+1 is divisible by a prime, because every
| natural number> 1 is. However, it's not divisible by
| p1,...,pn, hence there must be an additional prime not in
| the list.
|
| (I think you're right though that GP's "contradiction"
| doesn't work)
| ogogmad wrote:
| I keep asking GPT-3 to prove that the LR algorithm (for finding
| eigenvalues and eigenvectors) converges for PSD matrices. It
| keeps insisting that it's a form of gradient descent. Is that
| true?
| daenz wrote:
| Nit, but YAML is a superset of JSON, so no conversion required :)
| jefftk wrote:
| This sort of "do what I mean" situation, where doing the thing
| the user intended is different from doing something technically
| correct, is a place GPT-3 excels. Even though returning the
| input would be easiest, it has the pragmatic judgement to
| predict that's not what the user wants.
| mbowcut2 wrote:
| So, for people unfamiliar with deep language models like GPT,
| it's essentially a program that takes in a prompt and predicts
| the next set of words based on a training corpus -- which in
| GPT-3's case is a large portion of the internet. In these
| examples GPT is not executing any python code, it has just been
| trained on enough Python code/output to successfully predict what
| kinds of outputs these functions would produce.
| kcorbitt wrote:
| For folks wanting to play around with the GPT-3 code-editing
| capabilities referenced in the article within your own codebase,
| I wrote a simple open source VS Code plugin that lets you run
| commands against your currently-open file and get GPT-3's
| suggested edits back in a diff:
| https://marketplace.visualstudio.com/items?itemName=clippy-a...
| 58x14 wrote:
| I think I'm going to pair this with Copilot and see what
| happens. Hopefully I don't accidentally unlock something
| bizarre.
| zora_goron wrote:
| A quick question for anyone familiar with the architecture of
| these Transformer-based models -- I've heard that one reason why
| they don't work well with numbers is how the inputs are tokenized
| (i.e. as "chunks" rather than individual words/numbers). Is there
| anything architecturally preventing an exception in this form of
| tokenizing in the data preprocessing step, and passing numbers
| into the model in the format of 1 digit == 1 token? It seems like
| such a change could possibly result in a better semantic
| "understanding" of digits by the model.
| [deleted]
| Veedrac wrote:
| Nothing prevents it, no. Transformers are certainly capable of
| learning mathematical tasks; consider [1] as an example, which
| uses big but regular token lengths.
|
| Alternatively you could just scale 'till the problem solves
| itself.
|
| [1] https://arxiv.org/abs/2201.04600
| learndeeply wrote:
| Anyone have any ideas on how they're doing text insertion using
| an auto-regressive model?
| lucidrains wrote:
| yes, they are most likely finetuning with this type of
| pretraining https://arxiv.org/abs/2103.10360 quite easy to
| build
| PaulHoule wrote:
| It would be remarkable if it got the right answers.
|
| But it can't because it doesn't have the right structure (e.g.
| GPT-3 finishes in a finite time, a program in a real programming
| doesn't necessarily!)
|
| GPT-3's greatest accomplishment is that it has "neurotypical
| privilege", that is if it gets an answer that is 25% or 95%
| correct people give it credit for the whole thing. People see a
| spark of intelligence in it the way that people see faces in leaf
| axels or in martian rock formations or how G.W. Bush looked in
| Vladimir Putin's eyes and said he got a sense of Putin's soul.
| (That was about the only thing in his presidency that he later
| said he regretted!)
|
| As an awkward person I am envious because sometimes it seems I
| get an answer 98% correct or 99.8% correct and get no credit at
| all.
| Micoloth wrote:
| GPT3 does _not_ think like a human, but it definitely executes
| code in a way that is more similar to a human than a computer..
|
| Proof is, that indeed humans _do_ get the wrong answer in
| quizzes like these sometimes!
|
| So i cannot understand this point of view of diminishing it as
| "spark of intelligence". It is exactly what advertised: a very
| big step forward towards real AI, even if definitely not the
| last one?
| PaulHoule wrote:
| It is the Emperor's New Clothes incarnate.
|
| It has the special talent of hijacking your own intelligence
| to make you think it is intelligent.
|
| People understood this about the 1966 ELIZA program but
| intellectual standards have dropped greatly since then.
| thrtythreeforty wrote:
| > GPT-3 struggles with large numbers, decimal numbers, and
| negative numbers. When used it returns answers that are close but
| often incorrect.
|
| Regarding GPT-3's "guesstimates," intuitively it feels like the
| network _has_ to guess because it hasn 't been given a way to do
| exact computation--a neural network is built out of nonlinear
| functions--even if it "understands" the prompt (for whatever
| value you want to give to "understand").
|
| Are there any techniques that involve giving the model access to
| an oracle and allowing it to control it? To continue the analogy,
| this would be the equivalent of giving GPT-3 a desk calculator.
|
| If this is a thing, I have other questions. How do you train
| against it? Would the oracle have to be differentiable? (There
| are multiple ways to operate a desk calculator to evaluate the
| same expression.) Also, what control interface would the model
| need so that it can learn to use the oracle? (Would GPT-3 emit a
| sequence of 1-hot vectors that represent functions to do, and
| would the calculator have "registers" that can be fed directly
| from the input text? Some way of indirectly referring to operands
| so the model doesn't have to lossily handle them.)
| ravi-delia wrote:
| I believe the dominant thinking is that GPT-3 has trouble with
| math because it doesn't see individual digits. It obviously has
| no trouble working on words, which are much more discreet than
| numbers. I wouldn't be surprised if it had trouble carrying a
| long equation though. When writing it can reconsider the whole
| context with each new word, externalizing that memory, but with
| most computations it would have to carry out the whole thing in
| one go. That's a lot of dedicated parameters for a single
| subtask.
| thrtythreeforty wrote:
| > with most computations it would have to carry out the whole
| thing in one go
|
| Is there a way to allow models to say "let me think about
| this some more"? With language models like GPT-3 you emit one
| token per inference iteration, with its previous output fed
| back in as input/state. Can models opt out of providing a
| token, but still update state? That would allow it to break
| up the computation into discrete steps.
| durovo wrote:
| I believe GPT-3 has a transformer-based architecture. So it
| doesn't recursively ingest it's own output in each
| iteration. I believe attention-based transformer models
| have enough complexity to be able to learn what you are
| talking about on their own.
| thrtythreeforty wrote:
| Thank you for pointing out the difference. I went and
| reread about transformers; previously I thought they were
| a kind of RNN. (I am not an ML engineer.)
| ravi-delia wrote:
| I think it would work, but backprop would be computed in a
| different way every time. I'm not an expert, so there may
| be sneaky ways around it, but I'm pretty sure you'd lose
| out on a long history of little efficiency improvements
| when you could just make it more recurrent instead.
| daniel-cussen wrote:
| And that's where you see the man behind the curtain.
| AitchEmArsey wrote:
| Next year: GPT-NG offloads it's answers to Amazon
| Mechanical Turk, and we've come full circle.
| daniel-cussen wrote:
| Yeah for sure. With energy prices soaring, Moore's law
| being morally over for since 2010, wages being so
| completely destroyed by the hatred Democrats have for
| them, and the sneaky little misconceptions and errors the
| golem's makers did not fight hard enough to let in, AI
| will be supplanted by plain I.
| edflsafoiewq wrote:
| Can it do math on "prose numbers", eg. "two thousand three
| hundred and four"?
| mirker wrote:
| Even the tokenization is wonky. Imagine if you had no concept
| of math characters and instead has a lookup table of common-
| ngrams (BPE encoding). For example, the binary addition
| function "a+b" may be tokenized as a unary "3+b" because
| "3+b" occurs commonly. That tokenization is vastly different
| from "3.00000001+b". GPT has to invert this tokenization
| artifact with finite training data.
| visarga wrote:
| There are many papers trying to couple language models with
| external modules.
|
| In the Retrieval-Enhanced Transformer (RETRO) paper a large
| language model was coupled with a similarity based text index.
| It can populate the prompt with relevant information from the
| index thus being more grounded and update-able.
|
| In another paper (AlphaCode) the language model was coupled
| with a compiler and could run programs and check if they match
| the expected outputs for a few test cases. The model was able
| to solve competition style coding problems above average human
| score.
|
| In another paper (Language Models as Zero Shot Planners) a
| language model generates commands to navigate a virtual home
| environment and performs tasks. The knowledge in the LM helps
| in quickly learning tasks.
|
| A recent one can learn new concepts by simple conversation,
| then apply them where necessary. You can talk-train your model.
| (Memory assisted prompt editing to improve GPT 3 after
| deployment)
|
| So the trend is to add "toys" on language models - a simulator,
| a compiler, a search engine, a long term memory module.
|
| I'd like to see a recursive language model, that can sub-call
| itself to decompose problems.
| gwern wrote:
| You forgot all the inner monologue
| (https://www.gwern.net/docs/ai/gpt/inner-monologue/index) &
| scratchpad papers which give it additional steps or access to
| Python REPL etc: eg https://arxiv.org/abs/2112.15594
| https://arxiv.org/abs/2111.08267
| https://arxiv.org/abs/2111.08171
| visarga wrote:
| AI Chains really takes it to the next level.
| emmelaich wrote:
| An intriguing thought is that a GAI will behave very much like
| a well-read smart individual. With the faults, mystery and
| foibles that implies.
| spupe wrote:
| This is fascinating. I feel that we are still in the infancy of
| the field, however. These observations are analogous to
| naturalists of the past describing an animal's behavior, but we
| need to get to the point where more accurate estimates are made
| (ie, how often does it do each thing, how accurate it is after
| 100+ tries, etc). Every day we see a new observation showing wha
| GPTs can do, we also need a good way to make these observations
| systematic.
| berryg wrote:
| I struggle to understand how GPT-3 executes code. Is it simply
| running a python (or any other language) interpreter? Or is GPT-3
| itself interpreting and executing python code? If the latter
| question is true that would be amazing.
| [deleted]
| bidirectional wrote:
| It is the latter.
| Veedrac wrote:
| > GPT-3 seems to have issues with large numbers. Moyix's gist
| covers this in detail. GPT-3 tends to guesstimate an algebraic
| function instead of evaluating the numbers, so the answer is only
| correct to a certain approximation.
|
| There are two issues here. One is the lack of working memory,
| which means that there is very little scratch space for
| calculating things with a meaningful sequential depth. GPT-3 is
| very unlike traditional evaluation methods in this regard, in
| that it is easier for it to interpret the meaning of a program
| you give it and then intuit the result given the context than it
| is to mechanically execute its steps.
|
| The other issue is the text encoding, which makes it much harder
| for GPT-3 to do digit-by-digit operations. Many arbitrary numbers
| are just their own token. A fixed length number to us looks like
| a fixed number of characters, but for GPT-3 they can be and
| almost arbitrary number of tokens divided into almost arbitrary
| chunks. Using thousands separators is very helpful for it.
|
| If you account for these and design a prompt that mitigates them
| you can get much stronger results. Here is an example:
| https://news.ycombinator.com/item?id=30299360#30309302. I managed
| an accuracy of 42% for 3-by-3 digit multiplication.
| bitwize wrote:
| GPT-3 is starting to remind me of SCP-914. Give it an input, and
| its millions of tiny wheels churn and it produces something like
| what you want, but otherwise quite unexpected.
|
| Let's hope it doesn't turn into something like SCP-079...
| csmeder wrote:
| What year will GTP be able to take an app written in
| Swift/SwiftUI and output a spectacular Android translation?
| 3-years? 5-years? 10-years?
|
| This is an interesting benchmark because it is a very difficult
| problem, however: GTP has both everything it needs to do this
| without needing a fundamental improvement to the core of GTP
| (this process is more of a science than art) and using automated
| UI testing GTP can check if its solution worked.
|
| Thus this challenge is in the realm of what GTP already is,
| however, once it can do this it will have massive implications
| for how software is built.
| anyfoo wrote:
| A terrible prospect.
|
| It's hard enough for people to faithfully port an application.
| People who participate and live in the world that makes up our
| reality. Leaving this up to an AI will at best flood us with
| low quality junk. At worst it's actively harmful.
| ivegotnoaccount wrote:
| > For example, it seems to understand how to find a sum, mean,
| median, and mode. > Input: 1, 4, 5, 6, 2, 1, 1 > Output:
| 2.28571428571
|
| Well, even with those small numbers, it's wrong. The first "2"
| after the dot should not be there. The result it gives is 16/7,
| not 20/7.
| loganmhb wrote:
| I wonder how much of this is an illusion of precision that
| comes from pattern matching on content from filler sites like
| https://www.free-hosting.biz/division/16-divided-7.html (I do
| not recommend clicking the link, but the result appears there).
| aplanas wrote:
| Seems that it can convert from Python to Perl:
|
| https://beta.openai.com/playground/p/o4qZWSXVz8JMmVaI9j9NMIK...
| 7373737373 wrote:
| Has anyone tried using it for SAT problems yet?
| timdellinger wrote:
| my recollection is that the original journal article announcing
| GPT-3 included some data on how it performed against SAT-style
| questions
___________________________________________________________________
(page generated 2022-03-29 23:00 UTC)