[HN Gopher] Competitive Programming with AlphaCode
___________________________________________________________________
Competitive Programming with AlphaCode
Author : yigitdemirag
Score : 456 points
Date : 2022-02-02 16:13 UTC (6 hours ago)
(HTM) web link (deepmind.com)
(TXT) w3m dump (deepmind.com)
| pretendscholar wrote:
| I am a little bitter that it is trained on stuff that I gave away
| for free and will be used by a billion dollar company to make
| more money. I contributed the majority of that code before it was
| even owned by Microsoft.
| visarga wrote:
| Paying it forward, it will help others in turn.
| pretendscholar wrote:
| Yes it will help the already powerful players
| disproportionately.
| alphabetting wrote:
| They opensourced alphafold for anyone to use commercially
| despite big financial incentive to keep it private and use
| in their new drug discovery lab. No idea how this works or
| differs from alphafold but imagine they'll do the same here
| if possible
| pretendscholar wrote:
| Only after another lab made their own open source one
| that was comparable.
| kzrdude wrote:
| The problem is not really that microsoft owns github, or
| that licenses allow corporations free use, but that the
| tech giants are so big and have so much power.
| Permit wrote:
| Can you elaborate and give some history? What code did you
| contribute, and how did it end up being used by Microsoft and
| then DeepMind?
| arendtio wrote:
| > We pre-train our model on selected public GitHub code and
| fine-tune it on our relatively small competitive programming
| dataset.
|
| But since the code was 'selected' you don't know if your code
| was used. However, they seem to have used Python and C++, so
| my code is probably not part of it.
| [deleted]
| hmate9 wrote:
| Between this and OpenAI's Github Copilot "programming" will
| slowly start dying probably. What I mean by that is that sure,
| you have to learn how to program, but our time will be spent much
| more on just the design part and writing detailed
| documentation/specs and then we just have one of these AIs
| generate the code.
|
| It's the next step. Binary code < assembly < C < Python <
| AlphaCode
|
| Historically its always been about abstracting and writing less
| code to do more.
| mhzsh wrote:
| Creating a higher level abstraction is something people have
| been trying to do for decades with so-called 4th-generation
| languages. At some point, abstracting away too much makes a
| tool too cookie-cutter, and suddenly deviating from it causes
| more difficulty.
| visarga wrote:
| Maybe it's not more abstraction we need, just automating the
| drudgery. Abstractions are limited - by definition they
| abstract things away, they are brittle.
| vvilliamperez wrote:
| Read: Ruby on Rails
| streetcat1 wrote:
| First, If this is correct, if alpha code succeeded, this will
| bring to its own demise.
|
| I.e. as soon as it starts replacing humans, it will not have
| enough human generated training data, since all of programming
| will be done by models like himself.
|
| Second, alphacode was specifically trained for competitive
| programming :
|
| 1. short programs. 2. Each program has 100's of human generated
| solutions.
|
| However, commercial program are:
|
| 1. long. 2. Have no predefined answer or even correct answer.
| 3. Need to use/reuse a lot of legacy code.
| chroem- wrote:
| Reinforcement learning and adversarial training can render
| both of those concerns as non-issues in practice.
| ialyos wrote:
| The phrase "in practice" doesn't really work when you're
| referring to highly finicky strategies like RL and
| adversarial training
| AnIdiotOnTheNet wrote:
| > as soon as it starts replacing humans, it will not have
| enough human generated training data, since all of
| programming will be done by models like himself.
|
| As a natural born pessimist, I can't help but feel that by
| the time we get to that point we'll just keep blundering
| forward and adapting our world around the wild nonsense
| garbage code the model ends up producing in this scenario.
|
| After all, that's basically what we've done with the entire
| web stack.
| pjmorris wrote:
| I'd note that assembly, C, and Python didn't replace
| 'programming' but were expected to do so. I'd wager that what
| you now call 'detailed documentation/specs' will still be
| called programming in 10 or even 20 years.
| falcor84 wrote:
| If you could change a sentence in the documentation and then
| run a ~1min compilation to see the resulting software, it
| would be a very different kind of programming. I suppose
| it'll give a new meaning to Readme-Driven-Development.
| wittycardio wrote:
| Solving competitive programming problems is essentially solving
| hard combinatorial optimization problems. Throwing a massive
| amount of compute and gradient descent at the problem has
| always been possible. If I'm not mistaken what this does is
| reduce the representation of the problem to a state where it
| can run gradient descent and then tune parameters. The real
| magic is in finding structurally new approaches. If anything
| I'd say algorithms and math continue to be the core of
| programming. The particular syntax or level of abstraction
| don't matter so much.
| jdlshore wrote:
| > If anything I'd say algorithms and math continue to be the
| core of programming.
|
| I disagree; I think the core of programming is analyzing
| things people want and expressing solutions to those wants
| clearly, unambiguously, and in a way that is easy to change
| in the future. I'd say algorithms and math are a very small
| part of this work.
| wittycardio wrote:
| That's not programming, that's called being a good
| employee. Any person in any role should be doing that.
| Programming is about algorithms and math. Now a good
| employee who's in a technical role should have both.
| jdlshore wrote:
| > Programming is about algorithms and math.
|
| You've simply restated your opinion without providing any
| supporting arguments, and as I already said, I disagree.
| The vast majority of programming I see (and as a
| consultant, I see a fairly wide variety) is not about
| algorithms and math, but instead gluing together systems
| and expressing domain logic.
|
| Now, I suppose you could argue that domain logic is
| "algorithms and math," but in my experience, it's less
| about the specific algorithms and more about precisely
| describing fuzzy human behavior.
|
| It's that "precisely describing" and "easy to change in
| the future" parts that makes what programmers do
| different than what any good employee does.
|
| (I do agree that there is some programming that is
| focused on algorithms and math, but it's in the minority,
| in my experience. Perhaps the type of work you do _is_
| focused on algorithms and math, but I believe that 's a
| relatively small part of the software development
| ecosystem.)
| chroem- wrote:
| > Solving competitive programming problems is essentially
| solving hard combinatorial optimization problems.
|
| True, but if you relax your hard requirements of optimality
| to admit "good enough" solutions, you can use heuristic
| approaches that are much more tractable. High quality
| heuristic solutions to NP-hard problems, enabled by ML, are
| going to be a big topic over the next decade, I think.
| wittycardio wrote:
| I should correct myself, this isn't even that. This is just
| text analysis on codeforces solutions, which makes it even
| worse than I thought. Very pessimistic about it's
| generalizability.
| Inufu wrote:
| I agree, I expect programmers will just move up the levels of
| abstraction. I enjoyed this recent blog post on the topic:
| https://eli.thegreenplace.net/2022/asimov-programming-and-th...
| hackinthebochs wrote:
| The "problem" is that as you move up the levels of
| abstraction, you need fewer people to do the same amount of
| work. Unless the complexity of the work scales as well. I've
| always felt that programmers would be the first class of
| knowledge workers to be put out of work by automation. This
| may be the beginning of the end for the programming gravy
| train.
| NicoJuicy wrote:
| There aren't enough developers either way.
| bmh100 wrote:
| On the other hand, as the value of an hour of programming
| increases, the quantity demanded may also increase.
| paxys wrote:
| > as you move up the levels of abstraction, you need fewer
| people to do the same amount of work
|
| Yes, but the total amount of work (and surrounding
| complexity) also increases with it. Just look at the
| evolution of the software industry over the last few
| decades.
| hackinthebochs wrote:
| History isn't a great guide here. Historically the
| abstractions that increased efficiency begat further
| complexity. Coding in Python elides over low-level issues
| but the complexity of how to arrange the primitives of
| python remains for the programmer to engage with. AI
| coding has the potential to elide over all the complexity
| that we identify as programming. I strongly suspect this
| time is different.
| visarga wrote:
| > The "problem" is that as you move up the levels of
| abstraction, you need fewer people to do the same amount of
| work.
|
| This will lower the entry barrier to developing software so
| more people will go into the field. Before you needed to
| know a programming language, now you will just have a
| dialogue with a language model.
|
| > I've always felt that programmers would be the first
| class of knowledge workers to be put out of work by
| automation.
|
| We've been automating our work for 70 years, and look how
| many programmers are employed now. The more we automate,
| the more capable our field becomes and more applications
| pop up.
| hackinthebochs wrote:
| >This will lower the entry barrier to developing software
| so more people will go into the field.
|
| Indeed. The ideal future of programming is something out
| of star trek. I often noticed how everyone on the ship is
| a programmer of a sort, they whip up a simulation as the
| problem warrants regardless of their field. But in this
| future, the job of programmer basically doesn't exist. As
| a programmer, I should be allowed to have mixed feelings
| about that.
| visarga wrote:
| Let your imagination fly. We always want more than it's
| possible, our wishes fill up any volume like an expanding
| gas. Humans are going to be crucial to orchestrate AI and
| extract the most utility out of it.
| hmate9 wrote:
| Or you can do things at a faster pace and increase your
| productivity.
| Inufu wrote:
| Yes, this is how you increase prosperity (see: agricultural
| revolution, industrial revolution, etc). You can now create
| more with the same number of people.
| elwell wrote:
| > writing detailed documentation/specs
|
| That's what code is.
| bmc7505 wrote:
| I disagree that programming is dying -- tools like Copilot will
| lead to a Renaissance in the art of computer programming by
| enabling a larger population to design programs and explore the
| implications of their design choices. I wrote a short essay [1]
| on the history automated programming and where I think it is
| heading in the future.
|
| [1]:
| https://breandan.net/public/programming_with_intelligent_mac...
| 62951413 wrote:
| Model-driven development and code generation from UML were once
| supposed to be the future. It will be interesting to see how
| much further this approach takes us.
|
| Assuming ANNs resemble the way human brain function you'd also
| expect them to introduce bugs. And so the actual humans beings
| would partake in debugging too.
| diehunde wrote:
| My bet would be that it will never happen in a reasonable time
| frame. And also by that logic, writing that
| "documentation/spec" would just mean learning a new programming
| language the AI engine can parse making it as useful as a
| compiler. Anyone who has been writing and designing software
| for a while knows the cycle is way more complex than take some
| input and write code.
|
| Let me know when the AI engine is able to do complex
| refactoring or adding features that keeps backwards
| compatibility, find a bug in a giant codebase by debugging a
| test case or write code that's performant but also
| maintainable.
| ctoth wrote:
| You ever notice how the "let me know when" part of this keeps
| changing? Let me know when computers can ... play
| Go/understand a sentence/compose music/write a program/ ...
|
| But surely they'll never be able to do this new reference
| class you have just now come up with, right?
| diehunde wrote:
| Not really? I mean I would never say "let me know when
| computer can do X" when X is something that doesn't require
| too much creativity and imagination. Like, a computer
| composing music, doesn't impress me too much because music
| itself has structure. A computer creating music that would
| wow a professional composer? That would be impressive. Same
| with this topic. A computer that solves some (because it
| failed several) short programming challenges and OP says it
| will kill programming entirely? Not even close. Pretty cool
| though.
| Jensson wrote:
| It keeps changing since our imagination of what tasks
| requires intelligence are weak. We think that when a
| computer can do X it can also do Y. But then someone builds
| a computer that can do X but can't do Y, and we say "oh, so
| that doesn't require intelligence, let me know when it can
| do Z and we can talk again.". That doesn't mean that Z
| means the computer is intelligent, just that Z is a point
| where we can look at it and discuss again if we made any
| progress. What we really want is a computer that can do Y,
| but we make small mini tasks that are easier to test
| against.
|
| The Turing test is a great example of this. Turing thought
| that a computer needs to be intelligent to solve this task.
| But it was solved by hard coding a lot of values and better
| understanding of human psychology and what kind of
| conversation would seem plausible when most things are
| hardcoded. That solution obviously isn't AI, I bet you
| don't think so either, but it still passed the Turing test.
| ctoth wrote:
| At what point do we give up and realize that there is no
| one thing called intelligence, just a bunch of hacks that
| work pretty well for different things sometimes? I think
| that's probably where people keep failing here. The
| reason that we keep failing to find the special thing in
| every new field that AI conquers is because there's
| nothing special to actually find? I mean, we could keep
| moving the goalposts, a sort of intelligence of the gaps
| argument? But this doesn't seem productive.
| Enginerrrd wrote:
| I agree, from a totally different angle. Let's take something
| I know better as an example: Structural engineering.
| Structural engineering should be a "solved problem". It
| seems, ostensibly, relatively simple compared to a more open
| ended activity like "programming".(For "technical reasons",
| it ends up being more similar than you might think.) Still,
| you are ultimately dealing with the same materials, the same
| physics, and very similar configurations.
|
| And yet, despite the fact that we have programs to help
| calculate all the things, test code-required load-
| combinations, even run simulations and size individual
| components... it turns out that, it doesn't actually save
| that much work, and you still need an engineer to do most of
| it. And not just because of regulatory requirements. It's
| just, that's not the hard part. The hard part is assembling
| the components and specifications, specifying the correct
| loads based on location-specific circumstances, coming up
| with coherent and sensible design ideas, chasing down every
| possible creative nook and cranny of code to make something
| that was originally a mistake actually work, and know when
| the model is just wrong for some reason and the computer
| isn't simulating load paths accurately.
|
| Specifying the inputs and interpreting results is still about
| as much work as it was before you started with all the fancy
| tools. Those tools still have advantages mind you, and they
| do make one slightly more efficient. Substantially so in some
| cases, but most of the time it still comes out as a slight
| assist rather than a major automation.
| fvold wrote:
| I hear that.
|
| Machine Learning also has a long way to go before it can take
| a long, rambling mess of a meeting and somehow generate a
| halfway usable spec from it. I mean, the customer says they
| want X, but X is silly in this context, so we'll give them Y
| and tell them it's "X-like, but faster". For example, SQL is
| "Blockchain-like, but faster" for a lot of buzzword use-cases
| of blockchain.
| mirrorlake wrote:
| I've been wondering this for a while:
|
| In the future, code-writing AI could be tasked with generating
| the most reliable and/or optimized code to pass your unit tests.
| Human programmers will decide what we want the software to do,
| make sure that we find all the edge cases and define as many unit
| tests as possible, and let the AI write significant portions of
| the product. Not only that, but you could include benchmarks that
| pit AI against itself to improve runtime or memory performance.
| Programmers can spend more time thinking about what they want the
| final product to do, rather than getting mired in mundane
| details, and be guaranteed that portions of software will perform
| extremely well.
|
| Is this a naive fantasy on my part, or actually possible?
| phreeza wrote:
| And a second AI to generate additional test cases similar to
| yours (which you accept as also in scope) to avoid the first AI
| gaming the test.
| machiaweliczny wrote:
| First you need really good infra to make it easy to test
| working multiple solutions for AI but I think this will be
| bleeding edge in 2030.
|
| EDIT: with in-memory DBs I can imagine AI assisted mainframe
| than can solve 90% of business problems.
| EVa5I7bHFq9mnYK wrote:
| It seems to me that writing an exhausting set of unit cases is
| harder than writing the actual code.
| mrsuprawsm wrote:
| Does this mean that we can all stop grinding leetcode now?
| BoardsOfCanada wrote:
| Do I understand it correctly that it generated (in the end) ten
| solutions that then were examined by humans and one picked? Still
| absolutely amazing though.
| thomasahle wrote:
| No human examination was done.
|
| But it generated 10 solutions which it ran against the example
| inputs, and picked the one that passed.
|
| Actually I'm not sure if it ran the solutions against the
| example inputs or the real inputs.
| [deleted]
| aliceryhl wrote:
| They used the real inputs. The example inputs were used to
| filter out which candidates to submit for the 10 tries.
| aliceryhl wrote:
| No, they gave the algorithm 10 tries and tested all of them,
| and said that it was solved if any one of them worked.
| mcast wrote:
| The year is 2025, Google et al. are now conducting technical on-
| site interviews purely with AI tools and no human bias behind the
| camera (aside from GPT-3's quirky emotions). The interview starts
| with a LC hard, you're given 20 minutes -- good luck!
| jakey_bakey wrote:
| I think Amazon already tried this and it had surprisingly
| racist results
| qualudeheart wrote:
| Calling it now: If current language models can solve competitive
| programming at an average human level, we're only a decade or
| less off from competitive programming being as solved as Go or
| Chess.
|
| Deepmind or openAI will do it. If not them, it will be a Chinese
| research group on par with them.
|
| I'll be considering a new career. It will still be in computer
| science but it won't be writing a lot of code. There'll be
| several new career paths made possible by this technology as
| greater worker productivity makes possible greater
| specialization.
| keewee7 wrote:
| AI is being aggressively applied to areas where AI
| practitioners are domain experts. Think programming, data
| analysis etc.
|
| Programmers and data scientists might find ourselves among the
| first half of knowledge workers to be replaced and not among
| the last as we previously thought.
| muds wrote:
| It can be really tempting to think about research progression
| on a "linear" timescale but more often than not it eventually
| ends up following an "exponential" curve because of technical
| debt. And there appears to be a _lot_ of techniques used here
| which we don't fully understand.
|
| I wouldn't be surprised if a specifically engineered system ten
| years from now wins an ICPC gold medal but I'm pretty sure that
| a general purpose specification -> code synthesizer that would
| actually threaten software engineering would require us to
| settle a lot of technical debts first -- especially in the area
| of verifying code/text generation using large language models.
| EVa5I7bHFq9mnYK wrote:
| Don't worry, there are a lot of much simpler jobs, like drivers
| or cashiers that will surrender to AI before coder's job does.
| So UBI will be implemented long before that happens.
| solididiot wrote:
| I wouldn't be so sure. Programmers (and drivers and cashiers)
| can "survive" in poverty like millions others already do.
| This transformation is coming in waves that keep the
| proverbial frog in the pan.
| simpleguitar wrote:
| It doesn't even have to be average human.
|
| Let's say AI only gets to 10% (or 20% or 30% or whatever, it
| doesn't really matter), that's a huge number of jobs being
| lost.
|
| Imagine having a machine write all the "simple/boring" code for
| you. Your productivity will go through the roof. The smartest
| programmer who can most effectively leverage the machine could
| replace many hundreds of programmers.
|
| I should brush up on my plumbing and apply for a plumbing
| license soon. (I think plumbing is safer than electricians,
| because many CS people have good EE foundations).
| phendrenad2 wrote:
| Calling it now: Your prediction is off by an order of magnitude
| or two (10 years -> 100 years, or 1000 years)
| abecedarius wrote:
| Three months ago in the Copilot thread I was saying
|
| > in 5 years will there be an AI that's better than 90% of
| unassisted working programmers at solving new leetcode-type
| coding interview questions posed in natural language?
|
| and getting pooh-poohed.
| https://news.ycombinator.com/item?id=29020401 (And writing
| that, I felt nervous that it might not be aggressive enough.)
|
| There's this general bias in discussions of AI these days, that
| people forget that the advance they're pooh-poohing was
| dismissed in the same way as probably way off in the indefinite
| future, surprisingly recently.
| hackinthebochs wrote:
| The issue is these techniques are growing in capabilities
| exponentially, while we have a habit of extrapolating
| linearly. Some saw the glaring deficits in copilot then
| reasoned that linear improvements is still glaring deficits.
| I don't know that this bias can ever be corrected. A large
| number of intelligent people simply will never be convinced
| general AI is coming soon no matter what evidence is
| presented.
| Jensson wrote:
| > techniques are growing in capabilities exponentially,
| while we have a habit of extrapolating linearly
|
| What does this even mean? How do you put a number on AI
| capability? You can say it is growing faster than people
| expect, but what is even exponential or linear growth in AI
| capability?
| hackinthebochs wrote:
| I take your point that the linear/exponential terminology
| is a bit dubious. But the simple way to make sense of it
| is just going by various benchmarks. E.g. the power-law
| relationship between the model accuracy and the model
| size: https://eliaszwang.com/paper-reviews/scaling-laws-
| neural-lm/
| redsummer wrote:
| pkaye wrote:
| How long before it can write the code without plagiarizing code
| from online?
| stnmtn wrote:
| Humans study CS for 5 years, reading code from online to be
| able to solve these problems.
| falcor84 wrote:
| How long before the typical human coder can do so?
| pkaye wrote:
| Are you saying you cannot write code from scratch?
| sheikheddy wrote:
| Not the parent comment, but I cannot code from scratch
| (outside of very simple and small applications).
| Competitive Programming is at about the limit of what I
| can do without looking things up, and only because I've
| had practice specifically for that kind of artificial
| environment.
| falcor84 wrote:
| I can write some code from scratch, but my ability to
| write code is improved by an order of magnitude when I
| can refer to online resources, including example code.
| Jensson wrote:
| This is in line with what other code generation AI's have
| accomplished.
|
| To reach average level at codeforces you need to be able to
| apply a standard operation like a sort, or apply a standard
| math formula, as the first 1-2 problems in the easy contests
| are just that. It is impressive that they managed to get this
| result in real contests with real unaltered questions and see
| that it works. But generalizing this to harder problems isn't
| as easy, as there you need to start to device original
| algorithms instead of just applying standard algorithms, for
| such problems the model needs to understand computer science
| instead of just mapping language to algorithms.
| zerr wrote:
| The thing is, Competitive Programming (CP) is a completely
| different discipline/subject with its own trivia knowledge and
| tricks. CP uses Computer Science the same way as e.g. Biology
| uses Mathematics. It has very little in common with a real
| world software development.
| qualudeheart wrote:
| I said as much in another comment.
|
| Automating the software development profession proper is
| going to be much harder and will require autonomous agents
| with coherent world models, because that's what you need to
| act in a business context.
| f38zf5vdt wrote:
| A programming genie that grants programming wishes to the
| general public. Since most of what I do on a daily basis is
| engineering solutions based on tradeoffs, I can only imagine
| the number of programmers needed to debug solutions given by
| the programming genie in response to poorly described feature
| requests.
|
| If we become mechanics of the software AI vehicles of the
| future, so be it.
| csee wrote:
| You're extrapolating across very different types of problems.
| Go and Chess have unlimited training data. Competitive
| programming does not.
| raphlinus wrote:
| To me, that's actually one of the more interesting questions.
| It's possible to grade the output of the AI against objective
| criteria, like does it run, and resources consumed (RAM, CPU
| time, and, particularly of interest to me, parallel scaling,
| as GPU algorithms are too hard for most programmers). To what
| extent can you keep training by having the AI generate better
| and better solutions to a relatively smaller input pool of
| problems? I skimmed the paper to see how much they relied on
| this but didn't get a clear read.
| solididiot wrote:
| >> There'll be several new career paths made possible by this
| technology as greater worker productivity makes possible
| greater specialization.
|
| Can you list a few?
| Der_Einzige wrote:
| I'm already anticipating having the job title of "Query
| Engineer" sometime in the next 30 years, and I do NLP including
| large scale language model training. :(
| qualudeheart wrote:
| One of the big venture capitalists predicted "prompt
| engineering" as a future high paid and high status position.
|
| Essentially handling large language models.
|
| Early prompt engineers will probably be drawn from "data
| science" communities and will be similarly high status, well
| but not as well paid, and require less mathematical
| knowledge.
|
| I'm personally expecting an "Alignment Engineer" role
| monitoring AI systems for unwanted behavior.
|
| This will be structurally similar to current cyber security
| roles but mostly recruited from Machine Learning communities,
| and embedded in a broader ML ecosystem.
| jonas_kgomo wrote:
| I like this descriptions better, considering that companies
| like Anthropic are working specifically on Alignment and AI
| Safety. Being that the team actually spun out of Deep Mind,
| it is interesting.
| qualudeheart wrote:
| Alignment is going be a giant industry and will also
| include many people not originally in Stem. The
| humanities and "civil society" will both have their
| contributions to make.
|
| It's likely that alignment jobs won't themselves be
| automated because noone will trust AI systems to align
| themselves.
| sjg007 wrote:
| >"Alignment Engineer" role monitoring AI systems for
| unwanted behavior.
|
| ha, I know people already doing this..
| lugu wrote:
| Depending on what you want to do, you can either choose an
| industry with very fuzzy requirements (to stay near the
| programming side) or one with very complex but with strict
| requirements (to benefit from those coding robots). I guess we
| will need simulators for most of what we do in order to train
| those robots.
| buscoquadnary wrote:
| The problem is this view continues to view software engineers
| as people that write code, that's not what my job is, it is
| figuring out how to solve a business problem using technology,
| and getting people on board with that solution and updating and
| refining it.
|
| This viewpoint seems to me to be very similar to the idea of
| 3rd generation languages replacing developers because
| programming will be so easy, it isn't about how easy it is to
| write code, I function as a limited mentat taking all the
| possible requirements, tradeoffs constraints, analyzing them
| and then building the model, then I write out the code, the
| code artifact is not the value I add. The artifact is how I
| communicate the value to the world.
|
| This doesn't make programmers redundant anymore than Ruby, PHP,
| or Java made developers redundant because it freed them from
| having to manually remember and track memory usage and
| pointers, it is at most a tool to reduce the friction of
| getting what is in my head into the world.
|
| I control the code and whoever controls the code controls the
| business. I posses the ability to make out the strands of flow
| control and see the future state of the application. For I am
| the Sr. Software engineer and I have seen where no Project
| Manager can see.
|
| Apologies to Frank Herbet I just finished listening to Dune.
|
| EDIT:
|
| I got off track at the end but my point is that no matter how
| good the tools for developing the code are, they will never
| replace a software engineer anymore than electric drills and
| power saws replace home builders. It merely elevates our work.
| qualudeheart wrote:
| I actually agree with you on that. I had another comment
| further down the thread where I said that software
| engineering can't be fully automated by anything short of
| artificial general intelligence.
|
| As humans we have a coherent world model that current AI
| systems are nowhere near close to having.
|
| That coherent world model is a necessary precondition for
| both understanding a business goal and implementing a program
| to solve it. AlphaCode can do the second part but not the
| first.
|
| AlphaCode doesn't have that world model and even if it did it
| still wouldn't autonomously act on it, just follow orders
| from humans.
|
| Competitive programming is going to be solved much earlier
| than programming in a business context will, because it's
| completely independent of business requirements. It's at most
| half as hard of a problem .
| udev wrote:
| Yes, for very precise, comprehensive text descriptions of
| problems.
|
| It will take a far-far more advanced AI to write such
| descriptions for real-world problems.
|
| Writing requirements for a project is difficult work, and not
| for technical reasons, but for human reasons (people don't know
| what they want exactly, people have trouble imagining things
| they haven't seen yet, people are irrational, people might want
| something that is different from what they need, etc.)
|
| In this regard, we are safe for a few more decades at least.
| andy_ppp wrote:
| I would actually argue the programmers job has never been
| 100% writing the code, it's always been interpreting, fixing
| and decoding the ideas of others.
| bcrosby95 wrote:
| I would argue that we figured this out over 50 years ago
| but oddly enough some people still hold onto the idea.
| tluyben2 wrote:
| The older I get the more I see it has not been about
| programming for most tasks for quite a long time. In the
| early 80s it was a bit more (but not even much more); at
| that time as well I spent most of my time debugging and
| changing behaviour slightly (but in a lot of pages) instead
| of just cranking out huge bags of code.
| tluyben2 wrote:
| Yes, they have been trying to create 'sufficiently formal
| human readable text' to spec out projects; not detailed
| enough to execute by a computer but formal and precise enough
| so humans know exactly what they are getting. That still
| doesn't work at all and that is between humans. If the specs
| are clear enough, the act of programming is already mostly
| not the issue, however, they never are. I am looking forward
| to ML helping me writing boring code (which CoPilot already
| does, but again, that's not really where time/energy is spent
| anyway) and protect against security issues, scalability
| issues and all kinds of bugs (it could rewrite algo's it
| knows; it could recommend libraries that I should use instead
| of the crap I rolled myself etc).
| qualudeheart wrote:
| Fully automating software engineering won't happen until AGI.
| As a good Yuddite I expect us to have bigger problems when
| that happens.
|
| You need an agent with a large and coherent world model, in
| order to understand how your programs relate to the real
| world, in order to solve business tasks.
|
| This isn't something any program synthesis tech currently
| available can do, because none of it has a coherent world
| model.
|
| GPT-3 comes closest to this, but isn't able to engage in any
| kind of planning or abstract modeling, beyond semi coherent
| extrapolations from training data.
|
| Maybe scaling up GPT by a few more orders of magnitude would
| work, by generating an emergent world model along the way.
| CobrastanJorji wrote:
| What is a "Yuddite?" I tried Googling for it and got the
| impression it was LessWrong forum terminology for people
| who believed too strongly in LessWrong, but I couldn't find
| many references.
| nikkwong wrote:
| I believe he's referring to "luddites" -- a group of
| people who resisted technological innovation during the
| industrial revolution.
| indiv0 wrote:
| Luddite but mixed with "Eliezer Yudkowsky" who is a
| researcher working on the problem of friendly AI (or
| whatever they're calling it these days). Basically trying
| to prevent skynet.
|
| The GP is saying that once we have AGI, then "AGI is
| going to make the human race irrelevant" outweighs "AGI
| makes software devs irrelevant".
| qualudeheart wrote:
| That's the idea.
| qualudeheart wrote:
| I am a follower of Elizier Yudkowsky.
| steve76 wrote:
| NicoJuicy wrote:
| I would stop programming if all we needed to write was unit tests
| :p
| FartyMcFarter wrote:
| To compensate, lots of people would _start_ programming if that
| happened though. Many scientists would be interested in solving
| their field 's problems so easily - certainly maths would
| benefit from it.
| rmujica wrote:
| wasn't it this the motivation for Prolog?
| [deleted]
| 37ef_ced3 wrote:
| The example problem (essentially, is T a subsequence of S with
| deletions of size N) is a classic problem with no doubt dozens of
| implementations in AlphaCode's training set.
|
| And yet, what a garbage solution it produces.
|
| To illustrate the difference between intelligence and
| regurgitation, someone tell me what CoPilot generates for this:
| // A Go function to swap the sixth bit and seventeenth bit of a
| 32-bit signed integer.
|
| Here is a human solution: func swap(x int32)
| int32 { const mask = 1 << 5 var (
| xor1 = (x>>11 ^ x) & mask xor2 = xor1 << 11
| ) return x ^ xor1 ^ xor2 }
|
| CoPilot cannot reason numerically like this (understand
| "seventeenth bit" and "sixth bit" and generate the right code for
| that combination). It needs to understand the size of the gap
| between the bits, i.e., 11, and that's too hard.
| [deleted]
| deanmen wrote:
| You can do it without a subtraction unsigned
| int swapbits(unsigned int a) { bool bit6 = a & (1 <<
| 5); bool bit17 = a & (1 << 16); if (bit6 == bit17)
| return a; //bits are the same, do nothing return (a ^
| (1 << 5) ^ (1 << 16)); // flip both 6th and 17th bits
| }
| 37ef_ced3 wrote:
| And, to be clear, this is a human solution.
|
| Not as efficient as mine, but kudos.
| dskloet wrote:
| There's really no need for an 11 in the code. I'd say that
| makes the code worse, not better.
| 37ef_ced3 wrote:
| This is a toy problem to illustrate that CoPilot cannot write
| code that requires mathematical reasoning. It regurgitates
| solutions from the training set, via a mixed internal
| reresentation.
| deanmen wrote:
| unsigned int swapbits(unsigned int a) {
| bool bit6 = a & (1 << 5); bool bit17 = a & (1 <<
| 16); if (bit6 == bit17) return a; //bits are the
| same, do nothing return (a ^ (1 << 5) ^ (1 <<
| 16)); // flip both 6th and 17th bits }
| dskloet wrote:
| What requires mathematical reasoning? Getting or setting
| the nth bit? Or swapping two variables? What am I missing?
| altcognito wrote:
| Would we be able to generate unit tests? Strikes me that this
| would be important to verify given that we didn't even "write"
| the code. At some point we might not even be looking at the
| generated code? I almost guarantee that's what is going to
| happen eventually.
| 37ef_ced3 wrote:
| You can see it happening already.
|
| Solutions are posted, and they're wrong.
|
| But the CoPilot user can't see the code is wrong.
| [deleted]
| quenix wrote:
| Here is what CoPilot wrote in Python: # A
| function to swap the sixth bit and seventeenth bit of a 32-bit
| signed integer. def swap_bits(x): # Get the
| value of the sixth bit. bit6 = x & (1 << 5)
| # Get the value of the seventeenth bit. bit17 = x &
| (1 << 16) # Swap the values of the sixth and
| seventeenth bit. bit6 = bit6 >> 5 bit17 =
| bit17 >> 16 # Combine the swapped values of the
| sixth and seventeenth bit. x = x ^ (bit6 << 16)
| x = x ^ (bit17 << 5) return x
|
| It actually synthesised about 10 viable solutions. Some
| interesting ones: def swap_six_seventeen(x):
| # Get the binary representation of the integer.
| binary = bin(x)[2:] # Add zeros to the beginning of
| the binary representation. binary = '0' * (32 -
| len(binary)) + binary # Swap the sixth and
| seventeenth bit. binary = binary[:5] + binary[17] +
| binary[5:17] + binary[18:] # Convert the binary back
| to an integer. return int(binary, 2)
| omnicognate wrote:
| The first one (swap_bits) sets both bits to the same value,
| which is the original two bits XORed together. Eg.
| bin(swap_bits(0b_1_0000000000_0_00000))
| '0b10000000000100000'
| bin(swap_bits(0b_0_0000000000_1_00000))
| '0b10000000000100000'
| bin(swap_bits(0b_1_0000000000_1_00000)) '0b0'
| bin(swap_bits(0b_0_0000000000_0_00000)) '0b0'
|
| The second one converts the value to a string and uses string
| operations, which is wildly inefficient and a very common
| mistake made by inexperienced programmers unaware of bitwise
| operations (so presumably common in the training set). It
| also attempts to swap the 6th and 17th _most_ significant
| bits rather than the 6th and 17th _least_ significant bits,
| i.e. counts in the opposite direction to the first one (the
| comment doesn 't specify but typically you count from the
| least significant bit in these situations).
|
| Worse, though, it gets the string manipulation completely
| wrong. I think it's trying for `binary[:5] + binary[16] +
| binary[6:16] + binary[5] + binary[17:]`, i.e. characters 1-5,
| then character 17, then characters 7-16, then character 6,
| then characters 18-32. The manipulation it does just
| completely mangles the string.
|
| I'm very keen to try Github Copilot if they ever admit me to
| the beta (I've been waiting forever) and will adopt it
| enthusiastically if it's useful. However, this is exactly
| what I've pessimistically expected. Analysing these truly
| awful implementations to identify the subtle and bizarre
| misbehaviours has taken me far, far longer than it would have
| taken me to just write and test a working implementation
| myself. And I'm supposed to evaluate 10 of these to see if
| one of them might possibly do the right thing?!?!
| Veedrac wrote:
| The first example is almost correct, conditioned off a
| sentence description. The second example is the right idea,
| it just bit off more than it could chew when slicing it all
| together. Using string ops for binary manipulation in
| Python isn't even stupid; it can be faster in a lot of
| cases.
|
| This feels a lot like screaming at a child for imperfect
| grammar.
| 37ef_ced3 wrote:
| It illustrates that CoPilot is generating maximum
| likelihood token strings and has no real understanding of
| the code.
|
| That's what is happening here. There is no intelligence,
| just regurgitation. Randomization and maximum likelihood
| completion.
|
| Just like with the competitive programming example, we're
| asking it to produce solutions that it has seen in its
| training set. If you ask for a nontrivial twist on one of
| those solutions, it fails.
| hackinthebochs wrote:
| >It illustrates that CoPilot is generating maximum
| likelihood token strings and has no real understanding of
| the code.
|
| Funny, today I was just thinking of people's tendencies
| to dismiss AI advances with this very pattern of
| reasoning: take a reductive description of the system and
| then dismiss it as obviously insufficient for
| understanding or whatever the target is. The assumption
| is that understanding is fundamentally non-reductive, or
| that there is insufficient complexity contained within
| the reductive description. But this is a mistake.
|
| The fallacy is that the reductive description is glossing
| over the source of the complexity, and hence where the
| capabilities of the model reside. "Generating maximum
| likelihood token strings" doesn't capture the complexity
| of the process that generates the token strings, and so
| an argument that is premised on this reductive
| description cannot prove the model deficient. For
| example, the best way to generate maximum likelihood
| human text is just to simulate a human mind. Genuine
| understanding is within the solution-space of the problem
| definition in terms of maximum likelihood strings, thus
| you cannot dismiss the model based on this reductive
| description.
| 37ef_ced3 wrote:
| The difference between me and you is that I implement
| neural nets professionally. Here is one of my (non-
| professional) open source projects: https://NN-512.com
|
| I'm sure if you understood what the transformer was
| doing, you would be less impressed.
| hackinthebochs wrote:
| This is the wrong context to go with an appeal to
| authority. I know what the transformer is doing, I've
| also developed neural networks before (though not
| professionally). Your experience is working against you
| in developing your intuition. There's another common
| fallacy that because we're somehow "inside" the system,
| that we understand exactly what is going on, or in this
| case what isn't going on. Language models are composed of
| variations of matrix multiplications, but that isn't a
| complete description of their behavior. It's like saying
| because we've looked inside the brain and there's just
| electrical and chemical signals, the mind must reside
| somewhere else. It's just a specious argument.
| Veedrac wrote:
| It got the value of the sixth and seventeenth bits, moved
| them into the right positions, and inserted them into the
| original value. Off a one-line description _written in
| English_! I really cannot empathize with the idea that
| this is not a meaningful capability. If intelligence only
| means to you "equal in all capabilities to an experienced
| human", you are never going to be able to see anything
| coming ever.
| 37ef_ced3 wrote:
| If you ask CoPilot to solve something it hasn't seen, it
| won't be able to solve it.
|
| It's a transformer. Do you understand what that means?
| It's just matrix multiplication.
|
| It generates maximum likelihood token strings, based on
| its training data.
|
| It doesn't "understand" what those token string mean.
|
| You are amazed because you're testing the transformer by
| asking the transformer to generate human-written code
| THAT IT WAS TRAINED ON. To make CoPilot fail, all you
| have to do is ask it to generate something unlikely,
| something it hasn't seen in training.
|
| Maximum likelihood token strings. Period.
| omnicognate wrote:
| You're misunderstanding my point. Nobody's screaming at
| anything. Whether this thing is impressive isn't at
| issue. It's utterly astonishing.
|
| I'm trying to figure out whether copilot in its current
| form is a tool that will be useful to me in my job. (I'd
| be able to do this evaluation properly if they'd just let
| me on the damned beta.)
|
| Nearly right isn't good enough for this afaics. In fact,
| I expect there to be a slightly paradoxical effect where
| nearly-right is worse than obviously-wrong. An analysis
| of a piece of code like I did above is time consuming and
| cognitively taxing. An obviously wrong solution I can
| just reject immediately. An almost-right (or at least
| vaguely plausible) one like these takes _thought_ to
| reject. Much more thought, in this case (for me, at
| least) than just writing the thing myself in the first
| place.
|
| Edit: BTW, I don't get what you're saying with
|
| "The first example is almost correct, conditioned off a
| sentence description. The second example is the right
| idea, it just bit off more than it could chew when
| slicing it all together."
|
| The first one is completely (if subtly) wrong. It's
| supposed to swap two bits but it sets them to the same
| value. There's no interpretation of the description in
| which that's correct.
|
| The second one is definitely not "the right idea". It
| tries to do it with string manipulations, which
| (regardless of the fact that it does so incorrectly) is
| completely the wrong approach. This one is actually
| "better" than the other in the paradoxical sense I
| mentioned above, because I could reject it the moment I
| saw it convert the number to a string.
| Veedrac wrote:
| > The second one is definitely not "the right idea". It
| tries to do it with string manipulations, which
| (regardless of the fact that it does so incorrectly) is
| completely the wrong approach. This one is actually
| "better" than the other in the paradoxical sense I
| mentioned above, because I could reject it the moment I
| saw it convert the number to a string.
|
| In this case string ops are a worse idea, but as I said
| before, this is not generally true of Python, at least
| when using CPython. Eg. the string method is
| significantly the faster in this example:
| # https://stackoverflow.com/a/20918545/1763356
| def reverse_mask(x): x = ((x & 0x55555555) <<
| 1) | ((x & 0xAAAAAAAA) >> 1) x = ((x &
| 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2) x
| = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4)
| x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8)
| x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16)
| return x # My ver def
| reverse_format(x): return
| int(f"{x:032b}"[::-1], 2)
|
| Python's dynamic object overhead (and to a lesser extent,
| interpreter overhead) makes a lot of seemingly-expensive
| operations not matter very much.
| Veedrac wrote:
| Well, that also seems like the wrong question to ask.
| Whether it's currently useful to you for writing short
| algorithms, rather than as the non-programmer's API
| interface it's primarily marketed as, seems about the
| least interesting take-away for it. We'll get to
| smoothing over the cracks later, once it's not a
| capability we literally just discovered exists. Heck,
| Codex is already not SOTA for that, AlphaCode is.
| omnicognate wrote:
| It may not be the question that interests you but who are
| you to say it's the "wrong question" for me to ask? I
| want to know if I'm on the verge of having access to a
| tool that is going to transform the way I do my job, as
| people keep claiming.
| 37ef_ced3 wrote:
| All wrong. These "solutions" are not correct.
| UncleMeat wrote:
| The xors are wrong. Stick in the integer 1<<16.
| c4wrd wrote:
| Ask and you shall receive, here is the output it gave me: //
| Write a Go function to swap the sixth bit and seventeenth bit
| of a 32-bit signed integer. package main
| import ( "fmt" "math" )
| func main() { var a int32 = 0b101010101010101010101010
| 1010101010101010101010101010101010101010
| fmt.Printf("%b\n", a) fmt.Printf("%b\n", swapBits(a,
| 6, 17)) } func swapBits(a int32, i
| int, j int) int32 { // convert to binary bin
| := fmt.Sprintf("%b", a) // get the bits bit1
| := bin[i-1 : i] bit2 := bin[j-1 : j] // swap
| the bits bin = bin[:i-1] + bit2 + bin[i:] bin
| = bin[:j-1] + bit1 + bin[j:] // convert back to int
| return int32(bin2int(bin)) } func
| bin2int(bin string) int64 { var sum int64 for
| i, v := range bin { if v == '1' { sum +=
| int64(math.Pow(2, float64(len(bin)-i-1))) }
| } return sum }
| 37ef_ced3 wrote:
| Ridiculous. It's a joke, right? Perhaps the most inefficient
| and naive solution ever?
|
| CoPilot is regurgitating some solution from its training set,
| the solution of an inept programmer who would manipulate bits
| via conversion to string... yikes.
| skulk wrote:
| The next iteration of code assistant needs to be able to
| parse responses like your comment and update the code
| accordingly. Once a human+computer pair can converge on a
| correct and admissible solution to _any_ tractable
| programming task through natural language dialogue, we
| should start worrying about our jobs going away. Until
| then, for each line of code generated by AI, there will be
| two jobs created to maintain that code.
| electroly wrote:
| Copilot can do that, sorta. You undo the completion and
| add something like "... but don't convert it to a string"
| to the comment, then have it try completing again.
| hackinthebochs wrote:
| Which direction in feature space do you move in response
| to "you inept POS"?
| jdrc wrote:
| "And so in 2022 the species programmus programmicus went extinct"
| udev wrote:
| I am thinking whether this result can create a type of loop that
| can self-optimize.
|
| We have AI to generate reasonable code from text problem
| description.
|
| Now what if the problem description text is to generate such a
| system in the first place?
|
| Would it be possible to close the loop, so to speak, so that over
| many iterations:
|
| - text description is improved
|
| - output code is improved
|
| Would it be possible to create something that converges to
| something better?
| machiaweliczny wrote:
| I am actually trying this. Basically by asking questions to AI
| and teaching it to generate code / google when it doesn't know
| something. The other process checks if code is valid and either
| ask it to get more context or executes code and feeds back to
| file :)
| machiaweliczny wrote:
| I think one can make problem "differentiable" via some
| heuristics and if you have NN trained to rate code quality
| and some understanding what should be used for type of
| problem, memory and speed and than can classify problem to
| group then rate solution it should be able to guide the
| process (in competitive programming).
| indiv0 wrote:
| Do you have a blog or a github or something? This sounds
| really neat.
| wilde wrote:
| Oh sweet! When can skip the bullshit puzzle phone screens?
| doctor_eval wrote:
| I sometimes read these and wonder if I need to retrain. At my
| age, I'll struggle to get a job at a similar level in a new
| industry.
|
| And then I remember that the thing I bring to the table is the
| ability to turn domain knowledge into code.
|
| Being able to do competitive coding challenges is impressive, but
| a very large segment of software engineering is about eliciting
| what the squishy humans in management actually want, putting it
| into code, and discovering as quickly as possible that it's not
| what they really wanted after all.
|
| It's going to take a sufficiently long time for AI to take over
| management that I don't think oldies like me need to worry too
| much.
| prideout wrote:
| It is obvious to me that computer programming is an interesting
| AI goal, but at the same time I wonder if I'm biased, because I'm
| a programmer. The authors of AlphaCode might be biased in this
| same way.
|
| I guess this makes sense though, from a practical point of view.
| Verifying correctness would be difficult in other intellectual
| disciplines like physics and higher mathematics.
| thomasahle wrote:
| Just make it output a proof together with the program.
| EGreg wrote:
| To me, coding in imperative languages are one of the hardest
| things to produce an AI for with current approaches (CNN's, MCTS
| and various backpropagation). Something like Cyc would seem to be
| a lot more promising...
|
| And yet, I am starting to see (with GitHub's Copilot, and now
| this) a sort of "GPT-4 for code". I do see many problems with
| this, including:
|
| 1. It doesn't actually "invent" solutions on its own like
| AlphaZero, it just uses and remixes from a huge body of work that
| humans put together,
|
| 2. It isn't really ever sure if it solved the problem, unless it
| can run against a well-defined test suite, because it could have
| subtle problems in both the test suite and the solution if it
| generated both
|
| This is a bit like readyplayer.me trying to find the closest
| combination of noses and lips to match a photo (do you know any
| open source alternatives to that site btw?)
|
| But this isn't really "solving" anything in an imperative
| language.
|
| Then again, perhaps human logic is just an approaching with
| operations using low-dimensional vectors, able to capture simple
| "explainable" models while the AI classifiers and adversarial
| training produces far bigger vectors that help model the
| "messiness" of the real world and also find simpler patterns as a
| side effect.
|
| In this case, maybe our goal shouldn't be to get solutions in the
| form of imperative language or logic, but rather unleash the
| computer on "fuzzy" inputs and outputs where things are "mostly
| correct 99.999% of the time". The only areas where this could
| fail is when some intelligent adversarial network exploits
| weaknesses in that 0.001% and makes it more common. But for
| natural phenomena it should be good enough !
| qualudeheart wrote:
| Can you write more about how Cyc would help? The idea behind
| Cyc is cool but I don't think I've seen anyone discuss using it
| for program synthesis.
| gfd wrote:
| Relevant blogpost on codeforces.com (the competitive programming
| site used): https://codeforces.com/blog/entry/99566
|
| Apparently the bot would have a rating of 1300. Although the elo
| rating between sites is not comparable, for some perspective,
| mark zuckerberg had a rating of ~1k when he was in college on
| topcoder: https://www.topcoder.com/members/mzuckerberg
| baobabKoodaa wrote:
| The median rating is not descriptive of median ability, because
| a large number of Codeforces competitors only do one or a few
| competitions. A very small number of competitors hone their
| skills over multiple competitions. If we were to restrict our
| sample to competitors with more than 20 competitions, the
| median rating would be much higher than 1300. It's amazing that
| Alphacode achieved a 1300 rating, but compared to humans who
| actually practice competitive coding, this is a low rating.
|
| To clarify, this is a HUGE leap in AI and computing in general.
| I don't mean to play it down.
| YeGoblynQueenne wrote:
| >> To clarify, this is a HUGE leap in AI and computing in
| general. I don't mean to play it down.
|
| Sorry, but it's nothing of the sort. The approach is
| primitive, obsolete, and its results are very poor.
|
| I've posted this three times already but the arxiv preprint
| includes an evaluation against a formal benchmark dataset,
| APPS. On that more objective measure of performance, the best
| performing variant of AlphaCode tested, solved 25% of the
| easiest tasks ("introductory") and less than 10% of the
| intermediary ("interview") and advanced ("competition")
| tasks.
|
| What's more, the approach that AlphaCode takes to program
| generation is primitive. It generates _millions_ of candidate
| programs and then it "filters" them by running them against
| input-output examples of the target programs taken from the
| problem descriptions. The filtering still leaves thousands of
| candidate programs (because there are very few I/O examples
| and the almost random generation can generate too many
| programs that pass the tests, but still don't solve the
| problem) so there's an additional step of clustering applied
| to pare this down to 10 programs that are finally submitted.
| Overall, that's a brute-force, almost random approach that is
| ignoring entire decades of program synthesis work.
|
| To make an analogy, it's as if DeepMind had just published an
| article boasting of its invention of a new sorting
| algorithm... bubblesort.
| gfd wrote:
| You can find the rating distribution filtered for >5 contests
| here: https://codeforces.com/blog/entry/71260
|
| I am rated at 2100+ so I do agree that 1300 rating is low.
| But at the same time it solved
| https://codeforces.com/contest/1553/problem/D which is rated
| at 1500 which was actually non-trivial for me already. I had
| one wrong submit before getting that problem correct and I do
| estimate that 50% of the regular competitors (and probably
| the vast majority of the programmers commenting in this
| thread right now) should not be able to solve it within 2hrs.
| rfoo wrote:
| 1553D is a quite confusing case though.
|
| On the AlphaCode Attention Visualization website [1], the
| _Accepted_ code shown for 1553D is a O(n^2) Python one,
| which is supposed to be TLE. It correctly implements a two-
| pointer solution, but failed to "realize" that list.pop(0)
| is O(n) in Python. I'm not sure how it passed.
|
| [1] https://alphacode.deepmind.com/#layer=30,problem=34,hea
| ds=11...
| Jensson wrote:
| Likely the python runtime has a strange string
| implementation for cases like this, just like javascript
| strings.
| the-smug-one wrote:
| I'm trying to solve this for fun, but I'm stuck! I've got a
| recursive definition that solves the problem by building a
| result string. I think it's a dynamic programming problem,
| but right now I can't see the shared sub-problems so :).
| Some real sour cherries being experienced from not getting
| this one!
| johndough wrote:
| The proposed O(N2) solution contains many unnecessary
| operations, e.g. the creation of list c or reversal of the
| input strings. Maybe it has been copied from a related
| problem? You can easily solve the task with half as many
| lines in O(N). for _ in
| range(int(input())): a = list(input())
| b = list(input()) while a and b:
| if a[-1] == b[-1]: a.pop()
| b.pop() else: a.pop()
| if a: a.pop() print("NO" if b else "YES")
| pedrosorio wrote:
| > But at the same time it solved
| https://codeforces.com/problemset/problem/1553/D
|
| To be fair, it generated a set of (10) possible solutions,
| and at least one of them solved the problem.
| captain_price7 wrote:
| For comparison, I used to be a very average, but pretty regular
| user about 5 years ago. I could reliably solve easiest 2 out of
| 5 problems, 3 in my lucky days.
|
| My rating is 1562.
| jakey_bakey wrote:
| At the risk of sounding relentlessly skeptical - surely by
| training the code on GitHub data you're not actually creating an
| AI to solve problems, but creating an extremely obfuscated
| database of coding puzzle solutions?
| ogogmad wrote:
| _We validated our performance using competitions hosted on
| Codeforces, a popular platform which hosts regular competitions
| that attract tens of thousands of participants from around the
| world who come to test their coding skills. We selected for
| evaluation 10 recent contests, each newer than our training
| data. AlphaCode placed at about the level of the median
| competitor, marking the first time an AI code generation system
| has reached a competitive level of performance in programming
| competitions._
|
| [edit] Is "10 recent contests" a large enough sample size to
| prove whatever point is being made?
| [deleted]
| YeGoblynQueenne wrote:
| The test against human contestants doesn't tell us anything
| because we have no objective measure of the ability of those
| human coders (they're just the median in some unknown
| distribution of skill).
|
| There's more objective measures of performance, like a good,
| old-fashioned, benchmark dataset. For such an evaluation, see
| table 10 in the arxiv preprint (page 21 of the pdf), listing
| the results against the APPS dataset of programming tasks.
| The best performing variant of AlphaCode solves 25% of the
| simplest ("introductory") APPS tasks and less than 10% of the
| intermediary ("interview") and more advanced ones
| ("competition").
|
| So it's not very good.
|
| Note also that the article above doesn't report the results
| on APPS. _Because_ they 're not that good.
| solididiot wrote:
| Does it need to solve original problems? Most of the code we
| write is dealing with the same problems in a slightly different
| context each time.
|
| As others say in commends it might be the case where we meet in
| the middle. Us writing some form of tests for AI-produced code
| to pass.
| qualudeheart wrote:
| That's been a common objection to Copilot and other recent
| program synthesis papers.
|
| The models regurgitate solutions to problems already
| encountered in the training set. This is very common with
| Leetcode problems and seems To still happen with harder
| competitive programming problems.
|
| I think someone else in this thread even pointed put an example
| of AlphaCode doing the same thing.
| FiberBundle wrote:
| It never ceases to amaze me what you can do with these
| transformer models. They created millions of potential solutions
| for each problem, used the provided examples for the problems to
| filter out 99% of incorrect solutions and then applied some more
| heuristics and the 10 available submissions to try to find a
| solution.
|
| All these approaches just seem like brute-force approaches: Let's
| just throw our transformer on this problem and see if we can get
| anything useful out of this.
|
| Whatever it is, you can't deny that these unsupervised models
| learn some semantic representations, but we have no clue at all
| what that actually is and how these model learn that. But I'm
| also very sceptical that you can actually get anywhere close to
| human (expert) capability in any sufficiently complex domain by
| using this approach.
| bricemo wrote:
| What do you think then is the difference between going from
| 50th to 99.9th percentile in their other domains? Is there
| something materially different between ago, protein folding, or
| coding? (I don't know the answer, just curious if anyone else
| does)
| jahewson wrote:
| That's a big question but I'm tempted to answer it with a
| yes. A protein sequence contains a complete description of
| the structure of a protein but a coding question contains
| unknowns and the answers contain subjective variability.
| FiberBundle wrote:
| Well with respect to Go the fundamental difference afaict is
| that you can apply self-supervised learning, which is an
| incredibly powerful approach (But note e.g. that even this
| approach wasn't successful in "solving" Starcraft).
| Unfortunately it's extremely difficult to frame real-world
| problems in that setting. I don't know anything about
| protein-folding and don't know what Deepmind uses to try to
| solve that problem, so I cannot comment on that.
| cjbprime wrote:
| > this approach wasn't successful in "solving" Starcraft)
|
| Why do you say that? As I understand it, AlphaStar beat
| pros consistently, including a not widely reported
| showmatch against Serral when he was BlizzCon champ.
| zwaps wrote:
| Not once humans adapted to it afaik. AlphaStar got to top
| grandmaster level and then that was it, as people found
| ways to beat it. Now, it may be that the team considered
| the project complete and stopped training it. But
| technically - as it stands - Starcraft is still the one
| game where humans beat AI.
| gavagai691 wrote:
| Two possible reasons.
|
| 1. First, though I am not sure of this (i.e. this should
| be verified), I heard that the team working on AlphaStar
| initially tried to create a Starcraft AI entirely through
| "self-play," but this was not successful. (Intuitively,
| in a real-time game, there are too many bad options too
| early on that even with a LOT of time to learn, if your
| approach is too "random" you will quickly enter an
| unwinnable position and not learn anything useful.) As a
| result, they replaced this approach with an approach
| which incorporated learning from human games.
|
| 2. "including a not widely reported showmatch against
| Serral when he was BlizzCon champ." is a
| mischaracterization. It was not a "showmatch," rather
| there was a setup at Blizzcon where anyone could sit down
| and play against AlphaStar, and Serral at some point sat
| down to play AlphaStar there. He went 0-4 vs AlphaStar's
| protoss and zerg, and 1-0 vs its Terran. However, not
| only was he not using his own keyboard and mouse, but he
| could not use any custom hotkeys. If you do not play
| Starcraft it may not be obvious just how large of a
| difference this could make. BTW, when Serral played
| (perhaps an earlier iteration of) AlphaStar's terran on
| the SC2 ladder, he demolished it.
|
| I remember when seeing the final report, I was a bit
| disappointed. It seemed like they cut the project off at
| a strange point, before AlphaStar was clearly better than
| humans. I feel that if they had continued they could have
| gotten to that point, but now we will never know.
| briga wrote:
| Another way to frame it is that these models still perform very
| poorly at the task they're designed to do. Imagine if real
| programmer needed to write a solution a hundred times before
| they were able to achieve (average) performance. You'd probably
| wonder if it was just blind luck that got them to the solution.
| You'd also fire them. What these models are very good at doing
| is plagiarizing content, so part of me wonders if they aren't
| just copying previous solutions with slight adjustments.
| zmmmmm wrote:
| Has nobody yet asked it to write itself?
| ensan wrote:
| Wake me up when an AI creates an operating system on the same
| level of functionality as early-years Linux.
| timetotea wrote:
| If you want some video explanation https://youtu.be/Qr_PCqxznB0
| pedrobtz wrote:
| What about finding bugs, zero-day exploits?
| erwincoumans wrote:
| It would be interesting if a future 'AlphaZeroCode' with access
| to a compiler and debugger can learn to code, generating data
| using self-play. Haven't read the paper yet, seems some
| impressive milestone.
| [deleted]
| throwaway5752 wrote:
| Most people here are programmers (or otherwise involved in the
| production of software). We shouldn't look at RPA and other job
| automation trends dispassionately. SaaS valuations aren't were
| they are (and accounting doesn't treat engineering salary as cost
| of goods sold) because investors believe that they will require
| armies of very well paid developers in perpetuity.
| countvonbalzac wrote:
| what?
| londons_explore wrote:
| > AlphaCode placed at about the level of the median competitor,
|
| In many programming contests, a large number of people can't
| solve the problem at all, and drop out without submitting
| anything. Frequently that means the median scoring solution is a
| blank file.
|
| Therefore, without further information, this statement shouldn't
| be taken to be as impressive as it sounds.
| [deleted]
| d0mine wrote:
| It reminds me that median reputation on StackOverflow is 1. All
| AlphaSO would have to do is to register to receive median
| reputation on SO ;) (kidding aside AlphaCode sounds like magic)
|
| Inventing relational DBs hasn't replaced programmers, we just
| write custom DB engines less often. Inventing electronic
| spreadsheets hasn't deprecated programmers, it just means that we
| don't need programmers for corresponding tasks (where
| spreadsheets work well).
|
| AI won't replace programmers until it grows to replace the
| humanity as a whole.
| falcor84 wrote:
| >AI won't replace programmers until it grows to replace the
| humanity as a whole.
|
| Yes, but after seeing this progress in the former, my time
| estimate of time remaining until the latter had just
| significantly shortened.
| qualudeheart wrote:
| I don't even think the "will AI replace human programmers"
| question is that interesting anymore. My prediction is that a
| full replacement won't happen until we achieve general
| artificial intelligence, and have it treat programming as it
| would any other problem.
|
| Elsewhere ITT I've claimed that to fully automate programming
| you also need a model of the external world that's on par with
| a humans.
|
| Otherwise you can't work a job because you don't know how to do
| the many other tasks that aren't coding.
|
| You need to understand what the business goals are and how your
| program solves them.
| a-dub wrote:
| > In our preprint, we detail AlphaCode, which uses transformer-
| based language models to generate code at an unprecedented scale,
| and then smartly filters to a small set of promising programs
|
| if you're using a large corpus of code chunks from working
| programs as symbols in your alphabet, i wonder how much entropy
| there actually is in the space of syntactically correct solution
| candidates.
| softwaredoug wrote:
| I think CoPilot, etc will be revolutionary tools AND I think
| human coders are needed. Specifically I love CoPilot for the task
| of "well specified algorithm to solve problem with well-defined
| inputs and outputs". The kind of problem you could describe as a
| coding challenge.
|
| BUT, our jobs have a lot more complexity
|
| - Local constraints - We almost always work in a large, complex
| existing code base with specific constraints
|
| - Correctness is hard - writing lots of code is usually not the
| hard part, it's proving it correct against amorphous
| requirements, communicated in a variety of human social contexts,
| and bookmarked.
|
| - Precision is extremely important - Even if 99% of the time,
| CoPilot can spit out a correct solution, the 1% of the time it
| doesn't creates a bevy of problems
|
| Are those insurmountable problems? We'll see I suppose, but we
| begin to verge on general AI if we can gather and understand half
| a dozen modalities of social context to build a correct solution.
|
| Not to mention much of the skill needed in our jobs has much more
| to do with soft skills, and the bridge between the technical and
| the non technical, and less to do with hardcore heads-down
| coding.
|
| Exciting times!
| tasubotadas wrote:
| I just hope that this shows how useless competitive programming
| is that it can be replace by the Transformer-model.
|
| Additionally, people should REALLY rething their coding
| interviews if they can be solved by a program.
| msoad wrote:
| This seems to have a narrower scope than GitHub Copilot. It
| generates more lines of code to a more holistic problem vs.
| GitHub Copilot that works as a "more advanced autocomplete" in
| code editors. Sure Copilot can synthesize full functions and
| classes but for me, it's the most useful when it suggests another
| test case's title or writes repetitive code like this.foo = foo;
| this.bar = bar etc...
|
| Having used Copilot I can assure you that this technology won't
| replace you as a programmer but it will make your job easier by
| doing things that programmers don't like to do as much like
| writing tests and comments.
| ipnon wrote:
| The big question seems to be whether par with professional
| programmers is a matter of increasing training set and flop
| size, or whether different model or multi-model architectures
| are required.
|
| It does look like we've entered an era where programmers who
| don't use AI assistants will be disadvantaged, and that this
| era has an expiration date.
| stupidcar wrote:
| Having used Copilot for a while, I am quite certain it _will_
| replace me as a programmer.
|
| It appears to me that when it comes to language models,
| intelligence = experience * context. Where experience is the
| amount what's encoded in the model, and context is the prompt.
| And the biggest limitation on Copilot currently is context. It
| behaves as an "advanced autocomplete" because it all is has to
| go on is what regular autocomplete sees, e.g. the last few
| characters and lines of code.
|
| So, you can write a function name called createUserInDB() and
| it will attempt to complete it for you. But how does it know
| what DB technology you're using? Or what your user record looks
| like? It doesn't, and so you typically end up with a "generic"
| looking function using the most common DB tech and naming
| conventions for your language of choice.
|
| But now imagine a future version of Copilot that is
| automatically provided with a lot _more_ context. It also gets
| fed a list of your dependencies, from which it can derive which
| DB library you 're using. It gets any locatable SQL schema
| file, so it can determine the columns in the user table. It
| gets the text of the Jira ticket, so it can determine the
| requirements.
|
| As a programmer a great deal of time is spent checking these
| different sources and synthesising them in your head into an
| approach, which you then code. But they are all just text, of
| one form or another, and language models can work with them
| just as easily, and much faster, than you can.
|
| And one the ML train coding gets running, it'll only get
| faster. Sooner or later Github will have a "Copilot bot" that
| can automatically make a stab at fixing issues, which you then
| approve, reject, or fix. And as thousands of these issues pile
| up, the training set will get bigger, and the model will get
| better. Sooner or later it'll be possible to create a repo,
| start filing issues, and rely on the bot to implement
| everything.
| solarmist wrote:
| I'm skeptical it'll replace programmers, as in no more human
| programmers, but agree in the sense 100% human programmers ->
| 50%, 25%, 10% human programmers + computers doing most of the
| writing of actual code.
|
| I see it continuing to evolve and becoming a far superior
| auto-complete with full context, but, short of actual general
| AI, there will always be a step that takes a high-level
| description of a problem and turns it into something a
| computer can implement.
|
| So while it will make the remaining programmers MUCH more
| productive, thereby reducing the needed number of
| programmers, I can't see it driving that number to zero.
| mabub24 wrote:
| It will probably change the types of things a programmer
| does, and what it looks like to be a programmer. The nitty
| gritty of code _writing_ will probably get more and more
| automated. But the architecture of the code, and
| establishing and selecting it 's purpose in the larger
| scheme of a business, will probably be more what
| programmers do. Essentially, they might just become
| managers for automated code writers, similar to the
| military's idea of future fighter pilots relating to
| autonomous fighters/drones as described in this article:
|
| https://www.newyorker.com/magazine/2022/01/24/the-rise-of-
| ai...
|
| Maybe. It might never get to that level though.
| solarmist wrote:
| Yup, I think that's it exactly. I just described this in
| another comment as a reverse of the evolution that
| graphic design has undergone in bringing them into
| programming front-ends.
|
| I can't wait to see how far we're able to go down that
| path.
| TSiege wrote:
| I have a feeling this is the correct read in terms of
| progression. But I'm skeptical if it'll ever be able to
| synthesize a program entirely. I imagine that in the future
| we'll have some sort of computer language more like written
| language that will be used by some sort of AI to generate
| software to meet certain demands, but might need some manual
| connections when requirements are hazy or needs a more human
| touch in the UI/UX
| Veedrac wrote:
| > But I'm skeptical if it'll ever be able to synthesize a
| program entirely.
|
| Emotional skepticism carries a lot more weight in worlds
| where AI isn't constantly doing things that are meant to be
| infeasible, like coming 54th percentile in a competitive
| programming competition.
|
| People need to remember that AlexNet is 10 years old. At no
| point in this span have neural networks stopped solving
| things they weren't meant to be able to solve.
| solarmist wrote:
| I feel like you're taking that sentence a bit too
| literally. I read it as "I'm skeptical that AI will ever
| be able to take a vague human description from a product
| manager/etc. and solve it without an engineer-type person
| in the loop." The issue is humans don't know what they
| want and realistically programs require a lot of
| iteration to get right, no amount of AI can solve that.
|
| I agree with you; it seems obvious to me that once you
| get to a well-specified solution a computer will be able
| to create entire programs that solve user requirements.
| And that they'll start small, but expand to larger and
| more complex solutions over time in the same way that no-
| code tools have done.
| Hgsb wrote:
| Google Ambiguity.
| sharemywin wrote:
| To me it's not about it's current capabilities. It's the
| trajectory. This tech wasn't even a thing 2 years ago. There's
| billions being poured into it and every time someone uses these
| tools there's more free training data.
| chongli wrote:
| _repetitive code like this.foo = foo; this.bar = bar etc..._
|
| This sort of boilerplate code is best solved by the programming
| language. Either via better built-in syntax or macros. Using an
| advanced machine learning model to generate this code is both
| error-prone and a big source of noise and code bloat. This is
| not an issue that will go away with better tooling; it will
| only get worse.
| xmprt wrote:
| I don't think I agree. Most people spend more time reading
| than writing code so programming languages should be
| optimized to be easier to read whereas tooling should be made
| to simplify writing code. New syntax or macros sounds like it
| would make the language harder to read. I agree that an
| advanced machine learning model for generating boilerplate
| code isn't the right approach but I also don't think we
| should extend languages for this. Tooling like code
| generators and linters are a good middle ground.
| RangerScience wrote:
| FYI+IMO: Both Ruby and Scala have excellent ways to reduce
| these issues that occur at the language level, and make it
| easier to both read and write. I don't know either way if
| that means you should extend languages to handle it, but at
| least it's definitively possible to write the language that
| way from the beginning.
|
| Otherwise yup, agree with you; ML for problematic
| boilerplate isn't the right approach, but other code
| generators and linters are really good and get you most of
| the way there.
| orangecat wrote:
| _New syntax or macros sounds like it would make the
| language harder to read._
|
| Often the opposite is true. For example Java records are
| far easier to read and understand than the pages of
| boilerplate that they replace.
| valyagolev wrote:
| it is a very similar argument to the one for powerful IDEs
| and underwhelming languages. to be fair, it's not necessarily
| fruitless - e.g. with smalltalk. i fail to see the analoguous
| smalltalk-style empowerment of language using AI but perhaps
| something is there.
|
| anyway. programming is automation; automation of programming
| is abstraction. using AI to write your code is just a bad
| abstraction - we are used to them
| jxcole wrote:
| I feel like you are very defensive here and I want to be sure
| we take time to recognize this as a real accomplishment.
|
| Seriously though, I do doubt I can be fully replaced by a robot
| any time soon, it may be the case that soon enough I can make
| high-level written descriptions of programs and hand them off
| to an AI to do most of the work. This wouldn't completely
| replace me, but it could make developers 50x productive. The
| question is how elastic is the market...can the market grow in
| step with our increase in productivitiy?
|
| Also, please remember that as with anything, within 5 years we
| should see vast improvements to this AI. I think it will be an
| important thing to watch.
| nsxwolf wrote:
| Yesterday, I spent several hours figuring out if the business
| requirement for "within the next 3 days" meant 3 calendar
| days or 72 hours from now. Then about 10 minutes actually
| writing the code. Everyone thought my efforts were very
| valuable.
| RangerScience wrote:
| 100%. What makes us what we are is the mindset (in this
| case, this kind of "attention to detail); that didn't
| change with (first) compilers, (then) scripting languages,
| or (future?) AI-assisted programming.
|
| PS - Lawyers aren't even as detail-oriented as we are, it's
| surprising.
| solarmist wrote:
| Really?
|
| Maybe that's true in general because the spread in skill
| for being able to make a living as a lawyer and the same
| as a programmer depends far less on that attention to
| detail being a core skill. Still, I wonder if that also
| holds at the high levels of the profession. I get the
| impression that at the FAANG-level, lawyers would compare
| pretty favorably to programmers in detail orientation. In
| particular, patent and contract law.
|
| That said, it's just my general impression of what
| lawyers get up to.
|
| ...Hmm, thinking about the contract law thing a bit more.
| Yeah, I do believe you are right. Lawyers aren't writing
| nearly as many extremely detail-oriented texts as
| programmers are on a day-to-day basis. Their jobs are
| much more around finding, reading, and understanding
| those things and building stories around them.
| visarga wrote:
| The GPT family has already shown more than 50x productivity
| increase by being able to solve not one, but hundreds and
| perhaps thousands of tasks on the same model. We used to need
| much more data, and the model would be more fragile, and
| finding the right architecture would be a problem. Now we
| plug a transformer with a handful of samples and it works.
|
| I just hope LMs will prove to be just as useful in software
| development as they are in their own field.
| thomasahle wrote:
| If you make developers 50x more efficient, won't you need 50x
| fewer developers?
| bmh100 wrote:
| Not necessarily. Demand may be much higher than available
| supply right now. Tech companies will continue to compete,
| requiring spending on developers to remain competitive.
| Software is unlike manufacturing, in that the output is a
| service, not a widget. Worker productivity in general has
| not decreased the demand for full work weeks, despite
| projections in the early 20th century to the contrary. Of
| course, it is possible that fewer developers would be
| needed, but I don't think it's likely, yet.
| alasdair_ wrote:
| >If you make developers 50x more efficient, won't you need
| 50x fewer developers?
|
| Developers today are 50X more efficient than when they had
| to input machine code on punched tape, yet the number of
| developers needed today is far larger than it was in those
| times.
| throw10920 wrote:
| There's no reason to believe that we'll need _another_
| 50x more developers, though.
| solarmist wrote:
| There isn't? I feel like there's still a ton of places
| software hasn't even touched and not because it doesn't
| make sense, but because no one's gotten to it. It's not
| the most profitable thing people could write software
| for.
| alasdair_ wrote:
| Even if not, the original claim was that we may see a 50X
| _decrease_ and I personally don 't think that is likely,
| pre-Singularity anyway :)
| qualudeheart wrote:
| But think how large of a job program that would have
| been.
|
| Hundreds of people manually writing assembly and paid
| middle class wages. Not a compiler in sight.
|
| In the years leading up to the singularity I'd expect to
| see a lot of Graeberian "Bullshit Jobs".
|
| Everyone knows they're BS but as a society we allow them
| because we aren't willing to implement socialism or UBI.
| woadwarrior01 wrote:
| https://en.m.wikipedia.org/wiki/Jevons_paradox
| kevlened wrote:
| Greater efficiency leads to greater consumption unless
| demand is saturated. Given software's ability to uncover
| more problems that are solvable by software, we're more
| likely to build 50x more software.
| RangerScience wrote:
| This happened with the introduction of power tools to set
| building in Hollywood back in the day - literally this same
| question.
|
| People just built bigger sets, and smaller productions
| became financially feasible. Ended up creating demand, not
| reducing it.
| 0xdeadbeefbabe wrote:
| > but it could make developers 50x productive
|
| More likely it will translate the abstraction level by some
| vector of 50 elements.
| blt wrote:
| I am always surprised by the amount of skepticism towards deep
| learning on HN. When I joined the field around 10 years ago,
| image classification was considered a grand challenge problem
| (e.g. https://xkcd.com/1425/). 5 years ago, only singularity
| enthusiast types were envisioning things like GPT-3 and Copilot
| in the short term.
|
| I think many people are uncomfortable with the idea that their
| own "intelligent" behavior is not that different from pattern
| recognition.
|
| I do not enjoy running deep learning experiments. Doing resource-
| hungry empirical work is not why I got into CS. But I still
| believe it is very powerful.
| jonas_kgomo wrote:
| Genuine question, what are the reasons to be a software engineer
| without much ML knowledge in 2022. Seems like a wake up call for
| developers
| eulers_secret wrote:
| > what are the reasons to be a software engineer without much
| ML knowledge in 2022.
|
| I'm not quite sure what you're asking, but my reason is that I
| _do not enjoy_ working on /with ML. I'd personally rather quit
| the industry.
|
| But I work in embedded/driver development. I do not worry about
| ML models replacing me yet, but if I were just gluing together
| API calls I would be a bit worried and try to specialize.
| qualudeheart wrote:
| Find something that's hard and interesting. Someone will
| probably have a business trying to solve it and will hire you.
| jonas_kgomo wrote:
| 7 months ago, I asked natfriedman the same question, of which
| he responded: "We think that software development is entering
| its third wave of productivity change. The first was the
| creation of tools like compilers, debuggers, garbage
| collectors, and languages that made developers more productive.
| The second was open source where a global community of
| developers came together to build on each other's work. The
| third revolution will be the use of AI in coding. The problems
| we spend our days solving may change. But there will always be
| problems for humans to solve."
|
| https://news.ycombinator.com/item?id=27676266&p=2
| slingnow wrote:
| Genuine question: what are the reasons to be a carpenter
| without much robotics / automation knowledge in 2022. Seems
| like a wakeup call for carpenters.
| 0xdeadbeefbabe wrote:
| I hope you are right, but just to answer the question: all
| those other AI winters.
| jonas_kgomo wrote:
| Thats a good meditation. I think the winters were more driven
| by research dichotomy, for example Marvin Minsky's critique
| of the perceptron really slowed the research by 10 years.
| Advances made thus far have too much commercial relevance
| that companies invested dont look like they are gonna stop
| soon. But its a valid point. Looks like there is more upside
| being in subsets of computing like quantum computing, web3,
| metaverse etc than being a regular front-end engineer
| agentultra wrote:
| This is kind of neat. I wonder if it will one day be possible for
| it to find programs that maintain invariant properties we state
| in proofs. This would allow us to feel confident that even though
| it's generating huge programs that do weird things a human might
| not think of... well that it's still _correct_ for the stated
| properties we care about, ie: that it 's not doing anything
| underhanded.
| jdrc wrote:
| I think it would be interesting the train a system end-to-end
| with assembly code instead of various programming languages. This
| would make it a much more generic compiler
| ahgamut wrote:
| I find almost every new advance in deep learning is accompanied
| by contrasting comments: it's either "AI will soon automate
| programming/<insert task here>", or "let me know when AI can
| actually do <some-difficult-task>". There are many views on this
| spectrum, but these two are sure to be present in every comment
| section.
|
| IIUC, AlphaCode was trained on Github code to solve competitive
| programming challenges on Codeforces, some of which are
| "difficult for a human to do". Suppose AlphaCode was trained on
| Github code that contains the entire set of solutions on
| Codeforces, is it actually doing anything "difficult"? I don't
| believe it would be difficult for a human to solve problems on
| Codeforces when given access to the entirety of Github (indexed
| and efficiently searchable).
|
| The general question I have been trying to understand is this: is
| the ML model doing something that we can _quantify_ as
| "difficult to do (given this particular training set)"? I would
| like to compute a number that measures how difficult it is for a
| model to do task X given a large training set Y. If the X is part
| of the training set, the difficulty should be _zero_. If X is
| obtained only by combining elements in the training, maybe it is
| harder to do. My efforts to answer this question:
| https://arxiv.org/abs/2109.12075
|
| In recent literature, the RETRO Transformer
| (https://arxiv.org/pdf/2112.04426.pdf) talks about "quantifying
| dataset leakage", which is related to what I mentioned in the
| above paragraph. If many training samples are also in the test
| set, what is the model actually learning?
|
| Until deep learning methods provide a measurement of
| "difficulty", it will be difficult to gauge the prowess of any
| new model that appears on the scene.
| pedrosorio wrote:
| > Suppose AlphaCode was trained on Github code that contains
| the entire set of solutions on Codeforces, is it actually doing
| anything "difficult"?
|
| They tested it on problems from recent contests. The
| implication being: the statements and solutions to these
| problems were not available when the Github training set was
| collected.
|
| From the paper [0]: "Our pre-training dataset is based on a
| snapshot of selected public GitHub repositories taken on
| 2021/07/14" and "Following our GitHub pre-training dataset
| snapshot date, all training data in CodeContests was publicly
| released on or before 2021/07/14. Validation problems appeared
| between 2021/07/15 and 2021/09/20, and the test set contains
| problems published after 2021/09/21. This temporal split means
| that only information humans could have seen is available for
| training the model."
|
| At the very least, even if some of these problems had been
| solved exactly before, you still need to go from "all of the
| code in Github" + "natural language description of the problem"
| to "picking the correct code snippet that solves the problem".
| Doesn't seem trivial to me.
|
| > I don't believe it would be difficult for a human to solve
| problems on Codeforces when given access to the entirety of
| Github (indexed and efficiently searchable).
|
| And yet, many humans who participate in these contests are
| unable to do so (although I guess the issue here is that Github
| is not properly indexed and searchable for humans?).
|
| [0] https://storage.googleapis.com/deepmind-
| media/AlphaCode/comp...
| ahgamut wrote:
| > They tested it on problems from recent contests. The
| implication being: the statements and solutions to these
| problems were not available when the Github training set was
| collected.
|
| Yes, and I would like to know how similar the dataset(s)
| were. Suppose the models were trained only on greedy
| algorithms and then I provided a dynamic programming problem
| in the test set, (how) would the model solve it?
|
| > And yet, many humans who participate in these contests are
| unable to do so (although I guess the issue here is that
| Github is not properly indexed and searchable for humans?).
|
| Indeed, so we don't know what "difficult" means for
| <human+indexed Github>, and hence we cannot compare it to
| <model trained on Github>.
|
| My point is, whenever I see a new achievement of deep
| learning, I have no frame of reference (apart from my
| personal biases) of how "trivial" or "awesome" it is. I would
| like to have a quantity that measures this - I call it
| generalization difficulty.
|
| Otherwise the datasets and models just keep getting larger,
| and we have no idea of the full capability of these models.
| pedrosorio wrote:
| > Suppose the models were trained only on greedy algorithms
| and then I provided a dynamic programming problem in the
| test set, (how) would the model solve it?
|
| How many human beings do you personally know who were able
| to solve a dynamic programming problem at first sight
| without ever having seen anything but greedy algorithms?
|
| Deepmind is not claiming they have a machine capable of
| performing original research here.
|
| Many human programmers are unable to solve DP problems even
| after having them explained several times. If you could get
| a machine that takes in all of Github and can solve "any"
| DP problem you describe in natural language with a couple
| of examples, that is AI above and beyond what many humans
| can do, which is "awesome" no matter how you put it.
| sibeshk96 wrote:
| > that is AI above and beyond what many humans can do,
| which is "awesome" no matter how you put it.
|
| That's not the point being made. The point OP is making
| is that it is not possible to understand how impressive
| at "generalizing" to uncertainty a model is if you don't
| know how different the training set is from the test set.
| If they are extremely similar to each other, then the
| model generalizes weakly (this is also why the world's
| smartest chess bot needs to play a million games to beat
| the average grandmaster, who has played less than 10,000
| games in her lifetime). Weak generalization vs strong
| generalization.
|
| Perhaps all such published results should contain info
| about this "difference" so it becomes easier to judge the
| model's true learning capabilities.
| ahgamut wrote:
| > How many human beings do you personally know who were
| able to solve a dynamic programming problem at first
| sight without ever having seen anything but greedy
| algorithms?
|
| Zero, which is why if a trained network could do it, that
| would be "impressive" to me, given my personal biases.
|
| >. If you could get a machine that takes in all of Github
| and can solve "any" DP problem you describe in natural
| language with a couple of examples, that is AI above and
| beyond what many humans can do, which is "awesome" no
| matter how you put it.
|
| I agree with you that such a machine would be awesome,
| and AlphaCode is certainly a great step closer towards
| that ideal. However, I would like to have a number
| measures the "awesomeness" of the machine (not elo rating
| because that depends on a human reference), so I will
| have something as a benchmark to refer to when the next
| improvement arrives.
| pedrosorio wrote:
| I understand wanting to look at different metrics to
| gauge progress, but what is the issue with this?
|
| > not elo rating because that depends on a human
| reference
| sibeshk96 wrote:
| Using my previous chess analogy, the world's smartest
| chess bot has played a million games to beat the average
| grandmaster, who has played less than 10,000 games in her
| lifetime. So while they both will have the same elo
| rating, which is a measure of how well they are at the
| narrow domain of chess, there is clearly something
| superior about the how the human grandmaster learns from
| just a few data points i.e. strong generalization vs the
| AI's weak generalization. Hence the task-specific elo
| rating does not give enough context to understand how
| well a model adapts to uncertainty. For instance - a
| Roomba would beat a human hands down if there was an elo
| rating for vacuuming floors.
| ahgamut wrote:
| The Turing Test
| (https://en.wikipedia.org/wiki/Turing_test) for
| artificial intelligence required the machine to convince
| a human questioner that it was a human. Since then, most
| AI methods rely on a human reference of performance to
| showcase their prowess. I don't find this appealing
| because:
|
| 1) It's an imprecise target: believers can always hype
| and skeptics can always downplay improvements. Humans can
| do lots of different things somewhat well at the same
| time, so a machine beating human-level performance in one
| field (like identifying digits) says little about other
| fields (like identifying code vulnerabilities).
|
| 2) ELO ratings, or similar metrics are measurements of
| _skill_ , and can be brute-forced to some extent,
| equivalent to grinding up levels in a video game. Brute-
| forcing a solution is "bad", but how do we know a new
| method is "better/more elegant/more efficient"? For
| algorithms we have Big-O notation, so we know (brute
| force < bubble sort < quick sort), perhaps there is an
| analogue for machine learning.
|
| I would like performance comparisons that focus on
| quantities unique to machines. I don't compare the
| addition of computer processors with reference to human
| addition, so why not treat machine intelligence
| similarly?
|
| There are many interesting quantities with which we can
| compare ML models. Energy usage is a popular metric, but
| we can also compare the structure of the network, the
| code used, the hardware, the amount of training data, the
| amount of training time, and the similarity between
| training and test data. I think a combination of these
| would be useful to look at every time a new model
| arrives.
| mwattsun wrote:
| Seems to me that this accelerates the trend towards a more
| declarative style of programming where you tell the computer what
| you want to do, not how to do it
| aidenn0 wrote:
| > Creating solutions to unforeseen problems is second nature in
| human intelligence
|
| If this is true then a lot of the people I know lack human
| intelligence...
| algon33 wrote:
| How suprising did you guys find this? I'd have said there was a
| 20% chance of this performing at the median+level if I was asked
| to predict things beforehand.
| Isinlor wrote:
| There is a prediction market called Metaculus.
|
| On Dec 31, 2016 in partnership with Center for the Study of
| Existential Risk, Machine Intelligence Research Institute, and
| The Future of Life Institute they asked:
|
| How long until a machine-learning system can take a simple text
| description and turn it into a program coded in C/Python?
|
| https://www.metaculus.com/questions/405/when-will-programs-w...
|
| First 19 forecasters in March 2017 were predicting mid-2021,
| the best forecasters were predicting late 2024. When the
| question closed in 2020 the community was predicting January
| 2027 and the best forecasters were predicting March 2030.
|
| The question resolved on July 2021 when Codex was published.
|
| Community and the best forecasters were assigning ~15% that it
| will happen by July 2021.
|
| I'm currently 14th best forecaster there and I was predicting
| 33% before July 2021. It was my last prediction, and it was
| made on October 2018.
|
| I'm also predicting 75% that we will have AGI by 2040 as
| defined in this question:
|
| https://www.metaculus.com/questions/3479/when-will-the-first...
|
| 20% that it will happen before 2030.
|
| There is also stronger operationalization:
|
| https://www.metaculus.com/questions/5121/when-will-the-first...
|
| My prediction here is 60% before 2040 and 5% before 2030.
|
| I have also "canary in the coal mine" questions:
|
| When will AI achieve competency on multi-choice questions
| across diverse fields of expertise? Community predicts 50%
| before 2030, I agree.
|
| https://www.metaculus.com/questions/5276/ai-competence-in-di...
|
| When will AI be able to learn to play Montezuma's Revenge in
| less than 30 min? Community predicts 50% before 2025, I think
| 50% before 2027.
|
| https://www.metaculus.com/questions/5460/ai-rapidly-learning...
| baobabKoodaa wrote:
| I would have said there is a ~0% chance of this happening
| within our lifetimes.
| hackinthebochs wrote:
| I didn't find it very surprising, but then I tend to be more
| optimistic than average about the capabilities of transformer
| models and the prospect of general AI in the relatively near
| term.
| machiaweliczny wrote:
| I am surprised, as recently OpenAI had ~25% of easy problems
| and ~2% in competitive problems. Seems like DeepMind is ahead
| in this topic as well.
|
| Actually I think Meta AI had some interesting discovery
| recently that could possibly improve NNs in genral, so probably
| this as well.
|
| I am not in field but wonder if some other approaches like
| Tsetlin machines would be more useful for programming.
| marcusbuffett wrote:
| I would have guessed around the same chance, this was
| surprising to me after playing around with copilot and not
| being impressed at all.
| knowmad wrote:
| I agree with most of the comments I've read in this thread.
| Writing code to solve a well defined narrowly scoped problem
| isn't that hard or valuable. It's determining what the problem
| actually is and how software could be used to solve it that is
| challenging and valuable.
|
| I would really like to see more effort in the AI/ML code
| generation space being put into things like code review, and
| system observation. It seems significantly more useful to use
| these tools to augment human software engineers rather than
| trying to tackle the daunting and improbable task of completely
| replacing them.
|
| *Note: as a human software engineer I am biased
| [deleted]
| FemmeAndroid wrote:
| This is extremely impressive, but I do think it's worth noting
| that these two things were provided:
|
| - a very well defined problem. (One of the things I like about
| competitive programming and the like is just getting to implement
| a clearly articulated problem, not something I experience on most
| days.) - existing test data.
|
| This is definitely a great accomplishment, but I think those two
| features of competitive programming are notably different than my
| experience of daily programming. I don't mean to suggest these
| will always be limitations of this kind of technology, though.
| baobabKoodaa wrote:
| > One of the things I like about competitive programming and
| the like is just getting to implement a clearly articulated
| problem
|
| English versions of Codeforces problems may be well-defined but
| they are often very badly articulated and easy to misunderstand
| as a human reader. I still can't understand how they got AI to
| be able to generate plausible solutions from these problem
| statements.
| jakub_g wrote:
| 100% agree. Someone (who?) had to take time and write the
| detailed requirements. In real jobs you rarely get good tickets
| with well defined expectations; it's one of most important
| developer's jobs to transform fuzzy requirement into a good
| ticket.
|
| (Side note: I find that many people skip this step, and go
| straight from fuzzy-requirement-only-discussed-on-zoom-with-Bob
| to code; open a pull request without much context or comments;
| and then a code reviewer is supposed to review it properly
| without really knowing what problem is actually being solved,
| and whether the code is solving a proper problem at all).
| jensensbutton wrote:
| Maybe the problem transformation will be both the beginning
| _and_ end of the developer's role.
| ctoth wrote:
| So what happens when OpenAI releases TicketFixer 0.8 which
| synthesizes everything from transcripts of your meetings to
| the comments to the JIRA ticket to the existing codebase and
| spits out better tickets to feed into the programming side?
| solarmist wrote:
| Yup, I hope that'll happen. Then engineers would just end
| up being done at a higher level of abstraction closer to
| what designers do with wireframes and mockups.
|
| Kind of the opposite of the way graphic design has evolved.
| Instead of getting more involved in the process and, in
| many cases, becoming front-end developers, it'll become
| more abstract where humans make the decisions and reason
| about what to include/exclude, how it'll flow, etc.
|
| Even TicketFixer wouldn't be able to do more than offer a
| handful of possible solutions to design-type issues.
| bmhin wrote:
| Yeah, we need our TicketFixer to also include the No_Bob
| 0.2 plugin that figures out that a decent percentage of
| the time whatever "Bob" is asking for in that meeting is
| not what "Bob" thinks he is asking for or should be
| asking for and can squash those tickets. Without that
| we're gonna somehow end up with spreadsheets in
| everything.
| solarmist wrote:
| Haha, yeah, there's that, but there are also things like
| "adding a dark mode." There are a dozen ways to
| accomplish that kind of thing, and every company's
| solution will diverge when you get down to the details.
| jakub_g wrote:
| Take my money.
| machiaweliczny wrote:
| But it's easy to create AI conversation that will refine
| problem.
| ohwellhere wrote:
| Is the next step in the evolution of programming having the
| programmer become the specifier?
|
| Fuzzy business requirements -> programmer specifies and
| writes tests -> AI codes
| buscoquadnary wrote:
| That's all we've ever been since we invented software.
|
| First we specified the exact flow of the bits with punch
| cards.
|
| Then we got assembly and we specified the machine
| instructions.
|
| Then we got higher level languages and we specified how the
| memory was to be managed and what data to store where.
|
| Now we have object oriented languages that allow us to work
| with domain models, and functional languages that allow us
| to work data structures and algorithms.
|
| The next level may be writing business rules, and
| specifying how services talk to each other, who knows, but
| it will be no different than it is now just a higher level.
| chinabot wrote:
| If its anything like my job
|
| while(1) { Fuzzy business requirements -> programmer
| specifies and writes tests -> AI codes }
| e4e78a06 wrote:
| I don't think it's quite as impressive as you make it out to
| be. Median performance in a Codeforces programming competition
| is solving the easiest 1-2 problems out of 5-6 problems. Like
| all things programming the top 1% is much, much better than the
| median.
|
| There's also the open problem of verifying correctness in
| solutions and providing some sort of flag when the model is not
| confident in its correctness. I give it another 5 years in the
| optimistic case before AlphaCode can reliably compete at the
| top 1% level.
| ctoth wrote:
| This is technology that simply didn't exist in any form 2
| years ago. For no amount of money could you buy a program
| that did what this one does. Having been watching the growth
| of Transformer-based models for a couple years now really has
| hammered home that just as soon as we figure out how an AI
| can do X, X is no longer AI, or at least no longer
| impressive. How this happens is with comments like yours, and
| I'd really like to push back against it for once. Also 5
| years? So assuming that we have all of the future ahead of
| us, to think that we only have 5 years left of being the top
| in programming competitions seems like it's somehow important
| and shouldn't be dismissed with "I don't think it's quite as
| impressive as you make it out to be."
| BobbyJo wrote:
| I don't think that's what happening. Let's talk about this
| case: programming. It's not that people are saying "an AI
| programming" isn't impressive or isn't AI, it's that when
| people say "an AI programming" they aren't talking about
| ridiculously controlled environments like in this case.
|
| It's like self-driving cars. A car driving itself for the
| first time in a controlled environment, I'm sure, was an
| impressive feat, and it wouldn't be inaccurate to call it a
| self-driving car. However, that's not what we're all
| waiting for when we talk about the arrival of self-driving
| cars.
| ctoth wrote:
| And if AI programming were limited to completely
| artificial contexts you would have a point, though I'd
| still be concerned. We live in a world, however, where
| programmers routinely call on the powers of an AI to
| complete their real code and get real value out of it.
| This is based on the same technology that brought us this
| particular win, so clearly this technology is useful
| outside "ridiculously controlled environments."
| Retric wrote:
| Programmers do setup completely artificial contexts so AI
| can work.
|
| None of the self driving systems where setup by giving
| the AI access to sensors, a car, and the drivers handbook
| and saying well you figure it out from there. The general
| trend is solve this greatly simplified problem, this more
| complex one, up to dealing with the real world.
| ctoth wrote:
| By AI programming I mean the AI doing programming, not
| programming the AI. Though soon enough the first will be
| doing the second and that's where the loop really
| closes...
| BobbyJo wrote:
| That's not significantly different than how programming
| has worked for the last 40 years though. We slowly push
| certain types of decisions and tasks down into the tools
| we use, and what's left over is what we call
| 'programming'. It's cool, no doubt, but as long as
| companies need to hire 'prorammers', then it's not the
| huge thing we're all looking out over the horizon waiting
| for.
| YeGoblynQueenne wrote:
| >> This is technology that simply didn't exist in any form
| 2 years ago.
|
| A few examples of neural program synthesis from at least 2
| years ago:
|
| https://sunblaze-ucb.github.io/program-synthesis/index.html
|
| Another example from June 2020:
|
| _DreamCoder: Growing generalizable, interpretable
| knowledge with wake-sleep Bayesian program learning_
|
| https://arxiv.org/abs/2006.08381
|
| RobustFill, from 2017:
|
| _RobustFill: Neural Program Learning under Noisy I /O_
|
| https://www.microsoft.com/en-us/research/wp-
| content/uploads/...
|
| I could go on.
|
| And those are only examples from neural program synthesis.
| Program synthesis, in general, is a field that goes way
| back. I'd suggest as usual not making big proclamations
| about its state of the art without being acquainted with
| the literature. Because if you don't know what others have
| done every announcement by DeepMind, OpenAI et al seems
| like a huge advance... when it really isn't.
| qualudeheart wrote:
| Has someone tried classical program synthesis techniques
| on competitive programming problems? I wonder what would
| have been possible with tech from more than 2 years ago.
| YeGoblynQueenne wrote:
| I don't know if anyone has tried it, but it's not a very
| objective evaluation. We have no good measure of the
| coding ability of the "median level competitor" so doing
| better or worse than that, doesn't really tell us
| anything useful about the coding capability of an
| automated system.
|
| So my hunch is that it probably hasn't been done, or
| hasn't been done often, because the program synthesis
| community would recognise it's pointless.
|
| What you really want to look at is formal program
| synthesis benchmarks and how systems like AlphaCode do on
| them (hint: not so good).
| ctoth wrote:
| Of course program synthesis has been a thing for years, I
| remember some excellent papers out of MSR 10 years ago.
| But which of those could read a prompt and build the
| program from the prompt? Setting up a whole bunch of
| constraints and having your optimizer spit out a program
| that fulfills them is program synthesis and is super
| interesting, but not at all what I think of when I'm told
| we can make the computer program for us. For instance,
| RobustFill takes its optimization criteria from a bundle
| of pre-completed inputs and outputs of how people want
| the program to behave instead of having the problem
| described in natural language and creating the solution
| program.
| YeGoblynQueenne wrote:
| Program synthesis from natural language specifications
| has existed for many years, also. It's not my specialty
| (neither am I particularly interested in it), but here's
| a paper I found from 2017, with a quick search:
|
| https://www.semanticscholar.org/paper/Program-Synthesis-
| from...
|
| AlphaCode is not particularly good at it, either. In the
| arxiv preprint, besides the subjetive and pretty
| meaningless "evaluation" against human coders it's also
| tested on a formal program synthesis benchmark, the APPS
| dataset. The best performing AlphaCode variant reported
| in the arxiv preprint solves 25% of the "introductory"
| APPS tasks (the least challenging ones). All AlphaCode
| variants tested solve less than 10% of the "interview"
| and "competition" (intermediary and advanced) tasks.
| These more objective results are not reported in the
| article above, I think for obvious reasons (because they
| are extremely poor).
|
| So it's not doing anything radically new and it's not
| doing it particularlly well either. Please be better
| informed before propagating hype.
|
| Edit: really, from a technical point of view, AlphaCode
| is a brute-force, generate-and-test approach to program
| synthesis that was state-of-the-art 40 years ago. It's
| just a big generator that spams programs hoping it will
| hit a good one. I have no idea who came up with this.
| Oriol Vinyals is the last author and I've seen enough of
| that guy's work to know he knows better than bet on such
| a primitive, even backwards approach. I'm really shocked
| that this is DeepMind work.
| Jensson wrote:
| Top 1% competitive programming level means that it can start
| solving research problems, problem difficulty and creativity
| needed for problems goes up exponentially for harder problems
| and programming contests have lead to research papers before.
| It would be cool if we got there in 5 years but I doubt it.
| But if we got there it would revolutionize so many things in
| society.
| xorcist wrote:
| You don't think it's impressive, yet you surmise that a
| computer program could compete at a level of the top 1% of
| all humans _in_ _five_ _years_?
|
| That's wildly overstating the promise of this technology, and
| I'd be very surprised if the authors of this wouldn't agree.
| bricemo wrote:
| Agree. If an AI could code within the top 1%, every single
| person whose career touches code would have their lives
| completely upended. If that's only 5 years out...ooof.
| Groxx wrote:
| I do kinda wonder if it'd lead to as good results if you just
| did a standard "matches the most terms the most times" search
| against all of github.
|
| I have a suspicion it would - kinda like Stack Overflow,
| problems/solutions are not that different "in the small".
| It'd have almost certainly given us the fast square root
| trick verbatim, like Github's AI is doing routinely.
| YeGoblynQueenne wrote:
| >> AlphaCode ranked within the top 54% in real-world programming
| competitions, an advancement that demonstrates the potential of
| deep learning models for tasks that require critical thinking.
|
| Critical thinking? Oh, wow. That sounds amazing!
|
| Let's read further on...
|
| >> At evaluation time, we create a massive amount of C++ and
| Python programs for each problem, orders of magnitude larger than
| previous work. Then we filter, cluster, and rerank those
| solutions to a small set of 10 candidate programs that we submit
| for external assessment.
|
| Ah. That doesn't sound like "critical thinking", or any thinking.
| It sounds like massive brute-force guessing.
|
| A quick look at the arxiv preprint linked from the article
| reveals that the "massive" amount of prorgams generated is in the
| millions (see Section 4.4). These are "filtered" by testing them
| against program input-output (I/O) examples given in the problem
| descriptions. This "filtering" still leaves a few thousands of
| candidate programs that are further reduced by clustering to
| "only" 10 (which are finally submitted).
|
| So it's a generate-and-test approach rather than anything to do
| with reasoning (as claimed elsewhere in the article) let alone
| "thinking". But why do such massive numbers of programs need to
| be generated? And why are there still thousands of candidate
| programs left after "filtering" on I/O examples?
|
| The reason is that the generation step is constrained by the
| natural-language problem descriptions, but those are not enough
| to generate appropriate solutions because the generating language
| model doesn't understand what the problem descriptions mean; so
| the system must generate millions of solutions hoping to "get
| lucky". Most of those don't pass the I/O tests so they must be
| discarded. But there are only very few I/O tests for each problem
| so there are many programs that can pass them, and still not
| satisfy the problem spec. In the end, clustering is needed to
| reduce the overwhelming number of pretty much randomly generated
| programs to a small number. This is a method of generating
| programs that's not much more precise than drawing numbers at
| random from a hat.
|
| Inevitably, the results don't seem to be particularly accurate,
| hence the evaluation against programs written by participants in
| coding competitions, which is not any objective measure of
| program correctness. Table 10 on the arxiv preprint lists results
| on a more formal benchmar, the APPS dataset, where it's clear
| that the results are extremely poor (the best performing
| AlphaCode variant solves 20% of the "introductory" level
| problems, though outperforming earlier approaches).
|
| Overall, pretty underwhelming and a bit surpirsing to see such
| lackluster results from DeepMind.
| thomasahle wrote:
| Next they can train it on kaggle, and we'll start getting closer
| to the singularity
___________________________________________________________________
(page generated 2022-02-02 23:00 UTC)