[HN Gopher] Program Synthesis and Large Language Models
___________________________________________________________________
Program Synthesis and Large Language Models
Author : ChristofferSN
Score : 39 points
Date : 2024-12-12 08:56 UTC (3 days ago)
(HTM) web link (cacm.acm.org)
(TXT) w3m dump (cacm.acm.org)
| bjornsing wrote:
| I'm beginning to see LLMs more as a solution to the code reuse
| problem: they can implement anything that's common enough to
| appear a number of times in their training data.
|
| If you're hoping they will soon implement entirely novel complex
| systems based on a loose specification I think you'll be
| disappointed.
| danenania wrote:
| This kind of begs the question though in the "there's nothing
| new under the sun" sense. Maybe your particular task is novel,
| but are its component parts and algorithms novel? In some
| cases, perhaps some of them are, but I think that's more the
| exception than the rule.
|
| LLMs aren't just tape recorders. They can mix and match and
| adapt. What percentage of the code you write do you think is
| truly original?
| golly_ned wrote:
| This article compares the hardness of program synthesis to the
| utility of chatbots to show chatbots fall short.
|
| The difference is we don't expect chatbots to be 100% right but
| we do expect program synthesis to be 100% right. For chatbots,
| 99% is amazing in terms of utility. That other 1% is really hard
| to get.
|
| Given the limitations of English being able to robustly specify a
| program, thus requiring constraints for program synthesis to be
| formal descriptions of specifications, the author is committing a
| category error in comparing two incommensurable solutions to two
| distinct problems.
| jebarker wrote:
| > For chatbots, 99% is amazing in terms of utility.
|
| Is it really though? If it's 99% of generated tokens, then 1
| out of 100 being wrong is not great for code generation since
| you're often going to generate more than that. But let's
| suppose it's 1 in 100 whole functions or programs that's wrong.
| Unless there's a way to automatically verify when a function is
| wrong the error rate basically makes full automation, e.g.
| automatically fixing GitHub issues, out of the question.
|
| So it seems like at 99% we're left with a system that requires
| constant vigilance from the user to check it's work. This is a
| far cry from the promise of true program synthesis.
|
| What would be amazingly useful is a 99% reliable system that
| can tell you when it can't answer the question correctly
| instead of providing a wrong answer.
| simonw wrote:
| If you expect to be able to prompt a chatbot and then not
| TEST that the code it wrote for you works, you're going to
| have a bad time.
|
| If you're willing to put in that QA work yourself the
| productivity boost you can get them from is immense.
|
| And honestly, working with the best human programmer in the
| world won't guarantee that the software they produce is the
| exact software that you wanted. Details are always lost in
| the communication between the person who wants the software
| and the person who builds it
|
| What matters most is the dialog and speed of iteration that
| you can achieve to converge on the right product.
| jebarker wrote:
| I think the reason that I haven't felt the productivity
| boost is because my work is mostly debugging and fixing
| existing code in a complicated codebase. Rarely is it
| writing new greenfield code. LLMs have so far struggled to
| be useful in this setting as it's more like the QA part of
| the workflow you're suggesting.
| mondrian wrote:
| Tl;dr (a) program synthesis is provably hard and (b) English is
| too ambiguous as a specification language. Therefore we're
| nowhere close.
|
| Ultimately not a convincing argument. We could use this same
| argument to argue that a human cannot write a program to spec.
| That may be strictly true, but not interesting.
| lmeyerov wrote:
| It's frustrating whenever folks throw up proof complexity as why
| LLMs can't work. If most programs most people want to write can
| map into predictable & verifiable abstractions, or we recognize
| almost no software is verified to beginwith, we realize the world
| is already moving on irrespective of personal hobby horses here.
| Shocker: Much of the world we interact with every day already
| runs on PHP, JavaScript, and untyped Python that is not verified,
| not type checked, and has repos overflowing with bugs and CVEs.
|
| Prof. Dawn Song IMO has been articulating a more productive view.
| LLMs are generating half the new code on github anyways, so lean
| in: use this as an opportunity to make it easy for new code to
| use formal methods where before it would have been to hard.
| Progress will happen either way, and at least this way we have a
| shot at bringing the verifiability in to more user code.
| jebarker wrote:
| Do you have any links for Prof Song talking or writing about
| this? Sounds interesting
| qsort wrote:
| I don't think this is in direct contradiction with the article.
| This was written in response to ideas like:
|
| (a) the machine is so powerful that you can just tell it what
| to do and it does it with no supervision
|
| or
|
| (b) we can ditch programming languages and just use English as
| the "source of truth", like you would read, modify and check in
| the source code rather than compiled object code.
|
| (a) is very much yet to be seen, and (b) is just monumentally
| stupid. Productive uses of LLMs for code are either as a super-
| autocomplete or as a pair programmer, which doesn't make
| programming languages obsolete, much less computer science
| education as a whole. This is even called out in TFA:
|
| '''Even if chatbots should become a useful tool for
| programmers, their existence cannot mean the end of
| programming. It may well be the case that this combination of a
| built-in repository of code examples and the ability to easily
| retrieve code examples based on it can be used for generating
| ideas and for forms of "pair programming" as recently studied
| by Bird et al.2
|
| But this is not program synthesis, rather it gives software
| using LLMs the role of more modest and possibly useful tools.
| We are now seeing a flurry of interest in combining traditional
| approaches to program analysis with LLMs, and if anything, this
| points to the limitations inherent in both approaches.3,8'''
| lmeyerov wrote:
| I'm mostly responding to the middle third of the article that
| IMO interprets the word "correct" in an academic way that,
| IMO, is unanchored to how, to a rounding error, software is
| written:
|
| > "Even simple versions of the approach are intractable in
| the very precise sense of computational complexity theory if
| we are concerned about generating code that is correct."
| (Then 5 paragraphs about that being pspace-complete.)
|
| The notion of correctness used in the article is a verified
| property. My point is probably less than 1% of the
| programmers on this site have ever verified their code in
| that sense, especially since they left a university setting.
| Instead, most commercial developers write tests based on
| their domain knowledge as developers and some limited
| understanding of their company's business domain... if at
| all. Which LLMs can also do.
___________________________________________________________________
(page generated 2024-12-15 23:00 UTC)