[HN Gopher] CodeCompose: A large-scale industrial deployment of ...
___________________________________________________________________
CodeCompose: A large-scale industrial deployment of AI-assisted
code authoring
Author : azhenley
Score : 112 points
Date : 2023-06-03 13:38 UTC (9 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| m3kw9 wrote:
| Who else's is tired of these trillion dollar companies talk and
| talk and have no products?
| neuronexmachina wrote:
| It's a paper about a (for now) internal product:
|
| "In this paper we present CodeCompose, an AI-assisted code
| authoring tool developed and deployed at Meta internally.
| CodeCompose is based on the InCoder LLM that merges generative
| capabilities with bi-directionality. We have scaled up
| CodeCompose to serve tens of thousands of developers at Meta,
| across 10+ programming languages and several coding surfaces."
|
| Looks like the InCoder model it's based on can be downloaded
| here: https://huggingface.co/facebook/incoder-6B
| Veuxdo wrote:
| Does companies publishing papers about internal tools serve
| any purpose other than PR?
| chromatin wrote:
| There are many scholars working in industry (from software
| engineering to biotech) who believe in the ideals of
| information sharing and publication of research.
| [deleted]
| IshKebab wrote:
| Of course it does. They give lots of details about the
| tools and their use which is obviously helpful to anyone
| wanting to do something similar.
|
| A great example is
| https://www.uber.com/blog/research/keeping-master-green-
| at-s...
|
| I think maybe Gitlab Merge Trains predate it, but it was
| definitely influential.
| lmeyerov wrote:
| It was a useful read for me, esp seeing numbers on fine-
| tuning & use. We are piloting a DB analyst tool where users
| can use natural language to do DB queries, generate AI
| analyses, make interactive GPU charts, etc, so many nearby
| questions we think about a lot. Previously as a PhD
| publishing on program synthesis, most of our writeups were
| much smaller scale wrt live user evals, so all combiner...
| super cool to see.
|
| For FB... It probably helps keep the team valued internally
| + helps with retention & recruiting. For PhD trained types,
| this kind of paper is almost table stakes.
|
| Less obvious... FB has been laying off teams like this
| despite productivity ROI intuitions, so if I was there, I'd
| be careful to quantify current + future ROI - I'm sure
| there are key #'s not being shared.
| williamstein wrote:
| This paper says: "Customized for the organization:
| CodeCompose is fine-tuned on Meta's internal code
| repository, and is thus able to handle Meta-specific
| languages such as Hack and Flow." If you work at an org
| that might want to build their own LLM trained on their own
| internal code base, then the lessons of this paper would be
| of value to you.
| Veuxdo wrote:
| Makes sense. For Meta, though, by publishing papers like
| this are they hoping for something else other than PR? My
| only other guess would be attracting talent.
| bee_rider wrote:
| Lots of companies have research departments and release
| papers; the people working in these departments have some
| academic roots at least. The incentives for releasing
| papers are:
|
| * To raise your profile and reputation generally
|
| * The specific publish or perish incentives in academia
|
| * Because you really think you've done something
| interesting and novel, and want to share it with the
| world
|
| Only the middle one is removed when going to industry.
| alan-stark wrote:
| The abstract says _..we present metrics from our large-scale
| deployment of CodeCompose that shows its impact on Meta 's
| internal code authoring experience over a 15-day time window,
| where 4.5 million suggestions were made by CodeCompose.
| Quantitative metrics reveal that (i) CodeCompose has an
| acceptance rate of 22% across several languages, and (ii) 8% of
| the code typed by users of CodeCompose is through accepting code
| suggestions from CodeCompose. Qualitative feedback indicates an
| overwhelming 91.5% positive reception for CodeCompose._
|
| In other terms, out of 4.5 million suggestions about 80% were
| off, yet there is 91% positive reception. That's 3.6 million
| rejected suggestions that potentially distracted programmers from
| doing their work. Yet users are happy. Is there a contradiction
| in these figures?
| YetAnotherNick wrote:
| If you take random question from stack overflow, my guess is
| that 80% of them don't have correct answer, yet I am very happy
| stackoverflow exists.
| Mountain_Skies wrote:
| I've had Bing provide me with code from SO that was from the
| question, which was code that was explicitly stated to not
| work and the poster wanted to know what was wrong with it.
| Bing's AI didn't understand this and claimed it was a
| solution.
| afro88 wrote:
| Think of it like traditional code completion. It's mostly wrong
| but still useful. You either type through it, or tab/arrow to
| select the correct completion.
|
| AI code completion (like Github Copilot) is like this. Still a
| time saver overall, even with a low acceptance rate.
| alan-stark wrote:
| Reading these answers reminded me why I love HN - actually
| thoughtful perspectives :) Guess a lot boils down to two
| variables - (a) suggestion UX quality and (b) definition of
| 'rejection' event. I skimmed through the paper and it turns out
| that 91% figure is based on feedback from 70 people and
| anonymous feedback wasn't allowed. So, 'overwhelming 91%
| favorable' can be paraphrased to `64 people out of the total
| 16k user base said they liked it'. Would be interesting to see
| indirect metrics like retention on day 15.
| idiotsecant wrote:
| Quite an insightful comment. In an institution that large
| it's surprising there were only 64 brown nosers. I expect out
| of 16k captive audience employees you could probably get 64
| people to give a positive opinion of replacing paychecks with
| meta store scrip.
| seanmcdirmid wrote:
| A lot of time suggestions are provided but not used because you
| already knew the answer and typed fast enough not to take it.
| moonchrome wrote:
| It's easy to :
|
| - anticipate when the suggestions are likely to be useless and
| not even bother
|
| - scan the proposals to see if they are what you want in cases
| it's useful
|
| It's a boilerplate generator and you're happy when it saves you
| tedious mental effort.
| rychco wrote:
| I treat it the same way I do pre-LLM LSP suggestions, which is
| basically inline documentation lookup. 'Oh what was that
| function name for inserting something at the end? PushB- no,
| InsertAft- no, App - end! Yea that's it'
|
| In this case it gave me 3 suggestions but I only accepted 1. I
| could see this taking 5-10 suggestions for an LLM to when it's
| not something as straightforward as a function name. It's still
| very useful despite this low acceptance rate
| fnordpiglet wrote:
| I'd say it's hard to argue with the positive impression of the
| engineer using it. If they find it's suggestions helpful it's
| not a distraction, it's helpful.
|
| Using GitHub copilot daily I find it's suggestions often
| nonsense but interesting to see regardless. Often for
| boilerplate it's spot on and it saves me dozens of lines of
| typing. But it also suggests stuff on every key stroke many of
| which I just type through, similar to intellisense. Assuming
| Metas code thingy is better, I would find myself in that 91%,
| as I'm already there with what's available to the general
| public.
|
| My only gripe, fwiw, with copilot in vscode is it interferes
| with intellisense. Often I want to see the code completion from
| both, but copilot jumps in before intellisense and the
| intellisense never renders and I use it as an inline api
| reference. Sometimes it's so frustrating I have to turn off
| copilot. But, copilot is generally useful enough that I
| reenable it once I've understood the api stuff I'm unsure of.
| There's some escape backspace period dance I can do that
| sometimes let's intellisense win. I've not dug deeply enough
| into vscode configuration to know if there's some parameter to
| tweak the race conditions. I'd note that when intellisense
| renders first copilot still renders its suggestions but the
| other way doesn't work.
| pavlov wrote:
| I think the 8% number better explains why users were so
| overwhelmingly happy. Assuming the suggestions in general are
| not distractingly wrong, then 8% of code automatically written
| is a decent amount of time saved researching solutions.
| visarga wrote:
| Interesting that 91% find it useful but only 8% of the code
| is generated by LLM. This is even with a LLM tuned on the
| internal codebase. This will give a mild boost but not
| replace anyone.
| layer8 wrote:
| But only 22% are accepted for those 8%, which means that the
| 78% code suggestions that are not accepted correspond to an
| equivalent of over 28% of all code written. Not sure that
| having to spend the time evaluating an additional 28% of code
| in vain amounts to an overall win.
|
| Though I guess the success rates when using Stack Overflow
| aren't too dissimilar.
| cloudking wrote:
| Have you tried GitHub Copilot? You don't have to accept the
| code suggestions, so they don't really distract you or get in
| the way once you get used to the UX.
| tablatom wrote:
| I find them extremely distracting. Evaluating a suggestion
| is, for me, an entirely different mental process from the
| creative process I'm in the middle of. The tagline that
| copilot helps you stay in the flow is very much not my
| experience.
|
| I am well aware that others are having a different experience
| with it.
| bredren wrote:
| The Industrial Challenges section of the paper addresses
| specific areas of flow disruption they focused on.
|
| Some folks may never accept AI code completion /
| suggestions (like some prefer vim over modern IDEs) but at
| least people working on this stuff can describe points
| known to focus on.
| cloudking wrote:
| I've found I am naturally ignoring the large complex
| suggestions because they usually have mistakes, and
| accepting the small easy suggestions. I respect your
| experience though, to each their own.
| irthomasthomas wrote:
| Mine doesn't even make complex suggestions. I can't get
| it to suggest more than one line at a time. Wonder what's
| different? I'm on the beta.
| baq wrote:
| The thing can generate whole unit tests if you leave it a
| one-like description in a comment next to the function
| you want tested. It's actually amazing.
| cloudking wrote:
| For example, sometimes I'll start out with a code comment
| for a function, hit enter and the next line suggestion
| will be the entire function.
| wpride wrote:
| Everyone in this space seems to be building on the LSP and
| classic auto-complete in particular as their UI. But I've found
| this to be non ideal.
|
| - As mentioned in this paper I definitely do not want the AI
| suggestion crowding out a suggestion generated directly from the
| type bindings
|
| - I often do want the AI to write an entirely new block of
| boilerplate. To do this you have to write a comment string
| targeted at the AI, then delete this afterwards
|
| - Sometimes I'd just like the AI to explain to me what some code
| does without writing anything
|
| - This isn't something I always want on; I find myself turning
| the plugin on and off depending on the context
|
| Overall I think we need a novel UX to really unlock the AI's
| helpfulness
| anotherpaulg wrote:
| I have been enjoying a chat based AI coding modality. I built
| some tooling that gets rid of the need to cut & paste code
| between the chat and your files. This makes chatting about code
| changes much more ergonomic. My tool also integrates directly
| with git, which provides a safety net. It's easy to undo
| changes if the AI does something silly.
|
| Here are some chat transcripts that give a flavor of what it's
| like to code with AI this way:
|
| https://aider.chat/examples/
|
| My tool is open source, and currently only works if you have a
| gpt-4 api key.
| fnordpiglet wrote:
| In vscode the Genie extension does these things and you can
| provide your own contextual hooks with custom prompts. It's
| particularly good at explaining syntax and semantic errors.
| florbo wrote:
| This echoes my sentiment exactly. My biggest gripe is when type
| suggestions are replaced with AI suggestions, as I more often
| just want to auto-complete a method/attribute. I frequently
| find myself toggling AI suggestions via hotkey.
|
| As for the getting a suggestion by writing comments, an "insert
| from prompt" action perhaps, or just a separate prompt
| pane/popup/whatever-you-prefer combined with using good ol'
| copy+paste would suffice.
| stepanhruda wrote:
| Does it need to be that novel of a UX?
|
| If you want to know what some code does, just select it & hit a
| keyboard shortcut (or right click and choose explain from
| menu).
|
| If you want AI to write code for you, write a comment starting
| with a specific word, it suggests the implementation and you
| can choose to accept & replace the comment with it.
| rytill wrote:
| What kind of novel UX are you imagining?
| Animats wrote:
| Hm. It seems to be like automated Stack Overflow. Only 8% of the
| code comes from the AI system, but it's useful for getting
| examples of how to do something.
|
| Hallucination about API calls was reported as a problem. I've
| seen that one. There's an amusing, and seriously annoying,
| tendency for these systems to make up some plausible API call
| that does what you need, but doesn't exist. Maybe something
| should collect up such suggestions as proposals for new API
| calls.
| freeone3000 wrote:
| The future of API design -- "yes it would make SENSE if that
| existed, but it doesn't" => now it does
| ChatGTP wrote:
| It's a funny game because they all need their own clones of each
| model / product.
|
| Feels like tech is making billions but is a little lost ?
| zeedude wrote:
| Limit training to stackoverflow input and wham! we have automated
| modern programming ;)
| regularfry wrote:
| I would very much like a local code assist tool. Assuming
| integration with editors is my problem, what's best in class this
| week if a) I have a respectable GPU; b) I don't, and need CPU-
| only?
| wsxiaoys wrote:
| Check out https://github.com/TabbyML/tabby, which is fully
| self-hostable and comes with niche features.
|
| On M1/M2, it offers a convenient single binary deployment,
| thanks to Rust. You can find the latest release at
| https://github.com/TabbyML/tabby/releases/tag/latest
|
| (Disclaimer: I am the author)
| MisterAV wrote:
| On Visual Studio there's an extension (by Microsoft) called
| IntelliCode which is a small AI assistant that runs locally on
| the CPU. It doesn't come close to these new large GPU models
| but it's quite handy. It looks into what you're typing on the
| current line and the previous activity along with the current
| project and tries to predict the full line or even the same
| change on multiple lines if that makes sense.
| azhenley wrote:
| We recently published a paper on IntelliCode and share some
| of the usage numbers.
|
| https://austinhenley.com/pubs/Vaithilingam2023ICSE_IntelliCo.
| ..
|
| Disclaimer: I'm one of the co-authors.
| mormegil wrote:
| I have just installed Fauxpilot
| <https://github.com/fauxpilot/fauxpilot> (nVidia GPU-only) and
| it works... OK. Still evaluating and I'm basically sceptic on
| the whole concept, but... let's see.
| Filligree wrote:
| Nothing even comes close to copilot. I realise you said
| "local", but if you insist on that you're going to be
| disappointed.
| synthiq wrote:
| For anyone interested in related research, I used
| https://mirrorthink.ai to find some background on the state-of-
| the-art.
|
| (disclaimer: this is AI generated, but grounded on contents of
| papers, with real references, so I'd say it is still
| constructive)
|
| The state-of-the-art in code generation has seen significant
| advancements with the deployment of large language models (LLMs)
| in various code authoring tools. One such example is the study on
| GitHub Copilot, Amazon CodeWhisperer, and ChatGPT [1], which
| evaluates the code quality of these AI-assisted code generation
| tools. The study reveals that ChatGPT generates correct code
| 65.2% of the time, while GitHub Copilot and Amazon CodeWhisperer
| achieve 46.3% and 31.1% correctness, respectively. These results
| indicate that LLMs have made substantial progress in generating
| high-quality code, but there is still room for improvement.
|
| Other research in the field has explored various techniques to
| enhance code generation and assistance. For instance, RepoCoder
| [2] focuses on repository-level code completion by integrating
| code generation and retrieval models in an iterative paradigm.
| This approach considers the repository-level context, including
| customized information such as API definitions and identifier
| names, to improve code completion suggestions. Serenity [3]
| leverages library-based Python code analysis for code completion
| and automated machine learning. The authors explore the potential
| of data flow analysis produced by Serenity to improve code
| completion when combined with neural models.
|
| In addition to these advancements, the field has seen progress in
| incorporating contextual information into code completion models.
| The paper on enriching source code with contextual data [4]
| investigates the impact of incorporating contextual information
| on the performance of code completion models. The authors conduct
| an empirical study to analyze the effectiveness of this approach.
| These achievements, along with the advancements in LLMs,
| contribute to the ongoing progress in code generation and
| assistance. As the field continues to evolve, it is expected that
| AI-assisted tools will become increasingly sophisticated and
| effective in assisting developers with various aspects of the
| software development process.
|
| [1] Evaluating the Code Quality of AI-Assisted Code Generation
| Tools: An Empirical Study on GitHub Copilot, Amazon
| CodeWhisperer, and ChatGPT - 2023:
| https://arxiv.org/abs/2304.10778
|
| [2] RepoCoder: Repository-Level Code Completion Through Iterative
| Retrieval and Generation - 2023: https://arxiv.org/abs/2303.12570
|
| [3] Serenity: Library Based Python Code Analysis for Code
| Completion and Automated Machine Learning - 2023:
| https://arxiv.org/abs/2301.05108
|
| [4] Enriching Source Code with Contextual Data for Code
| Completion Models: An Empirical Study - 2023:
| https://arxiv.org/abs/2304.12269
| fabmilo wrote:
| I would like to work in this code copilot space, I think will be
| one of the fastest applications of LLms in the near future. I
| have been working on a tool to autogenerate docstrings from a
| python method in google format
| bolinfest wrote:
| If you want to skip the paper and watch the video:
| https://youtu.be/ANDJ0TKjyWw
|
| Disclaimer: I am the person in the video.
| muglug wrote:
| It was a great video, and a great paper.
|
| As someone who writes quite a lot of Hack, I'm selfishly
| interested in whether you plan to open-source this work (not
| the weights, obviously, but everything else).
___________________________________________________________________
(page generated 2023-06-03 23:00 UTC)