[HN Gopher] Chunking 2M files a day for code search using syntax...
___________________________________________________________________
Chunking 2M files a day for code search using syntax trees
Author : kevinlu1248
Score : 58 points
Date : 2023-07-31 20:35 UTC (2 hours ago)
(HTM) web link (docs.sweep.dev)
(TXT) w3m dump (docs.sweep.dev)
| yding wrote:
| This is really cool and a much needed contribution to helping
| LLMs run better on large code bases.
| kevinlu1248 wrote:
| Thanks! Would love to see this algorithm in LlamaIndex.
| intalentive wrote:
| Next step is to train models directly on syntax trees. Higher
| probability of correct output.
| eldenring wrote:
| I'd guess these model's understand works more closely to people
| so encoding in text is more token efficient and things like
| comments help.
|
| Also syntax seems a lot easier to understand for them than
| semantics/logic. If you've used GPT-4 it almost never makes
| syntax errors. Logical errors on the other hand...
| kevinlu1248 wrote:
| From my experience, GPT-4 never makes syntax errors directly
| but when making edits to existing code it's harder to prevent
| these syntax errors from appearing. We used to add a second
| pass to check for these syntax errors.
|
| It also frequently makes undefined variables and the like,
| however.
| reitzensteinm wrote:
| Did you get rid of the second pass? I'm working on
| something quite similar and find a pass that inspects and
| rejects erroneous code to be a big boost to correctness.
| kevinlu1248 wrote:
| We got rid of it. Our new edit framework works around
| search-and-replace pairs, based on the idea from aider
| that looks something like:
|
| <<<< ORIGINAL old_code ==== new_code >>>> UPDATED
| kevinlu1248 wrote:
| That's interesting, I've seen a few papers about this. I'm
| personally curious about editing syntax trees using language
| models, since it would prevent syntax errors altogether.
| Zambyte wrote:
| John McCarthy was right
| kevinlu1248 wrote:
| This is interesting. I'm taking a read on this.
| [deleted]
| [deleted]
___________________________________________________________________
(page generated 2023-07-31 23:00 UTC)