[HN Gopher] Chunking 2M files a day for code search using syntax...
       ___________________________________________________________________
        
       Chunking 2M files a day for code search using syntax trees
        
       Author : kevinlu1248
       Score  : 58 points
       Date   : 2023-07-31 20:35 UTC (2 hours ago)
        
 (HTM) web link (docs.sweep.dev)
 (TXT) w3m dump (docs.sweep.dev)
        
       | yding wrote:
       | This is really cool and a much needed contribution to helping
       | LLMs run better on large code bases.
        
         | kevinlu1248 wrote:
         | Thanks! Would love to see this algorithm in LlamaIndex.
        
       | intalentive wrote:
       | Next step is to train models directly on syntax trees. Higher
       | probability of correct output.
        
         | eldenring wrote:
         | I'd guess these model's understand works more closely to people
         | so encoding in text is more token efficient and things like
         | comments help.
         | 
         | Also syntax seems a lot easier to understand for them than
         | semantics/logic. If you've used GPT-4 it almost never makes
         | syntax errors. Logical errors on the other hand...
        
           | kevinlu1248 wrote:
           | From my experience, GPT-4 never makes syntax errors directly
           | but when making edits to existing code it's harder to prevent
           | these syntax errors from appearing. We used to add a second
           | pass to check for these syntax errors.
           | 
           | It also frequently makes undefined variables and the like,
           | however.
        
             | reitzensteinm wrote:
             | Did you get rid of the second pass? I'm working on
             | something quite similar and find a pass that inspects and
             | rejects erroneous code to be a big boost to correctness.
        
               | kevinlu1248 wrote:
               | We got rid of it. Our new edit framework works around
               | search-and-replace pairs, based on the idea from aider
               | that looks something like:
               | 
               | <<<< ORIGINAL old_code ==== new_code >>>> UPDATED
        
         | kevinlu1248 wrote:
         | That's interesting, I've seen a few papers about this. I'm
         | personally curious about editing syntax trees using language
         | models, since it would prevent syntax errors altogether.
        
         | Zambyte wrote:
         | John McCarthy was right
        
           | kevinlu1248 wrote:
           | This is interesting. I'm taking a read on this.
        
       | [deleted]
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-07-31 23:00 UTC)