[HN Gopher] Show HN: Open-source code search with OpenAI's funct...
       ___________________________________________________________________
        
       Show HN: Open-source code search with OpenAI's function calling
        
       We're excited to share a tool we've been working on called gpt-
       code-search. It allows you to search any codebase using natural
       language locally on your machine. We leverage OpenAI's GPT-4 and
       function calling to retrieve, search, and answer queries about your
       code.  All you need to do is to install the package with `pip
       install gpt-code-search`, set up your `OPENAI_API_KEY` as an
       environment variable, and start asking questions with `gpt-code-
       search query <your question>`.  E.g. You can ask questions like
       "How do I use the analytics module?" or "Document all the API
       routes related to authentication."  This is still early and hacked
       together in the past week, but we wanted to get it out there and
       get feedback.  We utilize OpenAI's function calling to let GPT-4
       call certain predefined functions in our library. You do not need
       to implement any of these functions yourself. These functions are
       designed to interact with your codebase and return enough context
       for the LLM to perform code searches without pre-indexing it or
       uploading your repo to a third party other than OpenAI. So, you
       only need to run the tool from the directory you want to search.
       The functions currently available for the LLM to call are:
       `search_codebase` - searches the codebase using a TF-IDF vectorizer
       `get_file_tree` - provides the file tree of the codebase
       `get_file_contents` - provides the contents of a file  These
       functions are implemented in `gpt-code-search` and are triggered by
       chat completions. The LLM is prompted to utilize the
       search_codebase and get_file_tree function as needed to find the
       necessary context to answer your query and then loops as needed to
       collect more context with the get_file_contents until the LLM
       responds.  A couple of limitations of this approach, GPT cannot
       load context across multiple files in a single prompt since we are
       passing in the contents of a single file in each function call. So,
       GPT repeatedly calls the get_file_contents function to load context
       from multiple files. This increases the latency and cost of the
       tool.  Another thing we realized as we were building is that the
       level of search and retrieval is limited by the context window,
       which refers to the scope of the search conducted by the tool,
       meaning that we can only search five levels deep in the file system
       and can only pass in the contents of one file at a time. So it
       would be best to run the tool from the package/directory closest to
       the code you want to search.  We plan to add support for local
       vector embeddings to improve search and retrieval. Combining the
       vector embeddings with function calling should result in much
       faster and higher quality results.  Also, support for other models,
       chat interactions in the command line, and generating code is
       already on our backlog!  Please check out gpt-code-search and let
       me know your thoughts, feedback, or suggestions.
        
       Author : narenkmano
       Score  : 12 points
       Date   : 2023-06-29 15:25 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | inertiatic wrote:
       | You're saying on the documentation:
       | 
       | > nor send your code to another third-party service.
       | 
       | But aren't you actually doing that? Sending things to OpenAI to
       | use as context?
        
       ___________________________________________________________________
       (page generated 2023-06-29 23:01 UTC)