[HN Gopher] Show HN: Open-source code search with OpenAI's funct...
___________________________________________________________________
Show HN: Open-source code search with OpenAI's function calling
We're excited to share a tool we've been working on called gpt-
code-search. It allows you to search any codebase using natural
language locally on your machine. We leverage OpenAI's GPT-4 and
function calling to retrieve, search, and answer queries about your
code. All you need to do is to install the package with `pip
install gpt-code-search`, set up your `OPENAI_API_KEY` as an
environment variable, and start asking questions with `gpt-code-
search query <your question>`. E.g. You can ask questions like
"How do I use the analytics module?" or "Document all the API
routes related to authentication." This is still early and hacked
together in the past week, but we wanted to get it out there and
get feedback. We utilize OpenAI's function calling to let GPT-4
call certain predefined functions in our library. You do not need
to implement any of these functions yourself. These functions are
designed to interact with your codebase and return enough context
for the LLM to perform code searches without pre-indexing it or
uploading your repo to a third party other than OpenAI. So, you
only need to run the tool from the directory you want to search.
The functions currently available for the LLM to call are:
`search_codebase` - searches the codebase using a TF-IDF vectorizer
`get_file_tree` - provides the file tree of the codebase
`get_file_contents` - provides the contents of a file These
functions are implemented in `gpt-code-search` and are triggered by
chat completions. The LLM is prompted to utilize the
search_codebase and get_file_tree function as needed to find the
necessary context to answer your query and then loops as needed to
collect more context with the get_file_contents until the LLM
responds. A couple of limitations of this approach, GPT cannot
load context across multiple files in a single prompt since we are
passing in the contents of a single file in each function call. So,
GPT repeatedly calls the get_file_contents function to load context
from multiple files. This increases the latency and cost of the
tool. Another thing we realized as we were building is that the
level of search and retrieval is limited by the context window,
which refers to the scope of the search conducted by the tool,
meaning that we can only search five levels deep in the file system
and can only pass in the contents of one file at a time. So it
would be best to run the tool from the package/directory closest to
the code you want to search. We plan to add support for local
vector embeddings to improve search and retrieval. Combining the
vector embeddings with function calling should result in much
faster and higher quality results. Also, support for other models,
chat interactions in the command line, and generating code is
already on our backlog! Please check out gpt-code-search and let
me know your thoughts, feedback, or suggestions.
Author : narenkmano
Score : 12 points
Date : 2023-06-29 15:25 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| inertiatic wrote:
| You're saying on the documentation:
|
| > nor send your code to another third-party service.
|
| But aren't you actually doing that? Sending things to OpenAI to
| use as context?
___________________________________________________________________
(page generated 2023-06-29 23:01 UTC)