https://github.com/tembo-io/pg_vectorize Skip to content Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} tembo-io / pg_vectorize Public * Notifications * Fork 11 * Star 417 * The simplest way to orchestrate vector search on Postgres 417 stars 11 forks Branches Tags Activity Star Notifications * Code * Issues 1 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights tembo-io/pg_vectorize This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Name Name Last commit Last commit message date Latest commit History 181 Commits .github .github .sqlx .sqlx docs docs images/vectorize-pg images/ vectorize-pg sql sql src src tests tests vector-serve vector-serve .dockerignore .dockerignore .gitattributes .gitattributes .gitignore .gitignore CONTRIBUTING.md CONTRIBUTING.md Cargo.lock Cargo.lock Cargo.toml Cargo.toml LICENSE LICENSE META.json.in META.json.in Makefile Makefile README.md README.md Trunk.toml Trunk.toml docker-compose.yml docker-compose.yml mkdocs.yml mkdocs.yml pyproject.toml pyproject.toml vectorize.control vectorize.control View all files Repository files navigation * README * License pg_vectorize: a VectorDB for Postgres pg_vectorize A Postgres extension that automates the transformation and orchestration of text to embeddings and provides hooks into the most popular LLMs. This allows you to do vector search and build LLM applications on existing data with as little as two function calls. This project relies heavily on the work by pgvector for vector similarity search, pgmq for orchestration in background workers, and SentenceTransformers. --------------------------------------------------------------------- Static Badge PGXN version OSSRank API Documentation: https://tembo-io.github.io/pg_vectorize/ Source: https://github.com/tembo-io/pg_vectorize Features * Workflows for both vector search and RAG * Integrations with OpenAI's embeddings and chat-completion endpoints and a self-hosted container for running Hugging Face Sentence-Transformers * Automated creation of Postgres triggers to keep your embeddings up to date * High level API - one function to initialize embeddings transformations, and another function to search Table of Contents * Features * Table of Contents * Installation * Vector Search Example * RAG Example * Trigger based updates * Try it on Tembo Cloud Installation The fastest way to get started is by running the Tembo docker container and the vector server with docker compose: docker compose up -d Then connect to Postgres: docker compose exec -it postgres psql Enable the extension and its dependencies CREATE EXTENSION vectorize CASCADE; Install into an existing Postgres instance If you're installing in an existing Postgres instance, you will need the following dependencies: Rust: * pgrx toolchain Postgres Extensions: * pg_cron ^1.5 * pgmq ^1 * pgvector ^0.5.0 Then set the following either in postgresql.conf or as a configuration parameter: -- requires restart of Postgres alter system set shared_preload_libraries = 'vectorize,pg_cron'; alter system set cron.database_name = 'postgres' And if you're running the vector-serve container, set the following url as a configuration parameter in Postgres. The host may need to change from localhost to something else depending on where you are running the container. alter system set vectorize.embedding_service_url = 'http://localhost:3000/v1/embeddings' SELECT pg_reload_conf(); Vector Search Example Text-to-embedding transformation can be done with either Hugging Face's Sentence-Transformers or OpenAI's embeddings. The following examples use Hugging Face's Sentence-Transformers. See the project documentation for OpenAI examples. Follow the installation steps if you haven't already. Setup a products table. Copy from the example data provided by the extension. CREATE TABLE products (LIKE vectorize.example_products INCLUDING ALL); INSERT INTO products SELECT * FROM vectorize.example_products; SELECT * FROM products limit 2; product_id | product_name | description | last_updated_at ------------+--------------+--------------------------------------------------------+------------------------------- 1 | Pencil | Utensil used for writing and often works best on paper | 2023-07-26 17:20:43.639351-05 2 | Laptop Stand | Elevated platform for laptops, enhancing ergonomics | 2023-07-26 17:20:43.639351-05 Create a job to vectorize the products table. We'll specify the tables primary key (product_id) and the columns that we want to search (product_name and description). SELECT vectorize.table( job_name => 'product_search_hf', "table" => 'products', primary_key => 'product_id', columns => ARRAY['product_name', 'description'], transformer => 'sentence-transformers/multi-qa-MiniLM-L6-dot-v1' ); This adds a new column to your table, in our case it is named product_search_embeddings, then populates that data with the transformed embeddings from the product_name and description columns. Then search, SELECT * FROM vectorize.search( job_name => 'product_search_hf', query => 'accessories for mobile devices', return_columns => ARRAY['product_id', 'product_name'], num_results => 3 ); search_results --------------------------------------------------------------------------------------------- {"product_id": 13, "product_name": "Phone Charger", "similarity_score": 0.8147814132322894} {"product_id": 6, "product_name": "Backpack", "similarity_score": 0.7743061352550308} {"product_id": 11, "product_name": "Stylus Pen", "similarity_score": 0.7709902653575383} RAG Example Ask raw text questions of the example products dataset and get chat responses from an OpenAI LLM. Follow the installation steps if you haven't already. Set the OpenAI API key, this is required to for use with OpenAI's chat-completion models. ALTER SYSTEM SET vectorize.openai_key TO ''; SELECT pg_reload_conf(); Create an example table if it does not already exist. CREATE TABLE products (LIKE vectorize.example_products INCLUDING ALL); INSERT INTO products SELECT * FROM vectorize.example_products; Initialize a table for RAG. We'll use an open source Sentence Transformer to generate embeddings. Create a new column that we want to use as the context. In this case, we'll concatenate both product_name and description. ALTER TABLE products ADD COLUMN context TEXT GENERATED ALWAYS AS (product_name || ': ' || description) STORED; SELECT vectorize.init_rag( agent_name => 'product_chat', table_name => 'products', "column" => 'context', unique_record_id => 'product_id', transformer => 'sentence-transformers/all-MiniLM-L12-v2' ); SELECT vectorize.rag( agent_name => 'product_chat', query => 'What is a pencil?' ) -> 'chat_response'; "A pencil is an item that is commonly used for writing and is known to be most effective on paper." Trigger based updates When vectorize job is set up as realtime (the default behavior, via vectorize.table(..., schedule => 'realtime')), vectorize will create triggers on your table that will keep your embeddings up to date. When the text inputs are updated or if new rows are inserted, the triggers handle creating a background job that updates the embeddings. Since the transformation is executed in a background job and the transformer model is invoked in a separate container, there is minimal impact on the performance of the update or insert statement. INSERT INTO products (product_id, product_name, description) VALUES (12345, 'pizza', 'dish of Italian origin consisting of a flattened disk of bread'); UPDATE products SET description = 'sling made of fabric, rope, or netting, suspended between two or more points, used for swinging, sleeping, or resting' WHERE product_name = 'Hammock'; Try it on Tembo Cloud Try it for yourself! Install with a single click on a Vector DB Stack (or any other instance) in Tembo Cloud today. About The simplest way to orchestrate vector search on Postgres Resources Readme Activity Custom properties Stars 417 stars Watchers 10 watching Forks 11 forks Report repository Releases 19 v0.12.1 Latest Mar 6, 2024 + 18 releases Contributors 6 * @ChuckHend * @theory * @ryw * @EvanHStanton * @russellkemmit * @ericankenman Languages * Rust 88.9% * Python 6.7% * Makefile 2.7% * Dockerfile 1.7% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.