https://github.com/tembo-io/pg_vectorize

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
tembo-io / pg_vectorize Public

  * Notifications
  * Fork 11
  * Star 417
  * 

The simplest way to orchestrate vector search on Postgres

417 stars 11 forks Branches Tags Activity
Star
Notifications

  * Code
  * Issues 1
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

tembo-io/pg_vectorize

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

       Name                Name          Last commit     Last commit
                                           message          date
Latest commit

 

History

181 Commits
 
.github             .github                              

.sqlx               .sqlx                                

docs                docs                                 

images/vectorize-pg images/
                    vectorize-pg                         

sql                 sql                                  

src                 src                                  

tests               tests                                

vector-serve        vector-serve                         

.dockerignore       .dockerignore                        

.gitattributes      .gitattributes                       

.gitignore          .gitignore                           

CONTRIBUTING.md     CONTRIBUTING.md                      

Cargo.lock          Cargo.lock                           

Cargo.toml          Cargo.toml                           

LICENSE             LICENSE                              

META.json.in        META.json.in                         

Makefile            Makefile                             

README.md           README.md                            

Trunk.toml          Trunk.toml                           

docker-compose.yml  docker-compose.yml                   

mkdocs.yml          mkdocs.yml                           

pyproject.toml      pyproject.toml                       

vectorize.control   vectorize.control                    

View all files

Repository files navigation

  * README
  * License

                pg_vectorize: a VectorDB for Postgres

                            pg_vectorize

 

A Postgres extension that automates the transformation and
orchestration of text to embeddings and provides hooks into the most
popular LLMs. This allows you to do vector search and build LLM
applications on existing data with as little as two function calls.

This project relies heavily on the work by pgvector for vector
similarity search, pgmq for orchestration in background workers, and
SentenceTransformers.

---------------------------------------------------------------------

Static Badge PGXN version OSSRank

API Documentation: https://tembo-io.github.io/pg_vectorize/

Source: https://github.com/tembo-io/pg_vectorize

Features

 

  * Workflows for both vector search and RAG
  * Integrations with OpenAI's embeddings and chat-completion
    endpoints and a self-hosted container for running Hugging Face
    Sentence-Transformers
  * Automated creation of Postgres triggers to keep your embeddings
    up to date
  * High level API - one function to initialize embeddings
    transformations, and another function to search

Table of Contents

 

  * Features
  * Table of Contents
  * Installation
  * Vector Search Example
  * RAG Example
  * Trigger based updates
  * Try it on Tembo Cloud

Installation

 

The fastest way to get started is by running the Tembo docker
container and the vector server with docker compose:

docker compose up -d

Then connect to Postgres:

docker compose exec -it postgres psql

Enable the extension and its dependencies

CREATE EXTENSION vectorize CASCADE;

Install into an existing Postgres instance

If you're installing in an existing Postgres instance, you will need
the following dependencies:

Rust:

  * pgrx toolchain

Postgres Extensions:

  * pg_cron ^1.5
  * pgmq ^1
  * pgvector ^0.5.0

Then set the following either in postgresql.conf or as a
configuration parameter:

-- requires restart of Postgres
alter system set shared_preload_libraries = 'vectorize,pg_cron';
alter system set cron.database_name = 'postgres'

And if you're running the vector-serve container, set the following
url as a configuration parameter in Postgres. The host may need to
change from localhost to something else depending on where you are
running the container.

alter system set vectorize.embedding_service_url = 'http://localhost:3000/v1/embeddings'

SELECT pg_reload_conf();

Vector Search Example

 

Text-to-embedding transformation can be done with either Hugging
Face's Sentence-Transformers or OpenAI's embeddings. The following
examples use Hugging Face's Sentence-Transformers. See the project
documentation for OpenAI examples.

Follow the installation steps if you haven't already.

Setup a products table. Copy from the example data provided by the
extension.

CREATE TABLE products (LIKE vectorize.example_products INCLUDING ALL);
INSERT INTO products SELECT * FROM vectorize.example_products;

SELECT * FROM products limit 2;

 product_id | product_name |                      description                       |        last_updated_at
------------+--------------+--------------------------------------------------------+-------------------------------
          1 | Pencil       | Utensil used for writing and often works best on paper | 2023-07-26 17:20:43.639351-05
          2 | Laptop Stand | Elevated platform for laptops, enhancing ergonomics    | 2023-07-26 17:20:43.639351-05

Create a job to vectorize the products table. We'll specify the
tables primary key (product_id) and the columns that we want to
search (product_name and description).

SELECT vectorize.table(
    job_name => 'product_search_hf',
    "table" => 'products',
    primary_key => 'product_id',
    columns => ARRAY['product_name', 'description'],
    transformer => 'sentence-transformers/multi-qa-MiniLM-L6-dot-v1'
);

This adds a new column to your table, in our case it is named
product_search_embeddings, then populates that data with the
transformed embeddings from the product_name and description columns.

Then search,

SELECT * FROM vectorize.search(
    job_name => 'product_search_hf',
    query => 'accessories for mobile devices',
    return_columns => ARRAY['product_id', 'product_name'],
    num_results => 3
);

                                       search_results
---------------------------------------------------------------------------------------------
 {"product_id": 13, "product_name": "Phone Charger", "similarity_score": 0.8147814132322894}
 {"product_id": 6, "product_name": "Backpack", "similarity_score": 0.7743061352550308}
 {"product_id": 11, "product_name": "Stylus Pen", "similarity_score": 0.7709902653575383}

RAG Example

 

Ask raw text questions of the example products dataset and get chat
responses from an OpenAI LLM.

Follow the installation steps if you haven't already.

Set the OpenAI API key, this is required to for use with OpenAI's
chat-completion models.

ALTER SYSTEM SET vectorize.openai_key TO '<your api key>';
SELECT pg_reload_conf();

Create an example table if it does not already exist.

CREATE TABLE products (LIKE vectorize.example_products INCLUDING ALL);
INSERT INTO products SELECT * FROM vectorize.example_products;

Initialize a table for RAG. We'll use an open source Sentence
Transformer to generate embeddings.

Create a new column that we want to use as the context. In this case,
we'll concatenate both product_name and description.

ALTER TABLE products
ADD COLUMN context TEXT GENERATED ALWAYS AS (product_name || ': ' || description) STORED;

SELECT vectorize.init_rag(
    agent_name => 'product_chat',
    table_name => 'products',
    "column" => 'context',
    unique_record_id => 'product_id',
    transformer => 'sentence-transformers/all-MiniLM-L12-v2'
);

SELECT vectorize.rag(
    agent_name => 'product_chat',
    query => 'What is a pencil?'
) -> 'chat_response';

"A pencil is an item that is commonly used for writing and is known to be most effective on paper."

Trigger based updates

 

When vectorize job is set up as realtime (the default behavior, via
vectorize.table(..., schedule => 'realtime')), vectorize will create
triggers on your table that will keep your embeddings up to date.
When the text inputs are updated or if new rows are inserted, the
triggers handle creating a background job that updates the
embeddings. Since the transformation is executed in a background job
and the transformer model is invoked in a separate container, there
is minimal impact on the performance of the update or insert
statement.

INSERT INTO products (product_id, product_name, description)
VALUES (12345, 'pizza', 'dish of Italian origin consisting of a flattened disk of bread');

UPDATE products
SET description = 'sling made of fabric, rope, or netting, suspended between two or more points, used for swinging, sleeping, or resting'
WHERE product_name = 'Hammock';

Try it on Tembo Cloud

 

Try it for yourself! Install with a single click on a Vector DB Stack
(or any other instance) in Tembo Cloud today.

About

The simplest way to orchestrate vector search on Postgres

Resources

Readme
Activity
Custom properties

Stars

417 stars

Watchers

10 watching

Forks

11 forks
Report repository

Releases 19

 
v0.12.1 Latest
Mar 6, 2024
+ 18 releases

Contributors 6

  * @ChuckHend
  * @theory
  * @ryw
  * @EvanHStanton
  * @russellkemmit
  * @ericankenman

Languages

  * Rust 88.9%
  * Python 6.7%
  * Makefile 2.7%
  * Dockerfile 1.7%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.