[HN Gopher] Show HN: HelixDB - Open-source vector-graph database...
       ___________________________________________________________________
        
       Show HN: HelixDB - Open-source vector-graph database for AI
       applications (Rust)
        
       Hey HN, we want to share HelixDB (https://github.com/HelixDB/helix-
       db/), a project a college friend and I are working on. It's a new
       database that natively intertwines graph and vector types, without
       sacrificing performance. It's written in Rust and our initial focus
       is on supporting RAG. Here's a video runthrough:
       https://screen.studio/share/szgQu3yq.  Why a hybrid? Vector
       databases are useful for similarity queries, while graph databases
       are useful for relationship queries. Each stores data in a way
       that's best for its main type of query (e.g. key-value stores vs.
       node-and-edge tables). However, many AI-driven applications need
       _both_ similarity and relationship queries. For example, you might
       use vector-based semantic search to retrieve relevant legal
       documents, and then use graph traversal to identify relationships
       between cases.  Developers of such apps have the quandary of
       needing to build on top of two different databases--a vector one
       and a graph one--plus you have to link them together and sync the
       data. Even then, your two databases aren't designed to work
       together--for example, there's no native way to perform joins or
       queries that span both systems. You'll need to handle that logic at
       the application level.  Helix started when we realized that there
       are ways to integrate vector and graph data that are both fast and
       suitable for AI applications, especially RAG-based ones. See this
       cool research paper: https://arxiv.org/html/2408.04948v1. After
       reading that and some other papers on graph and hybrid RAG, we
       decided to build a hybrid DB. Our aim was to make something better
       to use from a developer standpoint, while also making it fast as
       hell.  After a few months of working on this as a side project, our
       benchmarking shows that we are on par with Pinecone and Qdrant for
       vectors, and our graph is up to three orders of magnitude faster
       than Neo4j.  Problems where a hybrid approach works particularly
       well include:  - Indexing codebases: you can vectorize code-
       snippets within a function (connected by edges) based on context
       and then create an AST (in a graph) from function calls, imports,
       dependencies, etc. Agents can look up code by similarity or keyword
       and then traverse the AST to get only the relevant code, which
       reduces hallucinations and prevents the LLM from guessing object
       shapes or variable/function names.  - Molecule discovery: Model
       biological interactions (e.g., proteins - genes - diseases) using
       graph types and then embed molecule structures to find similar
       compounds or case studies.  - Enterprise knowledge management: you
       can represent organisational structure, projects, and people (e.g.,
       employee - team - project) in graph form, then index internal
       documents, emails, or notes as vectors for semantic search and link
       them directly employees/teams/projects in the graph.  I naively
       assumed when learning about databases for the first time that
       queries would be compiled and executed like functions in
       traditional programming. Turns out I was wrong, but this creates
       unnecessary latency by sending extra data (the whole written
       query), compiling it at run time, and then executing it. With
       Helix, you write the queries in our query language (HelixQL), which
       is then transpiled into Rust code and built directly into the
       database server, where you can call a generated API endpoint.  Many
       people have a thing against "yet another query language" (doubtless
       for good reason!) but we went ahead and did it anyway, because we
       think it makes working with our database so much easier that it's
       worth a bit of a learning curve. HelixQL takes from other query
       languages such as Gremlin, Cypher and SQL with some extra ideas
       added in. It is declarative while the traversals themselves are
       functional. This allows complete control over the traversal flow
       while also having a cleaner syntax. HelixQL returns JSON to make
       things easy for clients. Also, it uses a schema, so the queries are
       type-checked.  We took a crude approach to building the original
       graph engine as a way to get an MVP out, so we are now working on
       improving the graph engine by making traversals massively parallel
       and pipelined. This means data is only ever decoded from disk when
       it is needed, and parts of reads are all processed in parallel.  If
       you'd like to try it out in a simple RAG demo, you can follow this
       guide and run our Jupyter notebook:
       https://github.com/HelixDB/helix-db/tree/main/examples/rag_d...
       Many thanks! Comments and feedback welcome!
        
       Author : GeorgeCurtis
       Score  : 96 points
       Date   : 2025-05-13 17:26 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | sync wrote:
       | Looks nice! Are you looking to compete with
       | https://www.falkordb.com or do something a bit different?
        
         | GeorgeCurtis wrote:
         | Pretty much, our biggest focus is on Graph and Hybrid RAG. They
         | seem to have really honed in on Graph RAG since the last time I
         | checked their website.
         | 
         | One of the problems I know people experience with them is that
         | they're super slow at bulk reading.
         | 
         | Oh also, they aren't built in Rust haha
        
       | esafak wrote:
       | How does it compare with https://kuzudb.com/ ?
        
         | GeorgeCurtis wrote:
         | Kuzu don't support incremental indexing on the vectors. The
         | vector index is completely separate and decoupled from the
         | graph.
         | 
         | I.e: You have to re-index all of the vectors when you make an
         | update to them.
        
       | SchwKatze wrote:
       | Super cool!!! I'll try it this week and go back to give a
       | feedback.
        
         | GeorgeCurtis wrote:
         | I look forward to it :)
        
       | hbcondo714 wrote:
       | Congrats! Any chance Helixdb can be run in the browser too, maybe
       | via WASM? I'm looking for a vector db that can be pre-populated
       | on the server and then be searched on the client so user queries
       | (chat) stay on-device for privacy / compliance reasons.
        
         | GeorgeCurtis wrote:
         | Interesting, we've had a few people ask about this. So
         | essentially you'd call the server to retrieve the HNSW and then
         | store it in the browser and use WASM to query it?
         | 
         | Currently the road block for that is the LMDB storage engine.
         | We have on our own storage engine on our roadmap, which we want
         | to include WASM support with. If you wanna talk about it reach
         | out to my twitter: https://x.com/georgecurtiss
        
       | J_Shelby_J wrote:
       | How do you think about building the graph relationships? Any
       | special approaches you use?
        
         | GeorgeCurtis wrote:
         | Pretty much the same way you would with any graph DB, with the
         | added benefit of being able to treat a vector as a node by
         | creating those explicit relationships between them.
         | 
         | Does that answer your question properly?
        
       | carlhjerpe wrote:
       | Nice "I'll have this name" when there's already the helix editor
       | :)
        
         | GeorgeCurtis wrote:
         | First I'm hearing from it. The Beatles must've been super
         | pissed when Apple took their name :(
        
           | carlhjerpe wrote:
           | https://crates.io/search?q=Helix
           | 
           | I'm surprised none in the team searched crates.io once before
           | picking the name. Good luck!
        
             | GeorgeCurtis wrote:
             | we just started off as a side project and thought the name
             | fitted well. With the strands, graph type structure,
             | connections...
             | 
             | We didn't think of getting people to use it until we found
             | it was solving a real pain point for people, so weren't
             | worried about trademarks or names. There was no other helix
             | db so that was good enough for us at the time.
        
               | carlhjerpe wrote:
               | It's not the end of the world, just me being a bit
               | grumpy. I mean it when I say good luck! :)
        
               | GeorgeCurtis wrote:
               | Thank you :)
        
               | tavianator wrote:
               | > There was no other helix db
               | 
               | https://en.wikipedia.org/wiki/Helix_(database)
        
               | GeorgeCurtis wrote:
               | There was no active one. We saw this and thought it would
               | be a nice nod to history. We've actually spoken to some
               | developers at apple who thought this was really neat :)
        
             | itishappy wrote:
             | I don't think `helix-editor` is even on crates.io, just
             | placeholders.
             | 
             | https://github.com/helix-editor/helix/discussions/7038
             | 
             | That being said, when I saw `helix-db` I was thrown too.
             | "What's a text editor doing writing a vector-graph
             | database, I thought they were working on plugins?"
        
           | bbatsell wrote:
           | I can't tell if this is droll sarcasm, but just in case
           | not...
           | 
           | https://en.wikipedia.org/wiki/Apple_Corps_v_Apple_Computer
        
         | cormullion wrote:
         | perhaps it's a homage to the famous Helix database (see
         | Wikipedia)
        
           | GeorgeCurtis wrote:
           | well noted
        
       | javierluraschi wrote:
       | What is the max number of dimensions supported for a vector?
        
         | GeorgeCurtis wrote:
         | There is currently no cap. We will probably impose a similar
         | cap to Qdrant or Pinecone some time soon ~64k. There's
         | obviously a performance trade off as you go up, but we hope to
         | massively offset this by doing binary quantisation within the
         | next couple of months.
        
       | huevosabio wrote:
       | Can I run this as an embedded DB like sqlite?
       | 
       | Can I sidestep the DSL? I want my LLMs to generate queries and
       | using a new language is going to make that hard or expensive.
        
         | GeorgeCurtis wrote:
         | Currently you can't run us embedded and I'm not sure how you
         | could sidestep the DSL :/
         | 
         | We're working on putting our grammar in llama's cpp code so
         | that it only outputs grammatically correct HQL. But, even
         | without that it shouldn't be hard or expensive to do. I wrote a
         | Claude wrapper that had our docs in its context window, it did
         | a good job of writing queries most of the time.
        
           | tough wrote:
           | you could refactor your claude-wrapper into a mcp-server
           | maybe
           | 
           | how does llama's cpp special sauce work to enforce outputs
           | syntax?
        
       | elpalek wrote:
       | What method/model are you using for sparse search?
        
         | GeorgeCurtis wrote:
         | We're going to use BM25. Currently it is just dense search.
         | Coming very soon
        
           | elpalek wrote:
           | have you thought about SPALDE models? ex:
           | https://arxiv.org/abs/2109.10086
        
             | GeorgeCurtis wrote:
             | Looks really interesting, I'll have a proper read. What
             | would be your reasoning to incorporate this if we already
             | have vector functionality and semantic search?
        
               | elpalek wrote:
               | my project deals w/ non-english text, bm25 performance is
               | middeling. Language specific sparse model helps.
        
       | mdaniel wrote:
       | > so much easier that it's worth a bit of a learning curve
       | 
       | I think you misspelled "vendor lock in"
        
         | GeorgeCurtis wrote:
         | You can literally use us for free haha. There's not a language
         | that properly encapsulates graph and vector functionality, so
         | we needed to make our own. Also, we thought it was dumb that
         | query languages weren't type-safe... So we changed that
        
       | basonjourne wrote:
       | why not surrealdb?
        
         | GeorgeCurtis wrote:
         | General consensus is it's really slow, I like the concept of
         | surreal though. Our first, and extremely bare bones, version of
         | the graph db was 1-2 orders of magnitude faster than surreal
         | (we haven't run benchmarks against surreal recently, but I'll
         | put them here when we're done)
        
       | Attummm wrote:
       | It sounds very intriguing indeed. However, the README makes some
       | claims. Are there any benchmarks to support them?
       | 
       | > Built for performance we're currently 1000x faster than Neo4j,
       | 100x faster than TigerGraph
        
         | GeorgeCurtis wrote:
         | Those were actual benchmarks that we run, we didn't get a
         | chance to write them out properly before posting. I'll get on
         | it now and notify by replying to this comment when they're on
         | the readme :)
        
       | rohanrao123 wrote:
       | Congrats on the launch! I'm one of the authors of that paper you
       | cited, glad it was useful and inspiring to building this :) Let
       | me know if we can support in any way!
        
         | GeorgeCurtis wrote:
         | Wow! I enjoyed reading it a lot and it was definitely inspiring
         | for this project!
         | 
         | Would love to talk to you about it and make sure we capture all
         | of the pain points if you're open to it? :)
        
           | rohanrao123 wrote:
           | Absolutely, will DM you on X!
        
       | tmpfs wrote:
       | This is very interesting, are there any examples of interacting
       | with LLMs? If the queries are compiled and loaded into the
       | database ahead of time the pattern of asking an LLM to generate a
       | query from a natural language request seems difficult because
       | current LLMs aren't going to know your query language yet and
       | compiling each query for each prompt would add unnecessary
       | overhead.
        
         | GeorgeCurtis wrote:
         | This is definitely a problem we want to work on fixing quickly.
         | We're currently planning an MCP tool that can traverse the
         | graph and decide for itself at each step where to go to next.
         | As opposed to having to generate actual text written queries.
         | 
         | I mentioned in another comment that you can provide a grammar
         | with constrained decoding to force the LLM to generate tokens
         | that comply with the grammar. This ensures that only valid
         | syntactic constructs are produced.
        
       | raufakdemir wrote:
       | How can I migrate neo4j to this?
        
         | GeorgeCurtis wrote:
         | We can build an ingestion engine for you :)
         | 
         | We've built SQL and PGVector ones already, just waiting for
         | someone who could make use of other ones before we build them.
         | 
         | Let us know! Twitter in my bio
        
       | lennertjansen wrote:
       | how did you get it 3 OOMs faster than neo4j?
        
         | GeorgeCurtis wrote:
         | Because neo4j sucks! Partly because they're working with a
         | monolith that I imagine is difficult to iterate on and it's
         | written in Java. We've had the benefit of working on this in
         | Rust which lets us get really nitty and gritty with different
         | optimisations.
         | 
         | My friend who I worked on this with is putting together a
         | technical blog on those graph optimisations so I'll link it
         | here when he's done
        
       | youdont wrote:
       | Looks very interesting, but I've seen these kind of multi-
       | paradigm databases like Gel, Helix and Surreal and I'm not sure
       | that any of them quite hit the graph spot.
       | 
       | Does Helix support much of the graph algorithm world? For things
       | like GrapgRAG.
       | 
       | Either way, I'd be all over it if there was a python SDK witch
       | worked with the generated types!
        
       | dietr1ch wrote:
       | Graph DB OOMing 101. Can it do Erdos/Bacon numbers?
       | 
       | Graph DBs have been plagued with exploding complexity of queries
       | as doing things like allowing recursion or counting paths isn't
       | as trivial as it may sound. Do you have benchmarks and
       | comparisons against other engines and query languages?
        
       | riku_iki wrote:
       | How scalable is your DB in your tests? Could it be performent on
       | graphs with 1B/10B/100B connections?
        
       ___________________________________________________________________
       (page generated 2025-05-13 23:00 UTC)