[HN Gopher] Grafeo - A fast, lean, embeddable graph database bui...
___________________________________________________________________
Grafeo - A fast, lean, embeddable graph database built in Rust
Author : 0x1997
Score : 161 points
Date : 2026-03-21 14:50 UTC (8 hours ago)
(HTM) web link (grafeo.dev)
(TXT) w3m dump (grafeo.dev)
| satvikpendem wrote:
| There seem to be a lot of these, how does it compare to Helix DB
| for example? Also, why would you ever want to query a _database_
| with GraphQL, for which it was explicitly not made for that
| purpose?
| adsharma wrote:
| There are 25 graph databases all going me too in the AI/LLM
| driven cycle.
|
| Writing it in Rust gets visibility because of the popularity of
| the language on HN.
|
| Here's why we are not doing it for LadybugDB.
|
| Would love to explore a more gradual/incremental path.
|
| Also focusing on just one query language: strongly typed cypher.
|
| https://github.com/LadybugDB/ladybug/discussions/141
| tadfisher wrote:
| Is LadybugDB not one of these 25 projects?
| adsharma wrote:
| LadybugDB is backed by this tech (I didn't write it)
|
| https://vldb.org/cidrdb/2023/kuzu-graph-database-
| management-...
|
| You can judge for yourself what work has been done in the
| last 5 months. Many short videos here. New open source
| contributors who I didn't know before ramping up.
|
| https://youtube.com/@ladybugdb
| Aurornis wrote:
| Does anyone have any experience with this DB? Or context about
| where it came from?
|
| From the commit history it's obvious that this is an AI coded
| project. It was started a few months ago, 99% of commits are from
| 1 contributor, and that 1 contributor has some times committed
| 100,000 lines of code per week. (EDIT: 200,000 lines of code in
| the first week)
|
| I'm not anti-LLM, but I've done enough AI coding to know that one
| person submitting 100,000 lines of code a week is not doing deep
| thought and review on the AI output. I also know from experience
| that letting AI code the majority of a complex project leads to
| something very fragile, overly complicated, and not well thought
| out. I've been burned enough times by investigating projects that
| turned out to be AI slop with polished landing pages. In some
| cases the claimed benchmarks were improperly run or just
| hallucinated by the AI.
|
| So is anyone actually using this? Or is this someone's personal
| experiment in building a resume portfolio project by letting AI
| run against a problem for a few months?
| gdotv wrote:
| Agreed, there's been a literal explosion in the last 3 months
| of new graph databases coded from scratch, clearly largely LLM
| assisted. I'm having to keep track of the industry quite a bit
| to decide what to add support for on https://gdotv.com and
| frankly these days it's getting tedious.
| piyh wrote:
| I'm turning off my brain and using neo4j
| UltraSane wrote:
| Neo4j is pretty nice.
| gdotv wrote:
| proof that Neo4j won the popularity contest!
| aorth wrote:
| Figurative!
| jandrewrogers wrote:
| That is a lot of code for what appears to be a vanilla graph
| database with a conventional architecture. The thing I would be
| cautious about is that graph database engines in particular are
| known for hiding many sharp edges without a lot of subtle and
| sophisticated design. It isn't obvious that the necessary level
| of attention to detail has been paid here.
| justonceokay wrote:
| Yes a graph database will happily lead you down a n^3 (or
| worse!) path when trying to query for a single relation if
| you are not wise about your indexes, etc.
| adsharma wrote:
| Are you talking about the query plan for scanning the rel
| table? Kuzu used a hash index and a join.
|
| Trying to make it optional.
|
| Try
|
| explain match (a)-[b]->(c) return a.rowid, b.rowid,
| c.rowid;
| cluckindan wrote:
| That sounds like a "graph" DB which implements edges as
| separate tables, like building a graph in a standard SQL
| RDB.
|
| If you wish to avoid that particular caveat, look for a
| graph DB which materializes edges within vertices/nodes.
| The obvious caveat there is that the edges are not
| normalized, which may or may not be an issue for your
| particulat application.
| adsharma wrote:
| Are you talking about Andy Pavlo bet here?
|
| https://news.ycombinator.com/item?id=29737326
|
| Kuzu folks took some of these discussions and implemented
| them. SIP, ASP joins, factorized joins and WCOJ.
|
| Internally it's structured very similar to DuckDB, except for
| the differences noted above.
|
| DuckDB 1.5 implemented sideways information passing (SIP).
| And LadybugDB is bringing in support for DuckDB node tables.
|
| So the idea that graph databases have shaky internals stems
| primarily from pre 2021 incumbents.
|
| 4 more years to go to 2030!
| adsharma wrote:
| Source: https://www.theregister.com/2023/03/08/great_graph_
| debate_we...
|
| > There are some additional optimizations that are specific
| to graphs that a relational DBMS needs to incorporate:
| [...]
|
| This is essentially what Kuzu implemented and DuckDB tried
| to implement (DuckPGQ), without touching relational
| storage.
|
| The jury is out on which one is a better approach.
| jandrewrogers wrote:
| I wasn't referring to the Pavlo bet but I would make the
| same one! Poor algorithm and architecture scalability is a
| serious bottleneck. I was part of a research program
| working on the fundamental computer science of high-scale
| graph databases ~15 years ago. Even back then we could show
| that the architectures you mention couldn't scale even in
| theory. Just about everyone has been re-hashing the same
| basic design for decades.
|
| As I like to point out, for two decades DARPA has offered
| to pay many millions of dollars to anyone who can
| demonstrate a graph database that can handle a sparse
| trillion-edge graph. That data model easily fits on a
| single machine. No one has been able to claim the money.
|
| Inexplicably, major advances in this area 15-20 years ago
| under the auspices of government programs never bled into
| the academic literature even though it materially improved
| the situation. (This case is the best example I've seen of
| obviously valuable advanced research that became lost for
| mundane reasons, which is pretty wild if you think about
| it.)
| adsharma wrote:
| > many millions of dollars to anyone who can demonstrate
| a graph database that can handle a sparse trillion-edge
| graph.
|
| I wonder why no one has claimed it. It's possible to
| compress large graphs to 1 byte per edge via Graph
| reordering techniques. So a trillion scale graph becomes
| 1TB, which can fit into high end machines.
|
| Obviously it won't handle high write rates and mutations
| well. But with Apache Arrow based compression, it's
| certainly possible to handle read-only and read-mostly
| graphs.
|
| Also the single machine constraint feels artificial. For
| any columnar database written in the last 5 years,
| implementing object store support is tablestakes.
| jandrewrogers wrote:
| Achieving adequate performance at 1T edges in one aspect
| requires severe tradeoffs in other aspects, making every
| implementation impractical at that scale. You touched on
| a couple of the key issues when I was working in this
| domain.
|
| There is no single machine constraint, just the
| observation that we routinely run non-graph databases at
| similar scale on single machines without issue. It
| doesn't scale on in-memory supercomputers either, so the
| hardware details are unrelated to the problem:
|
| - Graph database with good query performance typically
| has terrible write performance. It doesn't matter how
| fast queries are if it takes too long to get data into
| the system. At this scale there can be no secondary
| indexing structures into the graph; you need a graph
| cutting algorithm efficient for both scalable writes and
| join recursion. This was solved.
|
| - Graph workloads break cache replacement algorithms for
| well-understood theory reasons. Avoiding disk just
| removes one layer of broken caching among many but
| doesn't address the abstract purpose for which a cache
| exists. This is why in-memory systems still scale poorly.
| We've known how to solve this in theory since at least
| the 1980s. The caveat is it is surprisingly difficult to
| fully reduce to practice in software, especially at
| scale, so no one really has. This is a work in progress.
|
| - Most implementations use global synchronization
| barriers when parallelizing algorithms such as BFS, which
| greatly increases resource consumption while throttling
| hardware scalability and performance. My contribution to
| research was actually in this area: I discovered a way to
| efficiently use error correction algorithms to elide the
| barriers. I think there is room to make this even better
| but I don't think anyone has worked on it since.
|
| The pathological cache replacement behavior is the real
| killer here. It is what is left even if you don't care
| about write performance or parallelization.
|
| I haven't worked in this area for many years but I do
| keep tabs on new graph databases to see if someone is
| exploiting that prior R&D, even if developed
| independently.
| rossjudson wrote:
| I guess it all depends on the meaning of the word
| "handle", and what the use cases are.
| stult wrote:
| It certainly does seem problematic to have a graph database
| hiding edges, sharp or not
| arthurjean wrote:
| Sounds about right for someone who ships fast and iterates. 54
| days for a v0 that probably needs refactoring isn't that crazy
| if the dev has a real DB background. We've all seen open source
| projects drag on for 3 years without shipping anything, that's
| not necessarily better
| Aurornis wrote:
| 200,000 lines of code on week 1 is not a sign of a quality
| codebase with careful thought put into it.
|
| > We've all seen open source projects drag on for 3 years
| without shipping anything, that's not necessarily better
|
| There are more options than "never ship anything" and "use AI
| to slip 200,000 lines of code into a codebase"
| TheJord wrote:
| shipping fast matters a lot less than shipping something you
| actually understand. 200k lines in a week means nobody knows
| what's in there, including the author. that's not a codebase,
| it's a liability
| ozgrakkurt wrote:
| Using a LLM coded database sounds like hell considering even
| major databases can have some rough edges and be painful to
| use.
| hrmtst93837 wrote:
| Six figures a week is a giant red flag. That kind of commit log
| usually means codegen slop or bulk reformatting, and even if
| some of it works I wouldn't trust the design, test coverage, or
| long-term maintenance story enough to put that DB anywhere near
| prod.
| measurablefunc wrote:
| This looks like another avant-garde "art" project.
| nexxuz wrote:
| I was ready to learn more about this but I saw "written in Rust"
| and I literally rolled my eyes and said never mind.
| ComputerGuru wrote:
| I think "written by genAI" should be a bigger turnoff than
| "written in Rust".
| andriy_koval wrote:
| alternative opinion:
|
| * it is possible to write high quality software using GenAI
|
| * not using GenAI could mean project won't be competitive in
| current landscape
| quantumHazer wrote:
| > not using GenAI could mean project won't be competitive
| in current landscape
|
| why? this is false in my opinion, iterating fast is not a
| good indicator of quality nor competitiveness
| andriy_koval wrote:
| iterating fast over quality (e.g. refactoring, tests
| coverage, benchmarks, documentation, trying new
| nontrivial ideas) is a good indicator of quality.
| Aurornis wrote:
| > * it is possible to write high quality software using
| GenAI
|
| From examine this codebase it doesn't appear to be written
| carefully _with_ AI.
|
| It looks like code that was promoted into existence as fast
| as possible.
| andriy_koval wrote:
| sure, there are bad genAI projects and there are good
| genAI projects. You can remove genAI term from previous
| sentence.
| chuckadams wrote:
| Too bad you don't do the same for commenting on HN.
| OtomotO wrote:
| Interesting... Need to check how this differs from agdb, with
| which I had some success for a sideproject in the past.
|
| https://github.com/agnesoft/agdb
|
| Ah, yeah, a different query language.
| cluckindan wrote:
| The d:Document syntax looks so happy!
| cjlm wrote:
| Overwhelmed by the sheer number of graph databases? I released a
| new site this week that lists and categorises them. https://gdb-
| engines.com
| dbacar wrote:
| Did you generate the list using an LLM?
| cjlm wrote:
| I was inspired by https://arxiv.org/abs/2505.24758 and
| collated their assessment into a table and then just kept
| adding databases :)
|
| Claude helped a lot but it's all reviewed and curated by me.
| natdempk wrote:
| Serious question: are there any actually good and useful graph
| databases that people would trust in production at reasonable
| scale and are available as a vendor or as open source? eg. not
| Meta's TAO
| cjlm wrote:
| Serious answer: limiting to just Open Source: JanusGraph,
| DGraph, Apache AGE, HugeGraph, MemGraph and ArcadeDB all meet
| that criteria.
| adsharma wrote:
| What is open source and what is a graph database are both
| hotly debated topics.
|
| Author of ArcadeDB critiques many nominally open source
| licenses here:
|
| https://www.linkedin.com/posts/garulli_why-arcadedb-will-
| nev...
|
| What is a graph database is also relevant: -
| Does it need index free adjacency? - Does it need to
| implement compressed sparse rows? - Does it need to
| implement ACID? - Does translating Cypher to SQL count
| as a graph database?
| pphysch wrote:
| Yeah: Postgres, etc.
|
| When you actually need to run _graph algorithms_ against your
| relational data, you export the subset of that data into
| something like Grafeo (embedded mode is a big plus here) and
| run your analysis.
| adsharma wrote:
| That importing is expensive and prevents you from handling
| billion scale graphs.
|
| It's possible to run cypher against duckdb (soon postgres as
| well via duckdb's postgres extension) without having to
| import anything. That's a game changer when everything is in
| the same process.
| szarnyasg wrote:
| That's a difficult question and I would like to avoid giving a
| direct answer (because I co-lead a nonprofit benchmarking graph
| databases) but even knowing what you need for a graph database
| can be a tricky decision. See my FOSDEM 2025 talk, where I
| tried to make sense of the field:
|
| https://archive.fosdem.org/2025/schedule/event/fosdem-2025-5...
| adsharma wrote:
| What people perceive as "Facebook production graph" is not just
| TAO. There is an ecosystem around it and I wrote one piece of
| it.
|
| Full history here: https://www.linkedin.com/pulse/brief-
| history-graphs-facebook...
| gdotv wrote:
| plenty of those - I've had to work with dozens of different
| graph databases integrating them on https://gdotv.com, save for
| maybe 1-2 exceptions in the list of supported databases on our
| website, they're all production ready and either backed by a
| vendor or open-source (or sometimes both, e.g. Apache AGE for
| Azure PostgreSQL). There are some technologies that have been
| around for a long time but really flying under the radar,
| despite being used a lot in enterprise (e.g. JanusGraph).
| mark_l_watson wrote:
| I just spent an hour with Grafeo, trying to also get the
| associated library grafeo_langchain working with a local Ollama
| model. Mixed results. I really like the Python Kuzu graph
| database, still use it even though the developers no longer
| support it.
| lmeyerov wrote:
| Speaking of embeddable, we just announced cypher syntax for gfql,
| so the first OSS CPU/GPU cypher query engine you can use on
| dataframes
|
| Typically used with scaleout DBs like databricks & splunk for
| analytical apps: security/fraud/event/social data analysis
| pipelines, ML+AI embedding & enrichment pipelines, etc. We
| originally built it for the compute-tier gap here to help
| Graphistry users making embeddable interactive GPU graph viz apps
| and dashboards and not wanting to add an external graph DB phase
| into their interactive analytics flows.
|
| Single GPU can do 1B+ edges/s, no need for a DB install, and can
| work straight on your dataframes / apache arrow / parquet:
| https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark...
|
| We took a multilayer approach to the GPU & vectorization
| acceleration, including a more parallelism-friendly core
| algorithm. This makes fancy features pay-as-you-go vs dragging
| everything down as in most columnar engines that are appearing.
| Our vectorized core conforms to over half of TCK already, and we
| are working to add trickier bits on different layers now that
| flow is established.
|
| The core GFQL engine has been in production for a year or two now
| with a lot of analyst teams around the world (NATO, banks, US
| gov, ...) because it is part of Graphistry. The open-source
| cypher support is us starting to make it easy for others to
| directly use as well, including LLMs :)
| xlii wrote:
| I wonder if people are using (or intend to use) vibe-coded
| projects like the one linked.
|
| I mean - I understand, some people have fun looking at new tech
| no matter the source, but my question is is there a person who
| would be designated to pick a GraphQL in language and would
| ignore all the LLM flags and put it in production.
| brunoborges wrote:
| Why is everything "... built in Rust" trending so easily on HN?
| IshKebab wrote:
| Because Rust is an excellent language that pushes you into the
| "pit of success", and consequently software written in Rust
| tends to be fast, robust and easy to deploy.
|
| There's no big mystery. No conspiracy or organised evangelism.
| Rust is just really good.
| mattvr wrote:
| It implies high performance, reliability, and a higher degree
| of mastery of the developer.
|
| (Which may not all be true, but perhaps moreso than your
| average project)
| foota wrote:
| I added a super cheap and bad embedding database in a project
| that allows the agent to call a tool for searching all the
| content it's built, it seems to work pretty well! This way the
| agent doesn't need to call a bunch of list tools (which I was
| worried would introduce lost of data to the context), and can
| find things based on fuzzy search.
___________________________________________________________________
(page generated 2026-03-21 23:00 UTC)