[HN Gopher] Datalog in Rust
       ___________________________________________________________________
        
       Datalog in Rust
        
       Author : brson
       Score  : 317 points
       Date   : 2025-06-15 11:18 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Leynos wrote:
       | It's funny seeing this as the top story.
       | 
       | I'm in the middle of putting together a realtime strategy game
       | using Differential Datalog[1] and Rust, with DDL managing the
       | game's logic. Mostly as an excuse to expose myself to new ideas
       | and engage in a whole lot of yak shaving.
       | 
       | [1] https://github.com/vmware-archive/differential-datalog
        
         | Yoric wrote:
         | On, nice!
         | 
         | I'll be interested in reading how this goes!
        
         | cmrdporcupine wrote:
         | Very cool, I'm curious to see what the state of that
         | implementation is and how far you get, since DDLog is not being
         | actively maintained anymore.
        
         | foota wrote:
         | I wonder if you could make a frankenstein version of
         | differential datalog by combing the OP repo with salsa[1] (the
         | crate that powers rust-analyzer)
         | 
         | [1] https://github.com/salsa-rs/salsa
        
         | lsuresh wrote:
         | That's a cool demo you're building using ddlog! FWIW, we, the
         | ddlog team have moved on to found Feldera
         | (https://github.com/feldera/feldera). You could consider using
         | DBSP directly through Rust.
        
       | rienbdj wrote:
       | A new McSharry post! Excellent
       | 
       | Last I checked, VMWare had moved away from differential datalog?
        
         | jitl wrote:
         | The Differential Datalog team founded Feldera:
         | https://www.feldera.com/
         | 
         | They switched from differential Datalog to differential SQL, I
         | think because they realized Datalog is a really tough sell.
        
           | rebanevapustus wrote:
           | They did, and their product is great.
           | 
           | It is the only database/query engine that allows you to use
           | the same SQL for both batch and streaming (with UDFs).
           | 
           | I have made an accessible version of a subset of Differential
           | Dataflow (DBSP) in Python right here:
           | https://github.com/brurucy/pydbsp
           | 
           | DBSP is so expressive that I have implemented a fully
           | incremental dynamic datalog engine as a DBSP program.
           | 
           | Think of SQL/Datalog where the query can change in runtime,
           | and the changes themselves (program diffs) are incrementally
           | computed: https://github.com/brurucy/pydbsp/blob/master/noteb
           | ooks/data...
        
             | gunnarmorling wrote:
             | > It is the only database/query engine that allows you to
             | use the same SQL for both batch and streaming (with UDFs).
             | 
             | Flink SQL also checks that box.
        
               | rebanevapustus wrote:
               | Not true.
               | 
               | There has to be some change in the code, and they will
               | not share the same semantics (and perhaps won't work when
               | retractions/deletions also appear whilst streaming). And
               | let's not even get to the leaky abstractions for good
               | performance (watermarks et al).
        
               | jitl wrote:
               | Flink SQL is quite limited compared to Feldera/DBSP or
               | Frank's Materialize.com, and has some correctness
               | limitations: it's "eventually consistent" but until you
               | stop the data it's unlikely to ever be actually correct
               | when working with streaming joins. https://www.scattered-
               | thoughts.net/writing/internal-consiste...
        
       | rc00 wrote:
       | Posted 1 day ago
       | 
       | https://news.ycombinator.com/item?id=44274592
        
       | tulio_ribeiro wrote:
       | "I, a notorious villain, was invited for what I was half sure was
       | my long-due comeuppance." -- Best opening line of a technical
       | blog post I've read all year.
       | 
       | The narrator's interjections were a great touch. It's rare to see
       | a post that is this technically deep but also so fun to read. The
       | journey through optimizing the aliasing query felt like a
       | detective story. We, the readers, were right there with you,
       | groaning at the 50GB memory usage and cheering when you got it
       | down to 5GB.
       | 
       | Fantastic work, both on the code and the prose.
        
       | 29athrowaway wrote:
       | If you wish to use Datalog and Rust, cozodb is written in Rust
       | and has a Datalog query syntax.
        
         | jitl wrote:
         | Cozodb seems cool but also inactive. I poked around about in
         | November 2024 and found some low hanging fruit in the sqlite
         | storage backend: https://github.com/cozodb/cozo/issues/285
        
           | 29athrowaway wrote:
           | It's not a lot of code so it's easy to tinker with.
        
       | maweki wrote:
       | It is nice to see a core group of Datalog enthusiasts persist,
       | even though the current Datalog revival seems to be on the
       | decline. The recent Datalog 2.0 conference was quite small
       | compared to previous years and the second HYTRADBOI conference
       | was very light on Datalog as well, while the first one had a
       | quarter of submissions with Datalog connection.
       | 
       | I'm encouraged by the other commenters sharing their recent
       | Datalog projects. I am currently building a set of data quality
       | pipelines for a legacy SQL database in preparation of a huge
       | software migration.
       | 
       | We find Datalog much more useful in identifying and looking for
       | data quality issues thatn SQL, as the queries can be incredibly
       | readable when well-structured.
        
         | kmicinski wrote:
         | No offense, but I wouldn't take Datalog 2.0's small attendance
         | as an exemplar of Datalog's decline, even if I agree with that
         | high-level point. Datalog 2.0 is a satellite workshop of LPNMR,
         | a relatively-unknown European conference that was randomly held
         | in Dallas. I myself attended Datalog 2.0 and also felt the
         | event felt relatively sparse. I also had a paper (not my
         | primary work, the first author is the real wizard of course :-)
         | at the workshop. I myself saw relatively few folks in that
         | space even attending that event--with the notable exception of
         | some European folks (e.g., introducing the Nemo solver).
         | 
         | All of this is to say, I think Datalog 2.0's sparse attendance
         | this year may be more indicative of the fact that it is a
         | satellite workshop of an already-lesser-prestigious conference
         | (itself not even the main event! That was ICLP!) rather than a
         | lack of Datalog implementation excitement.
         | 
         | For what it's worth, none of what I'm saying is meant to rebut
         | your high-level point that there is little novelty left in
         | implementing raw Datalog engines. Of course I agree, the
         | research space has moved far beyond that (arguably it did a
         | while ago) and into more exotic problems involving things like
         | streaming (HydroFlow), choice (Dusa), things that get closer to
         | the general chase (e.g., Egglog's chase engine), etc. I don't
         | think anyone disagrees that vanilla Datalog is boring, it's
         | just that monotonic, chain-forward saturation (Horn clauses!)
         | are a rich baseline with a well-understood engineering
         | landscape (esp in the high-performance space) to build out more
         | interesting theories (semirings, Z-sets, etc..).
        
           | BartjeD wrote:
           | The USA is a hostile environment to visit, for Europeans.
           | It's a theme all over western and northern Europe to avoid
           | it.
           | 
           | If you were expecting European attendance, that would explain
           | why it's different from the past.
           | 
           | It's the same in science circles, tourism and even business
           | travel.
        
             | freilanzer wrote:
             | As a European, I'm definitely avoiding the US. I was asked
             | whether I wanted to visit our US branch and I declined -
             | which was simply accepted and probably expected in advance.
             | Almost everyone I know does the same.
        
             | kmicinski wrote:
             | I completely agree with you, but this event was held in
             | October, 2024--Trump's horrendous behavior wasn't an effect
             | on that. There are other reasons folks internationally
             | probably don't want to travel to Texas aside from Trump--
             | but in this specific case I think it might have more to do
             | with the fact that LPMNR is a traditionally European
             | conference randomly held in the US.
        
       | burakemir wrote:
       | I made some progress porting mangle datalog to Rust
       | https://github.com/google/mangle/tree/main/rust - it is in the
       | same repo as the golang implementation.
       | 
       | It is slow going, partly since it is not a priority, partly
       | because I suffer from second system syndrome. Mangle Rust should
       | deal with any size data through getting and writing facts to disk
       | via memory mapping. The golang implementation is in-memory.
       | 
       | This post is nice because it parses datalog and mentions the LSM
       | tree, and much easier to follow than the data frog stuff.
       | 
       | There are very many datalog implementations in Rust (ascent,
       | crepe) that use proc-macros. The downside is that they won't
       | handle getting queries at runtime. For the static analysis use
       | case where queries/programs are fixed, the proc macro approach
       | might be better.
        
       | banana_feather wrote:
       | I like the author's datalog work generally, but I really wish his
       | introductory material did not teach using binary join, which I
       | found to get very messy internally as soon as you get away from
       | the ideal case. I found the generic join style methods to be
       | much, much simpler to generalize in one's head (see
       | https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...).
        
         | davery22 wrote:
         | related: McSherry's preceding blog post was all about
         | demonstrating how binary joins can achieve worst-case optimal
         | runtime, given suitable adjustments to the query plan.
         | 
         | -
         | https://github.com/frankmcsherry/blog/blob/master/posts/2025...
        
           | kmicinski wrote:
           | For materialization-heavy workloads (program analysis, etc.),
           | we often find that optimized binary join plans (e.g.,
           | profile-optimized, hand-optimized, etc.) beat worst-case
           | optimal plans due to the ability to get better scalability
           | (less locking) without the need to use a trie-based
           | representation. Within the space of worst-case optimal plans,
           | there are still lots of choices: but a bad worst-case optimal
           | plan can often beat a bad (randomly-chosen) binary plan. And
           | of course (the whole point of this exercise), there are some
           | queries where every binary plan explodes and you do need
           | WCOJ. There's also some work on making more traditional
           | binary joins robust (https://db.in.tum.de/people/sites/birler
           | /papers/diamond.pdf), among other interesting work
           | (https://arxiv.org/html/2502.15181v1). Effectively
           | parallelizing WCOJs is still an open problem as far as I am
           | aware (at least, this is what folks working on it tell me),
           | but there are some exciting potential directions in tackling
           | that that several folks are working on I believe.
        
       | rnikander wrote:
       | Some Clojure fans once told me they thought datalog was better
       | than SQL and it was a shame that the relational DBs all used SQL.
       | I never dug into it enough to find out why they thought that way.
        
         | jitl wrote:
         | I struggle to understand the Clojure/Datomic dialect, but I
         | agree generally. I recommend Percival for playing around with
         | Datalog in a friendly notebook environment online:
         | https://percival.ink/
         | 
         | Although there's no "ANSI SQL" equivalent standard across
         | Datalog implementations, once you get a hang of the core idea
         | it's not too hard to understand another Datalog.
         | 
         | I started a Percival fork that compiles the Datalog to SQLite,
         | if you want to check out how the two can express the same
         | thing: https://percival.jake.tl/ (unfinished when it comes to
         | aggregates and more advanced joins but the basic forms work
         | okay). Logica is a much more serious / complete Datalog->SQL
         | compiler written by a Google researcher that compiles to
         | BigTable, DuckDB, and a few other SQL dialects
         | (https://logica.dev/).
         | 
         | One area Datalog is an order of magnitude easier is when
         | working with recursive queries / rules; this is possible in SQL
         | but feels a bit like drinking playdough through a straw.
         | Frank's Materialize.com has a "WITH MUTUALLY RECURSIVE" SQL
         | form (https://materialize.com/blog/recursion-in-materialize/)
         | that's much nicer than the ancient ANSI SQL recursive approach,
         | we're evaluating it for page load queries & data sync at
         | Notion.
         | 
         | Feldera has a similar form for recursive views as well
         | (https://www.feldera.com/blog/recursive-sql-queries-in-
         | felder...). I like that Feldera lets you make each "rule" or
         | subview its own statement rather than needing to pack
         | everything into a single huge statement. Main downside I found
         | when testing Feldera is that their SQL dialect has a bunch of
         | limitations inherited from Apache Calcite, the Materialize SQL
         | dialect tries very hard to be PostgresSQL compatible.
        
           | ben_pfaff wrote:
           | > Main downside I found when testing Feldera is that their
           | SQL dialect has a bunch of limitations inherited from Apache
           | Calcite
           | 
           | At Feldera, we're adding features to our SQL over time, by
           | contributing them upstream to Calcite, making it better for
           | everyone. Mihai Budiu, who is the author of the Feldera SQL
           | compiler, is a Calcite committer.
        
             | jitl wrote:
             | Thanks for contributing. I see Mihai implemented the UUID
             | type in Calcite
             | (https://issues.apache.org/jira/browse/CALCITE-6738) back
             | in January which is one of the issues I hit, so for sure my
             | experience with Feldera is 6 months out of date and y'all
             | move pretty quick.
             | 
             | Most of what I mean is places where Feldera/Calcite has
             | slightly different syntax from Postgres for things. For
             | example, Postgres syntax for cast to bigint is
             | `some_expresion::bigint` although Postgres also supports
             | ANSI SQL `CAST(some_expression AS bigint)`, most examples I
             | find in the wild and in my own Postgres SQL use the
             | Postgres special syntax. JSON syntax also differs; Feldera
             | uses its own pretty elegant `VARIANT` type and
             | `some_expression[key_expression]` to access properties,
             | where Postgres calls this `json` or `jsonb`, and uses
             | `some_expression->key_expression` to access properties. In
             | those cases it's not like Feldera is wrong or lacks some
             | support, but it's a bit harder to work with for me because
             | I'm so used to Postgres syntax and I need to do some regex
             | replace whenever I bring a query from Postgres over to
             | Feldera.
             | 
             | Definitely not a deal-breaker, I am a Feldera enjoyer, but
             | it does add some friction.
        
               | lsuresh wrote:
               | Thanks for the kind words. :) We hear you on the dialect
               | differences.
               | 
               | An interesting case of a user dealing with this problem:
               | they use LLMs to mass migrate SparkSQL code over to
               | Feldera (it's often json-related constructs as you also
               | ran into). They then verify that both their original
               | warehouse and Feldera compute the same results for the
               | same inputs to ensure correctness.
        
         | kragen wrote:
         | Basically Datalog is much less verbose than SQL, imposes much
         | lighter tariffs on factoring out views, and supports transitive
         | closure enormously better. I started
         | http://canonical.org/~kragen/binary-relations off with a simple
         | nonrecursive query for which the SQL translation (given below)
         | is already criminal and whose properly factored SQL solution
         | merits the death penalty.
         | 
         | Recent additions to ANSI SQL have added the capacity for
         | recursion, so it's no longer completely impossible. But they
         | have three big disadvantages:
         | 
         | 1. They accidentally made SQL Turing-complete. Datalog queries,
         | by contrast, are guaranteed to terminate.
         | 
         | 2. They're still extremely clumsy to use.
         | 
         | 3. Because of #1, they're often not implemented fully, so they
         | are hard to rely on.
        
           | ulrikrasmussen wrote:
           | Yes, #1 basically means that they screwed up the design from
           | the get go, since it is impossible to reap the actual
           | benefits of Datalog when the language you evaluate is not, in
           | fact, Datalog. Recursive queries have the ability to perform
           | arbitrary computation in projections, so for starters any
           | top-down evaluation strategy or hybrid evaluation such as
           | magic sets is ruled out.
        
           | twoodfin wrote:
           | Curious: How would you usefully & naturally add recursion to
           | SQL without making it Turing-complete?
        
             | kragen wrote:
             | At first blush that sounds impossible. Maybe there's a
             | clever solution that would occur to someone if they spent a
             | few months working on it, or maybe not.
        
       | akavel wrote:
       | I had some light contact with Prolog long ago during my studies -
       | I have a rough idea how it is used and what it can be useful for,
       | but only on surface, not deep at all. I keep hearing about
       | Datalog since, as some amazing thing, but I don't seem able to
       | understand what it is - i.e. to grasp an answer to a simple
       | question:
       | 
       |  _what is that Datalog improves over Prolog?_
       | 
       | Just now I tried to skim the Wikipedia page of Datalog; the vague
       | theory I'm getting from it, is that maybe Prolog has relatively
       | poor performance, whereas Datalog dramatically improves
       | performance (presumably allowing much bigger datasets and much
       | more parallelized processing), at the cost of reducing
       | expressiveness and features in some other important ways?
       | (including making it no longer Turing-complete?) Is that what
       | it's about, or am I completely missing the mark?
        
         | codesnik wrote:
         | from what I know, prolog looked declarative, in a way that you
         | just encode relations and it figures out the answers, but it
         | really depended on the order of those rules, and some
         | additional instructions like "cut" which not only prevented
         | waste computations, but could affect the results.
         | 
         | datalog on the other hands is more or less a relation db with a
         | different syntax.
        
         | johnnyjeans wrote:
         | Datalog is simpler, not turing complete , and IIRC uses forward
         | chaining which has knock-on effects in its performance and
         | memory characteristics. Huge search spaces that a trivial in
         | Prolog are impossible to represent in Datalog because it eats
         | too much memory.
         | 
         | Datalog is a commuter car with a CVT. Prolog is an F1 car.
         | Basically, it's not about improvement. It's about lobotomizing
         | Prolog into something people won't blow their legs off with.
         | Something that's also much easier to implement and embed in
         | another application (though Prologs can be very easy to embed.)
         | 
         | If you're used to Prolog, you'll mostly just find Datalog to be
         | claustrophobic. No call/3? No term/goal expansion? Datalog is
         | basically designed to pull out the LCD featureset of Prolog for
         | use as an interactive database search.
         | 
         | > Prolog has relatively poor performance, whereas Datalog
         | dramatically improves performance
         | 
         | It's easier to write fast Datalog code but the ceiling is also
         | way lower. Prolog can be written in a way to allow for
         | concurrency, but that's an intermediate level task that
         | requires understanding of your implementation. Guarded Horn
         | Clauses and their derived languages[2] were developed to
         | formalize some of that, but Japanese advancements over Prolog
         | are extremely esoteric. Prolog performance really depends on
         | the programmer and the implementation being used and where it's
         | being used. Prolog, like a Lisp, can be used to generate native
         | machine code from a DSL at compile-time.
         | 
         | If you know an understand how the underlying implementation of
         | your Prolog works, and how to write code with the grain of your
         | implementation, it's absolutely "fast enough". Unfortunately,
         | that requires years of writing Prolog code with a single
         | implementation. There's a lot of work on optimizing[3][4]
         | prolog compilers out there, as well as some proprietary
         | examples[5]. But even something like SWIPL[6] is absolutely
         | fast enough to use (and it has a cute (nameless?) mascot :3)
         | SWIPL is easy to embed, actually disturbingly easy. I don't
         | know if SICSTUS is, or how well it copes with dynamic updates
         | to the knowledge base.
         | 
         | [1] - http://logicprogramming.stanford.edu/readings/ullman.pdf
         | 
         | [2] -
         | https://www.ueda.info.waseda.ac.jp/AITEC_ICOT_ARCHIVES/ICOT/...
         | 
         | [3] -
         | https://www.sciencedirect.com/science/article/pii/S074310669...
         | 
         | [4] -
         | https://link.springer.com/content/pdf/10.1007/3-540-18024-9_...
         | 
         | [5] - https://sicstus.sics.se/
         | 
         | [6] - https://www.swi-prolog.org/
        
       | Xeoncross wrote:
       | If you enjoyed the state machine + parsing part and also I
       | recommend the older "Lexical Scanning in Go" talk that Rob Pike
       | gave: https://www.youtube.com/watch?v=HxaD_trXwRE
       | 
       | It's in Go, but you can apply most of it easily in other
       | languages.
       | 
       | I'm so happy that modern languages like Rust, Zig, and Go support
       | unicode/runes/graphemes natively. So many problem just disappear
       | compared to Java, .net, C++, or scripting languages.
        
       ___________________________________________________________________
       (page generated 2025-06-16 23:01 UTC)