[HN Gopher] Datalog in Rust
___________________________________________________________________
Datalog in Rust
Author : brson
Score : 317 points
Date : 2025-06-15 11:18 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Leynos wrote:
| It's funny seeing this as the top story.
|
| I'm in the middle of putting together a realtime strategy game
| using Differential Datalog[1] and Rust, with DDL managing the
| game's logic. Mostly as an excuse to expose myself to new ideas
| and engage in a whole lot of yak shaving.
|
| [1] https://github.com/vmware-archive/differential-datalog
| Yoric wrote:
| On, nice!
|
| I'll be interested in reading how this goes!
| cmrdporcupine wrote:
| Very cool, I'm curious to see what the state of that
| implementation is and how far you get, since DDLog is not being
| actively maintained anymore.
| foota wrote:
| I wonder if you could make a frankenstein version of
| differential datalog by combing the OP repo with salsa[1] (the
| crate that powers rust-analyzer)
|
| [1] https://github.com/salsa-rs/salsa
| lsuresh wrote:
| That's a cool demo you're building using ddlog! FWIW, we, the
| ddlog team have moved on to found Feldera
| (https://github.com/feldera/feldera). You could consider using
| DBSP directly through Rust.
| rienbdj wrote:
| A new McSharry post! Excellent
|
| Last I checked, VMWare had moved away from differential datalog?
| jitl wrote:
| The Differential Datalog team founded Feldera:
| https://www.feldera.com/
|
| They switched from differential Datalog to differential SQL, I
| think because they realized Datalog is a really tough sell.
| rebanevapustus wrote:
| They did, and their product is great.
|
| It is the only database/query engine that allows you to use
| the same SQL for both batch and streaming (with UDFs).
|
| I have made an accessible version of a subset of Differential
| Dataflow (DBSP) in Python right here:
| https://github.com/brurucy/pydbsp
|
| DBSP is so expressive that I have implemented a fully
| incremental dynamic datalog engine as a DBSP program.
|
| Think of SQL/Datalog where the query can change in runtime,
| and the changes themselves (program diffs) are incrementally
| computed: https://github.com/brurucy/pydbsp/blob/master/noteb
| ooks/data...
| gunnarmorling wrote:
| > It is the only database/query engine that allows you to
| use the same SQL for both batch and streaming (with UDFs).
|
| Flink SQL also checks that box.
| rebanevapustus wrote:
| Not true.
|
| There has to be some change in the code, and they will
| not share the same semantics (and perhaps won't work when
| retractions/deletions also appear whilst streaming). And
| let's not even get to the leaky abstractions for good
| performance (watermarks et al).
| jitl wrote:
| Flink SQL is quite limited compared to Feldera/DBSP or
| Frank's Materialize.com, and has some correctness
| limitations: it's "eventually consistent" but until you
| stop the data it's unlikely to ever be actually correct
| when working with streaming joins. https://www.scattered-
| thoughts.net/writing/internal-consiste...
| rc00 wrote:
| Posted 1 day ago
|
| https://news.ycombinator.com/item?id=44274592
| tulio_ribeiro wrote:
| "I, a notorious villain, was invited for what I was half sure was
| my long-due comeuppance." -- Best opening line of a technical
| blog post I've read all year.
|
| The narrator's interjections were a great touch. It's rare to see
| a post that is this technically deep but also so fun to read. The
| journey through optimizing the aliasing query felt like a
| detective story. We, the readers, were right there with you,
| groaning at the 50GB memory usage and cheering when you got it
| down to 5GB.
|
| Fantastic work, both on the code and the prose.
| 29athrowaway wrote:
| If you wish to use Datalog and Rust, cozodb is written in Rust
| and has a Datalog query syntax.
| jitl wrote:
| Cozodb seems cool but also inactive. I poked around about in
| November 2024 and found some low hanging fruit in the sqlite
| storage backend: https://github.com/cozodb/cozo/issues/285
| 29athrowaway wrote:
| It's not a lot of code so it's easy to tinker with.
| maweki wrote:
| It is nice to see a core group of Datalog enthusiasts persist,
| even though the current Datalog revival seems to be on the
| decline. The recent Datalog 2.0 conference was quite small
| compared to previous years and the second HYTRADBOI conference
| was very light on Datalog as well, while the first one had a
| quarter of submissions with Datalog connection.
|
| I'm encouraged by the other commenters sharing their recent
| Datalog projects. I am currently building a set of data quality
| pipelines for a legacy SQL database in preparation of a huge
| software migration.
|
| We find Datalog much more useful in identifying and looking for
| data quality issues thatn SQL, as the queries can be incredibly
| readable when well-structured.
| kmicinski wrote:
| No offense, but I wouldn't take Datalog 2.0's small attendance
| as an exemplar of Datalog's decline, even if I agree with that
| high-level point. Datalog 2.0 is a satellite workshop of LPNMR,
| a relatively-unknown European conference that was randomly held
| in Dallas. I myself attended Datalog 2.0 and also felt the
| event felt relatively sparse. I also had a paper (not my
| primary work, the first author is the real wizard of course :-)
| at the workshop. I myself saw relatively few folks in that
| space even attending that event--with the notable exception of
| some European folks (e.g., introducing the Nemo solver).
|
| All of this is to say, I think Datalog 2.0's sparse attendance
| this year may be more indicative of the fact that it is a
| satellite workshop of an already-lesser-prestigious conference
| (itself not even the main event! That was ICLP!) rather than a
| lack of Datalog implementation excitement.
|
| For what it's worth, none of what I'm saying is meant to rebut
| your high-level point that there is little novelty left in
| implementing raw Datalog engines. Of course I agree, the
| research space has moved far beyond that (arguably it did a
| while ago) and into more exotic problems involving things like
| streaming (HydroFlow), choice (Dusa), things that get closer to
| the general chase (e.g., Egglog's chase engine), etc. I don't
| think anyone disagrees that vanilla Datalog is boring, it's
| just that monotonic, chain-forward saturation (Horn clauses!)
| are a rich baseline with a well-understood engineering
| landscape (esp in the high-performance space) to build out more
| interesting theories (semirings, Z-sets, etc..).
| BartjeD wrote:
| The USA is a hostile environment to visit, for Europeans.
| It's a theme all over western and northern Europe to avoid
| it.
|
| If you were expecting European attendance, that would explain
| why it's different from the past.
|
| It's the same in science circles, tourism and even business
| travel.
| freilanzer wrote:
| As a European, I'm definitely avoiding the US. I was asked
| whether I wanted to visit our US branch and I declined -
| which was simply accepted and probably expected in advance.
| Almost everyone I know does the same.
| kmicinski wrote:
| I completely agree with you, but this event was held in
| October, 2024--Trump's horrendous behavior wasn't an effect
| on that. There are other reasons folks internationally
| probably don't want to travel to Texas aside from Trump--
| but in this specific case I think it might have more to do
| with the fact that LPMNR is a traditionally European
| conference randomly held in the US.
| burakemir wrote:
| I made some progress porting mangle datalog to Rust
| https://github.com/google/mangle/tree/main/rust - it is in the
| same repo as the golang implementation.
|
| It is slow going, partly since it is not a priority, partly
| because I suffer from second system syndrome. Mangle Rust should
| deal with any size data through getting and writing facts to disk
| via memory mapping. The golang implementation is in-memory.
|
| This post is nice because it parses datalog and mentions the LSM
| tree, and much easier to follow than the data frog stuff.
|
| There are very many datalog implementations in Rust (ascent,
| crepe) that use proc-macros. The downside is that they won't
| handle getting queries at runtime. For the static analysis use
| case where queries/programs are fixed, the proc macro approach
| might be better.
| banana_feather wrote:
| I like the author's datalog work generally, but I really wish his
| introductory material did not teach using binary join, which I
| found to get very messy internally as soon as you get away from
| the ideal case. I found the generic join style methods to be
| much, much simpler to generalize in one's head (see
| https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...).
| davery22 wrote:
| related: McSherry's preceding blog post was all about
| demonstrating how binary joins can achieve worst-case optimal
| runtime, given suitable adjustments to the query plan.
|
| -
| https://github.com/frankmcsherry/blog/blob/master/posts/2025...
| kmicinski wrote:
| For materialization-heavy workloads (program analysis, etc.),
| we often find that optimized binary join plans (e.g.,
| profile-optimized, hand-optimized, etc.) beat worst-case
| optimal plans due to the ability to get better scalability
| (less locking) without the need to use a trie-based
| representation. Within the space of worst-case optimal plans,
| there are still lots of choices: but a bad worst-case optimal
| plan can often beat a bad (randomly-chosen) binary plan. And
| of course (the whole point of this exercise), there are some
| queries where every binary plan explodes and you do need
| WCOJ. There's also some work on making more traditional
| binary joins robust (https://db.in.tum.de/people/sites/birler
| /papers/diamond.pdf), among other interesting work
| (https://arxiv.org/html/2502.15181v1). Effectively
| parallelizing WCOJs is still an open problem as far as I am
| aware (at least, this is what folks working on it tell me),
| but there are some exciting potential directions in tackling
| that that several folks are working on I believe.
| rnikander wrote:
| Some Clojure fans once told me they thought datalog was better
| than SQL and it was a shame that the relational DBs all used SQL.
| I never dug into it enough to find out why they thought that way.
| jitl wrote:
| I struggle to understand the Clojure/Datomic dialect, but I
| agree generally. I recommend Percival for playing around with
| Datalog in a friendly notebook environment online:
| https://percival.ink/
|
| Although there's no "ANSI SQL" equivalent standard across
| Datalog implementations, once you get a hang of the core idea
| it's not too hard to understand another Datalog.
|
| I started a Percival fork that compiles the Datalog to SQLite,
| if you want to check out how the two can express the same
| thing: https://percival.jake.tl/ (unfinished when it comes to
| aggregates and more advanced joins but the basic forms work
| okay). Logica is a much more serious / complete Datalog->SQL
| compiler written by a Google researcher that compiles to
| BigTable, DuckDB, and a few other SQL dialects
| (https://logica.dev/).
|
| One area Datalog is an order of magnitude easier is when
| working with recursive queries / rules; this is possible in SQL
| but feels a bit like drinking playdough through a straw.
| Frank's Materialize.com has a "WITH MUTUALLY RECURSIVE" SQL
| form (https://materialize.com/blog/recursion-in-materialize/)
| that's much nicer than the ancient ANSI SQL recursive approach,
| we're evaluating it for page load queries & data sync at
| Notion.
|
| Feldera has a similar form for recursive views as well
| (https://www.feldera.com/blog/recursive-sql-queries-in-
| felder...). I like that Feldera lets you make each "rule" or
| subview its own statement rather than needing to pack
| everything into a single huge statement. Main downside I found
| when testing Feldera is that their SQL dialect has a bunch of
| limitations inherited from Apache Calcite, the Materialize SQL
| dialect tries very hard to be PostgresSQL compatible.
| ben_pfaff wrote:
| > Main downside I found when testing Feldera is that their
| SQL dialect has a bunch of limitations inherited from Apache
| Calcite
|
| At Feldera, we're adding features to our SQL over time, by
| contributing them upstream to Calcite, making it better for
| everyone. Mihai Budiu, who is the author of the Feldera SQL
| compiler, is a Calcite committer.
| jitl wrote:
| Thanks for contributing. I see Mihai implemented the UUID
| type in Calcite
| (https://issues.apache.org/jira/browse/CALCITE-6738) back
| in January which is one of the issues I hit, so for sure my
| experience with Feldera is 6 months out of date and y'all
| move pretty quick.
|
| Most of what I mean is places where Feldera/Calcite has
| slightly different syntax from Postgres for things. For
| example, Postgres syntax for cast to bigint is
| `some_expresion::bigint` although Postgres also supports
| ANSI SQL `CAST(some_expression AS bigint)`, most examples I
| find in the wild and in my own Postgres SQL use the
| Postgres special syntax. JSON syntax also differs; Feldera
| uses its own pretty elegant `VARIANT` type and
| `some_expression[key_expression]` to access properties,
| where Postgres calls this `json` or `jsonb`, and uses
| `some_expression->key_expression` to access properties. In
| those cases it's not like Feldera is wrong or lacks some
| support, but it's a bit harder to work with for me because
| I'm so used to Postgres syntax and I need to do some regex
| replace whenever I bring a query from Postgres over to
| Feldera.
|
| Definitely not a deal-breaker, I am a Feldera enjoyer, but
| it does add some friction.
| lsuresh wrote:
| Thanks for the kind words. :) We hear you on the dialect
| differences.
|
| An interesting case of a user dealing with this problem:
| they use LLMs to mass migrate SparkSQL code over to
| Feldera (it's often json-related constructs as you also
| ran into). They then verify that both their original
| warehouse and Feldera compute the same results for the
| same inputs to ensure correctness.
| kragen wrote:
| Basically Datalog is much less verbose than SQL, imposes much
| lighter tariffs on factoring out views, and supports transitive
| closure enormously better. I started
| http://canonical.org/~kragen/binary-relations off with a simple
| nonrecursive query for which the SQL translation (given below)
| is already criminal and whose properly factored SQL solution
| merits the death penalty.
|
| Recent additions to ANSI SQL have added the capacity for
| recursion, so it's no longer completely impossible. But they
| have three big disadvantages:
|
| 1. They accidentally made SQL Turing-complete. Datalog queries,
| by contrast, are guaranteed to terminate.
|
| 2. They're still extremely clumsy to use.
|
| 3. Because of #1, they're often not implemented fully, so they
| are hard to rely on.
| ulrikrasmussen wrote:
| Yes, #1 basically means that they screwed up the design from
| the get go, since it is impossible to reap the actual
| benefits of Datalog when the language you evaluate is not, in
| fact, Datalog. Recursive queries have the ability to perform
| arbitrary computation in projections, so for starters any
| top-down evaluation strategy or hybrid evaluation such as
| magic sets is ruled out.
| twoodfin wrote:
| Curious: How would you usefully & naturally add recursion to
| SQL without making it Turing-complete?
| kragen wrote:
| At first blush that sounds impossible. Maybe there's a
| clever solution that would occur to someone if they spent a
| few months working on it, or maybe not.
| akavel wrote:
| I had some light contact with Prolog long ago during my studies -
| I have a rough idea how it is used and what it can be useful for,
| but only on surface, not deep at all. I keep hearing about
| Datalog since, as some amazing thing, but I don't seem able to
| understand what it is - i.e. to grasp an answer to a simple
| question:
|
| _what is that Datalog improves over Prolog?_
|
| Just now I tried to skim the Wikipedia page of Datalog; the vague
| theory I'm getting from it, is that maybe Prolog has relatively
| poor performance, whereas Datalog dramatically improves
| performance (presumably allowing much bigger datasets and much
| more parallelized processing), at the cost of reducing
| expressiveness and features in some other important ways?
| (including making it no longer Turing-complete?) Is that what
| it's about, or am I completely missing the mark?
| codesnik wrote:
| from what I know, prolog looked declarative, in a way that you
| just encode relations and it figures out the answers, but it
| really depended on the order of those rules, and some
| additional instructions like "cut" which not only prevented
| waste computations, but could affect the results.
|
| datalog on the other hands is more or less a relation db with a
| different syntax.
| johnnyjeans wrote:
| Datalog is simpler, not turing complete , and IIRC uses forward
| chaining which has knock-on effects in its performance and
| memory characteristics. Huge search spaces that a trivial in
| Prolog are impossible to represent in Datalog because it eats
| too much memory.
|
| Datalog is a commuter car with a CVT. Prolog is an F1 car.
| Basically, it's not about improvement. It's about lobotomizing
| Prolog into something people won't blow their legs off with.
| Something that's also much easier to implement and embed in
| another application (though Prologs can be very easy to embed.)
|
| If you're used to Prolog, you'll mostly just find Datalog to be
| claustrophobic. No call/3? No term/goal expansion? Datalog is
| basically designed to pull out the LCD featureset of Prolog for
| use as an interactive database search.
|
| > Prolog has relatively poor performance, whereas Datalog
| dramatically improves performance
|
| It's easier to write fast Datalog code but the ceiling is also
| way lower. Prolog can be written in a way to allow for
| concurrency, but that's an intermediate level task that
| requires understanding of your implementation. Guarded Horn
| Clauses and their derived languages[2] were developed to
| formalize some of that, but Japanese advancements over Prolog
| are extremely esoteric. Prolog performance really depends on
| the programmer and the implementation being used and where it's
| being used. Prolog, like a Lisp, can be used to generate native
| machine code from a DSL at compile-time.
|
| If you know an understand how the underlying implementation of
| your Prolog works, and how to write code with the grain of your
| implementation, it's absolutely "fast enough". Unfortunately,
| that requires years of writing Prolog code with a single
| implementation. There's a lot of work on optimizing[3][4]
| prolog compilers out there, as well as some proprietary
| examples[5]. But even something like SWIPL[6] is absolutely
| fast enough to use (and it has a cute (nameless?) mascot :3)
| SWIPL is easy to embed, actually disturbingly easy. I don't
| know if SICSTUS is, or how well it copes with dynamic updates
| to the knowledge base.
|
| [1] - http://logicprogramming.stanford.edu/readings/ullman.pdf
|
| [2] -
| https://www.ueda.info.waseda.ac.jp/AITEC_ICOT_ARCHIVES/ICOT/...
|
| [3] -
| https://www.sciencedirect.com/science/article/pii/S074310669...
|
| [4] -
| https://link.springer.com/content/pdf/10.1007/3-540-18024-9_...
|
| [5] - https://sicstus.sics.se/
|
| [6] - https://www.swi-prolog.org/
| Xeoncross wrote:
| If you enjoyed the state machine + parsing part and also I
| recommend the older "Lexical Scanning in Go" talk that Rob Pike
| gave: https://www.youtube.com/watch?v=HxaD_trXwRE
|
| It's in Go, but you can apply most of it easily in other
| languages.
|
| I'm so happy that modern languages like Rust, Zig, and Go support
| unicode/runes/graphemes natively. So many problem just disappear
| compared to Java, .net, C++, or scripting languages.
___________________________________________________________________
(page generated 2025-06-16 23:01 UTC)