[HN Gopher] Datalog in Rust
___________________________________________________________________
Datalog in Rust
Author : brson
Score : 225 points
Date : 2025-06-15 11:18 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Leynos wrote:
| It's funny seeing this as the top story.
|
| I'm in the middle of putting together a realtime strategy game
| using Differential Datalog[1] and Rust, with DDL managing the
| game's logic. Mostly as an excuse to expose myself to new ideas
| and engage in a whole lot of yak shaving.
|
| [1] https://github.com/vmware-archive/differential-datalog
| Yoric wrote:
| On, nice!
|
| I'll be interested in reading how this goes!
| cmrdporcupine wrote:
| Very cool, I'm curious to see what the state of that
| implementation is and how far you get, since DDLog is not being
| actively maintained anymore.
| rienbdj wrote:
| A new McSharry post! Excellent
|
| Last I checked, VMWare had moved away from differential datalog?
| jitl wrote:
| The Differential Datalog team founded Feldera:
| https://www.feldera.com/
|
| They switched from differential Datalog to differential SQL, I
| think because they realized Datalog is a really tough sell.
| rebanevapustus wrote:
| They did, and their product is great.
|
| It is the only database/query engine that allows you to use
| the same SQL for both batch and streaming (with UDFs).
|
| I have made an accessible version of a subset of Differential
| Dataflow (DBSP) in Python right here:
| https://github.com/brurucy/pydbsp
|
| DBSP is so expressive that I have implemented a fully
| incremental dynamic datalog engine as a DBSP program.
|
| Think of SQL/Datalog where the query can change in runtime,
| and the changes themselves (program diffs) are incrementally
| computed: https://github.com/brurucy/pydbsp/blob/master/noteb
| ooks/data...
| gunnarmorling wrote:
| > It is the only database/query engine that allows you to
| use the same SQL for both batch and streaming (with UDFs).
|
| Flink SQL also checks that box.
| rebanevapustus wrote:
| Not true.
|
| There has to be some change in the code, and they will
| not share the same semantics (and perhaps won't work when
| retractions/deletions also appear whilst streaming). And
| let's not even get to the leaky abstractions for good
| performance (watermarks et al).
| jitl wrote:
| Flink SQL is quite limited compared to Feldera/DBSP or
| Frank's Materialize.com, and has some correctness
| limitations: it's "eventually consistent" but until you
| stop the data it's unlikely to ever be actually correct
| when working with streaming joins. https://www.scattered-
| thoughts.net/writing/internal-consiste...
| rc00 wrote:
| Posted 1 day ago
|
| https://news.ycombinator.com/item?id=44274592
| tulio_ribeiro wrote:
| "I, a notorious villain, was invited for what I was half sure was
| my long-due comeuppance." -- Best opening line of a technical
| blog post I've read all year.
|
| The narrator's interjections were a great touch. It's rare to see
| a post that is this technically deep but also so fun to read. The
| journey through optimizing the aliasing query felt like a
| detective story. We, the readers, were right there with you,
| groaning at the 50GB memory usage and cheering when you got it
| down to 5GB.
|
| Fantastic work, both on the code and the prose.
| 29athrowaway wrote:
| If you wish to use Datalog and Rust, cozodb is written in Rust
| and has a Datalog query syntax.
| jitl wrote:
| Cozodb seems cool but also inactive. I poked around about in
| November 2024 and found some low hanging fruit in the sqlite
| storage backend: https://github.com/cozodb/cozo/issues/285
| 29athrowaway wrote:
| It's not a lot of code so it's easy to tinker with.
| maweki wrote:
| It is nice to see a core group of Datalog enthusiasts persist,
| even though the current Datalog revival seems to be on the
| decline. The recent Datalog 2.0 conference was quite small
| compared to previous years and the second HYTRADBOI conference
| was very light on Datalog as well, while the first one had a
| quarter of submissions with Datalog connection.
|
| I'm encouraged by the other commenters sharing their recent
| Datalog projects. I am currently building a set of data quality
| pipelines for a legacy SQL database in preparation of a huge
| software migration.
|
| We find Datalog much more useful in identifying and looking for
| data quality issues thatn SQL, as the queries can be incredibly
| readable when well-structured.
| kmicinski wrote:
| No offense, but I wouldn't take Datalog 2.0's small attendance
| as an exemplar of Datalog's decline, even if I agree with that
| high-level point. Datalog 2.0 is a satellite workshop of LPNMR,
| a relatively-unknown European conference that was randomly held
| in Dallas. I myself attended Datalog 2.0 and also felt the
| event felt relatively sparse. I also had a paper (not my
| primary work, the first author is the real wizard of course :-)
| at the workshop. I myself saw relatively few folks in that
| space even attending that event--with the notable exception of
| some European folks (e.g., introducing the Nemo solver).
|
| All of this is to say, I think Datalog 2.0's sparse attendance
| this year may be more indicative of the fact that it is a
| satellite workshop of an already-lesser-prestigious conference
| (itself not even the main event! That was ICLP!) rather than a
| lack of Datalog implementation excitement.
|
| For what it's worth, none of what I'm saying is meant to rebut
| your high-level point that there is little novelty left in
| implementing raw Datalog engines. Of course I agree, the
| research space has moved far beyond that (arguably it did a
| while ago) and into more exotic problems involving things like
| streaming (HydroFlow), choice (Dusa), things that get closer to
| the general chase (e.g., Egglog's chase engine), etc. I don't
| think anyone disagrees that vanilla Datalog is boring, it's
| just that monotonic, chain-forward saturation (Horn clauses!)
| are a rich baseline with a well-understood engineering
| landscape (esp in the high-performance space) to build out more
| interesting theories (semirings, Z-sets, etc..).
| burakemir wrote:
| I made some progress porting mangle datalog to Rust
| https://github.com/google/mangle/tree/main/rust - it is in the
| same repo as the golang implementation.
|
| It is slow going, partly since it is not a priority, partly
| because I suffer from second system syndrome. Mangle Rust should
| deal with any size data through getting and writing facts to disk
| via memory mapping. The golang implementation is in-memory.
|
| This post is nice because it parses datalog and mentions the LSM
| tree, and much easier to follow than the data frog stuff.
|
| There are very many datalog implementations in Rust (ascent,
| crepe) that use proc-macros. The downside is that they won't
| handle getting queries at runtime. For the static analysis use
| case where queries/programs are fixed, the proc macro approach
| might be better.
| banana_feather wrote:
| I like the author's datalog work generally, but I really wish his
| introductory material did not teach using binary join, which I
| found to get very messy internally as soon as you get away from
| the ideal case. I found the generic join style methods to be
| much, much simpler to generalize in one's head (see
| https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...).
| davery22 wrote:
| related: McSherry's preceding blog post was all about
| demonstrating how binary joins can achieve worst-case optimal
| runtime, given suitable adjustments to the query plan.
|
| -
| https://github.com/frankmcsherry/blog/blob/master/posts/2025...
| kmicinski wrote:
| For materialization-heavy workloads (program analysis, etc.),
| we often find that optimized binary join plans (e.g.,
| profile-optimized, hand-optimized, etc.) beat worst-case
| optimal plans due to the ability to get better scalability
| (less locking) without the need to use a trie-based
| representation. Within the space of worst-case optimal plans,
| there are still lots of choices: but a bad worst-case optimal
| plan can often beat a bad (randomly-chosen) binary plan. And
| of course (the whole point of this exercise), there are some
| queries where every binary plan explodes and you do need
| WCOJ. There's also some work on making more traditional
| binary joins robust (https://db.in.tum.de/people/sites/birler
| /papers/diamond.pdf), among other interesting work
| (https://arxiv.org/html/2502.15181v1). Effectively
| parallelizing WCOJs is still an open problem as far as I am
| aware (at least, this is what folks working on it tell me),
| but there are some exciting potential directions in tackling
| that that several folks are working on I believe.
| rnikander wrote:
| Some Clojure fans once told me they thought datalog was better
| than SQL and it was a shame that the relational DBs all used SQL.
| I never dug into it enough to find out why they thought that way.
| jitl wrote:
| I struggle to understand the Clojure/Datomic dialect, but I
| agree generally. I recommend Percival for playing around with
| Datalog in a friendly notebook environment online:
| https://percival.ink/
|
| Although there's no "ANSI SQL" equivalent standard across
| Datalog implementations, once you get a hang of the core idea
| it's not too hard to understand another Datalog.
|
| I started a Percival fork that compiles the Datalog to SQLite,
| if you want to check out how the two can express the same
| thing: https://percival.jake.tl/ (unfinished when it comes to
| aggregates and more advanced joins but the basic forms work
| okay)
___________________________________________________________________
(page generated 2025-06-15 23:00 UTC)