[HN Gopher] Learn Datalog Today
___________________________________________________________________
Learn Datalog Today
Author : tosh
Score : 125 points
Date : 2024-01-21 16:38 UTC (6 hours ago)
(HTM) web link (www.learndatalogtoday.org)
(TXT) w3m dump (www.learndatalogtoday.org)
| SJC_Hacker wrote:
| I learned a bit of Datalog back in university, too many years
| ago. It was impressive how powerful the query language is. You
| could do in a single line what required several lines of SQL, and
| far more intuitively.
|
| But ... the problem is how many DBs support it, and how useful of
| a skill it is to know.
| FreeFull wrote:
| It's a shame that there doesn't seem to be any decent open-source
| implementation of Datalog. If you go for full Prolog instead of
| Datalog, there are several (Scryer Prolog being my personal
| favourite).
| cmrdporcupine wrote:
| https://en.wikipedia.org/wiki/Souffl%C3%A9_(programming_lang...
|
| https://github.com/souffle-lang/souffle
| nezaj wrote:
| Here's a blog post showing you how to roll your own in ~100
| lines of JS
|
| https://www.instantdb.com/essays/datalogjs
| kjqgqkejbfefn wrote:
| 1. Datomic - While not open-source, it has an open-source
| version called Datomic Free, which is a distributed database
| designed to enable scalable, flexible, and intelligent data
| storage and queries. Datomic's query language is closely
| inspired by Datalog.
|
| 2. DataScript - An open-source in-memory database and query
| engine for Clojure, ClojureScript, and JavaScript that is
| heavily influenced by Datalog and Datomic.
|
| 3. Crux (now XTDB) - A bitemporal database with Datalog-
| inspired querying capabilities. It is designed for efficient
| querying of historical data and offers ACID transactions.
|
| 4. Racket's miniKanren - While not strictly a database,
| miniKanren is an open-source logic programming extension to the
| Racket language, which is inspired by Datalog and can be used
| to manipulate and query data in a manner similar to Prolog.
|
| 5. LogicBlox - An open-source platform that combines a database
| system, a Datalog-based modeling language, and application
| server facilities. It allows developers to build complex, data-
| intensive applications.
|
| 6. Souffle - A Datalog-inspired language that is designed for
| static analysis problems. It can be viewed as a database query
| language with a focus on performance, allowing for parallel
| execution of queries.
|
| 7. Dedalus - A Datalog-like temporal logic language used to
| express complex distributed systems. It is primarily a research
| tool but has informed the design of other Datalog-inspired
| systems.
|
| 8. Flora-2 - An open-source object-oriented knowledge
| representation and reasoning system that integrates a variant
| of Datalog with objects and frames.
|
| Top 3 are from the Clojure ecosystem. Additionnaly in this same
| space there is Datalevin & Datahike among many others
| persnickety wrote:
| Cozo uses Datalog for queries, and has several backends,
| including SQLite
| cmrdporcupine wrote:
| Cozo is very attractive. I just wish there was a native
| Rust DSL API for it, so it could be embedded in Rust
| programs without using datalog queries in strings.
| soraki_soladead wrote:
| https://github.com/cozodb/pycozo/blob/main/pycozo/test_bu
| ild...
|
| Here's the python version of what I think you're looking
| for. Shouldn't be too difficult to port to rust.
| cmrdporcupine wrote:
| ok but that's not what i want.
|
| the thing is written in Rust. but does not expose a Rust
| query API, you have to query it through Datalog queries
| in strings; what you shared there just builds those
| strings from python.. it'd be nice to have a directly
| native API, with horne clauses constructed in Rust.
| summarity wrote:
| Also Rego, which is Datalog with structured extensions, in
| use everywhere where OPA is used (as in many k8s
| environments)
| j-pb wrote:
| Are you sure that LogicBlox is open-source? I couldn't find
| anything confirming this.
|
| I'd be very surprised if they were, because they even
| patented their join algorithm.
| cmrdporcupine wrote:
| It's definitely not open source.
|
| Not only is the join algorithm patented, but my
| understanding is the original authors of it can't even use
| it, because the LogicBlox IP was acquired but the people
| moved on.
|
| But some have since gone on to create new stuff @
| RelationalAI
| macmac wrote:
| Ad 1. All versions of Datomic are now free, but none are Open
| Source.
| kevindamm wrote:
| Also GDL and its variants, but that is more of a domain-
| specific language for game descriptions and general game-
| playing runtimes. Still, they refer to Datalog as its basis.
| cmrdporcupine wrote:
| Another one is Differential Datalog, for streaming data.
|
| https://github.com/vmware/differential-datalog
| habitue wrote:
| I'm sad their last commit was 2 years ago, seemed like a
| really cool idea
| tylerhou wrote:
| The authors spun it out into a startup, Feldera. A paper
| describing their idea also won Best Paper at VLDB 2023.
| The idea is very far from dead.
| cmrdporcupine wrote:
| Neat. Had run into them before (the "careers" page was
| marked as visited in my Firefox history ;-) ), but didn't
| make the connection.
| odipar wrote:
| CodeQL is another datalog with the domain of code analysis as
| its use case. Too bad you cannot create a custom fact
| database with CodeQL. Otherwise, the implementation of CodeQL
| is pretty advanced and efficient.
| infima wrote:
| While not trivial because it is not documented, you can
| create your a database with your own facts. Some of the
| extractors that create the required files are open source h
| ttps://github.com/github/codeql/blob/main/ruby/extractor/sr
| ...
| lukev wrote:
| Is LogicBlox open-source now? I encountered it on a project
| several years ago and at that point it was very much
| closed/commercial.
|
| Now the website isn't even loading... has the project been
| shuttered? I know LogicBlox was acquired by Predictix a long
| time ago, and recently Infor acquired Predictix. Hoping the
| project is still a going concern, there was some very cool
| tech in there.
| marcle wrote:
| ErgoAI is as "an enterprise-level extension of the Flora-2
| system" which was recently open-sourced:
| https://github.com/ErgoAI . It seems to be well documented.
| Jonovono wrote:
| CozoDB: https://github.com/cozodb/cozo
| manu3000 wrote:
| you can use Datalig within Flix https://flix.dev/
| refset wrote:
| For comparison, I previously translated that cart parts
| scheduling example on the Flix homepage to Datomic-style
| Datalog syntax: https://gist.github.com/refset/21b3fc1dec9a69
| 28943073809e133...
| dagipflihax0r wrote:
| Mangle https://github.com/google/mangle is an open-source
| implementation in golang, it was an explicit goal to make it
| easy to learn. Meaning: it is easy to recognize the pure
| datalog part, the syntax is following the good old course
| material.
|
| It was discussed here:
| https://news.ycombinator.com/item?id=33756800
| grepexdev wrote:
| I thought that syntax looked familiar! Looks like Logseq uses
| Datalog for advanced queries.
|
| https://hub.logseq.com/features/av5LyiLi5xS7EFQXy4h4K8/getti...
| packetlost wrote:
| More specifically, Logseq uses DataScript, a Datomic-inspired
| Datalog engine for ClojureScript.
| achileas wrote:
| They do! IIRC they were inspired by Roam's use of it with
| Clojurescript
| brendanyounger wrote:
| I wish people would stop referring to Datomic as datalog. Datomic
| is many things, but only the query format (Horn clauses with
| unification of variables, similar to prolog) has anything to do
| with datalog.
|
| Real datalog is far more interesting since it implicitly encodes
| recursion allowing you to chain rules. Rule A derives new facts,
| which rule B uses to derive new facts, which rules A and C use to
| derive new facts, and so on. Datomic has a notion of rules which
| are mostly syntax sugar and do not support this sort of recursive
| reasoning.
|
| Why is that a big deal? When rules are run automatically, you can
| build live, reactive systems, not just a database that sits
| around waiting for you to query it. Hellerstein's work at UC
| Berkeley
| (https://dsf.berkeley.edu/papers/sigrec10-declimperative.pdf)
| explores this in some detail.
| 6gvONxR4sf7o wrote:
| Sounds cool. What's the complexity of running this kind of
| recursive reasoning? Reasonable? Can you suggest any tools to
| not have to implement it ourselves?
| brendanyounger wrote:
| Souffle and Cozo mentioned below already implement the whole
| of "traditional" datalog.
|
| Percival (https://github.com/ekzhang/percival) has some very
| nice examples showing how you can interactively write and
| test rules on top of a datalog interpreter.
|
| Bud (http://bloom-lang.net/bud/) is Hellerstein's proof of
| concept playground. It has bit-rotted in the past few years,
| but the examples are readable even if you can't easily get it
| working.
|
| The complexity can be quite good. You can syntactically
| determine when you've written linear recursion (equivalent to
| a for loop) vs not. Otherwise, the complexity is what you'd
| expect from incremental view maintenance in a normal SQL
| database. Which is to say O(n^k) with k being the number of
| relations joined, but usually much, much less with
| appropriate indexes and skew in the data. All the usual
| tricks concerning data normalization and indexes from
| databases apply.
| refset wrote:
| RDFox offers a rather impressive sounding Datalog inferencing
| engine: https://www.oxfordsemantic.tech/rdfox
|
| > We present a novel approach to parallel materialisation
| (i.e., fixpoint computation) of datalog programs in
| centralised, main-memory, multi-core RDF systems. Our
| approach comprises an algorithm that evenly distributes the
| workload to cores, and an RDF indexing data structure that
| supports efficient, 'mostly' lock-free parallel updates.
|
| > Materialisation is PTIME-complete in data complexity and is
| thus believed to be inherently sequential. Nevertheless, many
| practical parallelisation techniques have been developed
| [...]
|
| There have been several papers and patents describing their
| approach, e.g.
| http://www.cs.ox.ac.uk/dan.olteanu/papers/mnpho-aaai14.pdf
| refset wrote:
| > Datomic has a notion of rules which are mostly syntax sugar
| and do not support this sort of recursive reasoning.
|
| > Why is that a big deal? When rules are run automatically, you
| can build live, reactive systems, not just a database that sits
| around waiting for you to query it.
|
| There was at least one serious attempt to bring these worlds
| together: https://github.com/sixthnormal/clj-3df
| dang wrote:
| Related:
|
| _Learn Datalog Today_ -
| https://news.ycombinator.com/item?id=27173890 - May 2021 (34
| comments)
|
| _Learn Datalog_ - https://news.ycombinator.com/item?id=19154997
| - Feb 2019 (1 comment)
|
| _Learn Datalog Today_ -
| https://news.ycombinator.com/item?id=17109105 - May 2018 (2
| comments)
|
| _Learn Datalog Today_ -
| https://news.ycombinator.com/item?id=14434457 - May 2017 (10
| comments)
|
| _Learn Datalog Today Ported to DataScript and Clojure (JVM)_ -
| https://news.ycombinator.com/item?id=13037199 - Nov 2016 (1
| comment)
|
| _Learn Datalog Today - An interactive Datomic query tutorial_ -
| https://news.ycombinator.com/item?id=6171722 - Aug 2013 (7
| comments)
| Clever321 wrote:
| Datalog feels so much more intuitive than SQL or any other query
| language I've used. I'm able to write concise, complex
| expressions pretty easily. In a SQL-based system, there seems to
| be a (low) complexity metric where it's easier to
| write/debug/maintain what was supposed to be a 'declarative' SQL
| query in a functional/imperative language instead. It feels like
| datalog is the next evolution of a declarative query language,
| one that is much more declarative than SQL itself.
|
| In the "day of datomic" videos, there is a segment where Stu
| debugs a slow query. He does the debugging without even looking
| at the data model, only by rearranging the clauses. It is really,
| really impressive, and I can't imagine having that capability in
| SQL.
| brendanyounger wrote:
| I greatly respect what Stu and Rich have done to make Datomic.
|
| However, they made an explicit design decision to not include a
| query optimizer and execute the clauses as they were written.
| This is usually fine since the author has some idea of what the
| best order is, but there are O(2^k) different permutations of
| clauses so doing it by hand will fail at some point (if you
| want the optimal ordering).
| account-5 wrote:
| For the idiot in the thread, why would I use datalog (which I've
| never heard of before) over SQL?
|
| Having looked quickly at it just now it seems (Wikipedia article)
| similar to Web Ontology Language (OWL), though I believe datalog
| may have been around long before owl.
| brendanyounger wrote:
| On a syntax level, parsing, generating, and templating datalog
| is _much_ simpler than doing the same to SQL. DBT would never
| exist if every SQL database accepted datalog queries and SQL
| injection attacks would be rare to non-existent.
|
| The more interesting answer is to think of datalog as making it
| easy to encode nearly all of your application logic as a bunch
| of self-referencing, incrementally updated, materialized views.
| Some examples: # view of Users table for
| currently logged in user LoggedInUserView(name, email,
| id) :- Users(id: payload["userId"], name, email),
| Cookies(name: "login", payload). # view of Users for
| admin AdminUserView(name, email, id) :- Users(id, name,
| email), Cookies(name: "login", payload), payload["isAdmin"] =
| true. # posts a user can see PostsView(title,
| content, id) :- Posts(title, content, public: true).
| PostsView(title, content, id) :- Posts(title, content, author:
| payload["userId"]), Cookies(name: "login", payload).
|
| And then you write your UI code to explicitly reference these
| derived views rather than manually wrapping an API around
| querying the Posts table and doing the filtering.
|
| The examples above can be neatly replicated in Supabase or
| Postgraphile (the OG of auto-generated GraphQL over Postgres),
| but you can do a lot more with datalog as a language. The
| Hellerstein paper mentioned above is a good starting place.
| refset wrote:
| Datalog can be very effective for expressing certain kinds of
| problems and for generating efficient solutions to those
| problems. Particularly anything that is even mildly recursive,
| and therefore especially "knowledge graphs" that rely heavily
| on rules to infer, model and retrieve information. However if
| your problem domain amounts to CRUD storage without a need for
| complex recursion then mature SQL systems usually have all the
| advantages (asides from the syntax!). For a more formal answer:
|
| > The intersection of databases, logic, and artificial
| intelligence gave raise to deductive databases. Deductive
| database systems are database management systems built around a
| logical model of data, and their query languages allow
| expressing logical queries. A deductive database system
| includes procedures for defining deductive rules which can
| infer information (in the so-called intensional database) in
| addition to the facts loaded in the (so-called extensional)
| database. The logic model for deductive databases is closely
| related to the relational model and, in particular, with the
| domain relational calculus. Datalog is the most known deductive
| query language (which syntactically is a Prolog subset) where
| constructed terms are not allowed as other non-declarative
| constructs such as the cut.
|
| > Also following the relational model, relational database
| systems are well-known and widespread nowadays. Their formal
| query languages include relational algebra and relational
| calculi but, in practical systems, the de-facto and ANSI/ISO
| standard SQL is the language of choice of every relational
| database vendor. Whilst SQL and relational formal languages
| implement a limited form of logic, deductive database languages
| implement advanced forms of logic.
|
| https://www.fdi.ucm.es/profesor/fernan/des/html/manual/manua...
___________________________________________________________________
(page generated 2024-01-21 23:00 UTC)