[HN Gopher] SQL, Homomorphisms and Constraint Satisfaction Problems
___________________________________________________________________
SQL, Homomorphisms and Constraint Satisfaction Problems
Author : xlinux
Score : 99 points
Date : 2024-11-20 17:07 UTC (5 hours ago)
(HTM) web link (www.philipzucker.com)
(TXT) w3m dump (www.philipzucker.com)
| babel_ wrote:
| For anyone curious: the performance difference between Clang and
| GCC on the example C solution for verbal arithmetic comes down to
| Clang's auto-vectorisation (deducing SIMD) whilst GCC here sticks
| with scalar, which is why the counter brings Clang closer in line
| to GCC (https://godbolt.org/z/xfdxGvMYP), and it's actually a
| pretty nice example of auto-vectorisation (and its limitations)
| in action, which is a fun tangent from this article (given its
| relevance to high-performance SMT/SAT solving for CSP)
| mbid wrote:
| The post mentions the idea that querying a database D can be
| understood algebraically as enumerating all morphisms Q -> D,
| where Q is the "classifying" database of the query, i.e. a
| minimal database instance that admits a single "generic" result
| of the query. You can use this to give a neat formulation of
| Datalog evaluation. A Datalog rule then corresponds a morphism P
| -> H, where P is the classifying database instance of the rule
| body and H is the classifying database instance for matches of
| both body and head. For example, for the the transitivity rule
| edge(x, z) :- edge(x, y), edge(y, z).
|
| you'd take for P the database instance containing two rows (a_1,
| a_2) and (a_2, a_3), and the database instance H contains
| additionally (a_1, a_3). Now saying that a Database D satisfies
| this rule means that every morphism P -> D (i.e., every match of
| the premise of the rule) can be completed to a commuting diagram
| P --> D | ^ | / [?] / Q
|
| where the additional map is the arrow Q -> D, which corresponds
| to a match of both body and head.
|
| This kind of phenomenon is known in category theory as a "lifting
| property", and there's rich theory around it. For example, you
| can show in great generality that there's always a "free" way to
| add data to a database D so that it satisfies the lifting
| property (the orthogonal reflection construction/the small object
| argument). Those are the theoretical underpinnings of the Datalog
| engine I'm sometimes working on [1], and there they allow you to
| prove that Datalog evaluation is also well-defined if you allow
| adjoining new elements during evaluation in a controlled way. I
| believe the author of this post is involved in the egglog project
| [2], which might have similar features as well.
|
| [1] https://github.com/eqlog/eqlog [2]
| https://github.com/egraphs-good/egglog
| lovasoa wrote:
| The topic of huge queries on tiny databases makes me think of
| this recent discussion on the SQLite forum:
| https://sqlite.org/forum/forumpost/0d18320369
|
| Someone had an issue because SQLite failed to optimize the
| following query select * from t where x = 'x'
| or '' = 'x'
|
| Someone said that SQLite could not optimize out the "or '' = 'x'"
| because it would be too expensive to compute. Which is obviously
| true only for huge queries on tiny datasets.
| hinkley wrote:
| Why would it be too expensive to optimize out static
| subexpressions?
| jjice wrote:
| My guess is that the expense can be tricky to calculate since
| the additional optimization prior to executing the query may
| take longer than if the query was just able to run (depending
| on the dataset, of course). I wonder if it's too expensive to
| calculate a heuristic as well, so it just allows it to
| execute.
|
| Just a guess.
| recursive wrote:
| It's not obviously true at all. Optimizing out `'' = 'x'` can
| be done for a fixed cost regardless of record count.
| lovasoa wrote:
| Optimizing out static expressions can be done in linear time
| at best. So if the number of clauses in WHERE is huge and the
| size of the underlying table is tiny (such as in the examples
| shown in the article we are commenting on), it will be better
| not to run the optimization.
|
| But of course, in normal life, outside of the world of people
| having fun with Homomorphisms, queries are much smaller than
| databases.
| recursive wrote:
| Parsing the expression in the first place is already linear
| time.
| jiggawatts wrote:
| > SQLite
|
| Well... there's your problem. SQLite is _not_ a general-purpose
| RDBMS, it is marketed as a replacement for "fopen()", a
| purpose for which it excels.
|
| A similar product is the Microsoft Jet database engine, used in
| products such as Microsoft Exchange and Active Directory.
| Queries have to be more-or-less manually optimised by the
| developer, but they run faster and more consistently than they
| would with a general-purpose query engine designed for ad-hoc
| queries.
___________________________________________________________________
(page generated 2024-11-20 23:00 UTC)