[HN Gopher] Semantic Diff for SQL
___________________________________________________________________
Semantic Diff for SQL
Author : s0ck_r4w
Score : 154 points
Date : 2022-07-29 15:14 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tessierashpool wrote:
| this is very cool, but I believe this bit of the README is
| incorrect:
|
| _Text-based diff tools such as git diff, when applied to a code
| base, have certain limitations. First, they can only detect
| insertions and deletions, not movements or updates of individual
| pieces of code._
|
| git diff can detect movements. looking at my .gitconfig, I think
| it's the "frag = magenta" line.
| Something1234 wrote:
| Post it or link it so we can better understand!
| chrisjc wrote:
| Very nice/interesting.
|
| Somewhat related question and apologies if this is already stated
| in the documentation (it's rather dense and I haven't had the
| time to read through it completely)...
|
| Can you use sqlglot to create custom DDL dialects that have
| custom first class objects? For instance, if I want to build a
| custom SQL/DDL/DML dialect that had a new kind of object such as
| a "pipe", "kitchensink", etc, would sqlglot be a good tool to
| use?
|
| I've tried playing around with Apache Calcite, but it lost me
| pretty quickly since the examples to customize/extend DDLs were
| quite lacking in my opinion.
| captaintobs wrote:
| Yes. SQLGlot is very customizable and you can pretty much
| override everything. It kind of needs to be flexible because
| even common sql dialects vary greatly.
|
| Here's an example of how we use SQLGlot to output raw Python
| code directly from SQL.
|
| https://github.com/tobymao/sqlglot/blob/main/sqlglot/executo...
| chrisjc wrote:
| Very nice, and exactly along the lines of what I was
| thinking... I want to be able to create a custom SQL dialect
| that can output some code.
|
| Thanks very much, looking forward to spending some time
| reading through all of this!
| AeroNotix wrote:
| What about difftastic?
| s0ck_r4w wrote:
| Difftastic seems like a really cool tool. There are a few
| reasons, however, why it doesn't apply well to use cases I had
| in mind:
|
| 1. It's in JS and not Python. These days a common data
| (including data tooling) stack revolves around Python and
| fitting JS into this ecosystem is not straightforwad.
|
| 2. Limited dialect support. As far as I can see it only
| supports "PostgreSQL flavor" (not sure what exactly is meant by
| "flavor" here). Support for dialects like Spark, Trino, Hive,
| etc SQL was crucial.
|
| Definitely a worthy mention, though, thank you!
| leeoniya wrote:
| > It's in JS and not Python
|
| it's in Rust:
|
| https://github.com/Wilfred/difftastic
|
| what am i missing?
| rmccue wrote:
| Difftastic's wiki also has a breakdown of some structured
| diff algorithms:
| https://github.com/Wilfred/difftastic/wiki/Structural-Diffs
|
| (I've been working on a similar problem, effectively diffing
| an XML tree.)
| difflens wrote:
| Interesting, will give sqlglot a look when we get to adding SQL
| support in DiffLens [https://github.com/marketplace/difflens]. Or
| perhaps DiffLens can just use sqlglot :) Either way we're very
| happy to see another semantic diff tool.
|
| P.S: We work on DiffLens. It currently supports TS, JS, CSS and
| text diffs. We're working on making a VS Code extension currently
| stochtastic wrote:
| I've been very impressed with sqlglot, and am looking forward to
| trying this feature. The only issue I've had with sqlglot is
| transpiling for use with a specific spark version: in my
| experience Spark is not great about surfacing obvious 'not
| registered' errors when a function isn't supported (especially in
| >=2.4). I ran into this with width_bucket, which is only in the
| most recent release. I am curious whether there's a
| straightforward way to write with a specific release and catch
| the error in transpilation rather than execution.
|
| Side note: Iaroslav (post author) and Toby (sqlglot creator) are
| both amazing, and I'm so glad that they're working on open source
| projects like this.
| captaintobs wrote:
| It's easy to add errors for dialects by calling #unsupported
| when a function is used.
|
| In terms of versioning of engines, I haven't implemented that
| yet, but presumably it could be done by adding a dialect
| subclass and having versioning route to it, so we could do
| something like parse(sql, dialect="spark", version=...) which
| could then route to a 2.3 version of spark.
|
| Happy to chat more about this and we can see about adding it
| (or feel free to make a pr). You can DM me on twitter or some
| other avenue as well if you want to dive in deep.
| nerpderp82 wrote:
| https://github.com/tobymao/sqlglot
| karmakaze wrote:
| I thought this was going to be something else like being able to
| tell that a rewritten query returns the same set of rows, but
| with potentially a very different query plan. E.g. dependent
| EXISTS subquery vs IN subquery.
| captaintobs wrote:
| This is what the sqlglot optimizer is used for. The optimizer
| converts EXISTS and IN into a canonicalized SQL (some variant
| of left join) which can then be compared to another query.
|
| So if you run the optimizer first and then the diff tool, it
| could solve this kind of use case.
| gavinray wrote:
| This exists for Postgres, let me try to find the name of the
| tool
|
| EDIT: I can't find it, I searched for 30 mins, I promise this
| exists though. If anyone else can remember the name of it,
| please post.
| [deleted]
___________________________________________________________________
(page generated 2022-07-29 23:00 UTC)