[HN Gopher] Pql, a pipelined query language that compiles to SQL...
___________________________________________________________________
Pql, a pipelined query language that compiles to SQL (written in
Go)
Author : ejcx
Score : 136 points
Date : 2024-02-28 15:33 UTC (7 hours ago)
(HTM) web link (pql.dev)
(TXT) w3m dump (pql.dev)
| RedShift1 wrote:
| InfluxDB tried to do this with InfluxQL but abandoned it, and are
| now back to SQL. The biggest problem I had with it when I tried
| it, was that is was simply too slow, queries were on average 6x
| slower than their SQL equivalents. I think a language like this
| is just too hard to optimize well.
| loic-sharma wrote:
| This is incorrect. It was their query engine that was hard to
| optimize, not the language. InfluxDB has been working on a new
| query engine based off Apache DataFusion to fix this.
|
| If you squint, this query language is very similar to Polars,
| which is state-of-the-art for performance. I expect Pql could
| be as performant with sufficient investment.
|
| The real problem is that creating a new query language is a
| _ton_ of work. You need to create tooling, language servers,
| integrate with notebooks, etc... If you use SQL you get all of
| this for free.
| crooked-v wrote:
| What's the reason to go for this over PRQL?
| zhiboz wrote:
| I had the exact same question when I saw the post.
| IshKebab wrote:
| Looks like PRQL doesn't have a Go library so I guess they just
| really wanted something in Go?
|
| I would guess they didn't wrap the main PRQL library (which is
| written in Rust) because Go code is a lot easier to deal with
| when it's pure Go. And they probably didn't just write a Go
| version of PRQL because that would be a mountain of work.
|
| Still I think that's a mistake. PRQL is a far more mature
| project and has things like IDE support and an online
| playground which they are never going to do...
|
| Better just to bite the bullet and wrap the Rust library.
| tstack wrote:
| > Looks like PRQL doesn't have a Go library so I guess they
| just really wanted something in Go?
|
| There's some C bindings and the example in the README shows
| integration with Go:
|
| https://github.com/PRQL/prql/tree/main/prqlc/bindings/prqlc-.
| ..
| IshKebab wrote:
| Ah yeah I wonder why they don't mention that in their docs.
| pkolaczk wrote:
| It's not the problem with binding to Rust. Python can do it,
| Zig can do it, C can do it, even JS can do. It is Go which
| doesn't integrate well with anything that isn't Go.
| randomdata wrote:
| Where did you dream up that idea? The PRQL project
| integrates with Go just fine. The project even provides an
| example in Go.
| caust1c wrote:
| The simple answer is that it's too distinct from where we're
| trying to meet our users.
|
| We're not anti-PRQL, but our users (folks in security) coming
| from Kusto, Splunk, SumoLogic, LogScale, and others have
| expressed that they have a preference for this syntax over PRQL
| syntax.
|
| I wouldn't be surprised if we end up supporting both and
| letting folks choose the one they're most happy with using.
| memset wrote:
| The big advantage of this project in my opinion is that it will
| pass functions it doesn't recognize to the underlying DB.
|
| With prql, if they don't support your favorite operator then
| you're out of luck.
| beeskip wrote:
| https://prql-lang.org/book/reference/syntax/s-strings.html
| memset wrote:
| Did not know about that!
| anotherhue wrote:
| Similarities to PRQL: https://prql-lang.org/
| danpalmer wrote:
| The "Why?" questions are getting downvoted, but to dissect the
| why section from the page a little...
|
| "designed to be small and efficient" - adding a layer on top of
| SQL is necessarily less efficient, adding a layer of indirection
| on underlying optimisations means it is likely (but not
| guaranteed) to also generate less efficient queries.
|
| "make developing queries simple" - this seems to be just
| syntactic preference. The examples are certainly shorter, but in
| part that's the SQL style used.
|
| I think it either needs more evidence that the syntax is actually
| better or cases it simplifies, the ways in which it can optimise
| that are hard in SQL, or it perhaps needs to be more honest about
| the intent of the project being to just be a different interface
| for those who prefer it that way.
|
| It's an interesting exercise, and I'm glad it exists in that
| respect, and hope the author enjoyed making it and learnt
| something from it. That can be enough of a why!
| ejcx wrote:
| The main goal was to help security engineers / analysts, who
| _loathe_ sql (for better or worse).
|
| I tend to think this is a little more user friendly,
| personally, and it's nice to give some open-source competition
| to the major languages that are used in security (SPL,
| Sumologic, KQL, and ES|QL).
|
| We were surprised that there weren't syntactic competitiors
| (i.e. -- while prql has some similar goals, the syntax and
| audience in mind were very different)
| throwanem wrote:
| How's perf for the compiled queries? The first thing I see in
| the examples is what appears to be a CTE-by-default approach
| that, in most (all?) engines, means the generated query
| ultimately runs over an unindexed (and maybe materialized!)
| intermediary resultset.
| fwip wrote:
| Higher level languages often have opportunities for additional
| optimisation over the straightforward implementation in the
| target language. This is because the semantics of the high-
| level language offer guarantees that the target doesn't, or
| because the intent is more clearly preserved in the source
| language.
|
| Whether or not this is true for PQL/SQL, I don't know enough to
| say. But I do know that I don't write SQL at a high-enough
| level to be sure that a wrapper couldn't compile to something
| more efficient than what I produce, especially for complicated
| queries.
| crooked-v wrote:
| SQL also has a lot of edge cases where, because half the
| syntax got bolted on after the language was initially
| finalized, a conceptually simple request can turn into a
| multi-level mess of a query.
| munk-a wrote:
| The join syntax in particular seems really clunky compared to
| what it could have been - it's unclear if they support joining
| on different column names (i.e. fooid = foo.id or even fooid =
| parentid) which would be really restricting if unsupported.
| It'd also be nice if they used the USING/ON distinction that
| SQL has instead of just supporting USING but calling it ON
| since that's a weird thing to mentally translate.
|
| Pipe-driven SQLes have been interesting in the past but I much
| prefer when you start with a big-blob of joins to define the
| data source before getting into the applied operations - the
| SQL PQL produces looks like it would have really large issues
| with performance due to forcing certain join orders and
| limiting how much the query planner can rearrange operations.
| gtroja wrote:
| The only thing I like in SQL is that is almost the same language
| in decades. Learn it once and you're done. If you really need,
| you could write macros yourself. I don't see the value of
| learning a new language to do the same thing
| quaunaut wrote:
| SQL's nonsensical handling of null is reason enough to learn
| other query languages.
| FridgeSeal wrote:
| Using this language on top won't solve that though, it still
| compiles into sql, warts and all.
| munk-a wrote:
| Do you mean that NULL <> NULL and NULL infects boolean logic?
|
| NULL is always an awkward thing to deal with - how you want
| to handle it depends on the specific thing you're trying to
| accomplish. I'd probably prefer it if NULL equaled NULL when
| dealing with where conditions but it actually makes join
| evaluations a lot cleaner - if NULL equaled NULL then joining
| on columns with nulls would get really weird.
|
| At the end of the day IS NULL and IS DISTINCT FROM/IS NOT
| DISTINCT FROM exist so you can handle cases where it'd be
| weird.
| nextaccountic wrote:
| the best way to handle nulls is with Option / Maybe types.
| that is, without null at all
|
| unfortunately they were not invented at the time sql was
| created
| munk-a wrote:
| I think that's just a question on syntactic sugaring here
| - so, concretely, what would that mean for comparison
| operators? If I wanted to `id = id` and both were
| nullable would I need to express that as two layers of
| statements where I tried to unwrap both sides first or
| would we have a maybe vs maybe comparison operator - if
| we had such an operator what would it do in this case?
| setr wrote:
| You'd have to unwrap, usually with convenient mechanism
| for it --pattern matching
|
| The problem is that id = id is fundamentally incorrect
| for a nullable column. You should have done id is not
| null and id = id. And you shouldn't have been allowed to
| do the first anyways, because nothing good can come of it
| (there is no sane semantics to stuffing a trinary logic
| into a boolean algebra, and SQL chooses one of the many
| insane options, leading to both false positive and false
| negative matches depending.) the only correct answer is
| not to do that.
| munk-a wrote:
| I can understand how that would make things more explicit
| - but given how common nullability is in the realm of
| databases I think it'd be a bad idea to force that much
| syntax in a situation where id=id is essentially what we
| want to discuss - especially since columns becoming
| potentially null is so trivial to achieve (even when
| impossible due to model constraints) as soon as you start
| throwing left joins in the mix.
| layer8 wrote:
| Pql uses the same NUll handling, since it just passes it
| through to the generated SQL.
| dvdkon wrote:
| Given how many new programming languages have become widely
| used in the time since SQL's creation, it seems many others
| _do_ se some value. For a mostly direct parallel to PQL, see
| CoffeeScript and TypeScript.
| pjerem wrote:
| > The only thing I like in SQL is that is almost the same
| language in decades
|
| So you do agree that the rest of SQL is broken. That's why
| there is a value in creating (and learning) such new languages.
| kubota wrote:
| I strongly disagree that SQL is broken. Try using JPA or your
| favorite ORM on a large project with a complex entity graph.
| It is a nightmare having to ctrl-click deep into the entity
| graph to see what annotations are on some great-grandchildren
| to get an idea as to what query is being executed. When
| working with relational data SQL has always been the ideal
| tool for the job on the projects I've worked on.
| setr wrote:
| ORMs are fundamentally broken. Hiding the relational
| structure of a relational database is the act of madmen.
|
| Using the psychosis of ORMs to defend the psychosis of SQL
| is itself a form of psychosis
| darcien wrote:
| This is actually pretty awesome! I use KQL every few days for
| reading some logs from Azure App Insight. The syntax is pretty
| nice and you can make pretty complex stuff out of it. But that's
| it, I can't use KQL anywhere else outside Azure. With this, I can
| show off my KQL-fu to my teammates and surprise them with how
| fast you can write KQL-like syntax compared to SQL.
| alfalfasprout wrote:
| At this stage I feel that the natural evolution for SQL is
| instead to use english to describe what you want and have an LLM
| generate SQL. Often with comments.
|
| For some reason, a lot of these SQL alternatives seem to be
| syntactic preference and not much simpler or clearer than the
| original.
| caust1c wrote:
| We built LLM-to-SQL before this at RunReveal, and while it's
| useful and gets queries mostly correct 80% of the time, 20% of
| the time it's way off or requires nontrivial manual
| intervention.
|
| We're still fairly bullish on the LLM-to-SQL front though, but
| in the meantime PQL is a good bridge.
| munk-a wrote:
| As a company that's invested into this. Would you mind
| talking as to why you don't want to use raw SQL - are there
| particular deficiencies you've found in it?
| jodrellblank wrote:
| Here is a long blogpost "against SQL" which lists many
| deficiencies of it: https://www.scattered-
| thoughts.net/writing/against-sql/
|
| In short, it has a longer spec than famously-complex C++
| while making a much less expressive language out of it.
| ejcx wrote:
| We do use raw sql, but we're a security business which
| tends to have heavy reliance on other languages that have a
| similar syntax to pql
| __mharrison__ wrote:
| See tools like pandas and Polars. These database libraries
| are abstractions that give you a spray of SQL
| functionality. I prefer using these libraries because it
| feels much more intuitive (and works with the Python/arrow
| ecosystem). (I'm also biased since I make a portion of my
| living off of pandas training material.)
| dhosek wrote:
| SQL is part of a generation of languages which attempted to
| have (within strict syntactical constraints), a natural
| language-sounding approach to begin with (another one that
| comes to mind is AppleScript).
| munk-a wrote:
| Does this really appeal to developers? SQL is an extremely
| expressive language and while there are things in it I would
| change they are mostly minor points. Writing a query in english
| where the actual SQL it translates to is dependent on the
| specific version of the LLM seems dangerous - and forcing the
| query expression to be in a language that not everyone speaks
| either as a first language or at all seems to make code less
| accessible.
|
| When properly styled with good indentation and syntax habits
| SQL is extremely readable.
| alfalfasprout wrote:
| It's good for the case where you want a somewhat complex
| analytical query generated on the spot.
|
| But this is only for initial generation. After that, you
| should be using pure SQL.
| rgmerk wrote:
| A thought experiment:
|
| Rather than an LLM, you can send your request for an SQL query
| directly to Donald D. Chamberlin, one of the original designers
| of SQL. Furthermore, he gets an ERD for your database.
|
| What odds you get back a query that gives you correct answers?
| importantbrian wrote:
| We might eventually get a good LLM to SQL tool, but my
| experience with them is that they make slick sales demos, but
| are worse than useless in the real world.
|
| You have to know SQL to use them. They produce a lot of code
| that looks correct and produces correct-looking results with
| subtle errors. So you can't just hand it to someone who doesn't
| know SQL and let them query the database, but that's the use
| case where something like this would be valuable. You have to
| be experienced with SQL and know all the peccadillos of the DB
| you're working with to check the query and output for
| correctness.
|
| For someone like me who is experienced with SQL, I can write
| simple queries just as fast as I can figure out how to prompt
| the LLM to get what I want. Where a tool like this would be
| really helpful is if it could help me write more complex
| queries more quickly. However, it is non-trivial to get the LLM
| to generate complex queries that take into account all the
| idiosyncrasies of your specific data model. So again it ends up
| being much faster for me to just write the query myself and not
| involve the LLM.
|
| Where I think LLMs go wrong with SQL is that to write good SQL
| you have to have a deep knowledge of the underlying data model,
| and the LLMs aren't good at that yet.
| mritchie712 wrote:
| we (https://www.definite.app/) do this.
|
| The SQL generation works well out of the box and works better
| as you update the semantic layer. The semantic layer includes
| things like joins and measures (e.g. aggregate functions) that
| you'd want standard definitions for. For example, you don't
| want an LLM creating a definition for MRR on the fly. All the
| semantic definitions are plain SQL.
|
| quick demo:
| https://www.loom.com/share/a0d3c0e273004d7982b2aed24628ef40?...
| sgammon wrote:
| I wish people would stop naming things Pql, Pkl, or Pickle. There
| are already languages and softwares out there with this name that
| are well adopted, beloved, and specific.
|
| Apple's Pkl project is, in my opinion, the one that should get to
| keep this name. It's the most interesting of all of them and most
| likely to get adopted/used. This Pql and Python Pickle are both
| sort of exotic use cases. Anyway, please realize there are
| already two popular languages named this.
| munk-a wrote:
| Tcl wants its namespace back - it predates all these.
| Additionally there is PECL (again, ancient) but the popularity
| of that has been waning for quite some time with composer being
| the main package management system for PHP now.
| sgammon wrote:
| Tcl and PECL are ancient and don't have the community
| described above. Neither have serious name collisions that I
| know of. Syntax highlighting is going to be hard when you
| name things this way. Documentation is going to be hard to
| find.
|
| I'm just saying: why do this to your users? It's a name. It
| is one of those things you have complete and total creative
| control over.
| kylecazar wrote:
| Maybe it's meant to be pronounced 'pea-qul'
| smurda wrote:
| This is cool. Splunk Search Processing Language (SPL) is a real
| vendor lock-in feature. Once the team has invested time to get
| ramped up on SPL, and it gets integrated in your workflows,
| ripping out Splunk has an even higher switching cost.
| helloericsf wrote:
| Wow, so many query languages, right? Do we really need another
| one? What's the story behind that decision? Cheers.
| brettv2 wrote:
| This is answered on their blog:
|
| https://blog.runreveal.com/introducing-pql/
| helloericsf wrote:
| Cheers, mate! The blog cleared up a chunk of my question and
| the chat here gave me a better grasp of why it's over PRQL.
| vincnetas wrote:
| for anyone using anything more than basic SQL functionality so
| far this looks very limiting. No window functions, no agregate
| filtering, no data type specific functions (ranges).
| sigmonsays wrote:
| why not just use sql?
| brikym wrote:
| Did you even visit the link? pql is much more concise and
| doesn't need a bunch of nested scopes. Better autocompletion
| because the data source is always present before defining the
| output columns.
| pknerd wrote:
| Sorry I might be dumb but why do we need this?
| falserum wrote:
| Because SQL is a nightmare. (Standard is in thousands of pages;
| nobody fully implements it; not actually a single language as
| usually each db has deviations and extensions; nonmodular,
| composability is hard - an afterthought)
|
| And the worst part: nothing better exist; single'ish bad
| language is better than dozens of new shortlived ones that have
| quirks in various other places.
|
| But somebody needs to be idealist and keep trying.
| RoyTyrell wrote:
| That is quite a hyperbolic statement and I have to disagree
| with you. I've used SQL on a variety of databases over 15
| years, and yes while some like MySQL have poorly implemented
| anything other than basic SQL/RDBMS features, most are very
| similar in feature sets. There are vendors-specific
| additional features like how Oracle supports hierarchical
| queries with CONNECT, but you don't have to use them.
|
| - CTEs are very close if not the same across Oracle,
| PostgreSQL, DB2, Hive, Snowflake, and MS SQL Server - I
| believe even Sybase too but it's been a while. - Joins work
| all largely the same even though a couple of those support
| additional join types, especially when you want to join on
| functions that return data sets. - Window functions are
| supported by every major DB with similar or the same syntax
| too. Any differences take 5sec to lookup in documentation.
|
| My only complaint is loading data is highly vendor specific.
| throwanem wrote:
| Most of these problems are pretty trivial to solve by simply
| reading the manual for the database engine you happen to be
| using, which typically will document deviations from the
| standard in serviceable detail.
|
| I'd be more sympathetic to concerns about switching engines
| if I had, at any point in a career now heading for its 25th
| year, ever seen that occur.
|
| Given the frequency with which ORMs are used in greenfield to
| defend against this notional problem, and the many problems
| that using an ORM _always_ inflicts, this may well be the
| costliest form of premature optimization I 've ever seen. It
| certainly can't be outside the top three.
| pknerd wrote:
| many other DSLs transformed to SQL. And now we have LLMs that
| can do it much better way.
| Alifatisk wrote:
| Reminds me of how ActiveRecord works in Rails
| memset wrote:
| This is really great! Maybe I'll incorporate this into my own
| software (scratchdata/scratchdb)
|
| Question: it looks like you wrote the parser by hand. How did you
| decide that that was the right approach? I myself am new to
| parsers and am working on implementing the PostgREST syntax in go
| using PEG to translate to Clickhouse, which is to say, a similar
| mission as this project. Would love to learn how you approached
| this problem!
| brikym wrote:
| It looks a lot like Kusto query language. Here is a kusto query:
| StormEvents | where StartTime between
| (datetime(2007-01-01) .. datetime(2007-12-31)) and
| DamageCrops > 0 | summarize EventCount = count() by
| bin(StartTime, 7d)
|
| edit... yes it indeed was inspired by Kusto as they mention on
| the github Readme https://github.com/runreveal/pql
| nerdponx wrote:
| Is that what AWS CloudWatch Insights uses?
| theragra wrote:
| Azure appinsights logs
| beoberha wrote:
| Wow, so many haters in here...
|
| This is awesome! I work at Microsoft and use Kusto daily. I would
| love to see a pipelined query language become mainstream and
| replace SQL for the simple querying that most people have to do.
| loic-sharma wrote:
| Agreed! Azure's best product isn't CosmosDB, AKS, or their AI
| cognitive services - it's Kusto! Sadly no one knows about it.
|
| If you're interested, you can try it here:
| https://learn.microsoft.com/en-us/azure/data-explorer/kusto/...
| nitrix wrote:
| Pipelining is cool, though this could've easily just been a
| library with nice chaining and combinators in your language of
| choice (seems to be Go here).
| mcphage wrote:
| Their very first example has issues:
|
| > users > | where like(email, 'gmail') > | count
|
| becomes
|
| > WITH > "__subquery0" AS ( > SELECT > * > FROM > "users" > WHERE
| > like ("email", 'gmail') > ) > SELECT > COUNT(*) AS "count()" >
| FROM > "__subquery0";
|
| Fetching everything from the users table can be a ton slower than
| just running a count on that table, if the table is indexed on
| email. I had to deal with that very problem this week.
| kccqzy wrote:
| Should not be a problem on modern Postgres.
| paulddraper wrote:
| That is the same query plan for any contemporary query planner.
|
| (Just like any C compiler will produce the same output for `x
| += 2` and `x += 1 + 1`.)
|
| ---
|
| A notable exception was PostgreSQL prior to version 12, which
| treated CTEs as an optimization fences.
| throwanem wrote:
| I'd be hesitant to assume the generated CTEs are always going
| to be amenable to optimization. The examples on the linked
| page are pretty trivial queries - I wonder what happens when
| that ceases to be the case, as seems very likely with a tool
| that apparently doesn't do a great deal to promote
| understanding what goes on under the hood.
| dkga wrote:
| The R {dbplyr} package is also a very good way in practice to
| pipe SQL.
| civilized wrote:
| Also AFAICT the only one that is fully integrated with a
| general purpose programming language with an excellent IDE.
| mavam wrote:
| We're developing TQL (Tenzir Query Language, "tea-quel") that is
| very similar to PQL: https://docs.tenzir.com/pipelines
|
| Also a pipeline language, PRQL-inspired, but differing in that
| (i) TQL supports multiple data types between operators, both
| unstructured blocks of bytes and structured data frames as Arrow
| record batches, (ii) TQL is multi-schema, i.e., a single pipeline
| can have different "tables", as if you're processing semi-
| structured JSON, and (iii) TQL has support for batch and stream
| processing, with a light-weight indexed storage layer on top of
| Parquet/Feather files for historical workloads and a streaming
| executor. We're in the middle of getting TQL v2 [@] out of the
| door with support for expressions and more advanced control flow,
| e.g., match-case statements. There's a blog post [#] about the
| core design of the engine as well.
|
| While it's a general-purpose ETL tool, we're targeting primary
| operational security use case where people today use Splunk,
| Sentinel/ADX, Elastic, etc. So some operators are very
| security'ish, like Sigma, YARA, or Velociraptor.
|
| Comparison: users | where eventTime >
| minus(now(), toIntervalDay(1)) | project user_id,
| user_email
|
| vs TQL: export where eventTime > now()
| - 1d select user_id, user_email
|
| [@]
| https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...
|
| [#] https://docs.tenzir.com/blog/five-design-principles-for-
| buil...
| ben_jones wrote:
| I'm glad this exists but would caution extensibility as the most
| important thing for devs to consider when picking there "ORM"
| stack especially in terse Golang.
|
| For that I use squirrel which uses the builder pattern to compose
| sql strings. Keeping it as strings and interfaces allow it to be
| very easily extended, for example I was able to customize it to
| speak SOQL (salesforce). plenty of downide though.
___________________________________________________________________
(page generated 2024-02-28 23:00 UTC)