[HN Gopher] Pql, a pipelined query language that compiles to SQL...
       ___________________________________________________________________
        
       Pql, a pipelined query language that compiles to SQL (written in
       Go)
        
       Author : ejcx
       Score  : 136 points
       Date   : 2024-02-28 15:33 UTC (7 hours ago)
        
 (HTM) web link (pql.dev)
 (TXT) w3m dump (pql.dev)
        
       | RedShift1 wrote:
       | InfluxDB tried to do this with InfluxQL but abandoned it, and are
       | now back to SQL. The biggest problem I had with it when I tried
       | it, was that is was simply too slow, queries were on average 6x
       | slower than their SQL equivalents. I think a language like this
       | is just too hard to optimize well.
        
         | loic-sharma wrote:
         | This is incorrect. It was their query engine that was hard to
         | optimize, not the language. InfluxDB has been working on a new
         | query engine based off Apache DataFusion to fix this.
         | 
         | If you squint, this query language is very similar to Polars,
         | which is state-of-the-art for performance. I expect Pql could
         | be as performant with sufficient investment.
         | 
         | The real problem is that creating a new query language is a
         | _ton_ of work. You need to create tooling, language servers,
         | integrate with notebooks, etc... If you use SQL you get all of
         | this for free.
        
       | crooked-v wrote:
       | What's the reason to go for this over PRQL?
        
         | zhiboz wrote:
         | I had the exact same question when I saw the post.
        
         | IshKebab wrote:
         | Looks like PRQL doesn't have a Go library so I guess they just
         | really wanted something in Go?
         | 
         | I would guess they didn't wrap the main PRQL library (which is
         | written in Rust) because Go code is a lot easier to deal with
         | when it's pure Go. And they probably didn't just write a Go
         | version of PRQL because that would be a mountain of work.
         | 
         | Still I think that's a mistake. PRQL is a far more mature
         | project and has things like IDE support and an online
         | playground which they are never going to do...
         | 
         | Better just to bite the bullet and wrap the Rust library.
        
           | tstack wrote:
           | > Looks like PRQL doesn't have a Go library so I guess they
           | just really wanted something in Go?
           | 
           | There's some C bindings and the example in the README shows
           | integration with Go:
           | 
           | https://github.com/PRQL/prql/tree/main/prqlc/bindings/prqlc-.
           | ..
        
             | IshKebab wrote:
             | Ah yeah I wonder why they don't mention that in their docs.
        
           | pkolaczk wrote:
           | It's not the problem with binding to Rust. Python can do it,
           | Zig can do it, C can do it, even JS can do. It is Go which
           | doesn't integrate well with anything that isn't Go.
        
             | randomdata wrote:
             | Where did you dream up that idea? The PRQL project
             | integrates with Go just fine. The project even provides an
             | example in Go.
        
         | caust1c wrote:
         | The simple answer is that it's too distinct from where we're
         | trying to meet our users.
         | 
         | We're not anti-PRQL, but our users (folks in security) coming
         | from Kusto, Splunk, SumoLogic, LogScale, and others have
         | expressed that they have a preference for this syntax over PRQL
         | syntax.
         | 
         | I wouldn't be surprised if we end up supporting both and
         | letting folks choose the one they're most happy with using.
        
         | memset wrote:
         | The big advantage of this project in my opinion is that it will
         | pass functions it doesn't recognize to the underlying DB.
         | 
         | With prql, if they don't support your favorite operator then
         | you're out of luck.
        
           | beeskip wrote:
           | https://prql-lang.org/book/reference/syntax/s-strings.html
        
             | memset wrote:
             | Did not know about that!
        
       | anotherhue wrote:
       | Similarities to PRQL: https://prql-lang.org/
        
       | danpalmer wrote:
       | The "Why?" questions are getting downvoted, but to dissect the
       | why section from the page a little...
       | 
       | "designed to be small and efficient" - adding a layer on top of
       | SQL is necessarily less efficient, adding a layer of indirection
       | on underlying optimisations means it is likely (but not
       | guaranteed) to also generate less efficient queries.
       | 
       | "make developing queries simple" - this seems to be just
       | syntactic preference. The examples are certainly shorter, but in
       | part that's the SQL style used.
       | 
       | I think it either needs more evidence that the syntax is actually
       | better or cases it simplifies, the ways in which it can optimise
       | that are hard in SQL, or it perhaps needs to be more honest about
       | the intent of the project being to just be a different interface
       | for those who prefer it that way.
       | 
       | It's an interesting exercise, and I'm glad it exists in that
       | respect, and hope the author enjoyed making it and learnt
       | something from it. That can be enough of a why!
        
         | ejcx wrote:
         | The main goal was to help security engineers / analysts, who
         | _loathe_ sql (for better or worse).
         | 
         | I tend to think this is a little more user friendly,
         | personally, and it's nice to give some open-source competition
         | to the major languages that are used in security (SPL,
         | Sumologic, KQL, and ES|QL).
         | 
         | We were surprised that there weren't syntactic competitiors
         | (i.e. -- while prql has some similar goals, the syntax and
         | audience in mind were very different)
        
           | throwanem wrote:
           | How's perf for the compiled queries? The first thing I see in
           | the examples is what appears to be a CTE-by-default approach
           | that, in most (all?) engines, means the generated query
           | ultimately runs over an unindexed (and maybe materialized!)
           | intermediary resultset.
        
         | fwip wrote:
         | Higher level languages often have opportunities for additional
         | optimisation over the straightforward implementation in the
         | target language. This is because the semantics of the high-
         | level language offer guarantees that the target doesn't, or
         | because the intent is more clearly preserved in the source
         | language.
         | 
         | Whether or not this is true for PQL/SQL, I don't know enough to
         | say. But I do know that I don't write SQL at a high-enough
         | level to be sure that a wrapper couldn't compile to something
         | more efficient than what I produce, especially for complicated
         | queries.
        
           | crooked-v wrote:
           | SQL also has a lot of edge cases where, because half the
           | syntax got bolted on after the language was initially
           | finalized, a conceptually simple request can turn into a
           | multi-level mess of a query.
        
         | munk-a wrote:
         | The join syntax in particular seems really clunky compared to
         | what it could have been - it's unclear if they support joining
         | on different column names (i.e. fooid = foo.id or even fooid =
         | parentid) which would be really restricting if unsupported.
         | It'd also be nice if they used the USING/ON distinction that
         | SQL has instead of just supporting USING but calling it ON
         | since that's a weird thing to mentally translate.
         | 
         | Pipe-driven SQLes have been interesting in the past but I much
         | prefer when you start with a big-blob of joins to define the
         | data source before getting into the applied operations - the
         | SQL PQL produces looks like it would have really large issues
         | with performance due to forcing certain join orders and
         | limiting how much the query planner can rearrange operations.
        
       | gtroja wrote:
       | The only thing I like in SQL is that is almost the same language
       | in decades. Learn it once and you're done. If you really need,
       | you could write macros yourself. I don't see the value of
       | learning a new language to do the same thing
        
         | quaunaut wrote:
         | SQL's nonsensical handling of null is reason enough to learn
         | other query languages.
        
           | FridgeSeal wrote:
           | Using this language on top won't solve that though, it still
           | compiles into sql, warts and all.
        
           | munk-a wrote:
           | Do you mean that NULL <> NULL and NULL infects boolean logic?
           | 
           | NULL is always an awkward thing to deal with - how you want
           | to handle it depends on the specific thing you're trying to
           | accomplish. I'd probably prefer it if NULL equaled NULL when
           | dealing with where conditions but it actually makes join
           | evaluations a lot cleaner - if NULL equaled NULL then joining
           | on columns with nulls would get really weird.
           | 
           | At the end of the day IS NULL and IS DISTINCT FROM/IS NOT
           | DISTINCT FROM exist so you can handle cases where it'd be
           | weird.
        
             | nextaccountic wrote:
             | the best way to handle nulls is with Option / Maybe types.
             | that is, without null at all
             | 
             | unfortunately they were not invented at the time sql was
             | created
        
               | munk-a wrote:
               | I think that's just a question on syntactic sugaring here
               | - so, concretely, what would that mean for comparison
               | operators? If I wanted to `id = id` and both were
               | nullable would I need to express that as two layers of
               | statements where I tried to unwrap both sides first or
               | would we have a maybe vs maybe comparison operator - if
               | we had such an operator what would it do in this case?
        
               | setr wrote:
               | You'd have to unwrap, usually with convenient mechanism
               | for it --pattern matching
               | 
               | The problem is that id = id is fundamentally incorrect
               | for a nullable column. You should have done id is not
               | null and id = id. And you shouldn't have been allowed to
               | do the first anyways, because nothing good can come of it
               | (there is no sane semantics to stuffing a trinary logic
               | into a boolean algebra, and SQL chooses one of the many
               | insane options, leading to both false positive and false
               | negative matches depending.) the only correct answer is
               | not to do that.
        
               | munk-a wrote:
               | I can understand how that would make things more explicit
               | - but given how common nullability is in the realm of
               | databases I think it'd be a bad idea to force that much
               | syntax in a situation where id=id is essentially what we
               | want to discuss - especially since columns becoming
               | potentially null is so trivial to achieve (even when
               | impossible due to model constraints) as soon as you start
               | throwing left joins in the mix.
        
           | layer8 wrote:
           | Pql uses the same NUll handling, since it just passes it
           | through to the generated SQL.
        
         | dvdkon wrote:
         | Given how many new programming languages have become widely
         | used in the time since SQL's creation, it seems many others
         | _do_ se some value. For a mostly direct parallel to PQL, see
         | CoffeeScript and TypeScript.
        
         | pjerem wrote:
         | > The only thing I like in SQL is that is almost the same
         | language in decades
         | 
         | So you do agree that the rest of SQL is broken. That's why
         | there is a value in creating (and learning) such new languages.
        
           | kubota wrote:
           | I strongly disagree that SQL is broken. Try using JPA or your
           | favorite ORM on a large project with a complex entity graph.
           | It is a nightmare having to ctrl-click deep into the entity
           | graph to see what annotations are on some great-grandchildren
           | to get an idea as to what query is being executed. When
           | working with relational data SQL has always been the ideal
           | tool for the job on the projects I've worked on.
        
             | setr wrote:
             | ORMs are fundamentally broken. Hiding the relational
             | structure of a relational database is the act of madmen.
             | 
             | Using the psychosis of ORMs to defend the psychosis of SQL
             | is itself a form of psychosis
        
       | darcien wrote:
       | This is actually pretty awesome! I use KQL every few days for
       | reading some logs from Azure App Insight. The syntax is pretty
       | nice and you can make pretty complex stuff out of it. But that's
       | it, I can't use KQL anywhere else outside Azure. With this, I can
       | show off my KQL-fu to my teammates and surprise them with how
       | fast you can write KQL-like syntax compared to SQL.
        
       | alfalfasprout wrote:
       | At this stage I feel that the natural evolution for SQL is
       | instead to use english to describe what you want and have an LLM
       | generate SQL. Often with comments.
       | 
       | For some reason, a lot of these SQL alternatives seem to be
       | syntactic preference and not much simpler or clearer than the
       | original.
        
         | caust1c wrote:
         | We built LLM-to-SQL before this at RunReveal, and while it's
         | useful and gets queries mostly correct 80% of the time, 20% of
         | the time it's way off or requires nontrivial manual
         | intervention.
         | 
         | We're still fairly bullish on the LLM-to-SQL front though, but
         | in the meantime PQL is a good bridge.
        
           | munk-a wrote:
           | As a company that's invested into this. Would you mind
           | talking as to why you don't want to use raw SQL - are there
           | particular deficiencies you've found in it?
        
             | jodrellblank wrote:
             | Here is a long blogpost "against SQL" which lists many
             | deficiencies of it: https://www.scattered-
             | thoughts.net/writing/against-sql/
             | 
             | In short, it has a longer spec than famously-complex C++
             | while making a much less expressive language out of it.
        
             | ejcx wrote:
             | We do use raw sql, but we're a security business which
             | tends to have heavy reliance on other languages that have a
             | similar syntax to pql
        
             | __mharrison__ wrote:
             | See tools like pandas and Polars. These database libraries
             | are abstractions that give you a spray of SQL
             | functionality. I prefer using these libraries because it
             | feels much more intuitive (and works with the Python/arrow
             | ecosystem). (I'm also biased since I make a portion of my
             | living off of pandas training material.)
        
         | dhosek wrote:
         | SQL is part of a generation of languages which attempted to
         | have (within strict syntactical constraints), a natural
         | language-sounding approach to begin with (another one that
         | comes to mind is AppleScript).
        
         | munk-a wrote:
         | Does this really appeal to developers? SQL is an extremely
         | expressive language and while there are things in it I would
         | change they are mostly minor points. Writing a query in english
         | where the actual SQL it translates to is dependent on the
         | specific version of the LLM seems dangerous - and forcing the
         | query expression to be in a language that not everyone speaks
         | either as a first language or at all seems to make code less
         | accessible.
         | 
         | When properly styled with good indentation and syntax habits
         | SQL is extremely readable.
        
           | alfalfasprout wrote:
           | It's good for the case where you want a somewhat complex
           | analytical query generated on the spot.
           | 
           | But this is only for initial generation. After that, you
           | should be using pure SQL.
        
         | rgmerk wrote:
         | A thought experiment:
         | 
         | Rather than an LLM, you can send your request for an SQL query
         | directly to Donald D. Chamberlin, one of the original designers
         | of SQL. Furthermore, he gets an ERD for your database.
         | 
         | What odds you get back a query that gives you correct answers?
        
         | importantbrian wrote:
         | We might eventually get a good LLM to SQL tool, but my
         | experience with them is that they make slick sales demos, but
         | are worse than useless in the real world.
         | 
         | You have to know SQL to use them. They produce a lot of code
         | that looks correct and produces correct-looking results with
         | subtle errors. So you can't just hand it to someone who doesn't
         | know SQL and let them query the database, but that's the use
         | case where something like this would be valuable. You have to
         | be experienced with SQL and know all the peccadillos of the DB
         | you're working with to check the query and output for
         | correctness.
         | 
         | For someone like me who is experienced with SQL, I can write
         | simple queries just as fast as I can figure out how to prompt
         | the LLM to get what I want. Where a tool like this would be
         | really helpful is if it could help me write more complex
         | queries more quickly. However, it is non-trivial to get the LLM
         | to generate complex queries that take into account all the
         | idiosyncrasies of your specific data model. So again it ends up
         | being much faster for me to just write the query myself and not
         | involve the LLM.
         | 
         | Where I think LLMs go wrong with SQL is that to write good SQL
         | you have to have a deep knowledge of the underlying data model,
         | and the LLMs aren't good at that yet.
        
         | mritchie712 wrote:
         | we (https://www.definite.app/) do this.
         | 
         | The SQL generation works well out of the box and works better
         | as you update the semantic layer. The semantic layer includes
         | things like joins and measures (e.g. aggregate functions) that
         | you'd want standard definitions for. For example, you don't
         | want an LLM creating a definition for MRR on the fly. All the
         | semantic definitions are plain SQL.
         | 
         | quick demo:
         | https://www.loom.com/share/a0d3c0e273004d7982b2aed24628ef40?...
        
       | sgammon wrote:
       | I wish people would stop naming things Pql, Pkl, or Pickle. There
       | are already languages and softwares out there with this name that
       | are well adopted, beloved, and specific.
       | 
       | Apple's Pkl project is, in my opinion, the one that should get to
       | keep this name. It's the most interesting of all of them and most
       | likely to get adopted/used. This Pql and Python Pickle are both
       | sort of exotic use cases. Anyway, please realize there are
       | already two popular languages named this.
        
         | munk-a wrote:
         | Tcl wants its namespace back - it predates all these.
         | Additionally there is PECL (again, ancient) but the popularity
         | of that has been waning for quite some time with composer being
         | the main package management system for PHP now.
        
           | sgammon wrote:
           | Tcl and PECL are ancient and don't have the community
           | described above. Neither have serious name collisions that I
           | know of. Syntax highlighting is going to be hard when you
           | name things this way. Documentation is going to be hard to
           | find.
           | 
           | I'm just saying: why do this to your users? It's a name. It
           | is one of those things you have complete and total creative
           | control over.
        
         | kylecazar wrote:
         | Maybe it's meant to be pronounced 'pea-qul'
        
       | smurda wrote:
       | This is cool. Splunk Search Processing Language (SPL) is a real
       | vendor lock-in feature. Once the team has invested time to get
       | ramped up on SPL, and it gets integrated in your workflows,
       | ripping out Splunk has an even higher switching cost.
        
       | helloericsf wrote:
       | Wow, so many query languages, right? Do we really need another
       | one? What's the story behind that decision? Cheers.
        
         | brettv2 wrote:
         | This is answered on their blog:
         | 
         | https://blog.runreveal.com/introducing-pql/
        
           | helloericsf wrote:
           | Cheers, mate! The blog cleared up a chunk of my question and
           | the chat here gave me a better grasp of why it's over PRQL.
        
       | vincnetas wrote:
       | for anyone using anything more than basic SQL functionality so
       | far this looks very limiting. No window functions, no agregate
       | filtering, no data type specific functions (ranges).
        
       | sigmonsays wrote:
       | why not just use sql?
        
         | brikym wrote:
         | Did you even visit the link? pql is much more concise and
         | doesn't need a bunch of nested scopes. Better autocompletion
         | because the data source is always present before defining the
         | output columns.
        
       | pknerd wrote:
       | Sorry I might be dumb but why do we need this?
        
         | falserum wrote:
         | Because SQL is a nightmare. (Standard is in thousands of pages;
         | nobody fully implements it; not actually a single language as
         | usually each db has deviations and extensions; nonmodular,
         | composability is hard - an afterthought)
         | 
         | And the worst part: nothing better exist; single'ish bad
         | language is better than dozens of new shortlived ones that have
         | quirks in various other places.
         | 
         | But somebody needs to be idealist and keep trying.
        
           | RoyTyrell wrote:
           | That is quite a hyperbolic statement and I have to disagree
           | with you. I've used SQL on a variety of databases over 15
           | years, and yes while some like MySQL have poorly implemented
           | anything other than basic SQL/RDBMS features, most are very
           | similar in feature sets. There are vendors-specific
           | additional features like how Oracle supports hierarchical
           | queries with CONNECT, but you don't have to use them.
           | 
           | - CTEs are very close if not the same across Oracle,
           | PostgreSQL, DB2, Hive, Snowflake, and MS SQL Server - I
           | believe even Sybase too but it's been a while. - Joins work
           | all largely the same even though a couple of those support
           | additional join types, especially when you want to join on
           | functions that return data sets. - Window functions are
           | supported by every major DB with similar or the same syntax
           | too. Any differences take 5sec to lookup in documentation.
           | 
           | My only complaint is loading data is highly vendor specific.
        
           | throwanem wrote:
           | Most of these problems are pretty trivial to solve by simply
           | reading the manual for the database engine you happen to be
           | using, which typically will document deviations from the
           | standard in serviceable detail.
           | 
           | I'd be more sympathetic to concerns about switching engines
           | if I had, at any point in a career now heading for its 25th
           | year, ever seen that occur.
           | 
           | Given the frequency with which ORMs are used in greenfield to
           | defend against this notional problem, and the many problems
           | that using an ORM _always_ inflicts, this may well be the
           | costliest form of premature optimization I 've ever seen. It
           | certainly can't be outside the top three.
        
         | pknerd wrote:
         | many other DSLs transformed to SQL. And now we have LLMs that
         | can do it much better way.
        
       | Alifatisk wrote:
       | Reminds me of how ActiveRecord works in Rails
        
       | memset wrote:
       | This is really great! Maybe I'll incorporate this into my own
       | software (scratchdata/scratchdb)
       | 
       | Question: it looks like you wrote the parser by hand. How did you
       | decide that that was the right approach? I myself am new to
       | parsers and am working on implementing the PostgREST syntax in go
       | using PEG to translate to Clickhouse, which is to say, a similar
       | mission as this project. Would love to learn how you approached
       | this problem!
        
       | brikym wrote:
       | It looks a lot like Kusto query language. Here is a kusto query:
       | StormEvents         | where StartTime between
       | (datetime(2007-01-01) .. datetime(2007-12-31))          and
       | DamageCrops > 0         | summarize EventCount = count() by
       | bin(StartTime, 7d)
       | 
       | edit... yes it indeed was inspired by Kusto as they mention on
       | the github Readme https://github.com/runreveal/pql
        
         | nerdponx wrote:
         | Is that what AWS CloudWatch Insights uses?
        
           | theragra wrote:
           | Azure appinsights logs
        
       | beoberha wrote:
       | Wow, so many haters in here...
       | 
       | This is awesome! I work at Microsoft and use Kusto daily. I would
       | love to see a pipelined query language become mainstream and
       | replace SQL for the simple querying that most people have to do.
        
         | loic-sharma wrote:
         | Agreed! Azure's best product isn't CosmosDB, AKS, or their AI
         | cognitive services - it's Kusto! Sadly no one knows about it.
         | 
         | If you're interested, you can try it here:
         | https://learn.microsoft.com/en-us/azure/data-explorer/kusto/...
        
       | nitrix wrote:
       | Pipelining is cool, though this could've easily just been a
       | library with nice chaining and combinators in your language of
       | choice (seems to be Go here).
        
       | mcphage wrote:
       | Their very first example has issues:
       | 
       | > users > | where like(email, 'gmail') > | count
       | 
       | becomes
       | 
       | > WITH > "__subquery0" AS ( > SELECT > * > FROM > "users" > WHERE
       | > like ("email", 'gmail') > ) > SELECT > COUNT(*) AS "count()" >
       | FROM > "__subquery0";
       | 
       | Fetching everything from the users table can be a ton slower than
       | just running a count on that table, if the table is indexed on
       | email. I had to deal with that very problem this week.
        
         | kccqzy wrote:
         | Should not be a problem on modern Postgres.
        
         | paulddraper wrote:
         | That is the same query plan for any contemporary query planner.
         | 
         | (Just like any C compiler will produce the same output for `x
         | += 2` and `x += 1 + 1`.)
         | 
         | ---
         | 
         | A notable exception was PostgreSQL prior to version 12, which
         | treated CTEs as an optimization fences.
        
           | throwanem wrote:
           | I'd be hesitant to assume the generated CTEs are always going
           | to be amenable to optimization. The examples on the linked
           | page are pretty trivial queries - I wonder what happens when
           | that ceases to be the case, as seems very likely with a tool
           | that apparently doesn't do a great deal to promote
           | understanding what goes on under the hood.
        
       | dkga wrote:
       | The R {dbplyr} package is also a very good way in practice to
       | pipe SQL.
        
         | civilized wrote:
         | Also AFAICT the only one that is fully integrated with a
         | general purpose programming language with an excellent IDE.
        
       | mavam wrote:
       | We're developing TQL (Tenzir Query Language, "tea-quel") that is
       | very similar to PQL: https://docs.tenzir.com/pipelines
       | 
       | Also a pipeline language, PRQL-inspired, but differing in that
       | (i) TQL supports multiple data types between operators, both
       | unstructured blocks of bytes and structured data frames as Arrow
       | record batches, (ii) TQL is multi-schema, i.e., a single pipeline
       | can have different "tables", as if you're processing semi-
       | structured JSON, and (iii) TQL has support for batch and stream
       | processing, with a light-weight indexed storage layer on top of
       | Parquet/Feather files for historical workloads and a streaming
       | executor. We're in the middle of getting TQL v2 [@] out of the
       | door with support for expressions and more advanced control flow,
       | e.g., match-case statements. There's a blog post [#] about the
       | core design of the engine as well.
       | 
       | While it's a general-purpose ETL tool, we're targeting primary
       | operational security use case where people today use Splunk,
       | Sentinel/ADX, Elastic, etc. So some operators are very
       | security'ish, like Sigma, YARA, or Velociraptor.
       | 
       | Comparison:                   users         | where eventTime >
       | minus(now(), toIntervalDay(1))         | project user_id,
       | user_email
       | 
       | vs TQL:                   export         where eventTime > now()
       | - 1d         select user_id, user_email
       | 
       | [@]
       | https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...
       | 
       | [#] https://docs.tenzir.com/blog/five-design-principles-for-
       | buil...
        
       | ben_jones wrote:
       | I'm glad this exists but would caution extensibility as the most
       | important thing for devs to consider when picking there "ORM"
       | stack especially in terse Golang.
       | 
       | For that I use squirrel which uses the builder pattern to compose
       | sql strings. Keeping it as strings and interfaces allow it to be
       | very easily extended, for example I was able to customize it to
       | speak SOQL (salesforce). plenty of downide though.
        
       ___________________________________________________________________
       (page generated 2024-02-28 23:00 UTC)