[HN Gopher] PRQL: Pipelined Relational Query Language
___________________________________________________________________
PRQL: Pipelined Relational Query Language
Author : animal_spirits
Score : 247 points
Date : 2023-07-25 18:13 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| getmeinrn wrote:
| >declarative
|
| An admirable ideal, but declarative languages always seem to
| devolve towards some frankenstein imperative/declarative hybrid.
| We need to stop going down this path and embrace Pulumi's
| pattern: use existing general purpose imperative languages to
| generate a declarative structure. Instead, people try to take
| their not-mature declarative language and fit a weird general
| purpose language inside it.
|
| EDIT>> I'm not suggesting that SQL needs to be declarative, only
| that if a problem space would benefit from declarative
| structures, generate them imperatively instead.
| codethief wrote:
| I've silently been a big fan of this approach for years but
| hadn't heard about it anywhere (most likely because I'm not
| familiar with Pulumi).
|
| Could you provide a reference that elaborates on "Pulumi's
| pattern"?
| getmeinrn wrote:
| This is the closest example I could find
| https://www.pulumi.com/docs/concepts/how-pulumi-works/
|
| Pulumi serves as the strongest contender to Terraform when
| doing IaC (infrastructure as code). Terraform attempts to be
| a declarative markup language (HCL) but it has a lot of weird
| imperative quirks due to (understandably) trying to support
| common complex use cases. In the end they have a clunky
| custom language that tries to do what general programming
| languages have done well forever. Pulumi doesn't re-invent
| the wheel, and lets programming languages do what they do
| best. Pulumi only really cares that the programming language
| generates a declarative spec that can be used to compare with
| your infrastructure. It's the best of both worlds.
| codethief wrote:
| Thanks so much!
|
| > Pulumi only really cares that the programming language
| generates a declarative spec that can be used to compare
| with your infrastructure. It's the best of both worlds.
|
| Fully agree! I've called this approach "imperatively
| generating declarations".
|
| Things might get complicated, though, if you try to nest
| this approach. E.g. if you imperatively generate the
| structure of your deployment pipeline, which in turn,
| contains imperative code to generate your infrastructure
| spec. :) Or something like that[0].
|
| But that's probably just the nature of the problem.
|
| [0]: I can't come up with a fully realistic scenario right
| now because it's late. Maybe tomorrow!
| 0cf8612b2e1e wrote:
| PRQL seems the most realistic evolution out of SQL. Changing
| the programming paradigm will never convince the SQL true
| believers.
| getmeinrn wrote:
| I don't really know what you're saying, can you say it
| another way? "Most realistic evolution"... why is that
| needed? If the problem is different database engines
| implementing the SQL spec differently, that's not something
| that can be papered over with another abstraction without a
| lot of wrinkles.
| 0cf8612b2e1e wrote:
| SQL does not need to be replaced, in the same way that
| COBOl never needed to be replaced. It can get the job done,
| but we have superior tools today. Better abstractions
| enable more productivity.
|
| That there are N flavors of SQL is annoying, but there are
| foundational design choices in the language which we are
| stuck with today. PRQL is quite readable to those with SQL
| experience and feels like a plausible next language in the
| space without reinventing paradigms.
| getmeinrn wrote:
| To go back to my original post, my main beef is with
| turning an imperative language into a declarative one. If
| you've seen enough of these types of languages degrade,
| you start to see a pattern.
| sdesol wrote:
| This is from the GitHub page
|
| > PRQL is being actively developed by a growing community
|
| You can see this at https://devboard.gitsense.com/PRQL which
| shows a good mix of new and not new contributors. Not sure why
| there was a spike in the last couple of days though. Maybe the
| maintainers that in this thread can comment on this.
|
| Full Disclosure: This is my tool
| snthpy wrote:
| Great tool! Thanks for sharing that.
|
| Regarding the activity, there's been a lot of preparation for
| the 0.9 release. As to the new contributors, I don't know the
| reasons for that but all contributions are very welcome. It's a
| very welcoming community and a great way to make an impact!
|
| Full Disclosure: PRQL contributor
| sdesol wrote:
| Thanks for the insights. You might also want to add a "good-
| first-issue" topic an/or labels to your project to signal how
| open you guys are to new contributors. In the near future,
| I'm going to start surfacing projects with these
| labels/topics.
| heywhatupboys wrote:
| This is just tidyr ?
| aerzen wrote:
| In a way, yes.
| snthpy wrote:
| Kind of. dplyr, Pandas, LINQ, Kusto, etc... are big
| inspirations. The fact that these are so popular and have
| reinvented the same workflows with slightly different syntaxes
| to me is a sign that they capture something fundamental about
| how humans like to think about data transformations.
|
| PRQL is indeed very close to dplyr. In my (biased) opinion,
| PRQL is actually a bit cleaner than dplyr because it is its own
| language and doesn't have to work as DSL inside R. The same
| goes for comparisons with Pandas and Polars having to work as
| DSLs inside Python.
|
| Compare from mtcars filter cyl > 6
| select {cyl, mpg} sort {-mpg}
|
| with mtcars %>% filter(cyl > 6) %>%
| select(cyl, mpg) %>% arrange(dplyr::desc(mpg))
|
| Incidentally, I produced the dplyr code from the PRQL with
| ```R install.packages("prqlr", repos =
| "https://eitsupi.r-universe.dev", lib="~/.local/R_libs/")
| library(prqlr, lib.loc="~/.local/R_libs/")
| library("tidyquery") " from mtcars
| filter cyl > 6 select {cyl, mpg} sort {-mpg}
| " |> prql_to_sql() |> tidyquery::show_dplyr() ```
|
| (Disclaimer: I'm a PRQL contributor.)
| chubot wrote:
| Hm, I just realized there are two similar projects with very
| similar names: this one, and
|
| https://github.com/erezsh/Preql
| nonethewiser wrote:
| There seems to be very little support for these SQL challengers
| despite many people saying SQL sucks and they want a redo. Why is
| that?
|
| Some guesses:
|
| - There is some popular non-sql query language that has gained
| lots of momentum that I just dont know about.
|
| - People are more effective with SQL because they know it so no
| new and especially existing databases will switch. This is
| definitely true to some extent, but if this is the whole reason
| then I guess SQL is pretty good since people rely on it to be
| effective.
|
| - There is support for it but a few key decision makers are to
| blame (big data or the MYSQL lobbyists)
|
| - People actually dont dislike SQL that much and/or its not that
| bad and hackernews is not representative
|
| - These query languages look nice but have technical or
| performance issues compared to SQL
| rcme wrote:
| I think the main reason mainline SQLs continue to be popular is
| that, unlike general purpose programming languages, the
| implementation of the query runner is the most important
| aspect, not then language spec. Take Python for instance. The
| official implementation objectively sucks, yet it remains
| extremely popular because the language implementation is only a
| part of a language's success. That won't fly for a db engine.
| samwillis wrote:
| Previous Show HN: https://news.ycombinator.com/item?id=31897430
|
| _Show HN: PRQL 0.2 - a better SQL - 378 points by maximilianroos
| on June 27, 2022, 161 comments_
|
| Original post when it was conceived:
| https://news.ycombinator.com/item?id=30060784
|
| _PRQL - A proposal for a better SQL - 650 points by
| maximilianroos on Jan 24, 2022, 295 comments_
| t8sr wrote:
| If the main complaint people have about SQL is that you can't
| swap SELECT, FROM and WHERE, then that's pretty good for a
| language designed in the 70s.
|
| This, by contrast, looks like it has a bunch of random line noise
| for syntax. Why on earth should I like this:
|
| `join side:left p=positions (p.id==employees.employee_id)`
|
| better than this:
|
| `LEFT JOIN positions AS p ON p.id = employees.employee_id` ?
| jdmichal wrote:
| I've had two real gripes with SQL. The rest of it has been, as
| you said, pretty good.
|
| Complaint 1: Not being able to use selected columns later in
| the same select. SELECT
| gnarly_calculation AS some_value, some_value * 2 AS
| some_value_doubled
|
| Instead: SELECT subquery.*,
| some_value * 2 AS some_value_doubled FROM (
| gnarly_calculation AS some_value ) AS subquery
|
| Complaint 2: Not being able to specify all columns _except_.
| This combines with the above, where I have to pull some
| intermediate calculations forward from a subquery, but I don 't
| need them in the final output. So I have to then enumerate all
| the output columns that I actually want, instead of being able
| to say something like `* EXCEPT some_value`.
| snthpy wrote:
| PRQL fixes this.
|
| (Disclaimer: I'm a PRQL contributor.)
| zX41ZdbW wrote:
| Not only PQRL fixes this. It has been allowed in
| ClickHouse's SQL since its inception.
| snthpy wrote:
| That's awesome! ClickHouse is a great system by all
| accounts and I've been meaning to try out ClickHouse
| Local.
|
| I'm more familiar with DuckDB and they've also been doing
| some great innovation on the SQL front. I don't know
| offhand if they can also do the forward referencing thing
| but they allow putting the FROM first and having GROUP BY
| ALL etc..
|
| It's great to see all this innovation happening in the
| SQL and Query Language space more generally at the
| moment.
| zX41ZdbW wrote:
| It is strange to hear about innovation in DuckDB - I see
| that they are gradually re-implementing the stuff already
| existing in ClickHouse. Sometimes they do a better job at
| promoting it.
| snthpy wrote:
| That's probably true but the big differentiator was that
| DuckDB can run in your python process so there's very low
| fiction to adopt it.
|
| My impression of ClickHouse was that it was more like
| postgresql in that regard, i.e. OLAP : OLTP as ClickHouse
| : Postgres as DuckDB : SQLite.
|
| clickhouse-local may have closed the gap on that though.
| Can you embed it in Python as a library?
| zX41ZdbW wrote:
| Both complaints are resolved by ClickHouse.
|
| This video also covers many other advantages of ClickHouse's
| SQL dialect: https://www.youtube.com/watch?v=zhrOYQpgvkk
|
| Some may find your first complaint questionable... But I
| specifically designed ClickHouse SQL to allow aliases to be
| used and referenced in every part of SQL query.
| crote wrote:
| The main issue with SQL is that you are stuck in a very strict
| way of writing things, which does not clearly match to how I
| think. The top-down way of writing PRQL where each step is
| simply a transformation of the previous one makes way more
| sense to me. SQL is something I'd need a reference manual for,
| PRQL is simply writing down what I want the query to do.
|
| I do agree that PRQL's join syntax is extremely bad, though.
| They should've stuck to explicit "left join"-like keywords, and
| the alias & join column shorthand could be done better.
| kbenson wrote:
| Because now the type of join is an argument on the "join"
| operation, and join is the first word of that statement making
| it more obvious what the operation actually is? I also prefer
| foo(bar) over "bar to foo()" as well, which seems like the
| equivalent in a more general purpose function call in a
| language example.
| lofatdairy wrote:
| Maybe it's my familiarity with R and the tidyverse paradigm,
| but I think in general this paradigm and syntax is pretty
| readable. The example you chose is perhaps a case where
| brackets and commas go a long way, but that aside it's really
| not too bad in context of the rest of the language. `:` seems
| to consistently be treated as keyword arguments, and `=` for
| aliasing.
| breischl wrote:
| Having spent more time in MongoDB aggregations than I'd like to
| lately, I really wish they'd support this. So much more sensible
| than the madness they've got going on.
| snthpy wrote:
| This sounds interesting. Could you provide a link to the
| MongoDB aggregations that your are referring to because they're
| not something I'm familiar with?
|
| (Disclaimer: I'm a PRQL contributor.)
| zX41ZdbW wrote:
| Here are examples of MongoDB aggregations: https://github.com
| /ClickHouse/ClickBench/blob/main/mongodb/q...
|
| They are painful to write compared to SQL queries.
|
| Although the commercial version of MongoDB has support for
| SQL, it's not available for general MongoDB users.
| snthpy wrote:
| Thanks!
|
| Those do look awful and more what I remember from the brief
| period when I used MongoDB in around 2013 or so.
|
| I take it GP comment was just trolling then?
| des1nderlase wrote:
| So, in principle, how is this different approach than DBT?
| andrewl-hn wrote:
| For me the examples on the website https://prql-lang.org/ are the
| biggest selling point for PRQL, in particular the SQL it
| generates. It looks clean, straightforward, something I would've
| written myself.
|
| In general, I like this slightly more careful take on modern
| database development. 10-15 years people would start a brand new
| database like Mongo, or Riak, or Influx, or whatever, and would
| try to convince application developers to select it for new
| projects. But recently we started getting more conservative
| options like EdgeDB, TimescaleDB, or even PRQL which all expect
| us to run Postgres with some addons and / or query preprocessors.
| Tech like this is so much easier to adopt!
|
| I'm really liking what Edge folks are doing with schemas and
| migrations, but I do find PRQL syntax much more intuitive. My
| application code is littered with data transformation pipelines
| already: all these map / filter chains in TYpeScript, iterators
| in Rust, enumerables in Ruby, streams in Java, LINQ in .net, Rx
| in dozens of languages etc. etc. So the concept is very, very
| familiar. Getting nice SQL queries out of PRQL that I can store,
| inspect later, see the query plans, add indices where necessary,
| is just great. It's such a good tool!
| OJFord wrote:
| To me it seems quite nice, but really just trivially different
| from SQL - like if Ruby was 'friendlier syntax that transpiles
| to Python', meh? You'd use whichever you happened to learn
| first and not bother with the other. (That's often true even
| though it's more than that of course.)
|
| The examples arbitrarily make SQL look more verbose:
| SELECT id, first_name, age
| FROM employees ORDER BY age
| LIMIT 10
|
| Yes! Of course I'd rather: from employees
| select {id, first_name, age} sort age take 10
|
| ..but wait, actually the SQL could've been:
| select id, first_name, age from employees order
| by age limit 10
|
| and it's more verbose by a character or two... (no braces, but
| 'order by' vs 'sort')
| jtsuken wrote:
| Being able to pick the source (i.e. the FROM clause) first is
| useful in itself, as you then get the benefit of sensible
| auto-complete suggestions.
| andrewl-hn wrote:
| I use CTEs, window functions, and groupings all the time when
| I write reporting queries. These things tend to be much more
| verbose in raw SQL, and ORMs / Query Builders either do not
| support some of these features at all or do very poor job
| (like, force me to put raw SQL substrings in my code), or
| force to write DSLs that are even more verbose than raw SQL.
| Look at corresponding PRQL samples, and you may see an
| appeal.
|
| Also, I agree, no one should write SQL like this - screaming
| keywords, superficial newlines, etc. I don't think this style
| made sense ever, even in 1970s.
| pjot wrote:
| At least we can all agree on leading commas though!
| slotrans wrote:
| death before trailing commas
| Pxtl wrote:
| While there's some stuff in C#/LINQ/EF that's more verbose
| (left joins are often a nightmare) or not-supported, I'll
| always say that I prefer writing queries in EF than in SQL,
| at least when dealing with SQL features that are supported
| by EF (which is a lot of them, it's a very expressive
| dialect).
|
| But EF lets you start with FROM, lets you do whichever
| keywords you need in whichever order (instead of WHERE ->
| GROUP BY -> HAVING and you've got to CTE or Subquery if you
| want another GROUP BY). It also lets you access the members
| of a group because the objects are still treated as a graph
| instead of being pulverized into a flat table like SQL
| does. It also makes your FKs into first-class navigational
| properties of the table.
|
| Like, if I have an addressID and I want to get its country
| code?
|
| In MS SQL that's SELECT CountryCode
| FROM Country INNER JOIN ProvState ON
| ProvState.CountryID = Country.ID INNER JOIN Address
| ON Address.ProvStateID = ProvState.ID WHERE
| Address.ID = @AddressIDParam
|
| In EF that's db.Addresses
| .Where(a => a.ID == addressIDParam) .Select(a =>
| a.ProvState.Country.CountryCode)
|
| EF has a hell of a lot of flaws, but linguistically I love
| it. Yes there's a lot of aliasing boilerplate in EF, but
| the ability to walk the foreign keys and the fact that you
| can put the select _after_ the table name pays off so very
| well.
|
| Also there's a dialect of LINQ that looks more like SQL but
| it's kind of weird and I don't love it so I prefer to use
| the lambda syntax above.
|
| In that dialect, it's from a in
| db.Addresses where a == addressIDParam
| select a.ProvState.Country.CountryCode
|
| which is even more terse and SQL-y although I find it a
| weird linguistic mash-up.
| pragmatic wrote:
| The downside though is you have to grab the sql it's
| generating somehow to try to optimize it, figure out what
| crazy cross apply madness it's going or to figure out why
| it's blowing sql servers recursion limit.
|
| I prefer to avoid linq syntax now. It's a false economy.
| hn_throwaway_99 wrote:
| I agree that I think the difficulty with the uptake will be
| "it's really just less annoying SQL", and it's hard to
| overtake a technology that's so ubiquitous when your fixes
| are really "nice to haves" vs something truly
| transformational.
|
| That said, it's not the succinctness that's an improvement,
| it's that the pipelined nature of PRQL really maps much
| better to how people should think about queries, and also how
| the server executes them! Something as trivially simple as
| putting the from clause first means I can now get much better
| typeahead/autocomplete support from dev tools. Heck, I
| already do this now in a slightly more annoying manner: I
| write "select * from some_table ..." first, then I go and
| type out the actual columns I want because once the from
| clause is correct my IDE will autocomplete/show errors for
| possible columns.
| t8sr wrote:
| I don't get that - to me the examples are much less readable
| than SQL and I don't understand why I should want to use this.
| Like, yes, you can reorder the query sections, which seems to
| be everyone's complaint about SQL, but then you also have
| multiple types of brackets, colons and other syntax for no
| reason, all while not really accomplishing anything SQL doesn't
| already do.
|
| What's the attraction?
| andrewl-hn wrote:
| Oh, I'm a big fan of raw SQL, too. There's very little out
| there that can bring me same joy as a well-written 30-50
| lines-long sql query! Like, I can spend an hour or two on it,
| but when it runs and produces the results in the desired
| form, it's sooo rewarding! And realizing that producing
| similar results in my main language (be it Rust or
| TypeScript, or whatever) would take me 3-4 days instead (and
| it would run much longer) makes me appreciate SQL programming
| more and more. In terms of "usefulness per minute spend"
| nothing comes close to SQL!
|
| However, the syntax can be a bit clunky from time to time.
| I'm very fortunate that I use Postgres only in the past 5-7
| years, so CTEs, and things like Timescale or JSONB, or PlV8 /
| PlRust for stored procedures are often on the menu. Yet,
| simple things still require repeating fragments of code from
| time to time, and complex grouping or window functions code
| often looks too convoluted.
|
| And as I wrote in my GP comment: I (and many other
| developers) already use pipelines for data manipulation in
| code - beyond databases. Say, we got data from an API
| endpoint, or we mixing data from several data sources. SQL is
| familiar because I see it often, PRQL is familiar because I
| use similar pipelining concepts often in my no-SQL code.
| Would I use PRQL for super simple queries? Probably not.
| Would I be upset if one of my coworkers used it for simple
| queries? No, why would I? Would I try implementing complex
| reporting query using PRQL? Yeah, I would. Partially because
| I suspect I would get to my data in desired form quicker with
| pipelines, and partially because I know: if I get stuck I
| would convert my intermediary query to SQL and pick up from
| there.
|
| For me PRQL looks better then most DSLs for ORMs or Query
| Builders in most languages. Adding a new tool to a project
| may be annoying, but depending on a project I'd rather pay
| this price once if it makes my less SQL-savvy team mates more
| comfortable. "Yet another syntax" is obviously the main point
| against, but from the application developer perspective ORMs
| / Query Builders often force one to learn "yet another
| syntax" anyway. "Don't use ORM / Builder" is an often voiced
| opinion and yet in practice we work with them all the time,
| too.
|
| So, I view PRQL not as an alternative to SQL but as a third
| database communication dialect in addition to SQL and my ORM
| / Query Builder DSLs.
| the_mitsuhiko wrote:
| I don't use PRQL but I absolutely get the appeal but
| specifically on the readability part, some things that are
| easy in PRQL are just awful in SQL.
|
| From the website for instance this is a nightmare to do in
| SQL: from employees group role (sort
| join_date take 1)
| snthpy wrote:
| Unfortunately the linebreaks were lost and, as shown that
| isn't a valid PRQL query. It would have to be either
| from employees group role ( sort
| join_date take 1 )
|
| or from employees | group role (sort
| join_date | take 1)
|
| In English: Take the 1st employee
| by (earliest) join_date for each role
| from the set of employees
| zX41ZdbW wrote:
| ClickHouse:
|
| SELECT * FROM employees ORDER BY join_date LIMIT 1 BY
| role
| zX41ZdbW wrote:
| What does it mean? Is it the same as "LIMIT BY" in
| ClickHouse?
|
| https://clickhouse.com/docs/en/sql-
| reference/statements/sele...
| the_mitsuhiko wrote:
| It picks the first result grouped by role and sorted by
| join date. I believe this can be expressed with limit by
| in ClickHouse.
| snthpy wrote:
| If you're happy with SQL then there isn't much point.
|
| For the folks building and supporting PRQL, SQL just has a
| few too many warts and the popularity of tools like Pandas,
| dplyr, Polars, LINQ, ... shows that for analytical work we
| often like to work with our data differently. Other
| frameworks and languages feel that we should throw out
| Relational Algebra as well but we feel that's like throwing
| the baby out with the bathwater.
|
| PRQL's core tenets are that Relational Algebra is
| foundational and a great abstraction but what's needed is a
| modern, ergonomic language that allows us to work with data
| the way most of us conceive of it - as pipelines of
| transformations.
|
| Personally I feel much more productive in PRQL. I can start
| with a dataset and append one transformation at a time,
| writing from top to bottom as I go along without having to
| jump back and forth between the SELECT and the WHERE and the
| GROUP BY clause etc... .
|
| Also, if I want to take out a step, I can just comment out
| that line and the rest of the pipeline usually still works.
| This might seem like a minor thing but in my experience it's
| those kind of ergonomics that make the difference in actual
| day to day work rather than fancy theoretical features you
| only use once in a blue moon. It's therefore worth noting
| that this was an intentional design decision. You try and
| take out some steps from your SQL query and see how well the
| rest of your query holds up.
| snthpy wrote:
| Disclaimer: I'm a PRQL contributor.
| aidos wrote:
| For what it's worth, it looks really readable to me. I have
| decades of sql experience at this point so consider myself
| pretty proficient but I can see the appeal of having a
| terser syntax for transformations. I especially like the
| "it just makes sql" approach.
|
| Stepping through the second example on that page I know how
| I could do the same in sql, and I also know that it would
| be harder for most people to follow.
|
| Question about that example though - why is there an
| aggregate without an assignment to a variable (average
| gross_salary) like the other examples?
| snthpy wrote:
| I couldn't find that example right now but there is a
| similar line in the main example on the prql-lang.org
| homepage: aggregate {
| average total, ... }
|
| I can't definitively say why it is there, other than
| perhaps just to show that you can specify aggregations
| without having to give them an alias. The column name
| won't be pretty but if you're just interactively trying
| something out and want to see the results then you
| probably won't care.
|
| Does that help?
| aidos wrote:
| Gotcha. The thing that's not immediately clear from the
| syntax is which columns I'm getting out as a final result
| of my query. I guess the last select you ran + any
| derived ones since then?
| snthpy wrote:
| That's right. It can be a bit hard to keep track of but
| the rules are quite simple. You start with all the
| columns from a `from`, then any `derive` just adds
| columns. A `select` restricts the columns to those
| selected and a `group + aggregate` replaces the columns
| with those from the `group` + those from the `aggregate`.
|
| I once tabled a proposal whether each query should be
| forced to end with a `select` to make the final column
| list explicit. This could be generated by the compiler.
| It was felt that that wasn't necessary though and would
| also be somewhat arbitrary since you also need to know
| what columns are available at each previous step of the
| pipeline if you want to be able to make modifications. As
| the tooling improves, you could perhaps be shown the
| current list of columns as you hover over each line?
| aidos wrote:
| There's nothing stopping people from using their own
| convention of selecting early to grab the stuff they need
| and selecting again right at the end to be explicit about
| what they want. Like select *, it's better to just be
| explicit.
| spullara wrote:
| it seems like any SQL parser could let you put FROM first and
| solve a lot of the annoyances of SQL
| mulmen wrote:
| The stiff syntax is a _feature_. The infamous 1000 line
| sales report query is already a bear to maintain. If you
| get a clever developer reordering syntax it will only be
| worse.
| strokirk wrote:
| It's certainly _not_ a feature. A lot of important SQL
| usage is ad-hoc queries, and they are more annoying to
| type than they should be.
| dlisboa wrote:
| Maybe you didn't see the more complex examples, but some of
| the SQL queries it abstracts are very unintuitive to write by
| hand. PRQL turns it into something more readable and closer
| to the expected mental model.
|
| For people who can write error-free and complex portable SQL
| queries by heart this may not be really interesting. From
| experience, however, that is not a skill most developers
| have.
| hn_throwaway_99 wrote:
| The attraction is that, especially for more complicated
| queries with complex joins, subqueries, aggregates, etc.,
| that the structure of PRQL much more closely matches the
| "English", mental-model of the query. I can just read PRQL
| linearly, from top to bottom, and at each point I know what
| is being "pipelined" into the next phase of the query.
|
| With SQL I have to look all over the place. And it's not just
| that FROM should come before SELECT, it's that if I'm doing,
| say, an aggregation with a group by, normal SQL doesn't
| really have anything to make me think about the ungrouped
| rows, and then merge them together to get the aggregate
| values. With PRQL I can just go top to bottom, and for me
| it's much easier to reason about (i.e. first get all the
| rows, then group by some specific columns, then take
| aggregates, etc.)
|
| And I say this as someone who spends about half my days in
| SQL at present.
| pragmatic wrote:
| The downside is, debugging what actual sql this monstrosity
| spits out.
| hn_throwaway_99 wrote:
| Two points:
|
| 1. The SQL that I've seen output by PRQL isn't monstrous,
| at least no more than it would be coding by hand.
|
| 2. I do agree that I don't think PRQL will have much
| uptake until it is baked in to DB systems, at least as a
| plugin. One reason (among many) that I hate ORM systems
| is that with native SQL it's very easy for me to just
| capture a SQL statement from my DB logs and then search
| my code for an offending statement. But given how PRQL is
| transpiled, I don't see any reason it couldn't be
| essentially part of the query planner, so that query logs
| etc. are reported in PRQL and not SQL. E.g. the query
| planner already takes text (SQL) -> Query Plan, it seems
| rather straightforward to go text (PRQL) -> text (SQL) ->
| Query Plan, with the SQL intermediate just being an
| implementation detail I don't have to care about as a DB
| user.
| slt2021 wrote:
| I think main difference is how PRQL translates into query
| execution plan, with SQL you need to read entirety of query
| to get rough understanding of order of operations (especially
| if you join two-three tables and have some nested and lateral
| queries).
|
| with PRQL I see that the order will be explicit and set up by
| developer, so any footguns will be evident.
|
| things like predicate push down, optimization fence, variable
| rewrite, etc are not needed to be relied upon, because
| pipeline of query is more explicit in PRQL
|
| also since it is new lang, it can be naturally extended into
| ETL type data pipelines
|
| also because PRQL can be translated into query execution plan
| - it can be converted into non-SQL languages like MongoDB or
| pandas / spark / etc, eliminating SQL altogether for
| distributed nosql engines
| aidos wrote:
| Is that true? Doesn't this just effectively translate into
| sql to be executed on the db as usual if you're running on
| an rdbms?
|
| Looking at the examples and the sql generated in the
| playground you're just going to have the query planner
| doing its thing to decide how to execute eventually anyway.
| slt2021 wrote:
| It transpiles into sql currently, but my understanding it
| is not long term vision.
|
| Language arch allows in the future to replace sql for
| querying data
| 0cf8612b2e1e wrote:
| The attraction is something that was designed after decades
| of usage and PL research. Consistency of syntax is a big one
| for me. A favorite example of mine: SELECT
| substring('PostgreSQL' from 8 for 3); SELECT
| substring('PostgreSQL', 8, 3); -- PostgreSQL-only syntax
| SELECT trim(both from 'yxSQLxx', 'xyz'); SELECT
| extract(day from timestamp '2001-02-16 20:38:40');
|
| Taken from: https://www.edgedb.com/blog/we-can-do-better-
| than-sql
|
| Maybe if SQL would give me that monumental ask of trailing
| commas, perhaps I would hate it less.
| soulbadguy wrote:
| Naive question :
|
| Isn't something like F# computation expression or C# LINQ a
| simpler and more generic way of acheiving the same effect?
| dtheodor wrote:
| Reminds me of Microsoft's KQL https://learn.microsoft.com/en-
| us/azure/data-explorer/kusto/...
| clatan wrote:
| The sooner SQL is phased out in favor of something more akin to
| Hadley Wickham's dplyr the better. Don't particularly like the
| syntax of this but it's the right direction.
| zX41ZdbW wrote:
| The ergonomics could be better:
| https://pastila.nl/?01359244/9ac65f960385a02b3193778b4c6af10...
|
| Significant newlines. Non-obvious way to specify database/schema.
| snthpy wrote:
| There are two symbols for separating transforms in a pipeline -
| \n and | so you can use | instead of \n if you prefer.
|
| So for your query from the linked gist, the following should
| work: from system.numbers | select number
|
| Otherwise, if you want to be really strict:
| prql target:sql.clickhouse from `system.numbers` |
| select {number}
|
| You can see here for more details: https://prql-
| lang.org/book/reference/syntax/keywords.html
| zX41ZdbW wrote:
| We have recently merged PRQL support into ClickHouse:
| https://github.com/ClickHouse/ClickHouse/pull/50686
|
| It's currently more like an experiment - I'm not sure if it will
| be usable or useful. There are some concerns about Rust, although
| minor:
| https://github.com/ClickHouse/ClickHouse/issues/52053#issuec...
| snthpy wrote:
| This is awesome! Thank you!
|
| Would that mean it would also be able to use it in clickhouse-
| local?
| zX41ZdbW wrote:
| Yes: clickhouse-local --dialect prql
|
| or: $ clickhouse-local :) SET
| dialect = 'prql'
| smartmic wrote:
| This must not be missing here then: "I don't want to learn your
| garbage query language" [1]
|
| [1] https://erikbern.com/2018/08/30/i-dont-want-to-learn-your-
| ga...
| default-kramer wrote:
| The best rebuttal: https://www.scattered-
| thoughts.net/writing/against-sql/
| maximilianroos wrote:
| Big fan of this post! We link to it from PRQL website.
|
| Our goal for PRQL is a great language to integrate with & build
| on. So we don't have N languages for N databases. Because PRQL
| will always be open-source, and won't ever have a commercial
| product, we think that's much more feasible than a DB-specific
| or product-specific language.
|
| (PRQL dev here)
| netcraft wrote:
| I dont know the right words, but creating a plugin for DBT for
| this would give it a lot of traction I bet.
| snthpy wrote:
| Totally agree. This was a high priority goal for us and
| @maximilianroos did a lot of work putting something together.
| In the end, I believe it was really fighting against how dbt is
| set up though as it is so tightly interwoven with Jinja2.
| Perhaps with the introduction of the Python dbt models, things
| might have changed now and it's worth taking another look.
|
| https://github.com/PRQL/dbt-prql
|
| (Disclaimer: I'm a PRQL contributor.)
| LinXitoW wrote:
| This feels like the perfect level of SQL abstraction. SQL is
| great in theory, but gets really hard too read once you throw in
| joins with subqueries.
| mulmen wrote:
| Yes poorly written code is hard to read. That flaw is not
| unique to SQL.
| smartmic wrote:
| Why not break these subqueries into their own common table
| expressions and prepend them to your join statements?
| mulmen wrote:
| Or their own tables or views.
| aidos wrote:
| Adding tables doesn't sound like a solution at all and
| views come with operational overheads that you need to be
| aware of.
|
| CTEs should be the first thing you reach for when trying to
| clarify the intent of the way you're chopping up your data.
| mulmen wrote:
| Depends on the situation. Creating tables is a space/CPU
| tradeoff. Views can be managed the same way as the rest
| of your data pipeline.
|
| CREATE TEMPORARY VIEW and CREATE TEMPORARY TABLE are both
| valid SQL.
|
| > CTEs should be the first thing you reach for when
| trying to clarify the intent of the way you're chopping
| up your data.
|
| Sure but they aren't a silver bullet. Hence the "or" in
| my comment.
|
| I use Redshift a lot and temporary tables are useful
| there for defining the sort and distribution keys when
| those are not favorable to the current query. Think
| joining several demographic tables into a single customer
| dimension, distributing it on the customer key then
| filtering the product_id distributed sales and re-
| distributing it on customer before joining. You can't do
| that with a CTE.
|
| Views are handy when you have ugly CASE statements or
| other derived logic and need a convenience column that
| does something like string concatenation. I have a whole
| repository of them I deploy on top of the default system
| views for simplifying admin tasks.
|
| If you are struggling to keep your SQL clean it's because
| you are doing too much at once. Take a step back and re-
| evaluate your data model. If your physical model is bad
| then no number of CTEs will save you.
| aidos wrote:
| CTEs are basically temporary views and tables.
|
| I'd argue that the extra tables with duplicate data
| aren't a space / cpu trade-off as much as they are a data
| management trade-off. There are cases for duplicating
| data in the system but I'd leave that to indexes and
| generated columns unless you really need to materialise
| it in a different structure.
| mulmen wrote:
| > CTEs are basically temporary views and tables.
|
| Not in any system I have ever used. CTEs aren't
| materialized so they're not tables.
|
| Tables (or materialized views if you prefer) are
| literally space/CPU tradeoffs. I have a data pipeline
| that combines website logs into a table before I join it
| with product data. If you tried to do both in one query
| then the DB falls over.
|
| You seem to be coming at this from a OLTP perspective but
| in OLAP data "duplication" is the name of the game. But
| what naively looks like wasteful duplication is actually
| creating coherent analytical models from disparate data
| sources. This leads to much simpler analytical queries.
| aidos wrote:
| I agree with your further example cases (they were added
| after my comment).
|
| I was thinking more of the original post where someone
| was complaining about readability when you have joins and
| subqueries - which is the usecase for CTEs.
| mulmen wrote:
| Yeah sorry I edit a lot. I'll bump up my delay.
|
| Tables, views, and CTEs are all tools that can be used to
| make SQL more readable. They are all valid alternatives
| to subqueries in JOINs which is the only thing I would
| say you should "never" do.
|
| I tend to use all of them. I create tables with an
| optimized (typically star) schema for my purposes. My
| date dimensions are almost always views on top of a list
| of dates. I also use views to create a ~"feature store"
| that pre-joins the underlying star schema. I can then
| write "simple" analytical queries that utilize CTEs.
| Those CTEs tend to not have joins and they tend to only
| apply relevant filters so the final select is clear and
| concise.
| nonethewiser wrote:
| This is tangential but a new query language is inevitably based
| in the idea that SQL is deficient in some manner (hard, not
| ergonomic, whatever). More interestingly, it also implies that
| the countless alternatives aren't good enough either.
|
| Is there an existing query language that anyone will argue is
| better than SQL? I have limited exposure on this, but if SQL is
| really not that good then I'd expect there to be a better one at
| this point. All the new entrants kinda suggest that maybe SQL
| isn't so bad after all. Even many noSQL db's use an SQL-like
| language (see PartiQL). I'm not that familiar with alternatives
| though.
| animal_spirits wrote:
| I counter the argument that just because it is still around it
| is the best solution. Many things stick around just because of
| inertia, for example, the QWERTY keyboard.
| halayli wrote:
| You're comparing apples to oranges. Typing is much harder to
| relearn due to muscle memory. New well designed programming
| languages become quite popular quickly and go viral when they
| solve the right problems for developers.
|
| SQL is a very sound querying language originally based on
| relational algebra and tuple relational calculus. Many
| attempts have been made in the past to come up with better
| syntax but they don't go far which might have to do with the
| fact that authors realize later that SQL is just a thin layer
| masquerading the mathematical concepts required to retrieve
| relational data and make certain assumptions without breaking
| set theory axioms.
|
| One common mistake I see is that developers attempting to
| create an SQL replacement often approach it in the same
| mindset as creating yet another general purpose programming
| language.
|
| Edit: The above is not in reference to PRQL. It's just my
| anecdotal experience working on several SQL engines, building
| ones from scratch and working in this domain for 15 yrs.
| nonethewiser wrote:
| Then state the better solution.
|
| Besides, I'm not saying it's the best because it's most
| common, or even that its the best at all. It just makes it
| difficult to understand the argument that SQL sucks when its
| so widely used and no one can agree on an alternative.
| Including in new databases which could opt for these "new and
| improved" variants.
|
| I grant that there is a first-mover advantage and it's not
| easy to shift paradigms. That's a big factor. That's why I am
| honestly - not rhetorically - asking which query language is
| clearly better than SQL? And why do people keep making new
| challengers instead of backing something that's already
| better than SQL? There seems to be very little support for
| these query languages despite many people saying SQL sucks
| and they want a redo.
| fijiaarone wrote:
| Big keyboard is keeping DVORAK down! In a fair world we'd all
| be typing NewQL faster with AI from our flying electric cars
| in a communist utopia.
| 0cf8612b2e1e wrote:
| I think of it in the same way as JavaScript. Clearly, it has
| issues, but if you want to work on the web, it is what you have
| available. If you want to speak to Oracle/MySQL/SQL
| Server/Postgres database you are writing SQL. New entrants,
| could make a new language, but now they are fighting a battle
| on two fronts: the novel technology +query language.
| mhink wrote:
| > I think of it in the same way as JavaScript. Clearly, it
| has issues, but if you want to work on the web, it is what
| you have available.
|
| And, interestingly, the approach they're taking here is
| similar to how folks have dealt with JS: introduce a
| transpiled language whose paradigms are close enough to the
| host language to feel familiar. Reminds me of CoffeeScript,
| actually (although if we're being honest, I couldn't stand
| CoffeeScript.)
| 0cf8612b2e1e wrote:
| I really want this to take off and become a first party supported
| language for Postgres. Yes, yes, relational algebra is beautiful
| and all that jazz. SQL is a garbage, first pass design that
| should have been replaced decades ago. Even Codd has complaints
| about it. It is amazing what he invented, but we have learned a
| lot about PL design since then.
| aerzen wrote:
| People really like to associate relational algebra with SQL,
| probably because they learned them one alongside another. But
| SQL is really terrible relational language - it breaks a few
| core concepts in different places, like relations being
| unordered sets, that you can ORDER BY. This bubbles up as
| relations losing ordering after being wrapped into a subquery,
| which is really unexpected.
|
| PRQL has a data model very similar to the relational one. The
| only big difference is that relations are ordered - they are
| defined as arrays of tuples. So let's hope that PRQL gets to be
| known as "the relational language Mk II"
| burcs wrote:
| Yeah, I have never liked that you choose what you are querying
| before you select the source. I think the formatting here is so
| much more intuitive.
| snthpy wrote:
| Not quite what you're asking for but DuckDB has both PRQL [1]
| and Postgres [2] extensions, so you could probably query your
| Postgres database with PRQL from there.
|
| There's also a DBeaver plugin [3] which we still need to
| document better and simplify the usage of but you could
| potentially also use that to query Postgres with PRQL.
|
| Finally there is pyprql [4] with which you could query Postgres
| from a Jupyter notebook.
|
| [1]: https://github.com/ywelsch/duckdb-prql
|
| [2]: https://duckdb.org/docs/extensions/postgres_scanner.html
|
| [3]: https://github.com/PRQL/prql/issues/1643
|
| [4]: https://github.com/PRQL/pyprql
|
| (Disclaimer: I'm a PRQL contributor.)
| vkazanov wrote:
| Relational algebra is still great, it is the sql that made an
| unbelievable mess out of this beautiful idea.
| lovasoa wrote:
| I am currently building a SQL-only web framework [1], and I was
| contacted by the prql folks to include the language by default in
| sqlpage. I have to say, I have mixed feelings about it: on one
| hand, it indeed feels more natural to query data with a clean
| pipeline of operators. On the other hand, it makes me think of
| the famous xkcd comic about standards [2]: there were too many
| data querying languages, so we made a new one.
|
| [1] SQLPage: https://sql.ophir.dev
|
| [2] https://xkcd.com/927/
| RGamma wrote:
| Sweet, it looks like the language M should have been (or should
| become)!
|
| Granted, M would massively benefit from better tooling too.
| snthpy wrote:
| Ah, nice to see a mention of M. I don't see it mentioned a lot
| on HN and get the feeling most folks here probably don't even
| really know about it because it's so embedded in PowerQuery and
| PowerBI. It's actually not a bad language, it's just quite
| verbose in terms of the keywords.
|
| I actually draw on my experience with M a lot in terms of the
| input that I give to PRQL because for data munging M is quite a
| good language.
|
| Until we did the [] to {} syntax swap in 0.9, PRQL actually
| started to look a lot like M because M is the only other
| language I know that has [] for tuples.
|
| (Disclaimer: PRQL contributor here)
| Taikonerd wrote:
| The limitation of PRQL is that it only does SELECTs, by design.
| If you want to insert/update/delete data, you're back to SQL.
|
| That means that your team's data scientist might give you a query
| written in PRQL, but if you want to actually incorporate it into
| the data pipeline, you'll need to translate it into SQL.
|
| I wish that PRQL would support at least a limited ability to
| insert -- for example, maybe just the case of inserting into a
| new temp table. No update or ON CONFLICT logic to worry about. It
| could look like this: from tracks
| filter artist == "Bob Marley" save bob_marley_tmp
| snthpy wrote:
| This is well timed as PRQL 0.9 was just released a few hours ago.
| Have a look at the release notes here:
|
| https://github.com/PRQL/prql/releases/tag/0.9.0
|
| There is a rather large breaking change in the syntax from `[]`
| to `{}` for tuples. This is because initially it seemed like
| these were lists but over time we've realised that they are
| actually tuples. As per the release notes, freeing up `[]` clears
| the way to start supporting arrays soon.
|
| Disclaimer: I'm a PRQL contributor.
| yevpats wrote:
| Looks interesting but I wonder if this abstraction doesn't leak
| and it doesn't just become a more limited version of vanilla
| python + vanilla SQL.
| aerzen wrote:
| Vanilla python? Where do you get this impression?
|
| The leak to SQL is trough s-strings, which are chunks of SQL
| that gets inlined into the resulting SQL. The long-term plan is
| to remove them completely, but they are needed as an escape-
| hatch for now.
| yevpats wrote:
| I see that PRQL is compiled to SQL. How do you deal with
| different type of SQLs? i.e Snowflake/Postgres/Clickhouse?
| Isn't it a potentially impossible abstraction or at least
| there will be places where you will need to use the native
| support sql language (unless of course databases will adopt
| that in the first place which might be great)?
| sbuttgereit wrote:
| This.
|
| If you are layering an abstraction on top of SQL to avoid the
| SQL, you fail that goal once something goes wrong.
|
| The reason you fail the goal is that you end up having to have
| skills to troubleshoot the SQL anyway. Wrong here can be: bad
| result, bad performance, or even just bad syntax. Unless you're
| ready to accept being helpless and rely on the knowledge of
| others... you'll want to look at what's actually hitting the
| database to know if it's a database problem, a PRQL problem, or
| problem in your own code and to do that we'll you better be
| well versed in SQL, your database, and how your DB vendor have
| implemented their flavor of SQL. So to use this DSL truly
| competently, you need to know everything that you'd need to
| know to write plain SQL... and then PRQL and how it might
| translate into the target SQL of your application. And taking
| PRQL on its own it doesn't really seem any simpler than SQL
| from what I can see... what you get in increased compositional
| simplicity you have to pay back in understanding its own
| quirks.
|
| To be fair this approach can be worth it, but I don't think for
| a stand alone DSL like PRQL. For example, in .NET land there's
| C# and Linq where you get a query DSL... but you also get a
| more seamless integration into C# and its paradigms. Same could
| be said of Elixir and the Ecto query DSL; all the other things
| I above remain true, but the DSL brings the value of clean
| integration in to the code base of your app.
|
| Needless to say, I understand the why they'd try something like
| this... but I'm skeptical that the marketing here seems to
| oversell what you end up really getting.
| bob1029 wrote:
| Forwarding this to some of my team members right now.
|
| We use SQL for a lot of things, almost all of them implementation
| of customized business rules. When attempting to develop a
| business rule that needs to manifest as a single SQL query, your
| brain is forced to work in a very particular crystalline way.
| Many business rules are defined in procedural terms at
| requirements time, so having a way to implement them in the same
| could be beneficial.
| evantbyrne wrote:
| Has anyone benchmarked it or the bindings? SQL tends to be
| embedded directly in software programs, so building would likely
| happen on the fly in real-world cases.
| karmakaze wrote:
| How are joins handled? A relational db is largely about the
| relations, otherwise this is more of a document query language.
| athenot wrote:
| I had the same question. From the docs, it looks pretty
| elegant: from employees join
| side:left positions (employees.id==positions.employee_id)
|
| which translates to SELECT
| employees.*, positions.* FROM
| employees LEFT JOIN positions ON employees.id =
| positions.employee_id
| qorrect wrote:
| I like it too. You can also alias them easily
| from e=employees join p=positions
| (e.id==p.employee_id)
| trollied wrote:
| positions p
|
| is a SQL alias. Was that too difficult?
| kbenson wrote:
| It's less about it being difficult, and more about it
| being easy, obvious to read/write, and consistent within
| the _new_ design. Was that not obvious?
| slotrans wrote:
| Hot takes: SQL is good. Great, in fact.
|
| Not just in what it's capable of doing, but in _form_. It has
| warts, yeah, but over the years I 've been writing it I've
| realized most of them are there for a reason. To me it's more
| than useful, it's _beautiful_. It 's the absolute last thing I
| want a replacement for.
| cowthulhu wrote:
| This is insanely cool. It'll be interesting to see if this gets
| decent adoption, and whether the query language ends up being
| easier to maintain than sql.
| kot-behemoth wrote:
| Just a heads up that Prefect integration link on the landing page
| doesn't work - 404. This is the one I was interested in!
| maximilianroos wrote:
| Thanks! We just did a big refactoring so we do have some links
| failing in our tests. Just PR-ed a fix to this one:
| https://github.com/PRQL/prql/pull/3075
|
| (PRQL dev here)
| snthpy wrote:
| Sorry that that was your first experience. I have opened an
| issue for this for you:
| https://github.com/PRQL/prql/issues/3074
|
| The correct link is the following: https://prql-
| lang.org/book/project/integrations/prefect.html
|
| TBH, that one is not much of an integration yet. It essentially
| boils down do. $ pip install prql-python
|
| and then import prql_python
| PostgresExecute.run(..., query=prql_python.compile(prql))
|
| I don't personally work with Prefect, so if you have any ideas
| about what you would like to see here, please comment on the
| issue or on Discord. We're also very open to Pull Requests and
| they usually get merged fast ;-)
|
| Disclaimer: I'm a PRQL contributor.
| sakras wrote:
| It's been really cool seeing PRQL come to life! I'm not involved
| with it, but I still remember the very first HN post about a year
| ago where it was just a proposal. And look at it now! It's really
| gaining steam, I'm really excited about it!
| maxloh wrote:
| How does this compare to EdgeQL?
| jakubmazanec wrote:
| This seems like just a compiler from a better language (PRQL)
| to SQL, while EdgeDB is actually a database that comes with not
| just a better language (EdgeQL), but also other great features
| (I love their TypeScript query builder or how incredibly
| frictionless the migrations are).
| glogla wrote:
| Last time this showed up on HN, I complained about it not having
| CASE WHEN equivalent, which I've seen heavily used in ETL
| usecases. Now I see they added it!
|
| Pretty cool and responsive. I'll go take it up for a spin.
| rzmmm wrote:
| I couldn't find a mention if this supports SQL transactions?
___________________________________________________________________
(page generated 2023-07-25 23:00 UTC)