[HN Gopher] Show HN: PRQL 0.2 - a better SQL
___________________________________________________________________
Show HN: PRQL 0.2 - a better SQL
Hi everyone -- thanks for your interest in PRQL -- let us know any
questions or feedback! We're excited to be releasing 0.2[1], the
first version of PRQL you can use in your own projects. It wouldn't
exist without the feedback we got from HackerNews when we
originally posted the proposal. [1]:
https://github.com/prql/prql/releases/tag/0.2.0
Author : maximilianroos
Score : 239 points
Date : 2022-06-27 17:03 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ewuhic wrote:
| Previous discussion: PRQL - A proposal for a better SQL
| https://news.ycombinator.com/item?id=30060784
| nawgz wrote:
| Is it true that this is somehow an analogue to how JS development
| is really ultimately targeting browser-compatible-JS in the end,
| even though we use the latest ECMAScript features & TypeScript in
| development? I.e. is it expected someone writes PRQL and then
| transpiles before executing against the database? Is there a REPL
| one can use against a local Postgres or something?
| maximilianroos wrote:
| > Is there a REPL one can use against a local Postgres or
| something?
|
| Somewhat -- you can use it in Jupyter now[1]; e.g.:
| %%prql from p = products.csv group categoryID (
| aggregate [average unitPrice] )
|
| This doesn't yet have the benefits we'd get from e.g.
| autocomplete, so there's much more to do there.
|
| There's also a cool TUI in PyPrql[2]
|
| [1]: https://pyprql.readthedocs.io/en/latest/magic_readme.html
|
| [2]: https://pyprql.readthedocs.io/en/latest/readme.html
| notimportant0 wrote:
| I mostly work using T-SQL but I like PRQL.
|
| Are you able to add in the examples the following:
| 1) Use of delimiters for names that include space, etc. I don't
| know if PRQL uses double quote or square brackets. 2)
| Use of two/three/four-naming convention to refer to servers,
| databases, tables and columns.
| exabrial wrote:
| I don't really find SQL that difficult, other than NULL !=NULL, I
| wish <=> could be switched to the default!
| bornfreddy wrote:
| Looks awesome! I don't think it adds much to SQL when the queries
| are simple, but when you have this looong and complex query I can
| totally see the appeal.
| gigatexal wrote:
| I'm a raw-SQL-no-ORM snob and I really like this. I'd like to see
| it become more mainstream.
| dvirsky wrote:
| Nice. A few years ago I designed a query language to do
| aggregations in RediSearch, and it's quite similar in its
| structure.
| https://redis.io/docs/stack/search/reference/aggregations/
| dmeijboom wrote:
| I recently started implementing the Postgres protocol in Rust
| (https://github.com/dmeijboom/postgres-conn). So I guess I'll be
| experimenting with creating a Postgres proxy which translates
| PRQL on-the-fly.
| maximilianroos wrote:
| That looks really exciting! Please keep us in touch with your
| efforts and let us know if there's any way we can be helpful.
| jeroen79 wrote:
| SQL doesn't need fixing or improving if you ask me, its well
| supported and just works.
| vander_elst wrote:
| Congrats for the milestone!! The syntax looks more intuitive than
| SQL. Great to see viable alternatives to SQL!
| ssalka wrote:
| I just noticed the play on words "prequel" vs "sequel". Nice.
| slotrans wrote:
| Hot takes: SQL is great, actually. This thing isn't better.
| breakfastduck wrote:
| As much as I love SQL it can often be a pain and involve lots
| of nested subqueries to do 'simple' things. I like this way
| this abstracts it.
|
| Would I use this instead of proper SQL in a data warehouse /
| large app? Maybe not.
|
| Would I use it to manually query DBs when I need some ad hoc
| info? For sure.
| notimportant0 wrote:
| If this becomes a full-fledge DQL that can be used in a proc
| or function in a running Postgres instance, I would use it in
| production.
| mellosouls wrote:
| No, but it's not as foreign a paradigm or language to
| (presumably?) its main target audience - developers.
| [deleted]
| emmelaich wrote:
| Doesn't the pipelining mean that it's not declarative? At least
| not in that part.
| OJFord wrote:
| Somehow I didn't see it coming -
|
| > pronounced "Prequel".
|
| - and I burst out laughing. Very good.
| michelpp wrote:
| It seems like an obvious next step for Postgres support would be
| to make PRQL a stored Procedure Language.
|
| create function foo() returns bar as language prql $$<prql code
| here>$$;
| ranjanprj wrote:
| Great project, was looking something like this
|
| but wish this was somehow encoded as JSON, so you could easily
| build pipeline UI for complex SQL Generation.
| cogman10 wrote:
| Awesome to see the progress here. Looks like the language has
| significantly matured since last it popped up on HN.
| oarabbus_ wrote:
| Is it correct there's no CASE WHEN and instead you have to define
| a function using a ternary operator? CASE WHENs may be verbose
| but when you have a dozen of them they're more readable and the
| waterfall nature is far preferable to a giant block of ternary
| clauses.
| qolop wrote:
| Why should I use this instead of SQL?
| snthpy wrote:
| That's a really good question! (and one we should probably
| answer explicitly in the [FAQ](https://prql-lang.org/faq/)
| rather than just implicitly)
|
| The README states that "PRQL is a modern language for
| transforming data -- a simple, powerful, pipelined SQL
| replacement. Like SQL, it's readable, explicit and declarative.
| Unlike SQL, it forms a logical pipeline of transformations, and
| supports abstractions such as variables and functions. It can
| be used with any database that uses SQL, since it transpiles to
| SQL."
|
| What that means to me is that PRQL more naturally maps onto how
| I think about and work with data.
|
| Say I have some dataset, `employees`, and I want to answer some
| questions about it like, for US employees, what is the maximum
| and minimum salary and how many employees are there:
| from employees filter country == "USA"
| # Each line transforms the previous result. aggregate [
| # `aggregate` reduces column to a value. max salary,
| min salary, count,
| # Closing commas are allowed :) ]
|
| Moreover, after each line you have a valid pipeline which you
| can transform further by adding more steps/lines to your
| pipeline. This matches more closely how people construct data
| pipelines in R using dplyr/tidyverse and in Python using
| Pandas.
|
| If you find that it doesn't map well onto how you think about
| data pipelines then please let us know as we're constantly
| looking for more real world examples to help us iterate on the
| language!
| hui-zheng wrote:
| One benefit of SQL is that the Database Engine will do the
| hard work of optimizing the query plan.
|
| Do you think the SQL complied by PRQL could be as effective
| and optimized by database engine as the direct-written SQL?
| snthpy wrote:
| As you said, let the Database Engine do the hard work of
| optimizing the query plan for you.
|
| I currently have no reason to believe that the PRQL
| generated SQL would be any worse than hand written SQL.
| That said, I don't think we've currently looked at any ways
| of passing hints to the query planner. We're always open to
| suggestions!
|
| In the worst case, you have full access to the generated
| SQL, and for absolutely crucial queries you can hand modify
| that SQL. At least PRQL might have saved you the trouble of
| writing a cumbersome window function or something like that
| (see for example the example of picking the top row by some
| GROUP BY expression).
| Cilvic wrote:
| This reminds me of KUSTO I'm not sure how it compares to SQL
| in general. But it was really fun to work with for querying
| Azure application insigts
| ithrow wrote:
| To avoid working with SQL strings.
| hui-zheng wrote:
| SQL/jinja like dbt could also avoid working with SQL strings.
| what would be the better advantage?
| snthpy wrote:
| There is already an integration for dbt:
| https://github.com/prql/dbt-prql
|
| For example {% prql %} from
| source = {{ source('salesforce', 'in_process') }}
| derive expected_sales = probability * value join {{
| ref('team', 'team_sales') }} [name] group name (
| aggregate (sum expected_sales) ) {% endprql
| %}
|
| would appear to dbt as SELECT
| name, SUM(source.probability * source.value) AS
| expected_sales FROM {{
| source('salesforce', 'in_process') }} AS source
| JOIN {{ ref('team', 'team_sales') }} USING(name)
| GROUP BY name
|
| dbt is definitely a use case we are very aware of and I am
| personally very keen on (since I use that in my $dayjob).
| With some of the ideas in
| https://github.com/prql/prql/issues/381 , I think PRQL
| could really shine in this area!
|
| With your contribution we can get there faster!
| tstack wrote:
| I work on a TUI logfile viewer that uses SQLite as a backend
| for doing analysis on the log messages (https://lnav.org).
| However, writing SQL interactively is painful since you can't
| really provide good auto-complete or preview, which is
| something I try to provide for most other operations.
|
| The PRQL pipeline syntax would make for a much better
| experience for lnav since you're able to progressively refine a
| query without having to jump around. (You've probably noticed
| that many log services, like Sumologic, already provide a
| pipeline-style syntax instead of something SQL-like.) The nice
| thing is that you can simply keep typing to get the results you
| want and get a preview at each stage. For example, entering
| "from" and then pressing <TAB> would make it clear to the
| program that table-names should be suggested. The program could
| then show the first few lines of the table. Typing "from
| syslog_log | filter " and then pressing <TAB> would make it
| clear that columns from the syslog_log table should be
| suggested (along with some other expression stuff). And, then,
| the preview of the filtered output could be shown.
|
| In the current implementation, pressing <TAB> just suggests
| every possible thing in the universe, whether it's appropriate
| or not. This leaves the poor with not much help after they've
| typed "SELECT". I find myself having to lookup docs/source to
| figure out column names or whatever and I wrote the darn thing.
| Ultimately, I think the analysis functionality just doesn't get
| used because interactively writing SQL is so user-hostile. So,
| I'm looking forward to seeing this succeed so that I can
| integrate it and still be able to use SQLite in the backend.
| davidw wrote:
| A good example might be a groupwise maximum. Those always tend to
| be a bit of a PITA in SQL if you're not writing them regularly.
| Be interesting to see what it transpiles to, as well.
| aerzen wrote:
| If you only want maximum of one column, the PRQL is quite
| simple: from my_table group column_a
| ( aggregate (max column_b) )
|
| If you want the row with the maximum value it gets interesting:
| from my_table group column_a ( sort
| [-column_b] take 1 )
|
| You can read more about group here: https://prql-
| lang.org/book/transforms/group.html
| beaugunderson wrote:
| Opened an issue as I couldn't get this to work against
| sqlite:
|
| https://github.com/prql/prql/issues/695
| maximilianroos wrote:
| Thanks a lot for testing and opening an issue! This is now
| fixed and released [1]. Let us know if you still face any
| problems.
|
| (We need better tests against real DBs, which is very much
| on our roadmap)
|
| [1]: https://github.com/prql/prql/pull/698
| beaugunderson wrote:
| Thank you for the amazingly quick fix!
| rackjack wrote:
| Obligatory dismissive comment:
|
| > 0.2
|
| No it ain't (in production).
|
| Anyway, this looks great. I LOVE the fact that you've provided a
| book too. Consider me a fan!
| maximilianroos wrote:
| We're definitely not ready for production! Sorry if that was
| implied.
|
| But we are ready for people to start using it in their
| development work. Lmk if there's a better way of describing
| that.
| pgt wrote:
| For those interested in database query languages, it is worth
| knowing about Datalog, the query language behind Datomic, XTDB
| and Datahike: http://www.learndatalogtoday.org/
|
| E.g. a parameterised aggregate query that retrieves the name and
| average rating of a film starring cast members whose names match
| the input names: [:find ?name (avg ?rating)
| :in $ [?name ...] [[?title ?rating]] :where [?p
| :person/name ?name] [?m :movie/cast ?p] [?m
| :movie/title ?title]]
|
| To reveal the answer, click on tab labelled "3" and then "I give
| up!": http://www.learndatalogtoday.org/chapter/7
| paol wrote:
| Here's one suggestion: SQL tediously requires specifying the
| equality condition on joins, when 90% of the time you just want
| to join on the fk defined between the tables.
| from a join b
|
| should implicitly join on the FK if no condition is given.
|
| It would require knowledge of the schema. I don't know if this is
| possible in PRQL, or if the transpilation to SQL has to be
| stateless.
| CuriousSkeptic wrote:
| SQL has that actually
|
| select * from a natural join b
|
| (not based on fk constraints though, it will join on all
| attributes with the same name in the relations)
| smallnamespace wrote:
| If you're willing to sacrifice economics elsewhere, repeating
| the table's name in the id column is one workaround:
| from a join b on b.a_id = a.a_id
|
| You can even use NATURAL JOIN if you can guarantee that the
| only fkey/pkey names will overlap between tables.
|
| An unreasonable way to achieve that is to put the table name in
| every column. A more palatable way is to write some clever
| functions in your schema to scan the information table look for
| column name clashes (you essentially write a tiny "linter"
| inside your schema).
| hyperman1 wrote:
| If you have identical field names, you can do in sql:
| Select * from a join b using (a_id)
|
| Don't do this in Oracle though, pain follows when you try to
| touch an a_id column.
| snthpy wrote:
| You can also do this in PRQL: from a
| join b [a_id]
|
| is the equivalent query.
| aerzen wrote:
| Hello another contributor here!
|
| Compilation does have to be stateless (for performance
| reasons), but we are planning to add some kind of schema
| definitions which could also specify foreign keys.
|
| So joins without conditions would be possible, we'll look into
| it!
|
| What do you think should happen if there are multiple foreign
| keys connecting the two tables? Should this also work for many-
| to-many relations with an intermediate table?
| JohnDeHope wrote:
| "What do you think should happen if there are multiple
| foreign keys connecting the two tables? Should this also work
| for many-to-many relations with an intermediate table?"
|
| If it's not ambiguous, then let me do it. If I rely on
| ambiguity then throw an exception. In the case of multiple
| foreign keys, throw an exception, as there's no way to know
| which one I mean. It'd be nice if I could disambiguate the
| situation though. Normal SQL allows the `on` clause.
| from TableA inner join TableB on <expression>
|
| What if I could specify a foreign key constraint just as
| easily... from TableA inner join TableB
| by ConstraintC
|
| Where ConstraintC is the name of a foreign key constraint
| between Table A and Table B. It'd be nice to specify the
| constraint without having to specify the column name details.
|
| The same goes for the many to many relationship with an
| intermediate table. It could look something like this...
| from TableA inner join TableB through TableC
|
| I wouldn't introduce TableC into the scope of the statement.
| It's not in the FROM clause. It's used in the query but is
| not available for selecting from. If you want to bring in
| columns from it, join on it the usual way.
|
| As applications grow, and initially simple lookup table
| semantics get more nuanced, it might be nice to be able to
| constrain the join on the lookup table like this...
| from TableA inner join TableB through TableC where
| <expression>
|
| That way if my TableC has some extra columns, such as
| effective dates, or deleted flags, or that sort of thing,
| then I can filter out some of the joins that might usually
| happen.
| kbenson wrote:
| Unambiguous things can become ambiguous at later points. As
| soon as you add a second relation between the tables, what
| once was unambiguous now is, and because of something which
| may be entirely unrelated to the specifics of the original
| query.
|
| This is where many conveniences that use implicit data run
| into problems. A small convenience now for the possibility
| of accidentally breaking because of mostly unrelated
| changes later is a poor trade off for anyone that wants to
| have stable and consistent software.
|
| This is likely one of those cases where you're better off
| with tooling to help make writing the correct unambiguous
| code easier (or automated away) than introducing a feature
| which leads to less stable systems in some cases.
|
| Edit: Along the lines of what you note at the end, I would
| rather see joins able to use named relations as defined in
| the schema. Of there's a relation from table movie to table
| actor specifically names roles in the schema, I would
| rather be able to join movie on roles and have actors
| joined correctly using that relation, and aliases to roles
| which I could then use. Then you're using features that are
| designed and stable and not implicit and subject to
| changing how or whether they function based on semi-
| unrelated changes.
|
| That might look like: "from movie relate roles" which is
| equivalent to "from movie join actor roles on movie.id =
| roles.movie_id", but because actor.movie_id has a
| constraint in the schema named roles which restricts it to
| a movie.id already.
| ximeng wrote:
| One way to avoid constraint name collisions is to include
| the base table and foreign table names and keys in the
| constraint name separated by underscores, at which point
| you don't save much by using the constraint in a join.
| aidos wrote:
| This is something you might end up regretting later.
|
| It's annoying adding another foreign key later and then having
| previously working queries fail at runtime due to an ambiguous
| join condition.
| paulhodge wrote:
| Agree about not implicitly finding the join key. But as long
| as we're brainstorming imaginary features, then maybe as part
| of the schema, we could somehow declare the default join key
| to use, for any given two tables. In most cases it's pretty
| obvious what the best join key would be.
| go_prodev wrote:
| I agree with you that it's a pain writing join conditions with
| many fields...
|
| But I think that's a shortcoming of the client tool, rather
| than the language.
|
| If SQL tools auto completed the join conditions as best as they
| could it would probably be a great help.
| skybrian wrote:
| I see that the JavaScript package is at [1] and it's implemented
| by compiling the Rust code to WASM. That should eventually make
| it pretty easy to run it.
|
| It has a typescript definition file, but it looks like it's
| autogenerated and a bit clunky. You get back a CompileResult and
| have to call free() explicitly, it seems? That doesn't seem very
| idiomatic for JavaScript.
|
| Also, the links to the documentation and examples in the README
| are broken.
|
| [1] https://www.npmjs.com/package/prql-js
| maximilianroos wrote:
| Thanks a lot -- issue added:
| https://github.com/prql/prql/issues/708
| aerzen wrote:
| That's true - the package is auto-generated using
| [wasmpak](https://github.com/rustwasm/wasm-pack), that's why
| TypeScript definitions are clunky. I did the initial prql-js
| release and I'm actually not sure about the free() issue you
| are talking about.
|
| We are currently working on compiling it for both Node.js and
| the browser target, and would be happy to see some advice if
| you are familiar with WASM!
| skybrian wrote:
| I didn't actually try it out and I'm not all that familiar
| with WASM. Here is the typescript I see (stripped of
| boilerplate comments):
|
| export function compile(s: string): CompileResult;
|
| export class CompileResult { free(): void;
| readonly error: CompileError | undefined; readonly
| sql: string | undefined;
|
| }
|
| What is the purpose of the free() method?
| digisign wrote:
| Thanks, I've frequently wanted a query language that was designed
| after the 70s. The ideas are sound, but a modernized syntax with
| variables to reuse subqueries would be lovely. This looks like
| it.
|
| I noticed one issue though... please don't copy the prefix of
| f-strings! That only exists because Python boxed itself in and it
| was literally the only ascii syntax left that could be used for
| string interpolation. It's mildly ugly but the best that could be
| done given those requirements. Not so here.
|
| The way shells do it with single quotes producing literal strings
| and double quotes available for interpolation has not been topped
| imho. Triple quotes are a nice extension as well, not sure if
| that made it in.
| oarabbus_ wrote:
| > a modernized syntax with variables to reuse subqueries would
| be lovely.
|
| CTEs provide this functionality already, don't they?
| digisign wrote:
| When I've needed them I've needed them for multiple
| statements, once is not enough. Currently have to use plpgsql
| for this, which is half awesome, half abomination. :-D A
| single simple language sounds easier to learn.
| aerzen wrote:
| Interesting suggestion. We added f-strings because we already
| had s-strings (pass trough to SQL) and r-strings (for raw
| multi-line text).
|
| And would you rather see "My {name}" or "My ${name}"? I
| personally dislike the $ prefix for all variables and
| interpolations...
| psychoslave wrote:
| Languages like Perl, Ruby and more offer plethora of
| additional ways to encode interpolated strings. I especially
| love the squiggly heredoc for multi line quotations
|
| https://infinum.com/blog/multiline-strings-ruby-2-3-0-the-
| sq...
| Kinrany wrote:
| Choose backticks as the quote style for interpolation to
| attract Markdown fans and confuse the hell out of MySQL users
| :D
| digisign wrote:
| The first one, the $ is redundant if braces required. Multi-
| line could be triple quoted. SQL, not sure, maybe sql:'' ?
| ComputerGuru wrote:
| From a mathematical point-of-view are there any
| transforms/operations (note: not end results, but actual
| operations) that this can do that SQL can't or vice-versa?
| scottlamb wrote:
| I assume it can't do anything SQL can't, because they write "It
| can be used with any database that uses SQL, since it
| transpiles to SQL." Not sure about the reverse.
|
| I'm used to SQL syntax, but this has definite appeal. As a
| small example, I like that it starts with the "from" clause, so
| autocomplete is more viable.
| ComputerGuru wrote:
| Thanks, I missed that it transpiles to SQL.
| BeefWellington wrote:
| Transpiling to SQL doesn't mean all the underlying SQL
| features are being exposed to you.
| scottlamb wrote:
| Yes, that's what I meant by "not sure about the reverse".
| aerzen wrote:
| As said, currently PRQL transpiles to SQL, so all expressions
| in PRQL are can be expressed in SQL. But not all SQL expression
| can be translated back into PRQL - some intentionally and some
| are just not yet implemented (UNION - i.e. vertical concat).
|
| But we also have plans for doing things that some SQL databases
| may not support, such as pivot (rows to columns).
| coremoff wrote:
| hopefully you'll forgive my pedantry - "union all" is
| vertical concat - "union" without the "all" gives you the
| distinct list
| jhgb wrote:
| One of the reasons why SQL is crap: there should be no
| distinction between the two in relational algebra. A set of
| {A, B, C, B, C} is the same as {A, B, C}.
| ttfkam wrote:
| Detecting duplicates has a cost you can't just hand-wave
| away. UNION ALL tells the engine not to worry about it
| and just output as it sees it, usually going faster.
| Depends on your data needs.
| Beltiras wrote:
| This can be transpiled into SQL which makes it then trivial
| that it can do everything SQL can do. SQL is Turing complete so
| it can do anything PRQL can do.
|
| EDIT: I'm sorry, I didn't realize that even if something
| transpiles from one language to another it does not guarantee
| that one language can generate all strings of another language.
| But taking a look at the abstractions PRQL offers I would be
| very surprised to find it not capable of it.
| ComputerGuru wrote:
| The second part of your statement is fine, but the first part
| is just a complete fallacy.
|
| I can transpile a pure language exposing only `if`, `while`,
| and `for` with no standard library and no interop to C - that
| definitely does not make it "trivial" that it can do
| everything SQL can do.
| Beltiras wrote:
| I realized this after posting and edited the post. Thanks.
| ComputerGuru wrote:
| No problem. (I didn't downvote.)
| MasterIdiot wrote:
| I've seen similar solutions being built internally in multiple
| companies, none with a syntax as well thought out as this.
| Amazing work!
| airtnp wrote:
| This sounds like LINQ, or SparkSQL. Instead of a full new
| language, it makes feel better to create libraries in languages
| that supports embedded DSL easily.
| Flimm wrote:
| Looks fantastic.
|
| There are a lot of rough edges when building a string
| representing an SQL query in the programming language that you're
| using. You have to be careful to avoid SQL injections, for
| starters. Do the bindings for PRQL innovate at this level?
| elforce002 wrote:
| This is really a SQL renaissance.
| roG0d wrote:
| I've been contributing to this project on a few little things due
| to my little knowledge level. But I felt like home with such a
| good company of people!.
|
| I not mature enough to fully appreciate the technical potential
| of the project, but the good ambient, the kindness and the growth
| potential is for sure worthwhile. I truly encourage everyone to
| contribute!
| cogman10 wrote:
| In particular, I think this is looking pretty good and I'd want
| to see even more complicated examples. For example, What do
| window functions end up looking like? [1]
|
| What about crazy operations like calculating percentile_cont?
| [2]
|
| Or just in general, how would "implementation specific" queries
| end up looking?
|
| [1] https://www.postgresql.org/docs/current/tutorial-
| window.html
|
| [2] https://docs.microsoft.com/en-
| us/sql/t-sql/functions/percent...
| maximilianroos wrote:
| Great questions!
|
| Window functions are here [1]. (We should add these to the
| homepage too)
|
| Implementation specific queries can be handled by the Dialect
| parameter [2], though there's still lots of work to do to
| build that out.
|
| [1]: https://prql-lang.org/book/transforms/window.html
|
| [2]: https://prql-
| lang.org/book/queries/dialect_and_version.html
| maximilianroos wrote:
| Window example is now on the homepage, thanks for the
| question: https://github.com/prql/prql/pull/692
| jhgb wrote:
| I wonder, why would you go for a "pipeline" of relational
| operations, when it's strictly weaker than allowing for a tree of
| operations? The way the examples seem to be written, a stack
| machine would subsume the existing syntax, since you first
| specify an operand (like "from employees") and then you specify
| and operation (like "filter country == "USA"), where in a stack
| machine an operation such as "from X" would put the relation X
| onto the top of stack, whereas an operation such as "filter"
| would then replace the top of stack with a transformed relation.
| This could be extended by for example "join on ..." being simply
| an operation consuming _two_ relations from the top of stack and
| putting one result back, joining two pipelines into one.
| jitl wrote:
| I'm surprised that none of the examples on Github or the website
| deals with join. I eventually found some in the "book" here:
| https://prql-lang.org/book/transforms/join.html
| from employees join side:left positions [id==employee_id]
|
| turns into SELECT employees.*,
| positions.* FROM employees LEFT JOIN
| positions ON id = employee_id
|
| I would love to see joins worked into the main learning examples.
| Without join, the examples lack a bit of the "relation" part; we
| could just as easily be compiling a DSL to a chain of
| `array.filter`, `array.reduce`, `array.map` calls. Joins are what
| makes relational modeling interesting!
|
| I would love to see Datalog/SPARQL-style implicit joins to make
| graph traversals like "which users have edited documents I own?"
| less verbose.
| psychoslave wrote:
| I always found that side:left/right should also be expressible
| as rapport:antecedent/consequent as in propositional logic,
| rather than limiting these relationships to the geometric
| representation of Venn diagram.
|
| And maybe a shorter alternative might be tie:arm/leg.
| aerzen wrote:
| I'm not sure if this is a joke, be we actually had a serious
| an idea to replace side:left/right with nulls_left:true and
| nulls_right:true
|
| This part of the join operation should be an after thought -
| just a flag after the central argument of the transform which
| should be the condition you join over.
| maximilianroos wrote:
| Great point, we'll add that.
|
| I don't think we do joins that much better than SQL does. We're
| thinking whether there's potential there, maybe through
| understanding foreign keys -- but we're being conservative
| about introducing change without value.
| igorkraw wrote:
| I looked at the book after this and have to say, I'd heavily
| recommend spending the next dew months _just_ improving joins
| (and complex joins especially). Like GP says, relational
| modelling is the interesting bit about SQL and I don 't feel
| exaggerative in saying the only reason I use SQL are joins,
| and so the only reason I'd introduce the complexity of your
| project into my stack would be if it makes handling joins,
| views and other aspects of relational modeling and slicing
| nicer - one example could be many to many relationships, or
| the gradient between graph/document based and normalised
| table based modeling
| maximilianroos wrote:
| Added: https://github.com/prql/prql/pull/697
| [deleted]
| [deleted]
| p33p wrote:
| This looks great. I've thought about something similar to this
| for quite a while now. Column autocomplete is key for me from a
| quality of life perspective and to make it truly usable.
|
| I'd absolutely love to see the next level of this pipeline be
| continued where something like Observable Plot or ggplot2 like
| functionality where you can take your pipeline data analysis and
| directly plot it to visualize it.
| 1st1 wrote:
| I also recommend looking at EdgeQL --
| https://www.edgedb.com/showcase/edgeql -- a new query language
| aimed to eliminate some of the SQL quirks.
|
| (I'm a co-founder)
| maximilianroos wrote:
| I'm a huge fan of EdgeDB!
|
| Possibly our focus is a bit different -- I see EdgeDB as
| primarily focused on transactional queries, whereas PRQL is
| very focused on analytical queries. PRQL doesn't do quite as
| much -- e.g. we don't model the relationships between entities,
| which is less functional but more compatible.
|
| Feel free to reach out on Twitter if you think there's some way
| of us collaborating, or if you have any feedback or guidance
| for us.
| hardwaregeek wrote:
| I've thought about building a better query language too. I'd love
| the ability to model sum types in databases, something like:
| enum SchoolType { College { degrees:
| Vec<Degree> }, HighSchool }
|
| It's such a common pattern and yet it's so annoying to model in a
| normal relational database. I wouldn't be surprised if the rise
| of NoSQL is tied to the inability of relational databases to
| model basic patterns like this.
|
| Part of me has wondered if a language is the solution. Maybe just
| a better query builder with support for sum types is necessary.
| But I suppose there's something useful about having a consistent
| model based around a language, even if people aren't writing the
| language directly.
| jhgb wrote:
| I'm not sure that relational databases are "unable" to model
| something like this, since I recall from years ago Date
| describing how to do something like this. Don't remember the
| details, but you might want to look into Date's writings.
| BeefWellington wrote:
| What is annoying about implementing something like this in a
| relational database?
| AtlasBarfed wrote:
| Auto-completion sucks in a lot of sql statements because the
| table provides all the hints that good autocompletion would
| need to provide good suggestions.
|
| That sounds like a nitpick, but man is it useful when you
| need it.
|
| Notice how the first thing in PRQL is the table declaration.
|
| The fact that UPDATE and INSERT have different syntaxes for
| basically specifying the same mutation operation is pretty
| dumb.
| andydd wrote:
| Have you tried Datagrip?
| vore wrote:
| It's not straightforward to do polymorphic joins: one common
| pattern is to have child tables for each case of the union,
| but there's no integrity constraint such that each parent
| must only have one child, e.g. CREATE TABLE
| schools (id SERIAL PRIMARY KEY); CREATE TABLE colleges
| (id INTEGER NOT NULL REFERENCES schools (id)); CREATE
| TABLE high_schools (id INTEGER NOT NULL REFERENCES schools
| (id));
|
| How can you ensure that a school is either a college or
| high_school but not both?
|
| Another alternative is to make one big table with check
| constraints but that's also hairy in its own right:
| CREATE TYPE school_type AS ENUM ('college', 'high_school');
| CREATE TABLE schools ( id SERIAL PRIMARY KEY,
| type school_type, /* college columns */, /*
| high school columns */, CHECK (type = 'college' AND
| /* college column constraints */), CHECK (type =
| 'high_school' AND /* high school column constraints */)
| );
|
| The other thing in the grandparent's comment that's a
| constant pain in SQL is representing an ordered list: how do
| you insert items into the middle of the list? Depending on
| your database, it can also be painful to renumber the other
| items.
| roller wrote:
| A combined approach works if want to encode the exclusive
| constraint: CREATE TYPE school_type AS
| ENUM ('college', 'high_school'); CREATE TABLE
| schools ( id SERIAL PRIMARY KEY, type
| school_type, unique (id, type) );
| CREATE TABLE colleges ( id INTEGER NOT NULL,
| type school_type default 'college', check
| (type='college'), foreign key (id, type)
| references school(id, type) );
|
| Ya, the syntax is annoying and repetitive. It would be nice
| if foreign key could be a literal to remove the extra
| column altogether. e.g.: foreign key (id,
| 'college') references school(id, type)
| vore wrote:
| Good point, I hadn't thought of that. Thanks!
| go_prodev wrote:
| > How can you ensure that a school is either a college or
| high_school but not both?
|
| Do you have any real world scenarios where you've faced
| this problem?
|
| In your example, you wouldn't model it like that. A school
| just needs an attribute that identifies the type of school
| (high school or college), and other attributes that would
| be common to both.
|
| I'm sure there's lots of examples but it's late and I'm
| struggling to think of one that a good normalized data
| model couldn't handle.
| rq1 wrote:
| You add a xor non null check on the foreign keys?
| andyferris wrote:
| I totally agree with this. Interesting point about NoSQL!
|
| I'm not sure if it's just the query language though - the
| definition language needs to make creating columns that are sum
| types trivial. For one-to-many data this might be a slight
| generalization of foreign key (compound of table tag + foreign
| key for that table). This can work for one-to-one data too, but
| can be a bit annoying having lots of tables compared to doing
| adding a couple nullable columns (plus there's also data
| locality differences). I suppose a wrapper language that covers
| both DDL and DML could work.
| srcreigh wrote:
| You'd have 1 table per sum type which requires extra data. Then
| polymorphic foreign key (aka a pair of fields school_type,
| school_id).
|
| (No foreign key constraints, but those are falling out of use
| in some cases due to inability to online migrate mysql schemas
| anyways.)
| ivank wrote:
| You can retain foreign key constraints by having one column
| per type of reference. It is also possible to ensure that
| exactly one column of several is NOT NULL, so that the
| columns can always be mapped to an enum in application code.
| Also, in PostgreSQL, the storage for the extra NULLs uses
| just one bit per column in a bitmap. CREATE
| TABLE dirents ( parent bigint NOT NULL,
| child_dir bigint, child_file bigint,
| child_symlink bigint, basename text
| NOT NULL, -- Ensure exactly one type of
| child is set CHECK (num_nonnulls(child_dir,
| child_file, child_symlink) = 1),
| CONSTRAINT dirents_child_dir_fkey FOREIGN KEY (child_dir)
| REFERENCES dirs (id), CONSTRAINT
| dirents_child_file_fkey FOREIGN KEY (child_file)
| REFERENCES files (id), CONSTRAINT
| dirents_child_symlink_fkey FOREIGN KEY (child_symlink)
| REFERENCES symlinks (id), PRIMARY KEY
| (parent, basename) );
___________________________________________________________________
(page generated 2022-06-27 23:00 UTC)