[HN Gopher] PRQL - A proposal for a better SQL
___________________________________________________________________
PRQL - A proposal for a better SQL
Author : maximilianroos
Score : 324 points
Date : 2022-01-24 17:24 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| eatonphil wrote:
| I'm really excited about languages that build on or are compiled
| to SQL, in the long-term (because I think it will take a very
| long time to build adoption).
|
| The ones that particularly excite me are shorthands for SQL, even
| though their heavy use of symbols may be a detriment. One
| particular use case is in easily defining static authorization
| policy-queries that are backed by database data plus and have
| request variables injected during evaluation.
|
| I am not very excited by datalog/prolog-based languages because I
| think logic languages are too unnatural to ever go mainstream.
| But I'd be excited to be wrong or for logic languages to become
| more friendly.
|
| Here are some others I'm watching. *
| https://github.com/mrumkovskis/tresql *
| https://www.htsql.org/doc/overview.html *
| https://github.com/cytosm/cytosm
| larodi wrote:
| in a way SQL is Prolog and all reasoning for improvement of SQL
| should start from Prolog, because is where SQL started from.
| the expressive power of both languages is theoretically the
| same, even though SQL is much more comprehensible. but then
| again - certain complex task turn SQL in difficult-to-
| comprehend series of nested declarative operations on algebraic
| sets.
| skissane wrote:
| How does this compare to QUEL?
|
| Or Tutorial D?
| ggrothendieck wrote:
| The link refers to dplyr not being able to use databases but
| actually there is a database backend for it in package dbplyr.
| See https://dbplyr.tidyverse.org/
| divan wrote:
| Obligatory reading https://www.scattered-
| thoughts.net/writing/against-sql/
| lxe wrote:
| Nice. Why OCaml though? I think using a more conventional
| language to construct queries could yield more adoption. It also
| seems that ORMs kinda of exist to tackle a similar issue, at
| least in part.
| HellsMaddy wrote:
| I don't see anything dealing with OCaml here, other than it
| being listed as an inspiration.
| tpoacher wrote:
| meanwhile I'm still stuck trying to make prolog work xD
| blintz wrote:
| This is a nice idea, especially given all the work people have
| done recently to make in-language querying nicer (Spark comes to
| mind).
|
| My only gripe is the 'auto-generated' column names for
| aggregates. This seems like a recipe for disaster - what if there
| is already (as there almost certainly will be) named
| "sum_gross_cost"? The behavior also just seems rather unexpected
| and implicit. My suggestion would be simple syntax that lets you
| optionally give a name to a particular aggregate column:
| ... filter gross_cost > 0 aggregate by:[title,
| country] [ average salary, sum
| gross_salary, average gross_cost, let
| sum_gc = sum gross_cost, count, ]
| sort sum_gc
|
| While it might seem a little uglier, it seems much more
| sustainable in the long run. If this is really too gross, I'd
| advocate some token other than underscore that is reserved for
| aggregation variables; perhaps `sum@gross_cost` or
| `sum#gross_cost`.
| maximilianroos wrote:
| Definitely -- giving the option of naming them is great.
|
| I'm not sure whether we should force naming? When I'm writing a
| query often I'm fine with something auto-generated when
| starting out.
| JimmyRuska wrote:
| Wow, between this and Malloy , there's a lot of great ideas.
|
| It'd be great if ideas from sparql/datalog/datomic were also
| picked up. It's easier to write recursive queries.
| jacktheturtle wrote:
| Do people still write SQL?
| didip wrote:
| Of course, and it is vastly better than using an ORM.
| [deleted]
| mrweasel wrote:
| You kinda have to. Assuming that you're using an ORM, you still
| need to understand how it translate to SQL, and help it do the
| translation correctly.
|
| Personally I've seen developer use the Django ORM, and create
| application with terrible performance. Tweaking the queries,
| you can help guide the ORM to generate better SQL, which in
| turn will affect your performance greatly.
|
| We're currently facing a problem with a custom who have an
| application with terrible performance/scaling issues. The
| entire thing is very database heavy, but interaction is done
| solely via Hibernate. I have nothing against Hibernate, it's a
| fine ORM, but you need to understand it well enough that you
| can guide it towards better queries (Which sometimes involve
| actually writing SQL). At some point you need to decide if your
| time isn't better spend learning SQL directly, as that via
| always provide you with better access to the functionality
| provided by the database.
| onlyrealcuzzo wrote:
| It's regularly listed in the top 5 most commonly used
| programming languages in the world [=
| MrBuddyCasino wrote:
| Do people still use ORMs?
| edgyquant wrote:
| Of course. Large companies like Pepsi have teams of analysts
| that only write SQL. I applied for a programming job there a
| long time ago and didn't follow up when they explained in the
| interview that's the only language they used.
| badhombres wrote:
| Absolutely. If I use a SQL db for my applications (I'm a
| software dev for context), I generally write raw SQL vs using
| an ORM. I find the long term issues of an ORM to not be worth
| investing and understanding SQL.
|
| I'm also not having to learn a new library, in addition to the
| standard DB connection libraries, ~if~ when I switch a language
| or platform for some project.
| drpotato wrote:
| Yes, quite a lot I would imagine. I use it extensively at work
| and similarly sql heavy software companies in the past. That
| being said, I've also worked at places where they've avoided it
| like the plague - largely because few people were competent at
| it - and were moving away from relational DBs due to scale.
| roveo wrote:
| In the world of data analytics/BI are many people whose main
| job is writing SQL.
| extrapickles wrote:
| I prefer to write SQL as most alternatives ether have runtime
| surprises or require more roundtrips to the database. I mostly
| work on line-of-business software, so if I was doing simple
| CRUD apps I might have a different opinion.
| slaymaker1907 wrote:
| This seems very similar to Kusto/KQL.
| cryptonector wrote:
| The most important dialect of SQL we should get is SQL-with-no-
| literal-values so we can force the use of query parameters, and
| then not have SQLi.
| BeefWellington wrote:
| This is another in a series of these kinds of proposals that look
| excellent on first glance for perhaps the 75% case but start
| getting syntactically messy when I want to customize the
| resultset returned.
|
| On the surface, they're always neat but when you start to dig
| into how you'd implement something in an RDBMS, it begins to fall
| apart.
|
| Let's look at the example syntax: from
| employees filter country = "USA"
| # Each line transforms the previous result. gross_salary
| = salary + payroll_tax # This _adds_ a column /
| variable. gross_cost = gross_salary + healthcare_cost
| # Variable can use other variables. filter gross_cost > 0
| aggregate split:[title, country] [ # Split are the
| columns to group by. average salary,
| # These are the calcs to run on the groups. sum
| salary, average gross_salary, sum
| gross_salary, average gross_cost, sum
| gross_cost, count, ] sort
| sum_gross_cost # Uses the auto-
| generated column name. filter count > 200 take 20
|
| Where in here is it clearly stated which fields are returned? In
| the original SQL it's right up front but here it's buried into
| the "aggregate" function, and I'm not clear that this isn't an
| oversight.
|
| Another example that speaks to the "how do I implement this" side
| of the equation: from employees filter
| country = "USA" # Each line transforms
| the previous result. gross_salary = salary + payroll_tax
| # This _adds_ a column / variable. gross_cost =
| gross_salary + healthcare_cost # Variable can use other
| variables. filter gross_cost > 0
|
| Does this mean that the database must scan all records of the
| employee table in order to return the result before moving to the
| next step in the query? Must I index all fields? If not, how does
| a query planner prepare for this scenario?
|
| The major tradeoff you make in most ORMs is exactly this: You
| lose out on being able to be explicit about how many queries are
| sent to the DB (and in many cases how efficient those queries
| are). Now this would become a language feature? What do I gain
| for that loss?
|
| I'm not saying that SQL Syntax is perfect; far from it. I'm not
| seeing how this is an improvement.
|
| I think if you want traction though, a proof of concept using an
| existing RDBMS would go a long way into providing evidence that
| this will work and is sufficiently thought out to deal with even
| the basics of what existing SQL databases have to. Query planning
| is hard, especially if you want it to be fast.
| drittich wrote:
| I don't think it's falling apart at all. Personally, I would
| require that what columns get returned be explicit (optionally
| with a * type syntax that you have to enable - the defaults
| should be safe and * has its risks). For one thing, you don't
| necessarily want to return all the columns you have aggregated.
| E.g., you may be running the equivalent of a HAVING clause on
| an aggregate column, so don't need the value returned.
|
| "Each line transforms the previous result." - I assume this is
| referring to the order that transpilation happens, so you can
| read it top to bottom and understand the flow easily.
|
| One thing I would like to see is how a recursive CTE might
| look.
| tomtheelder wrote:
| > Where in here is it clearly stated which fields are returned?
| In the original SQL it's right up front but here it's buried
| into the "aggregate" function, and I'm not clear that this
| isn't an oversight.
|
| It's in the aggregate portion, like you said. Other example
| queries have a select portion. Why does it matter that it's not
| in the leading position like SQL?
|
| > Does this mean that the database must scan all records of the
| employee table in order to return the result before moving to
| the next step in the query? Must I index all fields? If not,
| how does a query planner prepare for this scenario?
|
| No, they are just describing how the statement is supposed to
| be interpreted by a human. I think you can basically just
| shuffle all the filter statements to the end and keep it
| logically equivalent.
|
| This is a proposal for a "transpiles to SQL" language. So long
| as that transpliation is predictable, you cannot run into the
| sort of issues you are describing.
| BeefWellington wrote:
| > It's in the aggregate portion, like you said. Other example
| queries have a select portion. Why does it matter that it's
| not in the leading position like SQL?
|
| I don't mind it not being in the leading position. The author
| provided a _very simple_ query and in that case it 's not
| immediately apparent what fields to expect the resultset to
| contain when returned to the consumer.
|
| This is a troubleshooting issue more than anything else. IMO
| placing the "selected fields" into the very centre of the
| query is distracting and obfuscates what is happening.
|
| > This is a proposal for a "transpiles to SQL" language. So
| long as that transpliation is predictable, you cannot run
| into the sort of issues you are describing.
|
| I think a good test of whether any transpiled language works
| well is to look at whether it could work on its own as a
| language. See: Typescript.
| tfehring wrote:
| Very cool! A couple questions/suggestions off the top of my head:
|
| 1. Did you consider using a keyword like `let` for column
| declarations, e.g. `let gross_salary = salary + payroll_tax`
| instead of just `gross_salary = salary + payroll_tax`? It's nice
| to be able to scan for keywords along the left side of the
| window, even if it's a bit more verbose.
|
| 2. How does it handle the pattern where you create two moderately
| complex CTEs or subqueries (maybe aggregated to different levels
| of granularity) and then join them to each other? I always found
| that pattern awkward to deal with in dplyr - you have to either
| assign one of the "subquery" results to a separate dataframe or
| parenthesize that logic in the middle of a bigger pipeline. Maybe
| table-returning functions would be a clean way to handle this?
| maximilianroos wrote:
| > 2. How does it handle the pattern where you create two
| moderately complex CTEs or subqueries (maybe aggregated to
| different levels of granularity) and then join them to each
| other? I always found that pattern awkward to deal with in
| dplyr - you have to either assign one of the "subquery" results
| to a separate dataframe or parenthesize that logic in the
| middle of a bigger pipeline. Maybe table-returning functions
| would be a clean way to handle this?
|
| I don't have an example on the Readme, but I was thinking of
| something like (toy example): table
| newest_employees = ( from employees sort tenure
| take 50 ) from newest_employees join
| salary [id] select [name, salary]
|
| Or were you thinking something more sophisticated? I'm keen to
| get difficult examples!
|
| Edit: formatting
| twic wrote:
| My gut reaction is that if we have "from first" then maybe we
| should have to "to last": from employees
| sort tenure take 50 as newest_employees
| from newest_employees join salary [id] select
| [name, salary]
| mcdonje wrote:
| When you add in the ability to reference different tables
| like that to the piping syntax, it starts to remind me of the
| M query language: https://docs.microsoft.com/en-
| us/powerquery-m/quick-tour-of-...
|
| There, each variable can be referenced by downstream steps.
| Generally, the prior step is referenced. Without table
| variables, your language implicitly pipes the most recent
| one. With table references, you can explicitly pipe any prior
| one. That way, you can reference multiple prior steps for a
| join step.
|
| I haven't thought through that fully, so there may be gotchas
| in compiling such an approach down to SQL, but you can
| already do something similar in SQL CTEs anyway, so it should
| probably work.
| maximilianroos wrote:
| Thanks!
|
| > Did you consider using a keyword like `let` for column
| declarations
|
| Yeah, the current design for that is not nice. Good point re
| the keyword scanning. I actually listed `let` as an option in
| the notes section. Kusto uses `extend`; dplyr uses `mutate`;
| pandas uses `assign`.
|
| I opened an issue here: https://github.com/max-
| sixty/prql/issues/2
| maximilianroos wrote:
| I've added the `let` keyword given a few people commented on
| this.
| maximilianroos wrote:
| I wrote this over the holidays, because I find SQL wonderfully
| elegant in its function, but really frustrating in its form.
|
| Let me know any feedback -- as you can see it's still at the
| proposal stage. If it gains some traction I'll write an
| implementation.
| roberto wrote:
| This looks great! Clear docs and rationale, and the syntax is
| well thought. I'm definitely following this.
| xi wrote:
| Maybe you'd like to check FunSQL.jl, my library for
| compositional construction of SQL queries. It also follows
| algebraic approach and covers many analytical features of SQL
| including aggregates/window functions, recursive queries and
| correlated subqueries/lateral joins. One thing where it differs
| from dlpyr and similar packages is how it separates aggregation
| from grouping (by modeling GROUP BY with a _universal_
| aggregate function).
| maximilianroos wrote:
| This is awesome! I'll add a link to it on PRQL.
|
| I guess the biggest difference between FunSQL (and similarly
| dbplyr) and PRQL is that the former needs a Julia (or R)
| runtime to run.
|
| I really respect the library and keen to see how it develops.
| clarkevans wrote:
| Writing an alternative syntax is straight forward. Perhaps
| prototype PRQL using xi's excellent FunSQL backend? This
| way it's working out of the gate. Once syntax+semantics are
| pinned, writing another backend in the language of your
| choice would then be easier. Getting the backend correct is
| non-trivial work, and xi has done this already. Besides, we
| need a sandbox syntax anyway, so it might be fun to
| collaborate.
| glogla wrote:
| This is cool! I love it.
|
| Few notes:
|
| 1) SQL also allows you to define windowing reusably, like this
| select sum(blah) over window_abc,
| avg(blah) over window_abc from table_xyz window
| window_abc as (partition by x order by y)
|
| so that second example could be written somewhat less
| repetitively but it wouldn't change the whole point.
|
| 2) Sadly my main pain point with SQL for ETL is not possible to
| solve with a transpiler - SQL has exactly one target so doing
| things like "I want these records to go to a table A and those
| records go to table B" is not possible with one query.
|
| 3) It would be cool to see how this does typically annoying and
| repetitive cases from analytics / data warehousing world. I'm
| thinking like SCD1/2 implementation. But I don't even know if
| mutation is there yet.
|
| 4) I would recommend investing in one canonical formatter, like
| Go has. So that there isn't infinite number of ways the same
| query could be formatted for people to argue over preference.
|
| EDIT:
|
| 5) Since this seems to be focused on analytics (by the choice of
| queries and Snowflake in examples), I want to highlight that
| someone suggested to use TPC-H (or TPC-DS) queries as a
| benchmark. It does sound like a good idea.
| adamrezich wrote:
| I like this a lot, and I eagerly anticipate an early version of a
| transpiler to play with!
| hyperpallium2 wrote:
| \tangent What are today's data transformation needs (differ from
| Codd's inspiration?), how does relational algebra serve them, and
| design an "SQL" around that. There's got to be a 10x leapfrog in
| benefit for _some_ niche in there, and that 's the gateway to
| adoption.
| tristanz wrote:
| To get 10x I think you need to wed it to solving broader
| workflow challenges, like dbt does today.
| nassimsoftware wrote:
| It would be definitely interesting to have a TypeScript of some
| sort but for SQL. So a more practical and prettier syntaxe like
| what I'm seeing here that compiles to SQL queries.
| eatonphil wrote:
| TypeScript is more verbose than JavaScript. While I love to use
| TypeScript I don't think I'd categorize it as prettier than
| JavaScript. And practical... well if you mean it is more
| maintainable then yes but if you mean faster to write then no.
|
| I don't want a more verbose SQL I want a less verbose SQL!
| nassimsoftware wrote:
| I meant more the aspect that with TypeScript people would
| prefer to write in it and then compile to JavaScript because
| there is a benefit. With a PRQL with an SQL compilation
| target we would reap the benefit of a more practical and
| better syntax. In both cases they bring benefits but not in
| the same way.
| dirslashls wrote:
| Go to https://sqlframes.com/demo and in the code editor enter
| the following and execute (this example is taken from the first
| example on PRQL github page). It generates SQL, but it also
| computes and displays the results within the browser (though
| the data set below gives no results).
|
| const employees = SQL.values([{ title: 'Developer', country:
| 'USA', salary: 120, payroll_tax: 20, healthcare_cost: 6 }]);
| employees.schemaName = 'employees'; const { groupBy, where: {
| gt, eq, and }, agg: { count, sum, avg } } = SQL; return employe
| es.pdf(SQL.script('[salary]+[payroll_tax]').as('gross_salary'),
| SQL.script('[gross_salary]+[healthcare_cost]').as('gross_cost')
| ) .fdf(and(gt('gross_cost',0),eq('country','USA')))
| .gdf(groupBy('title','country')
| ,avg('salary').as('average_salary')
| ,sum('salary').as('sum_salary')
| ,avg('gross_salary').as('average_gross_salary')
| ,sum('gross_salary').as('sum_gross_salary')
| ,avg('gross_cost').as('average_gross_cost')
| ,sum('gross_cost').as('sum_gross_cost') ,count().as('count'))
| .having(gt('count',200)) .orderBy('sum_gross_cost');
| gibsonf1 wrote:
| SPARQL. Representing human information in relational tables goes
| against how people actually think and use information. We humans
| think in tremendous numbers of nested hierarchies, and recursive
| hierarchy traversal is a nightmare in relational databases. A
| graph is the structure for data that works best, is most
| efficient, and actually reflects how things are connected in our
| brains.
| mindcrime wrote:
| I'm a big fan of SPARQL, but the one thing that would concern
| me about trying to use it outside of the SemWeb context is
| simply that it assumes data is stored in <S,P,O> triples.
| Legacy databases by and large are not, so you need an adapter
| to bridge the representations. And while I know some exist, I
| haven't really used them and am not sure about the performance
| impact.
| mst wrote:
| You can get quite far mapping the triple concept to (PK,
| column, value) or (PK, FK, related-row) and transpiling from
| there.
|
| (I played around with this some years back, not to the point
| where anything came out of it worthy of publishing, but
| enough to be pleasantly surprised how far 'quite far' turned
| out to be in practice)
| fuzzieozzie wrote:
| The time of SQL alternatives
| https://news.ycombinator.com/item?id=30053860
| maximilianroos wrote:
| Yup, I posted that yesterday too. I think Malloy is really
| interesting -- compile to SQL but give more integrations to the
| DB, like schema-during-development. It has a proper team, led
| by Lloyd Tabb.
| 999900000999 wrote:
| >Compatible -- PRQL transpiles to SQL, so it can be used with any
| database that uses SQL. Where possible PRQL can unify syntax
| across databases. PRQL should allow for a gradual onramp -- it
| should be practical to mix SQL into a PRQL query where PRQL
| doesn't yet have an implementation.
|
| Awesome.
|
| I hate SQL so much, I know for personal projects this is gold. I
| imagine actually using it at work might draw some questions
| though
| hyperpallium2 wrote:
| nice name
| peoplefromibiza wrote:
| This is the kind of things that a well crafted DSL can solve, if
| the language you use supports macros.
|
| PRQL looks very similar to Ecto, the Elixir Query DSL
|
| https://hexdocs.pm/ecto/Ecto.Query.html
| magicalhippo wrote:
| Now this is actually nice, unlike the other suggestion posted
| today[1].
|
| Maybe I'm just too used to non-standard extensions of our
| database but the SQL example could, at least for our db, be
| rewritten as SELECT TOP 20 title,
| country, AVG(salary) AS average_salary,
| SUM(salary) AS sum_salary, AVG(gross_salary) AS
| average_gross_salary, SUM(gross_salary) AS
| sum_gross_salary, AVG(gross_cost) AS
| average_gross_cost, SUM(gross_cost) AS
| sum_gross_cost, COUNT(*) as count FROM (
| SELECT title, country,
| salary, (salary + payroll_tax) AS gross_salary,
| (salary + payroll_tax + healthcare_cost) AS gross_cost
| FROM employees WHERE country = 'USA' ) emp
| WHERE gross_cost > 0 GROUP BY title, country
| ORDER BY sum_gross_cost HAVING count > 200
|
| This cuts down the repetition a lot, and can also help the
| optimizer in certain cases. Could do another nesting to get rid
| of the HAVING if needed.
|
| Still, think the PRQL looks very nice, especially with a "let"
| keyword as mentioned in another thread here.
|
| [1]: https://news.ycombinator.com/item?id=30053860
| ako wrote:
| With a CTE it would read a bit more like prql:
| with usa_employees as ( SELECT title,
| country, salary, (salary +
| payroll_tax) AS gross_salary,
| (salary + payroll_tax + healthcare_cost) AS gross_cost
| FROM employees WHERE country = 'USA' AND
| (salary + payroll_tax + healthcare_cost) > 0 )
| select title, country, AVG(salary)
| AS average_salary, SUM(salary) AS
| sum_salary, AVG(gross_salary) AS
| average_gross_salary, SUM(gross_salary) AS
| sum_gross_salary, AVG(gross_cost) AS
| average_gross_cost, SUM(gross_cost) AS
| sum_gross_cost, COUNT(*) as emp_count from
| usa_employees group by title, country having
| count(*) > 200 order by sum_gross_cost limit 3
|
| Readability is pretty similar to prql. It would really help in
| SQL if you could refer to column aliases so you don't have to
| repeat the expression.
| magicalhippo wrote:
| > With a CTE
|
| The DB we use supports those, I just learned about them too
| late so keep forgetting they exist :(
|
| > It would really help in SQL if you could refer to column
| aliases so you don't have to repeat the expression.
|
| The DB we use supports that, so in your CTE you could write
| AND gross_cost > 0
|
| We do that all the time, which will be a pain now that we're
| migrating to a different DB server which doesn't.
| jsyolo wrote:
| what expressions are being repeated here?
| oblio wrote:
| > (salary + payroll_tax) AS gross_salary,
|
| > (salary + payroll_tax + healthcare_cost) AS gross_cost
|
| > AND (salary + payroll_tax + healthcare_cost) > 0
|
| And his is a simple example.
| mmsimanga wrote:
| Sybase IQ allows you to use the column alias anywhere else in
| the query.
| gmfawcett wrote:
| Not all database systems can optimize queries well over CTE
| boundaries. I believe this is still true for PostgreSQL (no
| longer true, see below -- it was true a few years ago). So
| there's a potential performance hit for (the otherwise
| excellent advice of) writing with CTE's.
| Rovanion wrote:
| IRC tells me this has been fixed now.
| gmfawcett wrote:
| Awesome news! thank you for sharing this. I found this
| post which confirms IRC and suggests it was an
| improvement in PG 12:
|
| https://paquier.xyz/postgresql-2/postgres-12-with-
| materializ...
|
| Today is a great day to have been wrong on the Internet.
| :)
| correct-me-plz wrote:
| Snowflake lets you refer to column aliases, and it's great!
|
| There's the slight issue of shadowing of table column names,
| which they resolve by preferring columns to aliases if both
| are named the same. So sometimes my aliases end up prefixed
| with underscores, but that's not a big deal.
| dvasdekis wrote:
| Column aliases would have saved me hundreds of hours over the
| course of my career. Sorely missing from standard SQL, and
| would make the need for PRQL less acute.
| iblaine wrote:
| The syntax seems similar to Apache Pig. Both are declarative and
| primarily built to be a procedural form of SQL.
| pmontra wrote:
| First, kudos because it takes courage to take on SQL in this way.
|
| Second, this kind of reversed SQL (filter-first, select-last) is
| much easier to reason about than the original and keep in mind
| that I prefer to code complex queries in SQL than to build or
| translate them in the ORM of the project I'm working on.
|
| Maybe a transpiler is an inevitable first step but I think that
| any SQL replacement should be itself the target of ORMs and run
| directly in the database CLI tools (psql / mysql ...) or IDEs
| (pgAdmin, MySQLAdmin, ...). What's the long term plan of the
| project?
| dragonwriter wrote:
| > Second, this kind of reversed SQL (filter-first, select-last)
| is much easier to reason about than the original
|
| Given that SQL clauses tend to be unambiguously terminated by
| the start of the next clause or the end of the statement, it
| surprises me that no engine has gone to accepting otherwise
| standard(-ish, as much as real DB vendor dialects are) SQL but
| without a mandated order of clauses.
|
| And then combine that with dev tools that allow easy
| rearrangement of clauses, perhaps based on configured
| preferences so that you don't even see the original if its not
| your preferred order, so that "Bob likes old-school SELECT FROM
| WHERE GROUP BY and Alice likes FROM WHERE GROUP BY SELECT"
| isn't a problem.
| maximilianroos wrote:
| Thanks!
|
| I agree that integrating with the DB would allow much more from
| a lang. But PRQL is a bet that languages which _start_ there
| (e.g Kusto) get lost because it requires changing DB, which is
| really hard. I worry EdgeDB may hit this issue too (but I 'm
| really hoping it works, and they have an excellent team).
|
| As I think you're suggesting -- you could imagine a language
| starting out as a transpiler, and then over time DBs working
| with it directly, cutting out some of the impediment mismatch.
|
| Malloy [1] is another point in space -- it targets existing DBs
| through SQL queries but can also ask for schemas etc while
| developing.
|
| [1] https://github.com/looker-open-source/malloy
| _dain_ wrote:
| Does this let you write functions that take tables as parameters
| and return tables? It always seemed weird to me that SQL doesn't
| let you do this.
| akdor1154 wrote:
| Looks really nice, i've been scribbling away in a little notebook
| all the things i would do in "akdor's dream sql", and what you
| have here hits pretty much exactly.
|
| Wondering about generic use of `let` - you have let for col
| defns, but `func` for functions and a TODO for tables/CTEs -
| could/should `let` do the lot? (Like another commenter posted,
| this is how MS's M language, used in PowerQuery in PowerBI and
| Excel works). Could enable an escape from point-free for entire
| queries if taken to extreme generality, not sure if that's a good
| thing, maybe it could be?
|
| Bikeshedding: even with some OCaml/F# experience, i find `f x y`
| harder to read than `f(x, y)`.
| maximilianroos wrote:
| Thanks!
|
| At the moment `let` is used to add a column as part of an
| existing pipeline. [1]
|
| `func` is the start of new expressions / pipelines. And I just
| added a proposal for `table = `, which would be the same.
|
| Does that make sense? Very open to more feedback...
|
| [1] I just added `let` based on feedback here, it's better than
| it was, but not perfect, as it can be confused for a new
| pipeline given its use in other langs.
| nextaccountic wrote:
| I also saw a proposal for a better syntax for SQL, called BQL,
| that was a strict superset but allowed for better modularity
|
| http://intelligiblebabble.com/a-better-query-language-bql-la...
|
| It had this github repo https://github.com/lelandrichardson/BQL
| but never went anywhere
|
| I hope PRQL has a better fate! unfortunately, by deviating from
| SQL lexical conventions (using :, using [], etc) we lose the
| ability to copy-paste from sql code elswhere.
|
| I want a better SQL, but I also want some compatibility. Like
| typescript is for javascript.
| spullara wrote:
| SQL could get a lot better by just adopting the ordering of
| operations like they did with LINQ:
|
| https://docs.microsoft.com/en-us/dotnet/csharp/programming-g...
| tester756 wrote:
| Funnily enough LINQ Query syntax is really uncommon and
| everybody uses method syntax
|
| var list = new List<int>{1,2,3}
|
| var extracted = list
|
| ....................Where(x => x > 1)
|
| ....................Select(x => $"my number: {x})
|
| ....................ToList();
| aloisdg wrote:
| Because the original felt odd in C#.
| louthy wrote:
| > everybody
|
| Not everybody
| betimsl wrote:
| We barely learned SQL for all these years and the guy wants to
| change it now. Thank you but no thank you.
|
| JK. Cool concept and hopefully it catches.
| eximius wrote:
| > A line-break doesn't created a pipeline in a few cases: ... >
| When the following line is a new statement, by starting with a
| keyword such as func.
|
| This feels like it could cause compatibility issues in the
| future.
| jcdreads wrote:
| I like that everyone is trying to make something like SQL that
| reads more naturally to them. More alternatives is good! SQL is a
| widely accepted standard, and has strictly defined and super
| broadly accepted semantics.
|
| As someone who has written quite a few half-baked-for-general-use
| but fit-for-purpose SQL generator utilities over the years, I'll
| suggest that if you intend for a novel syntax to be a general SQL
| replacement then being isomorphic to SQL would massively increase
| usefulness and uptake:
|
| 1. novel syntax to SQL; check! Now novel syntax works with all
| the databases!
|
| 2. any valid SQL to novel syntax; a bit harder, but I'd start by
| using a SQL parser like https://github.com/pganalyze/libpg_query
| and translating the resulting AST into the novel syntax.
|
| 3. novel syntax to SQL back to novel syntax is idempotent; a nice
| side effect is a validator/formatter for "novel syntax"
|
| 4. SQL to novel syntax back to SQL is idempotent; a nice side
| effect is a validator/formatter for SQL, which would be awesome.
| (See also https://go.dev/blog/gofmt, which is where I learned
| this "round trip as formatter" trick.)
|
| I don't mean for this to sound negative, and I know that 2, 3,
| and 4 are kind of hard. Thank you for building prql!
| Nican wrote:
| Also worth looking at KQL: https://docs.microsoft.com/en-
| us/azure/data-explorer/kusto/q...
| leokennis wrote:
| When you said "KQL" I thought you meant "Kibana Query Language"
| but I guess every letter of the alphabet followed by QL is
| already taken twice over...
| CarVac wrote:
| I was reading the criticism and suggestions and wondering what
| they were talking about, it seemed to read perfectly fine...
|
| Then I saw that the project had already been changed in response!
| Impressive.
| erezsh wrote:
| It seems people here are really interested in alternatives to
| SQL. So perhaps you'd also like to have a look at
| https://github.com/erezsh/Preql
|
| (Same name, same goal, different approach, and already working)
| dgudkov wrote:
| >PRQL is intended to be a modern, simple, declarative language
| for transforming data
|
| It's not declarative. It's functional.
|
| I believe that the approach that is followed by PRQL is more
| practical than SQL. We've implemented a similar approach in our
| visual ETL tool for non-technical people (https://easymorph.com)
| and it works wonderfully. Other cool things you can do with this
| approach (and can't with SQL):
|
| * Modify existing columns without re-selecting the whole dataset
|
| * Loops (iterations)
|
| * Conditional IF/THEN/ELSE branching as a workflow statement
|
| * Exceptions & error handling
| beagle3 wrote:
| shakti / K / kdb+ implements "real SQL", which is concise but
| readable, and could give you a few ideas. Here's a copy-paste
| from https://shakti.sh/ under document/sql.d (cannot deep link,
| unfortunately). The most most magical aspects are automatic joins
| - both left joins and "foreign key chase" joins. The fk-chase
| joins, in particular, should be part of _every_ query language,
| and can possibly be added in a backward compatible way to
| existing SQL implementations.
|
| example: TPC-H National Market Share Query 8
| http://www.qdpma.com/tpch/TPCH100_Query_plans.html what market
| share does supplier.nation BRAZIL have by order.year for
| order.customer.nation.region AMERICA and part.type STEEL?
|
| real: select revenue avg supplier.nation=`BRAZIL by order.year
| from t where order.customer.nation.region=`AMERICA,
| part.type=`STEEL
|
| ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue
| else 0 end) / sum(revenue) as mkt_share from ( select
| extract(year from o_orderdate) as o_year, revenue, n2.n_name as
| nation from t,part,supplier,orders,customer,nation n1,nation
| n2,region where p_partkey = l_partkey and s_suppkey = l_suppkey
| and l_orderkey = o_orderkey and o_custkey = c_custkey and
| c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and
| r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and
| o_orderdate between date '1995-01-01' and date '1996-12-31' and
| p_type = 'STEEL') as all_nations group by o_year order by o_year;
| maxwelljoslyn wrote:
| Thanks for the tip. That automatic "foreign key chasing" looks
| phenomenal. Byebye, much of that big tedious chunk in the
| middle of your 2nd example... Wish I had that for more of the
| SQL I write.
| lijogdfljk wrote:
| I wonder how this compares in practice to EdgeQL,
| https://www.edgedb.com/showcase/edgeql
|
| Offhand i thought PRQL seemed easier to reason about, but
| something about EdgeQL seems better to me.. though i can't
| describe it.
| maximilianroos wrote:
| I see EdgeQL as an excellent replacement for SQL in OLTP
| settings -- it has great language integration and a unified
| relational & typing approach. (Please correct me if this is
| mistaken though).
|
| I wrote the PRQL proposal for analytical / OLAP queries, where
| the pipeline of transformations are more important, and
| relations and typing are relatively less important.
| mongol wrote:
| I think it would be worthwhile to develop a shorthand of the same
| thing, suitable for use on the command prompt. Something using
| symbols as synonyms for keywords. Less eligble but more useful in
| a future when shell tools understand this syntax.
| nurettin wrote:
| To be fair, lateral joins (cross/outer apply in mssql) can help
| with name aliasing and table functions give sql some reusability.
| I think the main pain points for sql are pivots and window
| functions.
|
| A lot of the time you just want to transpose your result, but you
| have to choose an aggregate and handle null cases to force pivot
| to work the way you want it.
|
| And a lot of the time you want to aggregate a window but keep the
| ids of the row so you avoid the having keyword altogether and go
| for row_number and dense_rank to get your aggregate results.
|
| If I were to write a query language, I would discard group by and
| having and make it easier to apply transpose and window
| functions.
| bchammer wrote:
| This looks like a turing-complete language if you add function.
| One benefit or current SQL is that it can (and is) often
| rewritten to be executed more efficiently. This requires a
| simpler and declarative model - far from a programming language.
|
| If you want simple 'forward declaration' take a look at the SQL
| WITH clause.
| hardwaregeek wrote:
| I've always wondered why there aren't query languages that
| embrace algebraic data types and pattern matching. Seems like an
| obvious fit to me. There's many times where you'd want to model a
| table that has either this scheme or that schema.
| jlokier wrote:
| They can work well. In the project I'm working on the database
| uses algebraic datatype keys (i.e. tags and tag-dependent
| columns) to make the database faster and smaller than an
| equivalent relational schema, but the database is used via API
| rather than via a query language.
| jayd16 wrote:
| Check out C# and LINQ. They pull it off with anonymous types.
| haolez wrote:
| This actually looks like an improvement (and I like SQL). This
| feels closer to non-programmers, contrary to some other SQL
| "competitors" like that query language from InfluxDB.
| thanatos519 wrote:
| It definitely scans better than 'Flux' from InfluxDB2.
|
| One thing I like about Flux is the ability to split streams and
| return multiple distinct aggregations. Very handy in Grafana
| dashboards!
| 29athrowaway wrote:
| SQL is like BASIC for data.
|
| It is a language created to read like natural language, to
| facilitate its adoption.
|
| Not only developers use SQL, less technical users also use it,
| especially in the finance industry.
|
| Now, do you use BASIC professional software development? While
| some do, most don't. So why do we use SQL then?
|
| An imperative language that compiles to SQL can be attractive,
| but unless the compilers come with good compiler warnings, we may
| end up created bloated database code.
| ajkjk wrote:
| Just wanna say, I absolutely love this.
| ajkjk wrote:
| One piece of feedback:
|
| "sort sum_gross_cost # Uses the auto-generated column name."
| ... seems like a huge landmine. Languages really should not
| have any implicit way of constructing identifiers (among other
| reasons it is not easily greppable).
|
| You might consider using a syntax like `sum:gross_cost` which
| can function as a sort parameter and an aggregation, but is
| actually recognizable as an object instead of having an
| implicit transformation going on in the background. Like this:
| ... filter gross_cost > 0 aggregate by:[title,
| country] [ sum:gross_cost, ] sort
| sum:gross_cost ...
| topspin wrote:
| > # Uses the auto-generated column name
|
| The fact that someone felt the need to add that comment hints
| at a design mistake. Synthesizing symbols is weird,
| unnecessary and is probably a violation of the principle of
| least surprise.
|
| Otherwise I think PRQL has some value. Nice work. I strongly
| suspect that if SQL looked more like this there would be a
| lot more people willing to use the query language directly
| and perhaps fewer that are compelled to bury it under
| shifting layers of fragile abstractions.
| politician wrote:
| I prefer "from first". When I write a SQL query, 99 times out of
| 100, I'll start with "select * from", then fill in the query,
| then go back and select the columns. After a few basic joins,
| "easy" column names have probably grown aliases or have been
| subsumed entirely. For me, "select * from" is automatic.
| johnthescott wrote:
| using COMMON TABLE EXPRESSIONS (cte) can greatly improve
| readability of complex sql queries. adding "flow" just feels like
| a variation on sequential programming.
|
| i am against "improving" sql. instead, i thing a whole rethink of
| the engineering behind relational engines needs to occur. for
| example, why can't a relational database support both SQL and
| other languages simultaneously, instead of being so black-boxish?
| thelittlenag wrote:
| I worked on a language at one of the big banks that looked very
| similar. The goal was to have a better-than-SQL language on top
| of Spark and I think we succeeded at that very well.
| Unfortunately, politics killed the language and platform.
| deepstack wrote:
| Wouldn't it be better to just use something like prolog, there is
| already and extension for postgres?
| onphonenow wrote:
| Any chance of starting with FROM perhaps to help with
| autocomplete?
|
| From Table or FROMT TABLEA JOIN TABLE B ...
|
| The autocomplete would then be able to introspect much better in
| the tooling side.
| ithrow wrote:
| Queries start with FROM, it's called a README for a reason...
| Ngunyan wrote:
| ridaj wrote:
| Very cool
|
| Design goal question - is the goal to have a language that
| _blindly_ compiles to SQL, or will compilation require data
| schema knowledge?
|
| Suggestion: where possible stick to the well established SQL
| keywords (prefer "group by" to "aggregate by")
| maximilianroos wrote:
| Initially the transpiling would be context-free of the schema.
|
| It would really nice to have context when _writing_ the code,
| so we could do things like autocomplete (IIUC this is what
| Malloy does already).
|
| Are there features you can think of that would be helpful if we
| had the schema context during transpiling?
| ridaj wrote:
| I wasn't thinking it'd be useful, but more that it's good to
| remain as free as possible from any assumptions about the
| schema of the data. I often work with tables that have
| unusually large schemas (> 100MB) and have seen some
| products' performance severely degrade as a function of
| schema complexity.
|
| (But otherwise +1 to schema awareness during authoring.)
| kovek wrote:
| Could there be a tool that would translate PRQL to SQL? One could
| then write in ~/my_scripts/closest_points.prql and then run a
| command to get the sql equivalent and use that in exiting SQL
| tools that do not currently access PRQL (like Postgres).
| ako wrote:
| From the PRQL page: "PRQL transpiles to SQL, so it can be used
| with any database that uses SQL."
| oblio wrote:
| Man, I really hope something like this takes off. SQL is so
| entrenched but it's really showing its age and its pitiful
| standardization.
| oblio wrote:
| To the author, if he's reading this. I wish you luck! To get this
| adopted, try to make it modular so maybe it can be made a core
| module of PostgreSQL, MariaDB, MySQL. If you somehow get into
| those 3 I wouldn't be surprised if Oracle, SQL Server and DB2
| integrate it themselves just to keep up.
|
| Ah, Presto could be another popular target.
|
| It's a huge effort but something which could have colossal
| payback.
| twic wrote:
| This language transpiles to SQL, so it can be implemented
| entirely client-side. No need for modules for the database
| engines, just wrappers round the clients.
| nikkinana wrote:
| jeroen79 wrote:
| to be honest i find this makes it just more complex and less
| readable then normal sql
| LittlePeter wrote:
| Same here. I find the SQL query in the README more readable.
| hackeredje wrote:
| Is this not Linq ?
| https://stackoverflow.com/questions/tagged/linq ?
|
| And then dump the queries via
| https://stackoverflow.com/questions/1412863/how-do-i-view-th...
| or https://www.linqpad.net/ ?
| aloisdg wrote:
| Indeed. It looks a lot like dotnet's Linq.
| ianbicking wrote:
| I like it, it's readable, unlike some SQL alternatives I've seen
| it doesn't make me feel like I'm dumb and don't understand what a
| query even is.
|
| I can't decide if it would be better or worse if it stuck more
| closely to SQL keywords. You use "from" and "select", but not
| "where", "order by", "group by". There's some danger of it being
| in an uncanny valley of SQLish, but I'm pretty sure I'd prefer
| just using those terms verbatim (including the space in "order
| by"... that style is less common in modern languages but it's not
| really that much harder to parse).
|
| I'd like to see more examples of joins and composing SQL. Does
| this language make it easier to make more general SQL queries?
| Can I take two queries and squash them together in a reliable
| way? I feel like I end up with a lot of theme and variation in my
| queries, often involving optional filters.
|
| I might even like a notion of encapsulation that could help this
| query language when it's embedded in other languages. Like if I
| could say, in the language itself, that a query has certain
| unbound variables (and not just ? or other placeholders). This
| language seems like it would be better for generating than SQL,
| and sometimes generation is just necessary (like in any
| application that supports data exploration), but for most common
| cases I'd hope to avoid that. Defining inputs and then making
| whole filter sections or other statements conditional on those
| inputs would help here.
| maximilianroos wrote:
| Thanks!
|
| I just fleshed out composing CTEs, which is a small step
| towards the broader goal of making composition easier:
| https://github.com/max-sixty/prql/commit/dc68fcaaceef26cc078...
|
| Let me know if you have a good case of the sort of composition
| you find difficult in SQL (either here or in an issue). Thank
| you!
| cogman10 wrote:
| Yup, I like a lot of things about the way this looks. In
| particular, I like how friendly this looks to be for things
| like auto complete (pretty annoying to need to practically type
| the entire sql query only to go back and fix up the columns in
| order to get autocomplete to work).
|
| Specific things I'd like to see.
|
| How do you handle column ambiguity. In the examples, they show
| a join of positions to employee on employee_id == id. But what
| happens when you have 2 columns with the same name that you are
| joining on? (like employee_id to employee_id in some mapping
| table).
|
| Subqueries are pretty important in what I do, so what do those
| look like (perhaps covered by the "thinking about CTEs
| section").
|
| How about opportunities for optimization hints? In T-SQL you
| can hint at which index the optimizer should prefer to a
| specific query.
|
| Common SQL patterns would also be interesting. Like, how would
| you do keyset pagination?
|
| Edit: Also, I'd like a discussion about null. SQL null handling
| rules are terrible. I understand them, I work with them, but at
| the same time, they are so different from other languages
| concept of "null" that they are easy to trip over.
| galkk wrote:
| I'm quite opposed to the idea "from should be first".
|
| I want to understand what exactly the query returns, not the
| implementation detail of the source of this data (that can later
| be changed).
|
| Literally first example from page - I have no idea what is being
| returned: from employees filter
| country = "USA" # Each line transforms
| the previous result. let gross_salary = salary +
| payroll_tax # This _adds_ a column / variable.
| let gross_cost = gross_salary + benefits_cost # Variables can
| use other variables. filter gross_cost > 0
| aggregate by:[title, country] [ # `by` are the
| columns to group by. average salary,
| # These are the calcs to run on the groups. sum
| salary, average gross_salary, sum
| gross_salary, average gross_cost, sum
| gross_cost, count, ] sort
| sum_gross_cost # Uses the auto-
| generated column name. filter count > 200 take
| 20
|
| of course, similar things are happening to SQL too, with CTEs
| becoming more widespread and "real" list of the columns hidden
| somewhere inside, but it's still parseable
| quocanh wrote:
| I agree that the columns of the results should be more obvious.
| But I am a proponent of "from should be first". I have never
| written a SQL query without thinking about the contents of a
| table or its relations. If it was my way, I would describe
| where the data I'm pulling from, then describe any
| filters/joins, then describe the columns that I'm interested in
| (last).
| phailhaus wrote:
| SELECT id, name, author
|
| Quick, what is this query about? What's ironic is that I think
| you have it backwards: the columns are the implementation
| detail, not the table. The table is the context: you can't
| change that without having to change everything else. But
| columns are the last step, the selection after the filters,
| joins, etc. They can be changed at any time without affecting
| the logic.
| taeric wrote:
| This is... An odd choice. I'd assume I'm not without context
| looking at a query to know why I would want those columns.
|
| And the auto complete story is backwards. Often I know what
| columns I want, but I'm not clear what table I need to get
| them from. Such that, if you make a smarter suggest in the
| from to only include tables that have the columns, I'd be
| much happier.
| sanderjd wrote:
| Just throwing in another point of anecdata onto this pile:
| "Often I know what columns I want, but I'm not clear what
| table I need to get them from" does not make sense to me. I
| don't relate at all to their being a global namespace of
| columns, rather than a namespace of tables, each with its
| own columns specific to its context.
| taeric wrote:
| I challenge this. I accept that there are ambiguities,
| but I assert that you can go really fast by just telling
| someone to fetch a few columns by name.
|
| I further assert that if your database is filled with
| "Id" and "name" columns, instead of "department_name" and
| similar, you are probably as likely to mess up a join as
| any benefit you get from the name being short. (And
| really, what advantage is there in short names nowadays?)
|
| That all said. I worded my take too strongly. My point
| should have been that auto suggest should not be confined
| in either direction.
| samatman wrote:
| I'm also completely unfamiliar with the PRQL syntax, outside of
| right now.
|
| Reading the comment however, it would seem that `let` adds
| columns which are implicitly returned in the order they are
| defined.
|
| I do see benefits in this, and can imagine pitfalls. Hard to
| judge without kicking the tires.
|
| Update: It's quite possible we saw different syntax!
|
| https://news.ycombinator.com/item?id=30063266
|
| Without the `let` I would imagine having trouble reading it as
| well, I'm not sure if that would go away with familiarity but
| my instinct is that it's a useful addition.
| inglor wrote:
| The big advantage of "from first" like we have in Kusto KQL (a
| database we use at Microsoft) is that it provides much better
| autocomplete (if I write the `from` it can easily autocomplete
| the projection).
|
| If you want an interesting example of how a query language
| built for developer experience and autocompletions looks
| definitely check it out!.
| majkinetor wrote:
| Too bad we can't use Kusto with anything except Azure.
| _jal wrote:
| Designing languages around autocomplete is like designing
| toilets for better toilet paper dispensers.
|
| The language should be right for human understanding, not
| automated mad-lib generation.
| Xelbair wrote:
| And one of use cases is writing queries which it helps
| immensely. Best of both worlds would allow both orders.
| Just automatically transform the query to the usual form
| after it's execution.
| phailhaus wrote:
| Building for autocomplete _is_ building for human
| understanding. If it is impossible for a computer to
| determine the context of your query, why would a human do
| much better?
| [deleted]
| _jal wrote:
| They are not fully-aligned goals, and autocomplete should
| not be given equal consideration on par with human
| clarity.
|
| If you want nice autocomplete too, that's fine, but if
| there is a tradeoff, human understanding is the primary
| concern.
| tester756 wrote:
| I don't understand why do you think about it this way
|
| C#'s LINQ (really powerful tool similar to SQL) works the
| same way
|
| look:
|
| var list = new List<int>{1,2,3}
|
| var extracted = list
|
| .............................Where(x => x > 1)
|
| .............................Select(x => $"my number:
| {x})
|
| .............................ToList();
|
| or
|
| var extarcted =
|
| ........................from x in list
|
| ........................where x > 1
|
| ........................select $"my number: {x};
| inglor wrote:
| You would think that but having used both I find writing
| Kusto/KQL much smoother, neater and faster and if I have to
| choose between writing a query in either one I'd pick KQL.
|
| I understand this is just an opinion but it's an opinion
| held by everyone in my org who writes both.
|
| Theoretical correctness loses to pragmatism a lot and I'd
| read the KQL every day. Look at the examples at
| https://docs.microsoft.com/en-us/azure/data-
| explorer/kusto/q... - look at the examples at
| https://docs.microsoft.com/en-us/azure/data-
| explorer/write-q... and tell me they're not more readable
| than comparable SQL?
|
| (I can see the result type both by hovering on the query
| but also by just looking at the end of it - and in SQL most
| of the SELECTed items in complex queries are from
| subqueries anyway - at least in my use case)
| keithnz wrote:
| now if MS made a KQL -> TSQL or support it natively in
| SQL Server, that would be great :)
| adamrezich wrote:
| yes please!!
| jiggawatts wrote:
| I've been frustrated by toilets where I have to contort my
| body to reach the dispenser. Similarly, I've had dispensers
| intrude on the space where my legs would normally be and
| make it awkward to even just sit on the toilet.
|
| Toilets are absolutely designed to make the dispenser
| placement convenient. You just don't think about it because
| 95% of toilets get it right, so it just doesn't bother you
| that much that it _can_ be wrong.
|
| In SQL, some decisions are right about 10% of the time and
| are annoying and awkward the other 90%.
|
| That's why the order matters. Because everything else got
| it right.
| oldsecondhand wrote:
| There's a person behind the IDE. If you help the IDE, you
| help the person using the IDE.
| emteycz wrote:
| I'd agree if there was any way whatsoever of fixing this
| issue, but there simply isn't. The editor can't even begin
| to guess what you might want until you write your FROM.
| taeric wrote:
| Maybe in gigantic systems with more tables than makes
| sense. Realistically, all of the columns available in a
| database can be fit in memory with ease.
|
| Then, the ide could basically fill my from out for me,
| based on what I'm asking for. Can even suggest what join
| I will need, if I list columns from multiple tables.
| [deleted]
| tester756 wrote:
| >Maybe in gigantic systems with more tables than makes
| sense. Realistically, all of the columns available in a
| database can be fit in memory with ease.
|
| Every table has more than one column
|
| So there's always more columns to remember than tables
| and generally tables are pretty easy like user invoices
| blabla
|
| I worked with systems that had like 500 tables and some
| of them with 20-50 columns
|
| you really want good intellisense in such a environment
| taeric wrote:
| 500 times 50 is still not a big number. And you could do
| decent statistical suggestions on the current columns in.
|
| Good intelligent suggestions is, of course, helpful. And
| I agree that suggesting one of 500 is easier than the
| other. That said, neither is hard for a computer. And
| even asking friends what table I want will often be done
| with starting with the actual columns I want.
| tester756 wrote:
| On the other hand I don't see why it couldnt work both
| ways
|
| if you start query with
|
| SELECT MiddleName then you could receive auto complete
| thats adds "FROM Users" and moves your cusor after
| MiddleName.
|
| if you start query with
|
| FROM Users SELECT _ and know intellisense drops list of
| columns
| taeric wrote:
| Agreed. I'm really just arguing that it doesn't have to
| be from first.
| lemmsjid wrote:
| That's interesting because it also explains why I was going
| to say I do like having from first. When trying to reason
| about a query, I mentally go through the following:
|
| 1. What tables are being pulled from? This speaks to the
| potential domain of the query. 2. What data is being selected
| (I can now know what is or isn't being pulled from the
| aforementioned tables...) 3. What operations, aggregations,
| groupings, etc. are being performed to work on the pulle data
|
| Of course from vs select ordering is completely arguable, but
| my thinking process seems to follow that of the auto complete
| --in other words that my cognitive load of looking at the
| select statement is lessened when I know from what the
| columns are being selected.
|
| It also follows (at least to me) the mental process of
| writing the query. First look at the tables, then decide what
| columns, then decide what functions to apply.
| taeric wrote:
| I said it in a sibling, but I feel this is somewhat missed.
| Auto complete that simply lists the tables is easier if from
| is first. But... Auto complete that helps me know what tables
| can give me my requested columns works the other direction.
| KerryJones wrote:
| This feels like a English-language thing. In english we tend to
| put our adjectives first, it feels natural, "Where is my red,
| round ball?", rather than some other languages (like German)
| where you put the subject first. Equivalent of "Where is my
| ball, red & round?"
|
| While it inherently feels unnatural I do agree with the others
| here that the context is actually easier to understand once
| over the initial uncomfort.
| mcsoft wrote:
| Both CTEs and this idea address the same problem: poor
| readability of complex SQL queries. Compared to CTEs, the
| author takes the idea to split the complex query into parts to
| the next level.
|
| To your point - a solid IDE will show you what's being
| processed at each line (or returned, if the cursor is on the
| last line) - in an autocomplete window or a side panel.
| andreygrehov wrote:
| I like it. Was it inspired by CloudWatch Logs Query Language? [0]
| Looks somewhat similar.
|
| [0] -
| https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL...
| goodlinks wrote:
| maybe i am jaded (too many hours reading and writing sql as it
| is) but the structure of SQL is very closely linked to how i
| write a query:
|
| what data do i want to see: most important
|
| where doest is come from: next most important thing
|
| What filters and restrictions am i going to put on it: least
| important
| btown wrote:
| For any who want this kind of pipelining way-of-writing-SQL that
| has the benefit of existing in live production databases today, I
| highly, highly recommend looking into Postgres' and Snowflake's
| LATERAL JOIN keyword.
|
| https://heap.io/blog/postgresqls-powerful-new-join-type-late...
|
| https://stackoverflow.com/questions/28550679/what-is-the-dif...
|
| https://docs.snowflake.com/en/sql-reference/constructs/join-...
|
| The TL;DR is that they allow you to reuse annotated and
| aggregated columns in an incredibly elegant way. Compared to OP's
| proposal, you still do need to start the query with what columns
| you want to come out at the end, and normal SQL weirdnesses still
| apply - but it's far, far, easier when writing massive analytics
| queries to see the flow of variables from one stage to another.
| snidane wrote:
| You don't need additional SQL repetitive cruft.
| from population select country, rollup(city), count(*)
| sort
|
| Can represent this repetitive SQL query: select
| country, city, count(*) from population group
| by country, rollup(city) order by country, city
|
| Information in group by is often redundant. You can tell which
| columns are measures vs dimensions by examining the 'aggregate'
| function - rollup or no function vs sum, count, avg. Order by can
| have a default to sort by all columns instead of naming them one
| by one.
| rchrch wrote:
| Awesome! Would love to see an implementation. I worked on
| something similar over the Summer. It's just relational algebra
| with pipes for composition. If you are interested, we could get
| an antlr grammar going and plug it into this basic execution
| engine to get a feel for the language.
|
| - https://github.com/RCHowell/Sift -
| https://github.com/RCHowell/Sift/blob/main/src/main/kotlin/c...
| maximilianroos wrote:
| Yes this looks really cool, and similar! Feel free to hit me up
| on Twitter https://twitter.com/max_sixty
| cies wrote:
| I love quality language proposals like this. I'm not so much in
| data processing/bigdata, but have had to interact with SQL a lot.
|
| This syntax is lovely! It's more intuitively readable (and SQL is
| not that bad in that regard).
|
| My feedback:
|
| 1. Lower case, underscored everything makes the terms a bit hard
| to differentiate. Maybe set some classes of symbols in CamelCase,
| or add a !@#$%& prefixes to them to make it more readable.
|
| 2. I dont like to use another language (SQL or PRSQL for that
| matter) to db interaction, I like to write the queries in the
| language that I'm using to develop in. There are ORMs in this
| design space, but I'm a little fed up with them. In Java there's
| jOOQ. Other less-OO-more-functional ORMs exist in Rust and
| Haskell land. These often have a code generation step, a library
| is generated that guarantees some type safety for a give schema
| version. Some are more SQL-like, some provide a different API.
| PRQL is much more diverted from SQL than these, and for good
| reasons. Maybe several languages could easily have libs like this
| building on top of JPQL?
|
| 3. You solution is a bit like GraphQL in some regards; where
| there is a tool needed to convert the query to SQL. Tools like
| this exist, like Hasura and the likes. Hasura does a lot more. To
| me GraphQL has the huge advantage of serving a schema so that
| clients can be generated. I can interact with GraphQL in a type
| safe fashion from by generating a client in, say, Elm. The
| generated client lib does not allow my to write syntax errors in
| my queries and ensures all type conversions are sound. Maybe PRQL
| can also be a language like GraphQL in that regard, and provide a
| schema too.
|
| 4. JPQL. It's close to SQL. It improves to SQL, but I never found
| it enough of an improvement to justify the cost. I think your
| proposal is better. But still I think JPQL deserves a mention as
| maybe one of the most widespread compile-to-SQL languages.
| bachmeier wrote:
| My thought is that joins are the tough part of the SQL learning
| curve, but I don't see much in here that reduces the complexity
| of joins.
| omarhaneef wrote:
| There was this professor of language who would say "Do you
| think the question ('are carpets furniture?') tells you
| something about the ambiguity of the word carpet, or do you
| think it tells you something about the ambiguity in the world?"
|
| Similarly, I think joins are "tough" not because of the way SQL
| expresses them but because the logical possibilities of merging
| data from multiple tables are varied.
| bob1029 wrote:
| There is no such thing as a domain-agnostic SQL database that
| holds up under this kind of semantic scrutiny. I don't think
| that there ever _could_ be.
|
| If you are rolling a SQL schema for a home improvement
| contractor, it is extraordinarily unlikely that their
| specific business would expect any scenarios in which carpets
| are sometimes known as furniture.
|
| Having a bounded context to operate within is what makes SQL
| magical for me. When people don't understand the business or
| simply the game around how you _talk_ about the business,
| things start getting messy wrt joins.
| omarhaneef wrote:
| The carpet discussion was simply to say that you can't take
| out all the complexity of a language if the domain it is
| meant to describe is complex. The language has a limit to
| how simple it can be.
|
| I was not proposing a SQL database of carpets, or
| furniture, as a thought experiment.
| bachmeier wrote:
| Then make some cases easier and fall back to the SQL we
| already have for the rest?
| Supermancho wrote:
| SQL has effectively failed, as a standard, despite it's
| ubiquity. It's literally being aged out, which makes for
| opportunities for PRQL, etc to fill pragmatic gaps.
|
| eg the lack of default column aliasing from joins
| SELECT A.id AS A__id, A.name AS
| A__name, B.id AS B__id, B.name AS
| B__name FROM A LEFT JOIN B
| ON A.other_id = B.other_id
|
| When you could have: SELECT
| A.*, B.* FORMAT (TABLE__)
| FROM A LEFT JOIN B ON A.other_id =
| B.other_id
| cogman10 wrote:
| IMO, this appears not to be something that solves the SQL
| learning curve but rather the usability of a query language
| with tooling.
|
| I don't think there is much that could be done to address left,
| right, inner, outer join semantics. It's just something you
| have to learn if you want to do a lot of SQL (though, you are
| likely only ever going to use left and inner joins).
| munk-a wrote:
| I like the flow direction compared to standard SQL. SQL is
| supposed to read like a sentence I suppose but I have many times
| looked at it and really wanted things to be in a more logical
| order.
|
| My main suggestion would be to be a bit less terse and introduce
| a bit more firm formatting. I'm not a huge fan of the term
| "split" and feel like jazzing that up to "split over" or even
| just reviving "group by" would improve readability. Additionally
| the aliasing could use work, I'd suggest reversing the assignment
| to be something closer to `use salary + payroll_tax as gross
| salary`. In terms of firm formatting, unless I'm missing
| something there isn't any reason to allow a filter statement
| before any aliases - so you can force two fixed positions for
| filter clauses which would make it always legal to reference
| aliases in filters.
|
| On the brief topic of form vs. flexibility. SQL is a thing that,
| when complex, is written by many people over the course of its
| lifetime - removing the ability to make bad decisions is better
| than enabling the ability to write simple things even simpler -
| those silly do nothing queries like "SELECT * FROM customer WHERE
| deleted='f'` are written once[1] in a moments time and never
| inspected again. The complex queries are what you want to
| optimize for.
|
| 1. If they even are - with ORMs available a lot of those dead
| simple queries just end up being done through an ORM.
| hn_throwaway_99 wrote:
| > On the brief topic of form vs. flexibility. SQL is a thing
| that, when complex, is written by many people over the course
| of its lifetime - removing the ability to make bad decisions is
| better than enabling the ability to write simple things even
| simpler
|
| Hallelujah! But, to your footnote, this is a major reason why I
| despise ORMs. In my mind they make writing simple code slightly
| easier, but they make complicated SQL statements, especially
| when you get some weird issue under load and you're trying to
| debug why your DB is falling over, a _ton_ more difficult and
| you spend so much time just battling your ORM.
| jnsie wrote:
| I like the flow direction specifically for
| intellisense/autocomplete. I'm sure it would be easier to
| provide hints when the table name is known immediately.
| m1sta_ wrote:
| I'd love for the next release of SQL to have optional
| alternative ordering of clauses
| maximilianroos wrote:
| This is great feedback, and I agree with you re de-prioritizing
| terseness.
|
| _And_ I agree with you on both the assignments and `split`
| being a bit awkward. Kusto just uses `by`, WDYT?
| tomtheelder wrote:
| Not the original commenter, but just using `by` makes total
| sense to me.
| maximilianroos wrote:
| I've made this change [1]. Thank you!
|
| [1] https://github.com/max-
| sixty/prql/commit/dde7fcfc13daaadbdce...
| Serow225 wrote:
| <3
| munk-a wrote:
| By actually sounds great to me to, yea. In this case it's
| short but it's extremely communicative!
| loic-sharma wrote:
| Yes Kusto's `by` is excellent!
| dragonwriter wrote:
| > It's not declarative. It's functional.
|
| Functional, logical, and relational paradigms are the most
| commonly cited examples given of declarative programming.
___________________________________________________________________
(page generated 2022-01-24 23:02 UTC)