hngopher.com

       [HN Gopher] Show HN: PRQL in PostgreSQL
       ___________________________________________________________________
        
       Show HN: PRQL in PostgreSQL
        
       This extension let's you write PRQL functions in PostgreSQL.  When
       I first saw PRQL on Hacker News a few months ago, I was immediately
       captivated by the idea, yet equally disappointed that there was no
       integration for PostgreSQL. Having previous experience with writing
       PostgreSQL extensions in C, I thought this was a great opportunity
       to try out the pgrx framework and decided to integrate PRQL with
       PostgreSQL myself.  The maintainers of both PRQL and pgrx were very
       nice to work with. Thanks guys.
        
       Author : kaspermarstal
       Score  : 222 points
       Date   : 2024-02-19 11:27 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | andy_ppp wrote:
       | Very interesting, it looks a lot like the Elixir package Ecto
       | that has a DSL for writing SQL queries. Obviously there are some
       | differences here and I wonder if the compiler can do further
       | optimisations than Ecto can but interesting to see they align
       | quite a bit.
        
       | lkuty wrote:
       | I don't understand the need for libraries that abstract away SQL
       | when you could just write SQL directly and have full access to
       | the power of the language which is quite rich (recursive CTE,
       | windowing, ... aka Modern SQL). You could also use stored
       | procedures/functions for more complex stuff and e.g. JSON (or
       | native types) to transfer data between the database and the
       | application. Why limit ourselves with a sub-optimal language be
       | it PRQL, Ecto, other ORMs, ... ?
        
         | murkt wrote:
         | PRQL is not an ORM in any kind of way, it's the less quirky
         | SQL, basically.
        
         | davedx wrote:
         | SQL is pretty great, but for some things (I thought the example
         | in the readme was decent) it can be quite cumbersome. Stored
         | procedures are very imperative (and hard to debug, depending on
         | the platform IME). This seems like a more functional approach
         | to stored procedures.
         | 
         | I don't think it's suggested that this replaces SQL. Use the
         | right tool (and abstraction) for the job?
        
           | lkuty wrote:
           | Maybe. I already think in terms of transformations
           | (relational algebra and its closure property) when I write
           | SQL and use a lot of CTEs. But I guess the functional way
           | might help people see what's going on.
        
           | CuriouslyC wrote:
           | I think the debuggability is the #1 issue with database-as-a-
           | platform. Using notices to debug functions is such a poor
           | workflow that even though I'm bullish on putting stuff in PG
           | I avoid writing complex code might need to be debugged as an
           | integrated unit.
        
         | hosteur wrote:
         | SQL is incompatible with many types of autocompletion. For
         | example columns in a select statement are not known here you
         | write FROM. This alone justifies PRQL in my opinion.
        
           | CuriouslyC wrote:
           | There are tools in most languages to deal with this. For
           | example, take https://jawj.github.io/zapatos/. It introspects
           | your database schema to generate types, and gives you
           | autocompletion inside tagged template strings.
        
           | nwatson wrote:
           | Jetbrains tools (Datagrip, IntelliJ IDEA, PyCharm) deal well
           | with this if you leave a placeholder column before FROM, and
           | write the FROM part with proper joins. You can go back to the
           | columns and autocomplete goes just fine. You can amend our
           | extended any part and the autocomplete logic adapts well.
        
           | da_chicken wrote:
           | It's not a real problem in a practical sense. Yes, you have
           | to write the FROM clause before autocomplete happens even
           | though the SELECT must appear before it. Fortunately,
           | however, text editiors used as IDEs allow for out-of-sequence
           | code editing. You can just _enter the FROM_ and move back to
           | complete the SELECT.
           | 
           | It's like complaining that you have to know the variable name
           | you're going to assign something to before you start writing
           | the expression that will set the value.                 int y
           | = x * 2;
           | 
           | The idea is to evaluate the expression and store it, but the
           | expression doesn't actually read that way left-to-right.
           | Wouldn't it make more sense for it to be:                 x *
           | 2 assign to new int y;
           | 
           | Technically, that's written more in execution order. In
           | practice it just isn't that big of a deal. It only trips up
           | beginners.
        
         | snthpy wrote:
         | This comes up every time PRQL makes it onto HackerNews and is a
         | fair question:
         | 
         | Short answer: DX
         | 
         | Slightly longer answer: Developer productivity and experience,
         | especially for EDA and interactively writing complex analytical
         | queries.
         | 
         | Most people that have tried PRQL just find it more convenient
         | to write their analytical queries in it. PRQL compiles to SQL
         | so it can't express anything you can't already do in SQL, but
         | you can probably express yourself much faster in PRQL.
         | 
         | Just try the following query in the online PRQL Playground
         | (https://prql-lang.org/playground/) to find the longest track
         | per album:
         | 
         | ```prql
         | 
         | from tracks
         | 
         | group album_id (                 sort {-milliseconds}
         | take 1            )
         | 
         | ```
         | 
         | How long would it take you to write the SQL for that?
         | 
         | Disclaimer: I'm a PRQL contributor.
        
           | hans_castorp wrote:
           | > How long would it take you to write the SQL for that?
           | select distinct on (album_id) *         from tracks
           | order by album_id, milliseconds desc;
        
             | dewey wrote:
             | With the additional benefit that resources about SQL are
             | everywhere and every question has been answered already
             | multiple times. On top of that you are most likely to
             | encounter SQL at a job than PRQL.
        
             | hobs wrote:
             | You'd also need a LIMIT or a TOP, and you might need a
             | subquery for that depending on your SQL of choice.
        
               | hans_castorp wrote:
               | The question was "the longest track per album", so I'd
               | expect one row per album as the result (which is what my
               | query does). But adding a LIMIT would not do that.
               | 
               | It might be that the question was "the longest track
               | across all albums" which indeed would require a LIMIT
        
             | da_chicken wrote:
             | Eh, `DISTICT ON` is a custom PostgreSQL extension.
             | 
             | A standard* method would be:                   SELECT *
             | FROM tracks         QUALIFY row_number() over (partition by
             | album_id order by milliseconds desc) = 1;
             | 
             | But the QUALIFY clause is so new that it doesn't work on
             | most RDBMSs. If you're on MS SQL Server, you're still
             | using:                   SELECT *         FROM (
             | SELECT *             ,row_number() over (partition by
             | album_id order by milliseconds desc) rn           FROM
             | tracks           ) x         WHERE x.rn = 1;
             | 
             | That said, I still don't think PRQL is particularly
             | amazing. I can't tell if it's merely syntactic sugar for
             | SQL, or if it's actually meant to control query execution.
             | If it's the former, it's likely to frustrate developers
             | because it's actually just _another_ layer of abstraction.
             | If it 's the latter, then it requires the developer to not
             | only understand the data model well enough to be able to
             | write SQL queries, they need to be able to understand the
             | RDBMS impementation details well enough to be able to write
             | queries that best take advantage of the current database's
             | indexes, statistics, and configuration. Even something as
             | simple as sorting before filtering or projecting can be a
             | significant performance issue. Nevermind the fact that
             | relational algebra done in the wrong order can be non-
             | deterministic or not equivalent transformations, so even if
             | the query processor is smart enough to do rewrites whatever
             | the developer enters might be logically different
             | unintentionally.
             | 
             | Ultimately I think it's a tool that lets the developer
             | thinking about the problem in the way they prefer, rather
             | than thinking about the problem in the way that best suits
             | the problem at hand. Like insisting on writing
             | documentation in LaTeX instead of Word or Markdown.
             | 
             | *: I believed this was in SQL 2023, but double checking it
             | looks like it did not have make the final standard. I would
             | be surprised if it didn't make it in the future, however.
        
               | 6gvONxR4sf7o wrote:
               | Damn. Reading your comment, i was about to be really glad
               | that this pain would be a thing of the past before too
               | long. Too bad it didn't make the standard :(
        
               | da_chicken wrote:
               | Well, even when it's part of the standard it will take
               | about 6 years before your vendor chooses to implment it.
               | 
               | And even then, it'll be another 6 years before your
               | application vendor finally upgrades to it.
               | 
               | And even then, it'll be another 6 years before the
               | database feature is allowed to be enabled.
               | 
               | And even then, your reporting software won't support it.
        
               | reactivenz wrote:
               | Snowflake has had it for years, and even then it is
               | trivial to write as a nested SELECT.
               | 
               | The bigger problem, is the "over use of tools to boost
               | developers" the real problem, is putting poor developers
               | into the pipeline. complex analytics is SQL is just
               | simple, and lovely. The fact other struggle is not going
               | to be helped by pretending the "code" looks more C like,
               | they need to learn to think like a performant machine,
               | and then be productive.
        
               | hans_castorp wrote:
               | The PRQL extension discussed here is for Postgres, so my
               | solution was Postgres specific.
               | 
               | Since when is QUALIFY part of the SQL standard? So far I
               | have only seen it as a proprietary feature in Terradata.
        
           | lkuty wrote:
           | Sorry to be that guy :-) The `introduction.prql` example on
           | the playground gives an idea of the better readability of
           | PRQL vs SQL and your small example an idea of the speed you
           | may gain from writing PRQL vs SQL. It is interesting. Indeed,
           | me writing the SQL would have taken more time than you
           | writing the PRQL.
        
           | GordonS wrote:
           | > How long would it take you to write the SQL for that?
           | 
           | I don't want to appear rude, but unless I'm missing
           | something, this is a pretty simple SQL query, of the kind
           | anyone with mimimal SQL experience could write off the top of
           | their head in seconds.
           | 
           | I like the idea of PRQL, but I think a better example is
           | needed to sell it.
        
             | quesera wrote:
             | I think the logic here is: SQL is hard and most people
             | don't know it well. So PRQL is perhaps easier to learn.
             | 
             | The same logic is applied to TypeScript vs JavaScript. Or C
             | vs assembly. Or Nano vs vim. Etc etc.
             | 
             | It's a quandary though. SQL is clearly difficult for most
             | people to get their heads around. It does require a
             | different way of thinking about data, and you can get by
             | with a minimal SQL knowledge for a long time, especially
             | nowadays with ORMs.
             | 
             | But like so many other things, making the investment is
             | worthwhile and pays off in small and large ways, forever
             | (so far).
        
               | lowercased wrote:
               | > making the investment is worthwhile and pays off in
               | small and large ways
               | 
               | And... 'making the investment' takes time, and means that
               | time is not able to be invested someplace else.
               | 
               | If the majority of your job is writing SQL or similar
               | (data access, etc) then sure - yes, learn more of those
               | tools. Some folks have a wider range of responsibilities
               | that means you have to decide what to make more time
               | investments in, and saying 'yes' to something is
               | necessarily saying 'no' to other things.
        
               | setr wrote:
               | The hard part of SQL is thinking relationally. This
               | obviously doesn't do anything to affect that -- if they
               | actually thought this, they'd be making the same mistake
               | as SQL itself "the reason business users don't program is
               | because it's not english enough"
               | 
               | The problem this is resolving, if it successfully
               | resolves anything at all, is that the SQL language is a
               | mess of random keywords, inconsistent syntax requirements
               | and generates some of the worst error messages known to
               | man. It's an attempt at making SQL a consistent, simple
               | language -- ideally exposing the data model more directly
               | and with less noise
        
               | ttfkam wrote:
               | I prefer "thinking in sets". Either way, folks try to map
               | objects and structs to their databases (I blame ORMs
               | personally) when the analogy just isn't so. It's like
               | translating French to English word by word and wondering
               | why folks have trouble understanding you.
               | 
               | As for SQL being too much like English, making the syntax
               | closer to a general purpose functional programming
               | language isn't necessarily an improvement in my opinion.
        
             | audunw wrote:
             | I have a moderate amount of SQL experince, but I could not
             | write that query at the top of my head. Maybe you
             | misunderstand what the PRQL query is doing?
             | 
             | Here's the SQL it generates:
             | 
             | WITH table_0 AS ( SELECT _, ROW_NUMBER() OVER ( PARTITION
             | BY album_id ORDER BY milliseconds DESC ) AS _expr_0 FROM
             | tracks ) SELECT_ FROM table_0 WHERE _expr_0  <= 1
             | 
             | If I understand PRQL correctly, it finds the longest song
             | for each album? A simple concept, but not a simple MySQL
             | query.
        
               | tyre wrote:
               | SELECT DISTINCT ON (album_id) * FROM tracks GROUP BY
               | album_id ORDER BY milliseconds DESC;
               | 
               | For those unfamiliar with Postgres, DISTINCT ON takes
               | only one row for each of the groups based on the supplied
               | columns. So in this case, it will return only one row per
               | album_id.
               | 
               | Without an ORDER BY, DISTINCT ON chooses a random row
               | (not actually, but you can't rely on it.) Since we ORDER
               | BY milliseconds DESC, the first row of each group will be
               | the longest one.
        
               | nightpool wrote:
               | that works for top-1 but breaks completely if you want to
               | extend it to top-2
        
               | ttfkam wrote:
               | The example was asking for top 1. Yes, you are correct
               | that you'd have to switch up to a window function to
               | handle top-n+1.
               | 
               | Do you believe these cases are common enough to warrant
               | discarding the tools and training available to SQL? Are
               | you also certain that PRQL doesn't have corner cases
               | where SQL is more concise and/or easier to understand?
        
               | 6gvONxR4sf7o wrote:
               | It's a pretty common pattern (take the top N by group
               | with some order), so I'd expect that it's a familiar
               | pattern to a lot of people who regularly do analytical
               | queries in SQL. It's clearer with formatting/naming:
               | WITH ranked_tracks AS (             SELECT *,
               | ROW_NUMBER() OVER (                      PARTITION BY
               | album_id                     ORDER BY milliseconds DESC
               | ) AS row_num             FROM tracks         )
               | SELECT *         FROM ranked_tracks         WHERE
               | row_num = 1
               | 
               | But it is still super ugly for such a common need. If I
               | were to add syntax to make this kind of thing easier, I'd
               | just go for a syntax that made something like this valid:
               | SELECT *         FROM tracks         WHERE
               | ROW_NUMBER() OVER (                  PARTITION BY
               | album_id                 ORDER BY milliseconds DESC
               | ) = 1
               | 
               | Which apparently the QUALIFY statement does in a few
               | dialects.
        
             | bvrmn wrote:
             | I'm not a PRQL fun. But TBH I can't write this SQL from my
             | head. I have thousand lines of written SQL.
        
             | cmpb wrote:
             | This "find the latest row for each <column>" query is kind
             | of a poster-child for seemingly-simple but actually
             | difficult to get right/performant sql.
             | 
             | E.g. see:
             | https://stackoverflow.com/questions/1313120/retrieving-
             | the-l...
        
           | CuriouslyC wrote:
           | This seems like a micro-optimization to me. That's faster to
           | write, but you pay for that with extra tooling and an
           | abstraction layer that you now have to train or hire for, and
           | I'm not sure that's a good payoff.
           | 
           | This idea seems like it'd be better as an editor plugin that
           | lets you write shorthand and have it automatically expanded
           | into correct SQL rather than as a build time thing.
        
             | sph wrote:
             | You can apply this logic to choosing a high level language
             | vs writing assembly code.
             | 
             | Yes the pipeline is more complex, there are more tools and
             | more syntax to track, but the benefits are pretty clear (or
             | we'd all be writing UI code in hand written asm)
             | 
             | While leaky abstractions are a huge problem, the thing
             | about abstractions is that, if they are any good, the
             | benefits and improvements to productivity outweigh the
             | negatives. You just have to figure out if the gains PRQL
             | could give you are worth the effort.
             | 
             | I rarely write SQL, so it's not worth it for me. But if
             | PRQL were an actual query engine, not just a translation
             | layer, and some database offered a native PRQL interface, I
             | would immediately switch to it rather than to keep twisting
             | my brain with SQL and it's inane syntax and rules.
             | 
             | (I was a full time DBA in one previous life, so I should be
             | more comfortable with SQL than most.)
        
               | CuriouslyC wrote:
               | I don't think the analogy of with js/etc and assembly is
               | quite fair, the difference in readability between this
               | and sql isn't on that order, and if it were I'd be a lot
               | more bullish on this project :)
        
           | gigatexal wrote:
           | select id from tracks qualify row_number() over partition by
           | (album_id order by milliseconds desc) =1
           | 
           | That should work
           | 
           | Look I'm a diehard SQL just use it guy but open to
           | improvements. But I'm loathe to use abstractions for things
           | when the underlying thing is so expressive.
           | 
           | Autocomplete of fields in a good editor, schema help, etc go
           | a long way to making SQL being written raw very nice.
        
             | orlp wrote:
             | QUALIFY is not part of the SQL standard.
        
               | gigatexal wrote:
               | Then a subselect with the window function and an outer
               | where clause where the window function column = 1
        
               | da_chicken wrote:
               | I'm pretty sure QUALIFY was added in SQL 2023. Maybe it
               | was only discussed and didn't make it.
        
               | orlp wrote:
               | I don't know, and I don't feel like paying $200 per
               | chapter to find out.
        
               | riku_iki wrote:
               | more importantly you can run prql on Postgres today, but
               | can't use qualify yet
        
           | halayli wrote:
           | They can ask chatgpt to generate the SQL query and use the
           | SQL output that everyone is familiar with rather than use an
           | abstraction that is prone to versioning and behavioral
           | changes and will consume everyone else's time to go learn a
           | new language and become profficient in it.
        
             | oulipo wrote:
             | if they need ChatGPT to generate the query (costly and
             | possibly error-prone), then it means that not everyone is
             | familiar with the output...
        
           | quesera wrote:
           | Reformatted for readability (indent code blocks with two
           | spaces):                 from tracks       group album_id (
           | sort {-milliseconds}         take 1       )
           | 
           | Editorializing:
           | 
           | Data query specification is all about getting the details
           | right. This does not look simpler than the corresponding SQL
           | to me though. All components must be present -- scope, group,
           | limit, order.
           | 
           | SQL, for all its faults, is generally succinct at
           | incorporating the required details. The PRQL sample here is
           | succinct as well, but to me at least, not differentiating.
        
           | abirch wrote:
           | This would have been awesome for me 2 years ago.
           | 
           | Currently much of my complicated SQL is generated by a LLM.
        
             | buremba wrote:
             | Hmm, I would think that LLM helps adoption for the semantic
             | layers such as PRQL, Malloy, and dbt since it's possible to
             | generate/validate/iterate 5 lines of PRQL compared to 25
             | lines of SQL but considering none of them widely adopted
             | yet, you might indeed be correct in a way that LLM makes it
             | harder for the new tools to gain adoption by helping you to
             | suffer less from the verboseness of SQL.
        
               | staticautomatic wrote:
               | It's a tough call. I run a small analytics team and am
               | starting to train some analysts to code. Just the other
               | day I basically told one of my reports to focus on
               | learning Python and let ChatGPT teach him SQL by example
               | because I think it'll be easier to grok the explanations.
               | Now I'm looking at PRQL and Malloy and asking myself if
               | it's really a path I should send them down, and I'm not
               | sure it's a good idea.
        
               | buremba wrote:
               | I just tried ChatGPT to generate some Malloy snippets and
               | compared to SQL, it's very basic. It's probably not a
               | huge lift to teach it the library by scanning the docs
               | but still the reasoning with SQL is much sophisticated
               | given that there are tons of training data.
        
           | beeboobaa wrote:
           | Won't this just lead to developers wrecking performance
           | because they don't understand what's happening?
        
           | elbear wrote:
           | I have a feeling this language will be more familiar to
           | programmers who think functionally. I say this because it
           | seems to consist of transformations applied to data and of
           | `derive`, which defines new variables later used in other
           | transformations.
        
           | OJFord wrote:
           | In the real world, probably something like:
           | select           album_id            ,first_value(id order by
           | milliseconds desc) as longest_track_id         from tracks
           | group by album_id
           | 
           | I agree the PRQL's pretty nice here, but I think such a
           | generalised example (chances you actually want to `select
           | *`?) overstates the advantage.
        
             | da_chicken wrote:
             | This query would error: "Field `id` is neither in the GROUP
             | BY nor in an aggregate function." Since `id` is a key in
             | `tracks` and `album_id` is a foreign key and not unique in
             | `tracks`, the RDBMS wouldn't be able to use any implicit
             | GROUP BY determinism, either.
             | 
             | You could do this:                   select distinct
             | album_id            ,first_value(id order by milliseconds
             | desc) as longest_track_id         from tracks
             | 
             | But this is unlikely to perform as well as the row_number()
             | method because it will cause the RDBMS to generate a record
             | for every track and then waste time sorting the
             | intermediate results to find the unique records in the
             | output.
        
             | riku_iki wrote:
             | your query will return just album_id and id, not other
             | track fields
        
         | baq wrote:
         | > Why limit ourselves with a sub-optimal language be it PRQL?
         | 
         | Actually why limit yourself with SQL...?
         | 
         | PRQL is a language compiled into SQL and makes certain hard-to-
         | do things in SQL easy purely because it allows to streamline
         | operations which SQL needs CTE joins or whatever hoop jumping
         | to solve.
         | 
         | My favorite example which sounds easy but isn't - select the
         | row which is MAX(...).
        
           | ttfkam wrote:
           | > Actually why limit yourself with SQL...?
           | 
           | Because it's everywhere, has extensive documentation and
           | tutorials, all database tools support it, all relational
           | engines support it, some non-relational engines support it,
           | all programming languages have library support for it, it can
           | be accessed through command line tools as well as graphical
           | interfaces, etc.
           | 
           | You think an industry is going to give up 50 years of
           | infrastructure because some (typically junior) devs think the
           | syntax is "kinda icky"?
           | 
           | > My favorite example which sound easy but isn't - select the
           | row which is MAX(...).
           | 
           | If you look earlier in the comments you will see queries that
           | have "DISTINCT ON" in them. It solves the problem that sounds
           | easy, but actually is pretty easy if you know SQL.
        
             | jeroen79 wrote:
             | I agree, sql is well documented and an industry standard,
             | no need to make it more complex by adding preprocessors
             | that do nothing but change the syntax.
             | 
             | Just bite in and learn proper sql.
        
               | ttfkam wrote:
               | I would _love_ relational database engines to adopt the
               | same syntax when they support a feature. There is no
               | compelling business case for them to do so though.
               | Browsers converged largely in response to IE 's
               | dominance. If Firefox, Safari, and Opera had each gone
               | their own way, no single one of them could ever get
               | enough marketshare to get developers to care. Together
               | they made a block significant enough that Microsoft was
               | forced to join in or die. (Actually did effectively die
               | once Edge was moved to the Blink engine. Now it's a
               | satellite of a mostly standards-compliant effort.)
               | 
               | But databases aren't browsers. Devs just have a single
               | target usually: whatever database engine the company
               | decided to use.
               | 
               | Still it would be nice to have greater overlap to reduce
               | the niggling details between them that only seem to exist
               | today due to inertia rather than technical necessity.
        
         | vendiddy wrote:
         | The thing I hate most about SQL is lack of composability.
         | 
         | In most languages it's easy to pull out functions.
         | 
         | In SQL you end up with a giant hard to comprehend mess.
         | 
         | I think the underlying relational concepts in SQL are sound but
         | I'd love to see ideas like PRQL that aim make SQL easier to
         | write and maintain.
         | 
         | Stored procedures and functions are nice but don't allow the
         | basic idea of breaking a large query apart into smaller logical
         | components.
        
           | CuriouslyC wrote:
           | Your statement about breaking large queries apart is wrong.
           | You can write queries with CTES to improve readability, and
           | extract CTEs into functions that can be selected as queries
           | get too large and unwieldly. SQL is just as composable as any
           | other language.
           | 
           | The thing that's lacking right now is the tooling for
           | managing/testing/deploying database code. There are solutions
           | out there and the supabase folks have been working to make
           | things better but database first development still has some
           | hurdles in terms of DX.
        
             | jpgvm wrote:
             | One issue with functions though is that they can change
             | performance in unpredictable ways. For example a colleague
             | of mine recently altered a function I had written that was
             | used in multiple hot-path queries. The change he made
             | accidentally caused the function to no longer be inline-
             | able on PostgreSQL. Once the function couldn't be inlined
             | then the PostgreSQL planner wasn't able to select the
             | appropriate indices and the performance of several of the
             | queries exploded by about 100x.
             | 
             | So while it's true it can be composed etc the current state
             | of the art planners struggle except under very
             | simple/constrained scenarios.
        
               | CuriouslyC wrote:
               | If a function is marked stable it should not impact the
               | query plan at all, since stable functions are essentially
               | in-lined before planning. If logic is unstable a view is
               | probably going to be a better abstraction than a
               | function.
        
               | jpgvm wrote:
               | Well I thought so but even with the functions marked
               | IMMUTABLE, which is even more stringent than STABLE the
               | in-lining was not successful, this was apparent in the
               | query plan.
               | 
               | This might be a special case however as the function
               | called another function internally (also IMMUTABLE) which
               | was essentially memoized using an expression index. This
               | is the index that was no-longer hit when inlining failed.
               | 
               | If you think this is bug I think I can create a minimal
               | reproduction.
        
               | CuriouslyC wrote:
               | That does sound like a bug, the planner should be
               | inlining all of that. I would mention it on the postgres
               | mailing list so a committer with more experience in how
               | the planner marshals all that stuff together can weigh
               | in.
        
               | jpgvm wrote:
               | I guess I got nerd-sniped. I dug a bit deeper into what
               | was happening and the issue is the new version of the
               | function called to_char() which turns out isn't IMMUTABLE
               | which broke the inlining!
        
               | zeroimpl wrote:
               | An immutable function cannot query a table because the
               | table itself isn't immutable. If your stable/immutable
               | flags don't match reality, the function can't be inlined.
        
               | jpgvm wrote:
               | Details in sibling but I dug a bit deeper, new version
               | used to_char, turns out that is STABLE and not IMMUTABLE
               | so because the volatility didn't match the whole way down
               | anymore it broke inlining.
               | 
               | I'm guessing switching the function that calls to_char to
               | STABLE will fix the problem.
        
               | zeroimpl wrote:
               | Makes sense. You mentioned index so I thought maybe you
               | were were querying a table.
               | 
               | Would be nice if postgresql could tell you when the flags
               | don't match. I think anytime you deal with timestamps you
               | can have problems since the expression may depend on the
               | session's time zone.
        
               | da_chicken wrote:
               | I'm not sure how adding another layer of abstraction
               | helps with that.
               | 
               | The problem you encoutered is that the rewritten function
               | was either no longer table-valued, or else it was no
               | longer deterministic (which is what that big list of
               | rules for inlining really means). But that problem
               | doesn't go away by adding a layer of abstraction. The
               | need to understand relational determinism doesn't
               | disappear. The need to understand SARGability doesn't go
               | away. You can't really abstract the problem away.
        
             | vendiddy wrote:
             | Do you have any links to a basic example on using CTEs and
             | functions to keep SQL maintainable?
             | 
             | I've used CTEs, but I had not tried breaking up an SQL
             | query into functions. Didn't know that was possible!
             | 
             | For whatever reason, I feel like I end up with a giant blob
             | of SQL when writing SQL and it's incredibly frustrating.
        
               | CuriouslyC wrote:
               | You can just use chatgpt to rewrite sql with ctes, and
               | extract functions. It's quite good at it, particularly
               | gpt4.
               | 
               | That being said, CTEs are a really good way to write
               | complex queries. They let you tag bits of query with
               | meaningful names, and each thing you tag is accessible to
               | every CTE after it so you can build up an almost
               | imperative data flow by just doing select transforms one
               | after another. That way you're building hard queries from
               | the bottom up rather than the top down.
        
               | ttfkam wrote:
               | Agreed. It's so nice to be able to query intermediates to
               | see what their output looks like and just check each step
               | of the CTE during debugging. It was such an improvement
               | over most subselects!
        
             | Izkata wrote:
             | > You can write queries with CTES to improve readability,
             | and extract CTEs into functions that can be selected as
             | queries get too large and unwieldly.
             | 
             | Personally I'd go for breaking them into views. IIRC as of
             | around postgres 11-13 they're no longer a barrier for the
             | query planner.
        
               | CuriouslyC wrote:
               | You are right, views are a good choice in a lot of
               | instances. Functions give you more behavioral
               | flexibility, easy multiple version support (in postgres)
               | and fewer issues with ddl dependencies, but views are
               | semantically clearer, easier to work with and give good
               | usage flexibility.
        
           | ako wrote:
           | Composition is available in sql, but works a bit different
           | than in a procedural language. In sql you express sets of
           | data, composition consists of defining subsets that you
           | compose into more complex sets. Views and CTEs are the tools
           | for composition in SQL.
        
         | haspok wrote:
         | One notable reason is being database-agnostic. Like ORM's, if
         | you can generate SQL you can generate database-specific SQL as
         | well.
         | 
         | SQL is also quite verbose in places (JOINs are the most trivial
         | example), and lack a decent amount of abstraction (CTEs are
         | relatively low level).
         | 
         | Updating a large set of FK'd tables can be a nightmare (this is
         | what ORMs shine at).
         | 
         | Finally, some modern additions are quite unreadable, Postgres'
         | JSON syntax, for example.
         | 
         | I'm not saying that PRQL solves any of the above, but these are
         | all legit problems with "plain" SQL.
        
           | CuriouslyC wrote:
           | Database agnosticism is so 2010. There's very little reason
           | to choose a DB other than postgres, and if you have a reason
           | to choose a specific niche db you're not probably not going
           | to be migrating away from it any time soon.
           | 
           | CTEs are a first step in structuring queries to make them
           | decomposable. You can extract CTEs to functions and mark them
           | stable and it's logically equivalent to the original query.
        
             | haspok wrote:
             | > There's very little reason to choose a DB other than
             | postgres
             | 
             | Sure, if you are a startup, or write your own code. But for
             | most people the choice of database(s) is a given, and they
             | are not in a position to challenge that. At the end of the
             | day, Oracle has to make a living, too...
        
               | ttfkam wrote:
               | From parent:
               | 
               | > you're not probably not going to be migrating away from
               | it any time soon.
               | 
               | Oracle has its own optimizations and foot-guns that
               | extend well beyond what you can represent in a database-
               | agnostic API. And once you're on that DB, you can write
               | DB-agnostic and have performance be relatively horrible
               | or require a careful rewrite of your schema and stored
               | procedures when you migrate. There is not door number
               | three.
               | 
               | Writing a common layer for any and all relational
               | databases is like using a Java UI library for all
               | operating systems. Sure, it will work, but it will have
               | obvious shortfalls, be immediately recognizable as such
               | to anyone familiar with the underlying platform, be
               | inconsistent with other apps on that platform, and leave
               | any opportunities for efficiency and performance on the
               | floor.
               | 
               | Say you want a pivot table. In Oracle and MS SQL, it's
               | built in. In Postgres, it's possible but noticeably more
               | annoying. In MySQL, it's simply not possible. How would
               | you represent this in a database-agnostic way? And yet
               | performing in the app layer is very much slower/less
               | efficient.
               | 
               | Did you know Oracle supports parallel DML for enhanced
               | performance and lower multi-query latency? You have to
               | intentionally use though, and neither Postgres nor MySQL
               | support it at all.
               | 
               | What about global temporary tables? Those especially
               | aren't found in Postgres or MySQL and are not easily
               | swapped into the app layer without a massive performance
               | penalty.
               | 
               | Per-user namespaces are yet another Oracle-ism that just
               | doesn't translate to other DB engines, but you definitely
               | should know about.
               | 
               | If you're making a living from Oracle, earn your pay.
               | Make the most of what you've got.
        
             | riku_iki wrote:
             | > There's very little reason to choose a DB other than
             | postgres
             | 
             | Postgres is single server OLTP DB with complicated failover
             | story, it is strong enough reason to consider some other
             | contenders e.g. CocroachDB, SpannerDB for distributed OLTP
             | or OLAP specialized ClickHouse, BigQuery, DuckDB.
        
               | ttfkam wrote:
               | Aren't most of those wire-compatible with Postgres?
        
               | riku_iki wrote:
               | query language likely not fully supported
        
               | ttfkam wrote:
               | Yes, all engines are unique. Still wire compatible and
               | still SQL though.
        
               | riku_iki wrote:
               | It's obvious, but what is your point, why are you saying
               | this?
        
             | gaganyaan wrote:
             | I like PostgreSQL quite a bit and pushed for using it by
             | default at my company, but SQL pops up in a surprising
             | number of places. You can use it in PowerBI, Snowflake, and
             | pandas just to name a few. You don't always control which
             | DB you're talking to, and it would be nice if the SQL
             | interface that gets presented in those scenarios got an
             | upgrade. PRQL looks like a really promising option for
             | that.
        
           | ako wrote:
           | How are joins verbose? It's pretty straight to the point:
           | combine these 2 tables on these columns... what do you want
           | to remove to make it less verbose?
        
         | andix wrote:
         | Because SQL can be cumbersome to write. It's often repetitive,
         | requires nesting, aliases, and a specific order of statements.
        
           | ttfkam wrote:
           | 1. Notice language is complicated for some tasks
           | 
           | 2. Propose newer, simpler language to take care of these
           | 
           | 3. Newer, simpler language lacks features of original
           | language
           | 
           | 4. Newer language adds features, making it more complicated
           | 
           | 5. GOTO 1
        
             | andix wrote:
             | That's why we stopped innovation of programming languages
             | at C89? Why use sql at all if you can also do it in C?
        
               | ttfkam wrote:
               | Do you think SQL stopped at SQL-92?
               | 
               | Arrays, JSON, CTEs, window functions, booleans, MERGE,
               | temporal tables, regular expressions, foreign tables (aka
               | SQL/MED), etc.
               | 
               | SQL hasn't been sitting still, (though ORMs seem to lead
               | folks to believe it is).
               | 
               | Do you currently write K&R C or modern C (C11 or C17)?
               | 
               | Don't get me wrong, I don't think SQL is perfect. Far
               | from it. But PRQL isn't fixing the defects in SQL I care
               | about. For example in DDL, NOT NULL should be the default
               | rather than nullable. When I declare a column as a
               | foreign key, I shouldn't have to specify the type again
               | when the system already knows what the referenced type
               | is. Then again, PRQL is for querying, not data
               | definition, so it doesn't actually solve my biggest issue
               | at all.
               | 
               | SQL could perhaps be more terse. I agree with a lot of
               | folks that FROM should have been first and SELECT near
               | last. That's pretty uselessly subjective. The argument
               | that it's not composable falls flat for me though. Views,
               | CTEs, foreign tables, set-returning functions, etc. are
               | all forms of composability within SQL. When you think in
               | terms of sets, they all fit quite well together. If
               | you're not thinking in sets, it doesn't belong in the
               | database in my opinion.
               | 
               | The underlying engines themselves have been innovating
               | like gangbusters. Using the same wire protocol, folks can
               | connect to a standard Postgres database, a massive
               | CockroachDB cluster, Supabase, and all points in between
               | without changing a client library. The same is true for
               | MySQL, MariaDB, and PlanetScale.
               | 
               | Time series DB? Just use SQL. Analytics? SQL. Document-
               | oriented data? There's even standard JSON query syntax
               | within SQL.
               | 
               | It works. It doesn't generate huge amounts of CVEs like C
               | has. (DB libraries have the SQL injection attacks, not
               | SQL itself.) And it scales fairly seamlessly from Google
               | Spanner all the way down to embedded SQLite.
               | 
               | But folks assert it's irreparably broken and needs urgent
               | replacement. Having trouble buying it. Perhaps I just
               | haven't seen the right replacement yet. That may indeed
               | be the case. I just don't see PRQL being that
               | replacement. It feels a lot more like a lateral move to
               | me at best, and that's just too disruptive compared to
               | potential benefit.
        
         | adrian_b wrote:
         | In my opinion SQL is the sub-optimal language.
         | 
         | Whenever I am writing SQL I am not thinking in SQL, but I am
         | thinking in what I consider to be the mathematical sound way,
         | which I translate into SQL while writing. I consider thinking
         | in SQL a much greater mental handicap than having to translate
         | mentally into it.
         | 
         | I would prefer to write directly in what I would consider as a
         | good query language and have it translated automatically into
         | SQL, for compatibility with what is, for unfortunate historical
         | reasons, the standard.
         | 
         | I have not attempted previously to do or use something like
         | this, but work like that discussed here seems like a step in
         | the right direction.
        
           | ttfkam wrote:
           | Can you expand on this? I have always seen SQL as a DSL for
           | set theory. How are you seeing things "mathematically" that
           | do not comport with set manipulation in SQL?
        
       | zoky wrote:
       | Can someone explain to me why "Show HN" and "Ask HN" posts are
       | always, _always_ in hard-to-read light gray? Is there some cabal
       | that automatically downvotes such posts? Is there something
       | intrinsic to _The Algorithm_ that penalizes such posts? Or do I
       | just have really bad luck and always manage to click through on
       | posts that have gained traction despite having gotten enough
       | downvotes to put them at risk of sudden termination?
       | 
       | This is one of the great HN mysteries to me, and if anyone can
       | shed some light on it, it would be much appreciated.
        
         | murermader wrote:
         | I think I read here on HN some time ago that it is
         | intentionally hard to read to discourage posts with text. I
         | think the reasoning was that posting links to external blogs /
         | websites is usually higher quality than someone creating a
         | quick post on HN.
         | 
         | Could be wrong though, just writing this from memory.
        
           | throwaway143829 wrote:
           | HN has some really non-obvious UX, between this and the
           | mysterious green usernames I still don't understand. And the
           | fact that only some users can downvote. Or the weird logic
           | behind which words cannot appear in titles.
        
             | quesera wrote:
             | Green usernames are new accounts.
             | 
             | Only users with 500 karma can downvote.
             | 
             | Both are mechanisms to dull the potency of new users until
             | they have a chance to learn how HN is expected to work.
             | 
             | This might sound gatekeeping, and it literally is, but
             | consider than HN signup is open and takes 15 seconds with
             | no verification. HN likes the way HN works and these
             | provide simple rate-limits on destructive or oblivious
             | change.
        
               | Terretta wrote:
               | > _this might sound gatekeeping, and it literally is_
               | 
               | That term presumes unkept gates are preferable.
               | 
               | I grew up farming. One keeps one's gates or both wildlife
               | and livestock run amuck.
        
               | Izkata wrote:
               | > Green usernames are new accounts.
               | 
               | It made sense to me almost immediately, so since people
               | aren't making the connection: https://www.oxfordlearnersd
               | ictionaries.com/us/definition/eng...
               | 
               | >5 (informal) (of a person) young and without experience
               | 
               | > _The new trainees are still very green._
        
             | saganus wrote:
             | Green usernames are for new users.
        
             | PartiallyTyped wrote:
             | A comment has a green username if at the time of commenting
             | the author was new. The colour remains. If you look at your
             | first comments, you will see them green as well.
             | 
             | YCombinator founders have their own colours which are only
             | visible to each other.
             | 
             | There are plenty of undocumented features like this one.
        
         | pxeger1 wrote:
         | They aren't downvoted; it's just a styling thing. I believe
         | dang gave some reasoning for deliberately making the post text
         | less prominent, but I can't remember it.
        
         | quesera wrote:
         | They are not penalized, the grey text applies to all text
         | posts.
         | 
         | It's confusing, but greying out is used on HN for a single
         | purpose: to discourage reading (and therefore writing).
         | 
         | Low-quality comments are greyed by downvotes from other users
         | and moderators.
         | 
         | Text posts (including Ask and Show HN) are greyed
         | automatically.
         | 
         | Ask/Show posters are encouraged to post a comment on their own
         | story, and to let that comment rise or fall according to its
         | up/down votes.
         | 
         | Meta-meta: Your comment might be downvoted for being meta to
         | the post. It looks like it has already been "detached" from the
         | comment tree so that it appears at the bottom instead of
         | responding to up/downvotes. This is actually protective of your
         | karma and this conversation. Your question is valid, but it's
         | not germane to the post. If your comment was allowed to float
         | to its normal location, it would be downvoted by others who
         | considered it off-topic.
        
       | billfruit wrote:
       | Just out of curiosity, did PRQL evolve from some theoretical
       | innovation or did arise bottom up from practical utilities?
       | 
       | Is it a new tool with great new powers or is it just syntactic
       | sugar?
        
         | xmonkee wrote:
         | it compiles to SQL so no extra powers. But it does make some
         | common patterns more succinct. They have good documentation on
         | the website.
         | 
         | Personally, I was very excited about using it to write some
         | complex queries in my application that does some fancy
         | backtesting with sliding windows etc, but I reverted back to
         | SQL pretty quickly because I found myself first thinking in SQL
         | and translating back to PRQL :/
        
         | ttfkam wrote:
         | Yep, just syntactic sugar.
        
       | philipodonnell wrote:
       | I often wonder if NL-SQL tasks would benefit from an intermediate
       | query language that is more compatible with the next-logical
       | token approach that is used to generate the code. Obviously there
       | is less of this in the training set, but if it transpires in a
       | testable way, you could generate training data yourself from
       | known good sql queries? Are there any languages that have been
       | designed specifically for this?
        
       | dvdkon wrote:
       | Damn, now my bachelor's thesis will be less unique :)
       | 
       | I'm working on a new language that compiles directly to Postgres'
       | post-analysis structs. It's working out pretty well so far, but
       | my chosen "universal set" (aggregation/array/subquery/... as one
       | thing) semantics are sometimes a pain to encode.
        
         | dangoldin wrote:
         | Not to change your direction but something I've been toying
         | around is being able to support Algebraic types when defining
         | tables. That way you can offload a lot of the error checking to
         | the database engine's type system and keep application code
         | simpler.
        
           | dvdkon wrote:
           | I'd like to do something like that too, if/when I ever get to
           | replacing the DDL. In Postgres you could create custom types
           | for tagged unions, but it might be better to translate table-
           | level unions to a set of constraints, for performance and
           | flexibility (you can't create referential integrity
           | constraints using expressions IIRC).
        
           | andyferris wrote:
           | Sounds wonderful. I actually think this is the highest value
           | thing anyone could contribute to Postgres (assuming it could
           | handle foreign key constraints inside the sum types).
        
         | buremba wrote:
         | Sounds interesting! What's the benefit of compiling directly to
         | Postgres's internal structs over compiling to SQL?
        
           | dvdkon wrote:
           | There's little direct benefit, since the internal structs
           | pretty closely model SQL. But having the language compiler a
           | part of the Postgres process does help. It gives you easy
           | access to the database's structure, so you know the type of
           | every identifier, what columns tables have, what functions
           | are available, etc. You can then do your own (better) error
           | reporting and, more importantly, move away from SQL's
           | semantics.
           | 
           | For example, I want to have universal broadcasting of
           | operators on subquery results, array values, and aggregated
           | columns. To do this, I need to know which of these the
           | operand expressions represent, which is slow or impossible
           | with transpilation.
        
       | fforflo wrote:
       | Nice work. A few months back, I experimented with having a DSL
       | like PRQL in Postgres, but back then, I found the language a bit
       | cumbersome; however, it was great as an idea. IMHO, the best
       | "data transformation" language is jq and awk is second.
       | 
       | PRQL and EdgeQL (EdgeDB) are the most interesting ones to watch
       | how they evolve, though.
       | 
       | I've also written a PG extension to make jq available in Postgres
       | [0]
       | 
       | I believe Postgres, in general, will flourish as a host for DSL
       | languages [1].
       | 
       | 0: https://github.com/Florents-Tselai/pgJQ 1:
       | https://tselai.com/pgjq-dsl-database.html
        
         | canadiantim wrote:
         | Would love to see EdgeQL become adopted beyond EdgeDB. I don't
         | like the vendor lock-in with EdgeDB, but I think they're doing
         | great work
        
           | fforflo wrote:
           | Yes, they are. Beyond the core database offering, I'd also
           | like to underline the quality of their software engineering
           | work. There are not many Python-powered databases out there;
           | their codebase has some real gems from the setup.py to their
           | core compiler and Postgres-based storage layer.
        
           | 1st1 wrote:
           | > Would love to see EdgeQL become adopted beyond EdgeDB
           | 
           | We'll soon be announcing some interesting developments on
           | that front, stay tuned :)
        
           | ttfkam wrote:
           | "A jaw-dropping amount of effort has been spent attempting to
           | bridge the gap between the relational paradigm of SQL and the
           | object-oriented nature of modern programming languages.
           | EdgeDB sidesteps this problem by modeling data in an object-
           | relational way."
           | 
           | All the best to the team. I however truly hope this isn't the
           | direction the industry moves toward. I thought we learned our
           | lesson from MongoDB. I still believe data is best modeled in
           | sets, not objects.
           | 
           | The solution isn't for databases to become more like object
           | stores but for general purpose programming languages to be
           | more amenable to seamless access of set-oriented data.
           | 
           | More stuff like this:
           | 
           | https://github.com/porsager/postgres
           | 
           | https://github.com/launchbadge/sqlx
        
       | fkyoureadthedoc wrote:
       | PRQL looks interesting enough to at least give it an honest try.
       | 
       | > PRQL allows for powerful autocomplete, type-checking, and
       | helpful error messages (in progress)
       | 
       | Without some kind of autocomplete though I'm a lot less motivated
       | to do so.
        
       | grepknfss wrote:
       | You should make the first instance of "PQRL" in your readme a
       | link to that project.
        
         | kaspermarstal wrote:
         | Good suggestion, thanks
        
       | dang wrote:
       | Related:
       | 
       |  _PRQL as a DuckDB Extension_ -
       | https://news.ycombinator.com/item?id=39130736 - Jan 2024 (47
       | comments)
       | 
       |  _PRQL: Pipelined Relational Query Language_ -
       | https://news.ycombinator.com/item?id=36866861 - July 2023 (209
       | comments)
       | 
       |  _Calculate the Digits of Pi with DuckDB and PRQL_ -
       | https://news.ycombinator.com/item?id=35153824 - March 2023 (1
       | comment)
       | 
       |  _One Year of PRQL - a modern language for relational data_ -
       | https://news.ycombinator.com/item?id=34690560 - Feb 2023 (1
       | comment)
       | 
       |  _PRQL: a simple, powerful, pipelined SQL replacement_ -
       | https://news.ycombinator.com/item?id=34181319 - Dec 2022 (215
       | comments)
       | 
       |  _Show HN: PRQL 0.2 - a better SQL_ -
       | https://news.ycombinator.com/item?id=31897430 - June 2022 (159
       | comments)
       | 
       |  _PRQL - A proposal for a better SQL_ -
       | https://news.ycombinator.com/item?id=30060784 - Jan 2022 (292
       | comments)
        
       | mikpanko wrote:
       | There are very interesting improvements to SQL, which are much
       | more ergonomic, extend functionality, and provide higher-level
       | abstractions. Also backward compatible. PRQL and Malloy
       | immediately come to mind but there are more. Anybody has good
       | explanations why they struggle to get wide adoption?
        
         | chobanilover wrote:
         | When it comes to data stack tooling, organizations aren't
         | always optimizing for a better way to do things as much as
         | minimizing the worst possible scenario.
         | 
         | New syntax is nice, but it means that analysts and engineers
         | need to learn something new and are more likely to make
         | mistakes that could bubble up to production. There's always an
         | argument to be made why shiny new tool XYZ is better, but
         | unless it's 100X better, organizations are reluctant to switch
         | from something like vanilla PostgreSQL that they know works
         | 100% of the time.
        
           | chaps wrote:
           | Normally I would agree, except for the fact that this system
           | works by converting PRQL to SQL. So it's not 100% throwing
           | out the baby with the bath water, since there there are still
           | means for newer engineers to learn SQL through this tooling!
        
       | smabie wrote:
       | Can this be used inside Grafana?
        
       | landingunless wrote:
       | Nice to see extensions like this one developed in Rust/pgrx.
       | Reminds me of https://github.com/tcdi/plrust
        
         | kaspermarstal wrote:
         | Yes, the PL/Rust code base was a very useful when developing
         | this extension
        
       | brikym wrote:
       | It looks a lot like Microsoft's Kusto query language which is a
       | pleasure to use. Piping is better than nesting and from-first is
       | the way to go as it's necessary for autocomplete.
        
       | dataspun wrote:
       | Why is this PRQL extension for Postgres limited to Mac and Linux?
       | What dependencies on Windows are the obstacles, and is there an
       | expected solution in the near-term?
        
         | kaspermarstal wrote:
         | This extension has been developed on top of pgrx and depends on
         | the platforms that pgrx supports. From the pgrx readme:
         | 
         | > Windows is not supported. It could be, but will require a bit
         | of work with cargo-pgrx and figuring out how to compile pgrx's
         | "cshim" static library.
        
       | mt_ wrote:
       | Is PRQL faster than the standard query model?
        
         | Sammi wrote:
         | PRQL compiles to SQL. Why would it be faster?
         | 
         | The intention is rather for it to be simpler, as it uses a
         | linear direction of data handling. SQL jumps back and forth
         | with its order of operations and can be confusing in this way.
         | 
         | PRQL also has a more modern syntax that reuses more universal
         | concepts with fewer keywords to learn. In contrast to SQL which
         | has a unique keyword, syntax, and behavior for everything.
        
       ___________________________________________________________________
       (page generated 2024-02-19 23:00 UTC)