[HN Gopher] SQL pipe syntax available in public preview in BigQuery
       ___________________________________________________________________
        
       SQL pipe syntax available in public preview in BigQuery
        
       Author : marcyb5st
       Score  : 183 points
       Date   : 2025-02-10 10:38 UTC (3 days ago)
        
 (HTM) web link (cloud.google.com)
 (TXT) w3m dump (cloud.google.com)
        
       | whalesalad wrote:
       | Who invented pipe syntax? Elixir? Or is there another runtime or
       | language that made this prevalent beforehand?
        
         | ellisv wrote:
         | The |> syntax? Perhaps F#. Pipes have been around since at
         | least Unix v3.
        
           | _huayra_ wrote:
           | Many such things from F# come from Haskell (given the
           | influence from MSR UK, specifically Simon Peyton Jones before
           | he left a few years ago), so likely Haskell or earlier imo
           | (at least in terms of "complex" "pipe" operations that have
           | higher-level operations than "bytes in, bytes out" of Unix).
        
             | ubertaco wrote:
             | Worth noting that F# started out life as an implementation
             | of OCaml for the .NET runtime [1], so most likely the pipe
             | syntax was taken from there, although the pipeline-of-
             | functions construction is much older than that [2]
             | 
             | [1] https://softwareengineering.stackexchange.com/questions
             | /2099...
             | 
             | [2] https://retrocomputing.stackexchange.com/questions/1733
             | 5/wha...
        
               | grndn wrote:
               | OCaml took the '|>' pipe symbol from F#. And F# was the
               | language that made the '|>' pipe symbol popular in
               | mainstream programming (as opposed to the unix '|' pipe
               | symbol), afaik. According to Don Syme, it was used in F#
               | in 2003 (see "Early History of F#", section 9.1, [1]
               | which references [2]).
               | 
               | Here's his full comment:
               | 
               | /quote
               | 
               | Despite being heavily associated with F#, the use of the
               | pipeline symbol in ML dialects actually originates from
               | Tobias Nipkow, in May 1994 (with obvious semiotic
               | inspiration from UNIX pipes) [archives 1994; Syme 2011].
               | 
               | ... I promised to dig into my old mail folders to uncover
               | the true story behind |> in Isabelle/ML, which also
               | turned out popular in F#...
               | 
               | In the attachment you find the original mail thread of
               | the three of us [ Larry Paulson; Tobias Nipkow; Marius
               | Wenzel], coming up with this now indispensable piece of
               | ML art in April/May 1994. The mail exchange starts as a
               | response of Larry to my changes.
               | 
               | ...Tobias ...came up with the actual name |> in the
               | end...
               | 
               | /endquote
               | 
               | Haskell has had "$" or "backwards pipe" for ages, but
               | that is just another way of doing function application
               | and it does not feel the same as (and is not used the
               | same way as) the unix-style piping paradigm.
               | 
               | [1] https://fsharp.org/history/hopl-final/hopl-fsharp.pdf
               | 
               | [2] https://web.archive.org/web/20190217164203/https://bl
               | ogs.msd...
        
         | xnx wrote:
         | Originally? Douglas McIlroy for Unix in 1973:
         | https://en.wikipedia.org/wiki/Pipeline_(Unix)
        
           | jmount wrote:
           | I agree. And then functional languages pick up the pipe, as
           | it is roughly "poor man's method chaining" (so the object
           | oriented languages don't need it).
        
         | nothrabannosir wrote:
         | Shell would at least qualify as prior art, I'd assume
        
         | grndn wrote:
         | Elixir took the pipe symbol from F# (See Jose's comment at
         | [1]). For more details see my other comment below.
         | 
         | [1] https://elixirforum.com/t/which-language-first-introduced-
         | th...
        
       | pjmlp wrote:
       | Ever since using Kusto Query Language, that I look forward to SQL
       | getting something like this, maybe there is hope if enough DBs
       | have it as an extension.
        
         | DarkCrusader2 wrote:
         | I heavily use KQL on a daily basis and want nothing more than
         | using it for Postgres and SQlite as well.
        
       | Taikonerd wrote:
       | This is so exciting. I hope Postgres and MySQL get some level of
       | support for it, too.
        
       | notpushkin wrote:
       | PRQL is a similar idea, and it compiles to SQL: https://prql-
       | lang.org/                   from invoices         filter
       | invoice_date >= @1970-01-16         derive {
       | transaction_fees = 0.8,           income = total -
       | transaction_fees         }         filter income > 1
        
         | mritchie712 wrote:
         | there's an duckdb extension[0] for it too
         | 
         | 0 -
         | https://duckdb.org/community_extensions/extensions/prql.html
        
         | throwaway127482 wrote:
         | I think I like this new pipe syntax a lot better than PRQL.
         | Feels like it has the potential to become officially supported
         | by DBs like MySQL since it's a relatively small change to
         | existing SQL syntax.
        
         | gorkaerana wrote:
         | I'm tempted to give it a try just for the "For HackerNews
         | enthusiasts" section of their landing page, which states "The
         | PRQL compiler is written in Rust" and "We talk about
         | "orthogonal language features" a lot". Love me some banter in
         | technical documentation :D
        
         | benrutter wrote:
         | I love the design behind PRQL, was a little dissapointed when I
         | tried using it with an MSSQL server and found there was no
         | decent way to filter for `LIKE '%something%'`.
         | 
         | (PRQL uses regex for stuff like this, which sounds great but
         | then isn't supported on some databases like MSSQL)
        
           | default-kramer wrote:
           | I'm very surprised to learn that PRQL does not natively
           | support `like`, but you can add it yourself: https://github.c
           | om/PRQL/prql/issues/1123#issuecomment-135385...
        
             | promiseofbeans wrote:
             | Huh, it's super cool they have a built-in way of extending
             | to support missing engine features!
        
         | e1g wrote:
         | "Malloy" is one one in the same spirit, but probably more
         | comprehensive https://github.com/malloydata/malloy
        
         | snthpy wrote:
         | SQL pipe syntax is a great step in the right direction but what
         | I think sets PRQL apart (in my very biased opinion,
         | contributor) is the ability to define functions.
         | 
         | Here's a simple example:                   # define the
         | "take_smallest" function         let take_smallest = func n col
         | tbl<relation> -> (          from tbl          sort col
         | take n          )                  # find smallest 3 tracks by
         | milliseconds         from tracks         take_smallest 3
         | milliseconds
         | 
         | You can try this now in the online playground: https://prql-
         | lang.org/playground/
         | 
         | That's simple enough and there's not that much gained there but
         | say you now want to find the 3 smallest tracks per album by
         | bytes?
         | 
         | That's really simple in PRQL and you can just reuse the
         | "take_smallest" function and pass a different column name as an
         | argument:                   from tracks         group album_id
         | (          take_smallest 3 bytes          )
        
           | snthpy wrote:
           | Here's a more complex example for sessionizing user events:
           | # specify the target SQL dialect         prql
           | target:sql.duckdb                  # define the "sessionize"
           | function         let sessionize = func user_col date_col
           | max_gap:365 tbl<relation> -> (             from tbl
           | group user_col (             window rows:-1..0 (
           | sort date_col               derive prev_date=(lag 1
           | (date_col|as date))               )           )
           | derive {             date_diff = (date_col|as date) -
           | prev_date,             is_new_session = case [date_diff >
           | max_gap  || prev_date==null => 1, true => 0],             }
           | window rows:..0 (             group user_col (
           | sort {date_col}               derive user_session_id = (sum
           | is_new_session)             )             sort {user_col,
           | date_col}             derive global_session_id = (sum
           | is_new_session)           )           select !{prev_date,
           | date_diff, is_new_session}         )                  # main
           | query         from invoices         select {customer_id,
           | invoice_date}         sessionize customer_id invoice_date
           | max_gap:365         sort {customer_id, invoice_date}
           | 
           | You can also try that in the playground: https://prql-
           | lang.org/playground/
        
         | ashu1461 wrote:
         | In general in languages like PRQL / Pipe syntax, though
         | conciseness is a big advantage, do they help in query
         | optimisation in any regard ?
        
         | digikata wrote:
         | IIRC the SQL pipe google paper references PRQL. I think both
         | make sense in their own context. SQL pipe is a smaller leap for
         | users familiar with SQL and prql sort of takes on more change.
         | I do wonder if the smaller leap might cause more deja-vu
         | displacement initially than a bigger shift like PRQL. I don't
         | know the answer other than look at users of both over time and
         | see how they do with the different syntaxes.
        
       | Taikonerd wrote:
       | Relatedly, PRQL is a lovely pipe-oriented syntax for SQL DBs:
       | https://prql-lang.org/
       | 
       | Their syntax is a lot cleaner, because it's a _new_ language --
       | it 's not backwards-compatible with SQL. But then again, they
       | don't have the resources of Google behind them.
        
         | systems wrote:
         | if its not backward compatible with SQL, is not pipe syntax for
         | SQL
         | 
         | its just a new Query language
         | 
         | most DBMS have proprietary extension to SQL, its messy, real
         | success would be adding the new syntax to the standard and have
         | implemented across multiple DBMSs
        
           | Taikonerd wrote:
           | Well, OK. PRQL is "for SQL DBs" in the sense that you can
           | translate it to SQL on the client side, and then execute it
           | on regular Postgres, MySQL, etc.
        
             | RyanHamilton wrote:
             | That's exactly what qstudio offers:
             | https://www.timestored.com/qstudio/prql-ide a desktop
             | client that translates prql to run against many databases.
        
       | epistasis wrote:
       | This is beautiful.
       | 
       | I first encountered this style of data manipulation in R's
       | tidyverse. Some say C# (or F#?)had similar pipes and influences,
       | but I haven't seen specifics.
       | 
       | Some other data libraries like Polars have sort of similar parse
       | trees, but they also have lots of extra cruft because of
       | limitations to the underlying language. Python's functional
       | calling semantics are extremely powerful, but not quite as
       | powerful as R's.
        
         | jddj wrote:
         | The C# and F# bit is LINQ (language integrated query) and it's
         | a lovely piece of engineering.
        
           | ziml77 wrote:
           | The LINQ syntax was an interesting experiment but the value
           | of that syntax always felt minimal to me. Like it looked out
           | of place in the middle of my C# code, so instead I just use
           | the extension methods.
        
             | neonsunset wrote:
             | It grows on you after you start using F# too
        
       | nicoritschel wrote:
       | Also is in DuckDB https://github.com/ywelsch/duckdb-psql
        
         | conradev wrote:
         | and SQLite is waiting on Postgres:
         | https://news.ycombinator.com/item?id=41385005
        
       | jmull wrote:
       | I like it, but I'm not sure it's worth the trouble of further
       | expanding an ever-ballooning extended family of SQL syntaxes.
       | 
       | Of course, SQL has numerous issues, both in an absolute sense and
       | relative to what we've come to expect. And the obvious course of
       | action in each individual case is to add syntax to support the
       | missing feature.
       | 
       | But as you keep adding syntax, SQL gets more and more complex,
       | both in the individual variants and due to the complexities of
       | multiple variants with different syntax support at different
       | times.
       | 
       | A transpiling approach makes more sense to me. I'd like the sql
       | implementors to focus on source maps and other things to better
       | support plugging in external, alternate syntaxes (is a
       | standardized mechanism too much to ask for?).
       | 
       | Then individual projects/people/efforts can choose the SQL syntax
       | variant that works for them, and it can be one that evolves
       | separate from the host db.
        
         | _dark_matter_ wrote:
         | There are a lot of transpilers, and they don't get much
         | adoption because the people implementing them are not the
         | people implementing the database.
         | 
         | The database and the query language are tightly coupled. The
         | latter determines the kinds of optimizations that the former
         | can do, and a transpiled language means that optimization info
         | needs to be passed through in weird and inefficient ways; it
         | also means that obvious optimizations aren't possible, because
         | the people building the language are not building the db.
         | 
         | If you've ever tried used query hints, or deeply inspected a
         | distributed query plan, you'll know exactly what I mean.
        
         | 9rx wrote:
         | _> A transpiling approach makes more sense to me._
         | 
         | That's the solution we've been using, oft referred to as ORM,
         | to work around the lack of this syntax. But then you had to
         | live in a world where you had an almost SQL, but not really,
         | language that doesn't always quite map perfectly to SQL and all
         | the pain to go along with that.
         | 
         | Now instead of that song and dance you can compose queries with
         | plain SQL and simple string concatenation. That is a huge win,
         | even if troubling that it has taken so long (and still not
         | widely supported).
        
           | jmull wrote:
           | Generally, ORM's try to present a high-level abstraction of
           | your data and data model than the engine does natively.
           | 
           | That makes it the ORM's responsibility to implement the
           | abstraction in a good way. Problems come in when the "good"
           | way depends on factors the ORM doesn't consider or expose
           | well.
           | 
           | I actually mean transpiler in a mainly syntactic sense, and
           | would want it to avoid higher-level abstractions (after all,
           | as you point out, ORMs have that ground covered).
        
             | 9rx wrote:
             | While that is closer to ORM in the classical sense, the
             | modern use - and why I used 'oft referred to' - is more
             | accurately a query builder, but get called ORM regardless.
             | The reason people reach for this type of tool is because
             | there hasn't been a good solution to this with SQL proper,
             | necessitating another language. Until now.
        
         | harrall wrote:
         | Plain transpiling won't work because when you write a SQL
         | query, you have to know what engine is running it because they
         | all do worse at some things than others.
         | 
         | For example, the same result could be fetched using a
         | correlated subquery, a common table expression, a temporary
         | table, a view or very hackily using aggregation and it would
         | depend on whether you were using SQL Server, Postgres, MySQL,
         | or SQLite because they don't do it all fast. ...and you need to
         | know the version.
         | 
         | This might be able to be solved if the language was a brand new
         | higher level one though and the compiler knew the intricacies
         | of each engines.
        
           | michaelmior wrote:
           | It seems like this is solved by just having the transpiler
           | aware of the target system.
        
           | gav wrote:
           | One example is LookML, which used to build semantic data
           | models in an analytic layer:
           | https://cloud.google.com/looker/docs/what-is-lookml
        
             | anon84873628 wrote:
             | LookML does seem to have invested a lot in compilation to
             | different SQL backends, generally using the best syntax for
             | each.
             | 
             | Unfortunately the frontend is so tightly tied to the Looker
             | BI stuff, and it can't really express arbitrary queries
             | without going through lots of hassle.
             | 
             | Its unclear what Google is doing with Looker. Its would be
             | interesting to imagine what LookML would be like as an
             | independent SQL tool chain.
        
               | amcaskill wrote:
               | The creator of LookML is actually working on that with a
               | OSS project called Malloy.
               | 
               | https://www.malloydata.dev/
        
               | leoh wrote:
               | >LookML does seem to have invested a lot in compilation
               | to different SQL backends, generally using the best
               | syntax for each.
               | 
               | To some degree, yes. Yet far and away, users of Looker
               | use engines like RedShift, BigQuery, and Snowflake
               | because they're extremely effective at the types of
               | queries that Looker sends at them -- not because Looker
               | spends a huge number of hours optimizing for each engine
               | (that's not to say none is done); these dbs are great at
               | analytical queries.
               | 
               | Looker in its earlier days (early/mid 2010s) took a bet
               | on analytical database engines getting better as opposed
               | to other technologies; for example, Tableau had its own
               | datastore and largely did not "push queries down to the
               | database" for execution. In the end, BigQuery was
               | radically faster than SparkSQL and was compelling for
               | customers, for example; it was not that Looker spent a
               | ton of time optimizing BigQuery as opposed to SparkSQL.
               | 
               | Source: I was an early engineer at Looker
        
           | crazygringo wrote:
           | It's also not just that -- it depends on the data.
           | 
           | A database can change how it decides to execute a query based
           | on whether a table has 100 rows or 10,000,000 rows.
        
             | harrall wrote:
             | Yeah that's true. Also, you, as the query writer, know how
             | much the table has and will have and sometimes you decide
             | what to do based on that information too.
        
           | jmull wrote:
           | Yes, any such transpiler would certainly need to be engine
           | aware (for multiple reasons). Right off the bat, the output
           | needs to be in a syntax the engine accepts, and there are all
           | kinds of differences from engine to engine, version to
           | version.
           | 
           | And, as you say, if the language contains higher-level
           | abstractions, it might need to work quite differently
           | depending on the target engine.
           | 
           | My thought is to avoid those higher-level abstractions as
           | much as possible and just expose the capabilities of the
           | engine as directly as possible, albeit with a different
           | syntax. In my experience, developers who are willing and able
           | to write SQL are fine with targeting a specific engine and
           | optimizing to that. (Those that aren't get someone else to do
           | it, or live with he sad consequences of dealing with an ORM.)
           | 
           | To summarize:
           | 
           | Normal Approach: you pick an engine, and get the syntax that
           | comes with it. You need to know what the engine does well and
           | doesn't do well. You write SQL accordingly, using the syntax
           | the engine accepts.
           | 
           | Transpiler Approach: you pick an engine, and independently
           | choose a syntax. You still need to know what the engine does
           | well and doesn't do well. You still write SQL accordingly,
           | but using the syntax of the language you chose.
        
             | rednafi wrote:
             | I don't know if the transpiler approach really helps when
             | you still have to be aware of the underlying database
             | engine. I'm fine with choosing something like PostgreSQL
             | and writing code optimized for its specific SQL dialect.
             | 
             | But I'm not sure I want to deal with a transpiler that
             | still suffers from a leaky abstraction. It starts to feel a
             | lot like using an ORM. ORMs promise database-agnostic
             | syntax across different engines, but that promise falls
             | apart when you need to optimize a query and suddenly
             | realize you must account for engine-specific features--like
             | PostgreSQL's BRIN indexes.
        
               | jmull wrote:
               | To me, abstracting away the engine is not a goal.
               | (Generally, it's pointless -- the various engines work in
               | different ways and offer different features. While
               | there's a bunch of stuff that widely common, that's also
               | the stuff you don't really need to abstract.)
               | 
               | The point of transpiling is to allow you to pick a syntax
               | _independent_ of the engine. You still have to pick the
               | engine and will deal with its peculiarities.
               | 
               | Today's post is about pipe syntax, which is syntax.
               | Yesterday's was about trailing commas, which is syntax. I
               | think there's an appetite for this.
        
           | jamesblonde wrote:
           | We transpile from Python to SQL using Calcite. Calcite works
           | well as an engine that knows enough SQL dialects to be
           | practical.
        
             | leoh wrote:
             | +1 for Calcite. Any more details on what you're doing? Are
             | you using a python library on top of calcite? Something
             | else?
        
           | leoh wrote:
           | This is nonsense and the person who was advocating for
           | implementation-aware transpilation is completely wrong.
           | 
           | It is the role of a database engine implementation to service
           | useful queries. Sometimes those queries are very complex; and
           | yes, pipe syntax allows users to more easily write queries
           | that can challenge database performance.
           | 
           | Yet this totally misses the point. Technologies like LookML
           | have long allowed for extremely useful yet complicated
           | queries to be executed by a database engine (often against
           | one that is exceptional at complex analytical queries, like
           | BigQuery) with high performance.
           | 
           | We should never handicap a SQL user's ability to do what they
           | want in the most ergonomic way. Transpilation largely allows
           | for this -- and I am 100% certain that implementations that
           | allow for pipe syntax will effectively merely transform an
           | AST with pipes to an existing kind of AST (this is, of
           | course, effectively transpilation).
           | 
           | It is the job of the database engine, not the one querying
           | (within reason) to performantly execute a query -- and ditto
           | for a pipe->non-pipe SQL transpiler. If you disagree with
           | this, you are ignoring the fact that BigQuery, Snowflake,
           | Redshift, and other optimized analytical databases are
           | collectively worth billions, largely for their ability to
           | performantly service crazy yet useful queries.
        
       | beoberha wrote:
       | Somehow Microsoft (my employer) has totally failed in
       | evangelizing Kusto to the masses. KQL is by far the best query
       | language you'll ever use.
       | 
       | This syntax from Google is nice but it's still just SQL.
        
         | summarity wrote:
         | Plus one for Kusto. I've never had this much success teaching
         | people data analysis. It's good for simple obvious and powerful
         | composed queries. Though I work at GitHub so still too close to
         | the mothership for an unbiased opinion.
         | 
         | There's some Apache energy around KQL though and a few open
         | source parsers so there's hope yet.
        
         | default-kramer wrote:
         | I remain unimpressed by KQL. Comparing SQL to KQL is
         | approximately like comparing Java to C#. Yeah it's better in
         | many ways, but at the end of the day it doesn't make a huge
         | difference. I want to go from Java to Lisp.
         | 
         | Consider how many of these complaints are equally valid against
         | KQL: https://www.scattered-thoughts.net/writing/against-sql/
        
       | schultzer wrote:
       | When I first started working in Elixir I loved Ecto and the pipe
       | syntax, but now I want a SQL sigil, that wont care about where I
       | put my where or from clauses.
       | 
       | Why are SQL parsers even concerned with that? Isn't that why we
       | have RD parser.
        
       | mwexler wrote:
       | Though not a pipe syntax, the Malloy language has some similar
       | analytic sugar in its syntax.
       | 
       | Malloy is from Lloyd Tabb, a co-founder of Looker.
       | 
       | https://www.malloydata.dev/
        
       | mccanne wrote:
       | Really cool though typing ">" after "|" is a pain
       | https://github.com/brimdata/super/blob/main/docs/language/pi...
        
       | willvarfar wrote:
       | When the pipe syntax was first published last year the SQLite
       | team did a quick hack to try it out.
       | 
       | https://sqlite.org/forum/forumpost/5f218012b6e1a9db
       | 
       | (Note that the fiddle linked no longer supports the syntax)
       | 
       | It is very interesting that they found it completely unnecessary
       | to actually require the pipe characters. The grammar works when
       | the pipe characters are optional.
       | 
       | And, imo, looks a lot better!
        
         | specialist wrote:
         | Good. I vote for omitting the pipe lexemes. I hate them.
         | 
         | Of the alternatives linked (elsethread), PSQL is the least bad.
         | I'm ambivalent about the curly brackets; will have to ponder.
         | 
         | Malloy would be improved by omitting the superfluous (JSON-
         | esque) semicolons.
        
       | ajfriend wrote:
       | I have project that's still very much at the experimental stage,
       | where I try to get something similar to this pipe syntax by
       | allowing users to chain "SQL snippets" together. That is, you can
       | use standalone statements like `where col1 > 10` because the
       | `select * from ...` is implied.
       | https://ajfriend.github.io/duckboat/                   import
       | duckboat as uck              csv = 'https://raw.githubusercontent
       | .com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'
       | uck.Table(csv).do(             "where sex = 'female' ",
       | 'where year > 2008',             'select *, cast(body_mass_g as
       | double) as grams',             'select species, island,
       | avg(grams) as avg_grams group by 1,2',             'select *
       | replace (round(avg_grams, 1) as avg_grams)',             'order
       | by avg_grams',         )
       | 
       | I still can't tell if it's too goofy, or if I really like it. :)
       | 
       | I write a lot of SQL anyway, so this approach is nice in that I
       | find I almost never need to look up function syntax like I would
       | with Pandas, since it is just using DuckDB SQL under the hood,
       | but removing the need to write `select * from ...` repeatedly.
       | And when you're ready to exit the data exploration phase, its
       | easy to gradually translate things back to "real SQL".
       | 
       | The whole project is pretty small, essentially just a light
       | wrapper around DuckDB to do this expression chaining and lazy
       | evaluation.
        
       | Xmd5a wrote:
       | https://github.com/seancorfield/honeysql#vanilla-sql-clause-...
       | (-> (select :a :b :c)             (from :foo)             (where
       | [:= :foo.a "baz"]))         => {:select [:a :b :c] :from [:foo]
       | :where [:= :foo.a "baz"]}
       | 
       | Effort: zero (0). That's what "simple made easy" is about.
        
       | jakozaur wrote:
       | The SQL with pipe syntax is also been implemented at Databricks
       | since Jan 30, 2025: https://docs.databricks.com/en/sql/language-
       | manual/sql-ref-s...
       | 
       | Still, the best is yet to come. Previously, SQL extensions were a
       | pain. There was no good place, and table-value functions were a
       | mess.
       | 
       | Now, it would be possible to have higher-order functions such as
       | enrichment, predictions, grouping or other data contracts.
       | Example:                 FROM orders       |> WHERE order_date >=
       | '2024-01-01'       |> AGGREGATE SUM(order_amount) AS total_spent
       | GROUP BY customer_id       |> WHERE total_spent > 1000       |>
       | INNER JOIN customers USING(customer_id)       |> CALL
       | ENRICH.APOLLO(EMAIL > customers.email)       |> AGGREGATE
       | COUNT(*) high_value_customer GROUP BY company.country
       | 
       | This may be called one SQL to determine distinct e-mail domains,
       | then prepare an enriching dataset and later execute the final SQL
       | with JOIN.
       | 
       | Iterative SQL with pipes may also work better with GenAI.
        
         | memhole wrote:
         | Thanks for sharing. I wasn't aware they pushed that out. The
         | ordering of this makes so much more sense. My only real
         | concern, is that I think CTEs are so common place and part of
         | ANSI SQL that you'll see people trade a standard for this. But,
         | I also gripe that Snowflake uses // for comments and people get
         | confused when their code doesn't work on another DB. Or
         | Oracles' join syntax is another example.
        
       | lordofgibbons wrote:
       | I love it. Why did it take us so long to make SQL easier to
       | reason about?
       | 
       | Hopefully, it gets adopted in Postgresql too.
        
       | data-ottawa wrote:
       | I've been using this for a bit more than a week already, I would
       | say this is a great feature for iteration and interactive
       | queries. Here's my review:
       | 
       | Doing data exploration, analysis, and cleaning, this is way more
       | productive than just SQL. As an example, fusing the aggregate
       | functions with the group by keys creates a much faster workflow.
       | I like that it unifies WHERE/HAVING/QUALIFY, and the
       | set/extend/drop functions help (though Snowflake still wins for
       | being able to declare a column and use it in the same select).
       | Ultimately this gives me a one liner for situations where I'm
       | creating hard to name intermediate CTEs, and that's awesome!
       | 
       | Iterating on array columns or parsing non-trivial JSON is much
       | better with this syntax too.
       | 
       | This is a shift back to the data frame API of a few years ago,
       | but we're still missing typing helpers and support that data
       | frame APIs could provide. It would be nice to have a system like
       | this which plugs into language servers, or can list fields at
       | each step with a mouse over/cursor hover, and getting editor
       | completions (the BQ console does an okay job).
       | 
       | This syntax is great for DBT macros. You can just drop in entire
       | transforms without worrying about polluting the query namespace
       | or working around existing columns on your reference tables.
       | 
       | There's a dark side to this syntax. The imperative coding style
       | this comes with a tradeoff that the reader needs to track
       | internal state through many steps. It's the same reason why
       | SELECT * is often not recommended.
       | 
       | As a best practice I like to throw a `|> SELECT X,Y,Z` at the end
       | of these pipe blocks to reinforce to the reader what the output
       | looks like.
       | 
       | I should mention that it's not as portable, but frankly all the
       | major DBs aren't portable, and other DBs also now support this
       | syntax.
       | 
       | tl;dr: I like this feature, but use it sparingly. Avoid overuse
       | in model files, definitely lean on it in analyses.
        
         | eb0la wrote:
         | Just a question: if you're adding stuff at the end of the
         | query... does the previous steps get cached?
         | 
         | I mean, if you type first: FROM customers |> WHERE signup_date
         | >= '2024-01-01T00:00:00Z' And then you add (and execute) |>
         | COUNT() as tally GROUP BY CHANNEL
         | 
         | Does the first expression get cached by BQ?
        
       | perdomon wrote:
       | This is great. It's a more linear workflow that mimics how we
       | filter data in our minds. How long until it's available in MySQL
       | (never, probably)
        
       | dangoodmanUT wrote:
       | IMO pipe syntax is so much better. Not necessarily this version,
       | but in general
        
       | gigatexal wrote:
       | I think the syntax is awesome. Too bad I'm using snowflake at the
       | $DAYJOB
        
       | tqi wrote:
       | If this helps people write queries more easily, I think that is
       | unequivocally a great thing and have no reservations.
       | 
       | Personally, I continue to prefer CTEs because it allows me to
       | write queries in a more modular way, and in conjunction with
       | descriptive names helps me to keep less context in my head when
       | working out the logic. When I look at a pipe syntax query, it
       | almost feels like the difference between reading a long paragraph
       | vs bullet points?
        
         | jeffbee wrote:
         | "CTE is a progressive neurodegenerative disease that develops
         | after repeated head injuries, such as concussions"
        
           | throw-qqqqq wrote:
           | First hit on Google for "sql cte" gives me: CTE (Common Table
           | Expression) and this page: https://www.geeksforgeeks.org/cte-
           | in-sql/
        
           | scarmig wrote:
           | For anyone who doesn't know, CTE = common table expression,
           | which is a part of a query representing a temporary result
           | set that can be referenced from other parts of a query.
        
         | setr wrote:
         | I don't think this impacts your usage of CTEs; it's a
         | reorganization of the SELECT statement, and presumably enables
         | the use of multiple operators and ordering of the SELECT
         | clauses (eg you can have multiple WHERE clauses and it'll do
         | the right thing) and unifying the clauses (WHERE/HAVING/QUALIFY
         | is now simply WHERE for all 3 cases)
         | 
         | You'd still be using CTEs on top of this to deal with
         | deduplicating multiple SELECT/SUBSELECTs
        
           | tqi wrote:
           | Interesting - the examples I've seen seem to have multiple
           | layers of FROM > WHERE > AGGREGATE > WHERE > JOIN...[1],
           | where the goal seems to be to remove the need to have nested
           | CTEs?
           | 
           | [1] https://news.ycombinator.com/item?id=43037330
        
             | setr wrote:
             | I mean that pattern would be a single select statement,
             | just using the psychotic inconsistencies of normal SQL
             | SELECT         FROM         WHERE         GROUP BY
             | HAVING         JOIN (             SELECT             FROM
             | GROUP BY             )         ON
             | 
             | Only reason to use a CTE here would be that JOIN-SELECT is
             | ugly, so it competes in that sense, but the usual usage of
             | deduplication-queries-without-making-views isn't resolved
             | by the new syntax. So you'd still be using CTEs in either
             | syntax.
             | 
             | If you imagine it in normal code, the pipe syntax is just a
             | function call chain, and CTEs are variables. You can mix
             | and match as needed for clarity. And you can define your
             | CTE using the function-call chain
        
               | tqi wrote:
               | So for me, the example query in pipe syntax (up to the
               | enrich:                 FROM orders       |> WHERE
               | order_date >= '2024-01-01'       |> AGGREGATE
               | SUM(order_amount) AS total_spent GROUP BY customer_id
               | |> WHERE total_spent > 1000       |> INNER JOIN customers
               | USING(customer_id)       |> CALL ENRICH.APOLLO(EMAIL >
               | customers.email)       |> AGGREGATE COUNT(*)
               | high_value_customer GROUP BY company.country
               | 
               | is easier to parse (for me) as something like:
               | WITH         customer_order_totals AS (           SELECT
               | customer_id, SUM(order_amount) AS total_spent
               | FROM orders            GROUP BY 1         )
               | ,enriched_customers AS (           SELECT
               | cus.customer_id, ENRICH.APOLLO(EMAIL > cus.email)
               | enrichment             FROM customer cus         )
               | SELECT ec.country, COUNT(*) high_value_customer
               | FROM customer_order_totals cot         JOIN
               | enriched_customers ec           ON ec.customer_id =
               | cot.customer_id        WHERE cot.total_spent > 1000
               | GROUP BY ec.country
               | 
               | largely because I can understand the components in
               | isolation first, before bringing together. yes, it's more
               | verbose, but I am of the opinion that the brevity comes
               | at the expense of having to keep more information in my
               | working memory.
        
       | tdfirth wrote:
       | It should have always worked this way. Without this feature you
       | take the algebra out of relational algebra. That's the root of
       | most of the composition issues in SQL.
       | 
       | Sadly it's a few decades too late though, and sadly this just
       | fragments the "ecosystem" further.
        
       | ralmidani wrote:
       | The first time I looked at Elixir, I thought it was hideous (I
       | was really into Python and Ruby, and still admire them while
       | liking Elixir much more). But I kept hearing about the virtues of
       | Elixir (and of the BEAM) and coming back. Finally, it clicked,
       | and I'm now in my 3rd year of doing Elixir full-time. I hope to
       | never again use a language that doesn't have pipes.
       | 
       | The moral of the story? Let's give this new SQL syntax a chance.
        
       | fforflo wrote:
       | I'm all for such an approach, but I also like Firebolt's approach
       | with Lambda functions focusing on arrays. [0]
       | 
       | [0] https://www.firebolt.io/blog/sql-thinking-in-lambdas
        
       | ejcx wrote:
       | We made pql.dev that works with the different sql syntaxes by
       | translating kusto like queries to sql (using CTE). It's worked
       | really well thusfar and I wish someone would make a standard
       | pipelined query language that gets supported across most
       | databases
       | 
       | I know prql exists, but the syntax is pretty cumbersome and not
       | something I enjoyed writing, but I do understand why folks would
       | gravitate towards it
        
       | xiphias2 wrote:
       | One great feature of the SQL pipe syntax is that it works much
       | better together with programming languages (generating, or
       | modifying existing SQL queries).
       | 
       | It would be great to have it standardized fast and implemented
       | everywhere.
        
       | zendist wrote:
       | This reminds me a bit of MSFTs Kusto language. Such an immensely
       | useful way to slice and dice large amounts of structured data.
        
       | dnst wrote:
       | It seems unfortunate that InfluxDB had the pipe operator with
       | flux in version 2 and dropped it the new version due to low
       | adoption. Now it seems to become more popular.
        
       | code_runner wrote:
       | I appreciate that somebody somewhere may appreciate and enjoy
       | this, but I am not that person. I love SQL. I have always loved
       | SQL. I know why others don't, but I do and its beautiful.
        
       | eb0la wrote:
       | First thing I thought is "this is a SELECT * in disguise", which
       | is something that you SHOULD never do in BigQuery. If you can
       | combine it with CTEs, seems good because it adds legibility for
       | complex queries. Also looks easier for the planner. Not for the
       | optimizer, though.
        
       ___________________________________________________________________
       (page generated 2025-02-13 23:00 UTC)