[HN Gopher] SQL pipe syntax available in public preview in BigQuery
___________________________________________________________________
SQL pipe syntax available in public preview in BigQuery
Author : marcyb5st
Score : 183 points
Date : 2025-02-10 10:38 UTC (3 days ago)
(HTM) web link (cloud.google.com)
(TXT) w3m dump (cloud.google.com)
| whalesalad wrote:
| Who invented pipe syntax? Elixir? Or is there another runtime or
| language that made this prevalent beforehand?
| ellisv wrote:
| The |> syntax? Perhaps F#. Pipes have been around since at
| least Unix v3.
| _huayra_ wrote:
| Many such things from F# come from Haskell (given the
| influence from MSR UK, specifically Simon Peyton Jones before
| he left a few years ago), so likely Haskell or earlier imo
| (at least in terms of "complex" "pipe" operations that have
| higher-level operations than "bytes in, bytes out" of Unix).
| ubertaco wrote:
| Worth noting that F# started out life as an implementation
| of OCaml for the .NET runtime [1], so most likely the pipe
| syntax was taken from there, although the pipeline-of-
| functions construction is much older than that [2]
|
| [1] https://softwareengineering.stackexchange.com/questions
| /2099...
|
| [2] https://retrocomputing.stackexchange.com/questions/1733
| 5/wha...
| grndn wrote:
| OCaml took the '|>' pipe symbol from F#. And F# was the
| language that made the '|>' pipe symbol popular in
| mainstream programming (as opposed to the unix '|' pipe
| symbol), afaik. According to Don Syme, it was used in F#
| in 2003 (see "Early History of F#", section 9.1, [1]
| which references [2]).
|
| Here's his full comment:
|
| /quote
|
| Despite being heavily associated with F#, the use of the
| pipeline symbol in ML dialects actually originates from
| Tobias Nipkow, in May 1994 (with obvious semiotic
| inspiration from UNIX pipes) [archives 1994; Syme 2011].
|
| ... I promised to dig into my old mail folders to uncover
| the true story behind |> in Isabelle/ML, which also
| turned out popular in F#...
|
| In the attachment you find the original mail thread of
| the three of us [ Larry Paulson; Tobias Nipkow; Marius
| Wenzel], coming up with this now indispensable piece of
| ML art in April/May 1994. The mail exchange starts as a
| response of Larry to my changes.
|
| ...Tobias ...came up with the actual name |> in the
| end...
|
| /endquote
|
| Haskell has had "$" or "backwards pipe" for ages, but
| that is just another way of doing function application
| and it does not feel the same as (and is not used the
| same way as) the unix-style piping paradigm.
|
| [1] https://fsharp.org/history/hopl-final/hopl-fsharp.pdf
|
| [2] https://web.archive.org/web/20190217164203/https://bl
| ogs.msd...
| xnx wrote:
| Originally? Douglas McIlroy for Unix in 1973:
| https://en.wikipedia.org/wiki/Pipeline_(Unix)
| jmount wrote:
| I agree. And then functional languages pick up the pipe, as
| it is roughly "poor man's method chaining" (so the object
| oriented languages don't need it).
| nothrabannosir wrote:
| Shell would at least qualify as prior art, I'd assume
| grndn wrote:
| Elixir took the pipe symbol from F# (See Jose's comment at
| [1]). For more details see my other comment below.
|
| [1] https://elixirforum.com/t/which-language-first-introduced-
| th...
| pjmlp wrote:
| Ever since using Kusto Query Language, that I look forward to SQL
| getting something like this, maybe there is hope if enough DBs
| have it as an extension.
| DarkCrusader2 wrote:
| I heavily use KQL on a daily basis and want nothing more than
| using it for Postgres and SQlite as well.
| Taikonerd wrote:
| This is so exciting. I hope Postgres and MySQL get some level of
| support for it, too.
| notpushkin wrote:
| PRQL is a similar idea, and it compiles to SQL: https://prql-
| lang.org/ from invoices filter
| invoice_date >= @1970-01-16 derive {
| transaction_fees = 0.8, income = total -
| transaction_fees } filter income > 1
| mritchie712 wrote:
| there's an duckdb extension[0] for it too
|
| 0 -
| https://duckdb.org/community_extensions/extensions/prql.html
| throwaway127482 wrote:
| I think I like this new pipe syntax a lot better than PRQL.
| Feels like it has the potential to become officially supported
| by DBs like MySQL since it's a relatively small change to
| existing SQL syntax.
| gorkaerana wrote:
| I'm tempted to give it a try just for the "For HackerNews
| enthusiasts" section of their landing page, which states "The
| PRQL compiler is written in Rust" and "We talk about
| "orthogonal language features" a lot". Love me some banter in
| technical documentation :D
| benrutter wrote:
| I love the design behind PRQL, was a little dissapointed when I
| tried using it with an MSSQL server and found there was no
| decent way to filter for `LIKE '%something%'`.
|
| (PRQL uses regex for stuff like this, which sounds great but
| then isn't supported on some databases like MSSQL)
| default-kramer wrote:
| I'm very surprised to learn that PRQL does not natively
| support `like`, but you can add it yourself: https://github.c
| om/PRQL/prql/issues/1123#issuecomment-135385...
| promiseofbeans wrote:
| Huh, it's super cool they have a built-in way of extending
| to support missing engine features!
| e1g wrote:
| "Malloy" is one one in the same spirit, but probably more
| comprehensive https://github.com/malloydata/malloy
| snthpy wrote:
| SQL pipe syntax is a great step in the right direction but what
| I think sets PRQL apart (in my very biased opinion,
| contributor) is the ability to define functions.
|
| Here's a simple example: # define the
| "take_smallest" function let take_smallest = func n col
| tbl<relation> -> ( from tbl sort col
| take n ) # find smallest 3 tracks by
| milliseconds from tracks take_smallest 3
| milliseconds
|
| You can try this now in the online playground: https://prql-
| lang.org/playground/
|
| That's simple enough and there's not that much gained there but
| say you now want to find the 3 smallest tracks per album by
| bytes?
|
| That's really simple in PRQL and you can just reuse the
| "take_smallest" function and pass a different column name as an
| argument: from tracks group album_id
| ( take_smallest 3 bytes )
| snthpy wrote:
| Here's a more complex example for sessionizing user events:
| # specify the target SQL dialect prql
| target:sql.duckdb # define the "sessionize"
| function let sessionize = func user_col date_col
| max_gap:365 tbl<relation> -> ( from tbl
| group user_col ( window rows:-1..0 (
| sort date_col derive prev_date=(lag 1
| (date_col|as date)) ) )
| derive { date_diff = (date_col|as date) -
| prev_date, is_new_session = case [date_diff >
| max_gap || prev_date==null => 1, true => 0], }
| window rows:..0 ( group user_col (
| sort {date_col} derive user_session_id = (sum
| is_new_session) ) sort {user_col,
| date_col} derive global_session_id = (sum
| is_new_session) ) select !{prev_date,
| date_diff, is_new_session} ) # main
| query from invoices select {customer_id,
| invoice_date} sessionize customer_id invoice_date
| max_gap:365 sort {customer_id, invoice_date}
|
| You can also try that in the playground: https://prql-
| lang.org/playground/
| ashu1461 wrote:
| In general in languages like PRQL / Pipe syntax, though
| conciseness is a big advantage, do they help in query
| optimisation in any regard ?
| digikata wrote:
| IIRC the SQL pipe google paper references PRQL. I think both
| make sense in their own context. SQL pipe is a smaller leap for
| users familiar with SQL and prql sort of takes on more change.
| I do wonder if the smaller leap might cause more deja-vu
| displacement initially than a bigger shift like PRQL. I don't
| know the answer other than look at users of both over time and
| see how they do with the different syntaxes.
| Taikonerd wrote:
| Relatedly, PRQL is a lovely pipe-oriented syntax for SQL DBs:
| https://prql-lang.org/
|
| Their syntax is a lot cleaner, because it's a _new_ language --
| it 's not backwards-compatible with SQL. But then again, they
| don't have the resources of Google behind them.
| systems wrote:
| if its not backward compatible with SQL, is not pipe syntax for
| SQL
|
| its just a new Query language
|
| most DBMS have proprietary extension to SQL, its messy, real
| success would be adding the new syntax to the standard and have
| implemented across multiple DBMSs
| Taikonerd wrote:
| Well, OK. PRQL is "for SQL DBs" in the sense that you can
| translate it to SQL on the client side, and then execute it
| on regular Postgres, MySQL, etc.
| RyanHamilton wrote:
| That's exactly what qstudio offers:
| https://www.timestored.com/qstudio/prql-ide a desktop
| client that translates prql to run against many databases.
| epistasis wrote:
| This is beautiful.
|
| I first encountered this style of data manipulation in R's
| tidyverse. Some say C# (or F#?)had similar pipes and influences,
| but I haven't seen specifics.
|
| Some other data libraries like Polars have sort of similar parse
| trees, but they also have lots of extra cruft because of
| limitations to the underlying language. Python's functional
| calling semantics are extremely powerful, but not quite as
| powerful as R's.
| jddj wrote:
| The C# and F# bit is LINQ (language integrated query) and it's
| a lovely piece of engineering.
| ziml77 wrote:
| The LINQ syntax was an interesting experiment but the value
| of that syntax always felt minimal to me. Like it looked out
| of place in the middle of my C# code, so instead I just use
| the extension methods.
| neonsunset wrote:
| It grows on you after you start using F# too
| nicoritschel wrote:
| Also is in DuckDB https://github.com/ywelsch/duckdb-psql
| conradev wrote:
| and SQLite is waiting on Postgres:
| https://news.ycombinator.com/item?id=41385005
| jmull wrote:
| I like it, but I'm not sure it's worth the trouble of further
| expanding an ever-ballooning extended family of SQL syntaxes.
|
| Of course, SQL has numerous issues, both in an absolute sense and
| relative to what we've come to expect. And the obvious course of
| action in each individual case is to add syntax to support the
| missing feature.
|
| But as you keep adding syntax, SQL gets more and more complex,
| both in the individual variants and due to the complexities of
| multiple variants with different syntax support at different
| times.
|
| A transpiling approach makes more sense to me. I'd like the sql
| implementors to focus on source maps and other things to better
| support plugging in external, alternate syntaxes (is a
| standardized mechanism too much to ask for?).
|
| Then individual projects/people/efforts can choose the SQL syntax
| variant that works for them, and it can be one that evolves
| separate from the host db.
| _dark_matter_ wrote:
| There are a lot of transpilers, and they don't get much
| adoption because the people implementing them are not the
| people implementing the database.
|
| The database and the query language are tightly coupled. The
| latter determines the kinds of optimizations that the former
| can do, and a transpiled language means that optimization info
| needs to be passed through in weird and inefficient ways; it
| also means that obvious optimizations aren't possible, because
| the people building the language are not building the db.
|
| If you've ever tried used query hints, or deeply inspected a
| distributed query plan, you'll know exactly what I mean.
| 9rx wrote:
| _> A transpiling approach makes more sense to me._
|
| That's the solution we've been using, oft referred to as ORM,
| to work around the lack of this syntax. But then you had to
| live in a world where you had an almost SQL, but not really,
| language that doesn't always quite map perfectly to SQL and all
| the pain to go along with that.
|
| Now instead of that song and dance you can compose queries with
| plain SQL and simple string concatenation. That is a huge win,
| even if troubling that it has taken so long (and still not
| widely supported).
| jmull wrote:
| Generally, ORM's try to present a high-level abstraction of
| your data and data model than the engine does natively.
|
| That makes it the ORM's responsibility to implement the
| abstraction in a good way. Problems come in when the "good"
| way depends on factors the ORM doesn't consider or expose
| well.
|
| I actually mean transpiler in a mainly syntactic sense, and
| would want it to avoid higher-level abstractions (after all,
| as you point out, ORMs have that ground covered).
| 9rx wrote:
| While that is closer to ORM in the classical sense, the
| modern use - and why I used 'oft referred to' - is more
| accurately a query builder, but get called ORM regardless.
| The reason people reach for this type of tool is because
| there hasn't been a good solution to this with SQL proper,
| necessitating another language. Until now.
| harrall wrote:
| Plain transpiling won't work because when you write a SQL
| query, you have to know what engine is running it because they
| all do worse at some things than others.
|
| For example, the same result could be fetched using a
| correlated subquery, a common table expression, a temporary
| table, a view or very hackily using aggregation and it would
| depend on whether you were using SQL Server, Postgres, MySQL,
| or SQLite because they don't do it all fast. ...and you need to
| know the version.
|
| This might be able to be solved if the language was a brand new
| higher level one though and the compiler knew the intricacies
| of each engines.
| michaelmior wrote:
| It seems like this is solved by just having the transpiler
| aware of the target system.
| gav wrote:
| One example is LookML, which used to build semantic data
| models in an analytic layer:
| https://cloud.google.com/looker/docs/what-is-lookml
| anon84873628 wrote:
| LookML does seem to have invested a lot in compilation to
| different SQL backends, generally using the best syntax for
| each.
|
| Unfortunately the frontend is so tightly tied to the Looker
| BI stuff, and it can't really express arbitrary queries
| without going through lots of hassle.
|
| Its unclear what Google is doing with Looker. Its would be
| interesting to imagine what LookML would be like as an
| independent SQL tool chain.
| amcaskill wrote:
| The creator of LookML is actually working on that with a
| OSS project called Malloy.
|
| https://www.malloydata.dev/
| leoh wrote:
| >LookML does seem to have invested a lot in compilation
| to different SQL backends, generally using the best
| syntax for each.
|
| To some degree, yes. Yet far and away, users of Looker
| use engines like RedShift, BigQuery, and Snowflake
| because they're extremely effective at the types of
| queries that Looker sends at them -- not because Looker
| spends a huge number of hours optimizing for each engine
| (that's not to say none is done); these dbs are great at
| analytical queries.
|
| Looker in its earlier days (early/mid 2010s) took a bet
| on analytical database engines getting better as opposed
| to other technologies; for example, Tableau had its own
| datastore and largely did not "push queries down to the
| database" for execution. In the end, BigQuery was
| radically faster than SparkSQL and was compelling for
| customers, for example; it was not that Looker spent a
| ton of time optimizing BigQuery as opposed to SparkSQL.
|
| Source: I was an early engineer at Looker
| crazygringo wrote:
| It's also not just that -- it depends on the data.
|
| A database can change how it decides to execute a query based
| on whether a table has 100 rows or 10,000,000 rows.
| harrall wrote:
| Yeah that's true. Also, you, as the query writer, know how
| much the table has and will have and sometimes you decide
| what to do based on that information too.
| jmull wrote:
| Yes, any such transpiler would certainly need to be engine
| aware (for multiple reasons). Right off the bat, the output
| needs to be in a syntax the engine accepts, and there are all
| kinds of differences from engine to engine, version to
| version.
|
| And, as you say, if the language contains higher-level
| abstractions, it might need to work quite differently
| depending on the target engine.
|
| My thought is to avoid those higher-level abstractions as
| much as possible and just expose the capabilities of the
| engine as directly as possible, albeit with a different
| syntax. In my experience, developers who are willing and able
| to write SQL are fine with targeting a specific engine and
| optimizing to that. (Those that aren't get someone else to do
| it, or live with he sad consequences of dealing with an ORM.)
|
| To summarize:
|
| Normal Approach: you pick an engine, and get the syntax that
| comes with it. You need to know what the engine does well and
| doesn't do well. You write SQL accordingly, using the syntax
| the engine accepts.
|
| Transpiler Approach: you pick an engine, and independently
| choose a syntax. You still need to know what the engine does
| well and doesn't do well. You still write SQL accordingly,
| but using the syntax of the language you chose.
| rednafi wrote:
| I don't know if the transpiler approach really helps when
| you still have to be aware of the underlying database
| engine. I'm fine with choosing something like PostgreSQL
| and writing code optimized for its specific SQL dialect.
|
| But I'm not sure I want to deal with a transpiler that
| still suffers from a leaky abstraction. It starts to feel a
| lot like using an ORM. ORMs promise database-agnostic
| syntax across different engines, but that promise falls
| apart when you need to optimize a query and suddenly
| realize you must account for engine-specific features--like
| PostgreSQL's BRIN indexes.
| jmull wrote:
| To me, abstracting away the engine is not a goal.
| (Generally, it's pointless -- the various engines work in
| different ways and offer different features. While
| there's a bunch of stuff that widely common, that's also
| the stuff you don't really need to abstract.)
|
| The point of transpiling is to allow you to pick a syntax
| _independent_ of the engine. You still have to pick the
| engine and will deal with its peculiarities.
|
| Today's post is about pipe syntax, which is syntax.
| Yesterday's was about trailing commas, which is syntax. I
| think there's an appetite for this.
| jamesblonde wrote:
| We transpile from Python to SQL using Calcite. Calcite works
| well as an engine that knows enough SQL dialects to be
| practical.
| leoh wrote:
| +1 for Calcite. Any more details on what you're doing? Are
| you using a python library on top of calcite? Something
| else?
| leoh wrote:
| This is nonsense and the person who was advocating for
| implementation-aware transpilation is completely wrong.
|
| It is the role of a database engine implementation to service
| useful queries. Sometimes those queries are very complex; and
| yes, pipe syntax allows users to more easily write queries
| that can challenge database performance.
|
| Yet this totally misses the point. Technologies like LookML
| have long allowed for extremely useful yet complicated
| queries to be executed by a database engine (often against
| one that is exceptional at complex analytical queries, like
| BigQuery) with high performance.
|
| We should never handicap a SQL user's ability to do what they
| want in the most ergonomic way. Transpilation largely allows
| for this -- and I am 100% certain that implementations that
| allow for pipe syntax will effectively merely transform an
| AST with pipes to an existing kind of AST (this is, of
| course, effectively transpilation).
|
| It is the job of the database engine, not the one querying
| (within reason) to performantly execute a query -- and ditto
| for a pipe->non-pipe SQL transpiler. If you disagree with
| this, you are ignoring the fact that BigQuery, Snowflake,
| Redshift, and other optimized analytical databases are
| collectively worth billions, largely for their ability to
| performantly service crazy yet useful queries.
| beoberha wrote:
| Somehow Microsoft (my employer) has totally failed in
| evangelizing Kusto to the masses. KQL is by far the best query
| language you'll ever use.
|
| This syntax from Google is nice but it's still just SQL.
| summarity wrote:
| Plus one for Kusto. I've never had this much success teaching
| people data analysis. It's good for simple obvious and powerful
| composed queries. Though I work at GitHub so still too close to
| the mothership for an unbiased opinion.
|
| There's some Apache energy around KQL though and a few open
| source parsers so there's hope yet.
| default-kramer wrote:
| I remain unimpressed by KQL. Comparing SQL to KQL is
| approximately like comparing Java to C#. Yeah it's better in
| many ways, but at the end of the day it doesn't make a huge
| difference. I want to go from Java to Lisp.
|
| Consider how many of these complaints are equally valid against
| KQL: https://www.scattered-thoughts.net/writing/against-sql/
| schultzer wrote:
| When I first started working in Elixir I loved Ecto and the pipe
| syntax, but now I want a SQL sigil, that wont care about where I
| put my where or from clauses.
|
| Why are SQL parsers even concerned with that? Isn't that why we
| have RD parser.
| mwexler wrote:
| Though not a pipe syntax, the Malloy language has some similar
| analytic sugar in its syntax.
|
| Malloy is from Lloyd Tabb, a co-founder of Looker.
|
| https://www.malloydata.dev/
| mccanne wrote:
| Really cool though typing ">" after "|" is a pain
| https://github.com/brimdata/super/blob/main/docs/language/pi...
| willvarfar wrote:
| When the pipe syntax was first published last year the SQLite
| team did a quick hack to try it out.
|
| https://sqlite.org/forum/forumpost/5f218012b6e1a9db
|
| (Note that the fiddle linked no longer supports the syntax)
|
| It is very interesting that they found it completely unnecessary
| to actually require the pipe characters. The grammar works when
| the pipe characters are optional.
|
| And, imo, looks a lot better!
| specialist wrote:
| Good. I vote for omitting the pipe lexemes. I hate them.
|
| Of the alternatives linked (elsethread), PSQL is the least bad.
| I'm ambivalent about the curly brackets; will have to ponder.
|
| Malloy would be improved by omitting the superfluous (JSON-
| esque) semicolons.
| ajfriend wrote:
| I have project that's still very much at the experimental stage,
| where I try to get something similar to this pipe syntax by
| allowing users to chain "SQL snippets" together. That is, you can
| use standalone statements like `where col1 > 10` because the
| `select * from ...` is implied.
| https://ajfriend.github.io/duckboat/ import
| duckboat as uck csv = 'https://raw.githubusercontent
| .com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'
| uck.Table(csv).do( "where sex = 'female' ",
| 'where year > 2008', 'select *, cast(body_mass_g as
| double) as grams', 'select species, island,
| avg(grams) as avg_grams group by 1,2', 'select *
| replace (round(avg_grams, 1) as avg_grams)', 'order
| by avg_grams', )
|
| I still can't tell if it's too goofy, or if I really like it. :)
|
| I write a lot of SQL anyway, so this approach is nice in that I
| find I almost never need to look up function syntax like I would
| with Pandas, since it is just using DuckDB SQL under the hood,
| but removing the need to write `select * from ...` repeatedly.
| And when you're ready to exit the data exploration phase, its
| easy to gradually translate things back to "real SQL".
|
| The whole project is pretty small, essentially just a light
| wrapper around DuckDB to do this expression chaining and lazy
| evaluation.
| Xmd5a wrote:
| https://github.com/seancorfield/honeysql#vanilla-sql-clause-...
| (-> (select :a :b :c) (from :foo) (where
| [:= :foo.a "baz"])) => {:select [:a :b :c] :from [:foo]
| :where [:= :foo.a "baz"]}
|
| Effort: zero (0). That's what "simple made easy" is about.
| jakozaur wrote:
| The SQL with pipe syntax is also been implemented at Databricks
| since Jan 30, 2025: https://docs.databricks.com/en/sql/language-
| manual/sql-ref-s...
|
| Still, the best is yet to come. Previously, SQL extensions were a
| pain. There was no good place, and table-value functions were a
| mess.
|
| Now, it would be possible to have higher-order functions such as
| enrichment, predictions, grouping or other data contracts.
| Example: FROM orders |> WHERE order_date >=
| '2024-01-01' |> AGGREGATE SUM(order_amount) AS total_spent
| GROUP BY customer_id |> WHERE total_spent > 1000 |>
| INNER JOIN customers USING(customer_id) |> CALL
| ENRICH.APOLLO(EMAIL > customers.email) |> AGGREGATE
| COUNT(*) high_value_customer GROUP BY company.country
|
| This may be called one SQL to determine distinct e-mail domains,
| then prepare an enriching dataset and later execute the final SQL
| with JOIN.
|
| Iterative SQL with pipes may also work better with GenAI.
| memhole wrote:
| Thanks for sharing. I wasn't aware they pushed that out. The
| ordering of this makes so much more sense. My only real
| concern, is that I think CTEs are so common place and part of
| ANSI SQL that you'll see people trade a standard for this. But,
| I also gripe that Snowflake uses // for comments and people get
| confused when their code doesn't work on another DB. Or
| Oracles' join syntax is another example.
| lordofgibbons wrote:
| I love it. Why did it take us so long to make SQL easier to
| reason about?
|
| Hopefully, it gets adopted in Postgresql too.
| data-ottawa wrote:
| I've been using this for a bit more than a week already, I would
| say this is a great feature for iteration and interactive
| queries. Here's my review:
|
| Doing data exploration, analysis, and cleaning, this is way more
| productive than just SQL. As an example, fusing the aggregate
| functions with the group by keys creates a much faster workflow.
| I like that it unifies WHERE/HAVING/QUALIFY, and the
| set/extend/drop functions help (though Snowflake still wins for
| being able to declare a column and use it in the same select).
| Ultimately this gives me a one liner for situations where I'm
| creating hard to name intermediate CTEs, and that's awesome!
|
| Iterating on array columns or parsing non-trivial JSON is much
| better with this syntax too.
|
| This is a shift back to the data frame API of a few years ago,
| but we're still missing typing helpers and support that data
| frame APIs could provide. It would be nice to have a system like
| this which plugs into language servers, or can list fields at
| each step with a mouse over/cursor hover, and getting editor
| completions (the BQ console does an okay job).
|
| This syntax is great for DBT macros. You can just drop in entire
| transforms without worrying about polluting the query namespace
| or working around existing columns on your reference tables.
|
| There's a dark side to this syntax. The imperative coding style
| this comes with a tradeoff that the reader needs to track
| internal state through many steps. It's the same reason why
| SELECT * is often not recommended.
|
| As a best practice I like to throw a `|> SELECT X,Y,Z` at the end
| of these pipe blocks to reinforce to the reader what the output
| looks like.
|
| I should mention that it's not as portable, but frankly all the
| major DBs aren't portable, and other DBs also now support this
| syntax.
|
| tl;dr: I like this feature, but use it sparingly. Avoid overuse
| in model files, definitely lean on it in analyses.
| eb0la wrote:
| Just a question: if you're adding stuff at the end of the
| query... does the previous steps get cached?
|
| I mean, if you type first: FROM customers |> WHERE signup_date
| >= '2024-01-01T00:00:00Z' And then you add (and execute) |>
| COUNT() as tally GROUP BY CHANNEL
|
| Does the first expression get cached by BQ?
| perdomon wrote:
| This is great. It's a more linear workflow that mimics how we
| filter data in our minds. How long until it's available in MySQL
| (never, probably)
| dangoodmanUT wrote:
| IMO pipe syntax is so much better. Not necessarily this version,
| but in general
| gigatexal wrote:
| I think the syntax is awesome. Too bad I'm using snowflake at the
| $DAYJOB
| tqi wrote:
| If this helps people write queries more easily, I think that is
| unequivocally a great thing and have no reservations.
|
| Personally, I continue to prefer CTEs because it allows me to
| write queries in a more modular way, and in conjunction with
| descriptive names helps me to keep less context in my head when
| working out the logic. When I look at a pipe syntax query, it
| almost feels like the difference between reading a long paragraph
| vs bullet points?
| jeffbee wrote:
| "CTE is a progressive neurodegenerative disease that develops
| after repeated head injuries, such as concussions"
| throw-qqqqq wrote:
| First hit on Google for "sql cte" gives me: CTE (Common Table
| Expression) and this page: https://www.geeksforgeeks.org/cte-
| in-sql/
| scarmig wrote:
| For anyone who doesn't know, CTE = common table expression,
| which is a part of a query representing a temporary result
| set that can be referenced from other parts of a query.
| setr wrote:
| I don't think this impacts your usage of CTEs; it's a
| reorganization of the SELECT statement, and presumably enables
| the use of multiple operators and ordering of the SELECT
| clauses (eg you can have multiple WHERE clauses and it'll do
| the right thing) and unifying the clauses (WHERE/HAVING/QUALIFY
| is now simply WHERE for all 3 cases)
|
| You'd still be using CTEs on top of this to deal with
| deduplicating multiple SELECT/SUBSELECTs
| tqi wrote:
| Interesting - the examples I've seen seem to have multiple
| layers of FROM > WHERE > AGGREGATE > WHERE > JOIN...[1],
| where the goal seems to be to remove the need to have nested
| CTEs?
|
| [1] https://news.ycombinator.com/item?id=43037330
| setr wrote:
| I mean that pattern would be a single select statement,
| just using the psychotic inconsistencies of normal SQL
| SELECT FROM WHERE GROUP BY
| HAVING JOIN ( SELECT FROM
| GROUP BY ) ON
|
| Only reason to use a CTE here would be that JOIN-SELECT is
| ugly, so it competes in that sense, but the usual usage of
| deduplication-queries-without-making-views isn't resolved
| by the new syntax. So you'd still be using CTEs in either
| syntax.
|
| If you imagine it in normal code, the pipe syntax is just a
| function call chain, and CTEs are variables. You can mix
| and match as needed for clarity. And you can define your
| CTE using the function-call chain
| tqi wrote:
| So for me, the example query in pipe syntax (up to the
| enrich: FROM orders |> WHERE
| order_date >= '2024-01-01' |> AGGREGATE
| SUM(order_amount) AS total_spent GROUP BY customer_id
| |> WHERE total_spent > 1000 |> INNER JOIN customers
| USING(customer_id) |> CALL ENRICH.APOLLO(EMAIL >
| customers.email) |> AGGREGATE COUNT(*)
| high_value_customer GROUP BY company.country
|
| is easier to parse (for me) as something like:
| WITH customer_order_totals AS ( SELECT
| customer_id, SUM(order_amount) AS total_spent
| FROM orders GROUP BY 1 )
| ,enriched_customers AS ( SELECT
| cus.customer_id, ENRICH.APOLLO(EMAIL > cus.email)
| enrichment FROM customer cus )
| SELECT ec.country, COUNT(*) high_value_customer
| FROM customer_order_totals cot JOIN
| enriched_customers ec ON ec.customer_id =
| cot.customer_id WHERE cot.total_spent > 1000
| GROUP BY ec.country
|
| largely because I can understand the components in
| isolation first, before bringing together. yes, it's more
| verbose, but I am of the opinion that the brevity comes
| at the expense of having to keep more information in my
| working memory.
| tdfirth wrote:
| It should have always worked this way. Without this feature you
| take the algebra out of relational algebra. That's the root of
| most of the composition issues in SQL.
|
| Sadly it's a few decades too late though, and sadly this just
| fragments the "ecosystem" further.
| ralmidani wrote:
| The first time I looked at Elixir, I thought it was hideous (I
| was really into Python and Ruby, and still admire them while
| liking Elixir much more). But I kept hearing about the virtues of
| Elixir (and of the BEAM) and coming back. Finally, it clicked,
| and I'm now in my 3rd year of doing Elixir full-time. I hope to
| never again use a language that doesn't have pipes.
|
| The moral of the story? Let's give this new SQL syntax a chance.
| fforflo wrote:
| I'm all for such an approach, but I also like Firebolt's approach
| with Lambda functions focusing on arrays. [0]
|
| [0] https://www.firebolt.io/blog/sql-thinking-in-lambdas
| ejcx wrote:
| We made pql.dev that works with the different sql syntaxes by
| translating kusto like queries to sql (using CTE). It's worked
| really well thusfar and I wish someone would make a standard
| pipelined query language that gets supported across most
| databases
|
| I know prql exists, but the syntax is pretty cumbersome and not
| something I enjoyed writing, but I do understand why folks would
| gravitate towards it
| xiphias2 wrote:
| One great feature of the SQL pipe syntax is that it works much
| better together with programming languages (generating, or
| modifying existing SQL queries).
|
| It would be great to have it standardized fast and implemented
| everywhere.
| zendist wrote:
| This reminds me a bit of MSFTs Kusto language. Such an immensely
| useful way to slice and dice large amounts of structured data.
| dnst wrote:
| It seems unfortunate that InfluxDB had the pipe operator with
| flux in version 2 and dropped it the new version due to low
| adoption. Now it seems to become more popular.
| code_runner wrote:
| I appreciate that somebody somewhere may appreciate and enjoy
| this, but I am not that person. I love SQL. I have always loved
| SQL. I know why others don't, but I do and its beautiful.
| eb0la wrote:
| First thing I thought is "this is a SELECT * in disguise", which
| is something that you SHOULD never do in BigQuery. If you can
| combine it with CTEs, seems good because it adds legibility for
| complex queries. Also looks easier for the planner. Not for the
| optimizer, though.
___________________________________________________________________
(page generated 2025-02-13 23:00 UTC)