hngopher.com

       [HN Gopher] Ask HN: Does anyone else think SQL needs help?
       ___________________________________________________________________
        
       Ask HN: Does anyone else think SQL needs help?
        
       I have been writing a decent amount of SQL for three years now, and
       while the paradigm is extremely powerful, the language lends itself
       to unmaintainable spaghetti code quickly.  Fundamentally, SQL lacks
       meaningful abstraction, and there's no sane way to package very
       common operations.  Say you want to find elements in a row that
       correspond to some maximum value when grouped by date. Today, you'd
       need to write something like this EVERY SINGLE TIME:  ``` SELECT
       sd1.sale_person_id, sd1.sale_person_name, sd1.no_products_sold,
       sd1.commission_percentage, sd1.sales_department FROM
       sales_department_details sd1 INNER JOIN (SELECT sale_person_id,
       MAX(no_products_sold) AS no_products_sold FROM
       sales_department_details GROUP BY sale_person_id) sd2 ON
       sd1.sale_person_id = sd2.sale_person_id AND sd1.no_products_sold =
       sd2.no_products_sold; ```  Wouldn't something like this be nicer?
       ``` SELECT sale_person_id, max(no_products_sold) as max_sold,
       link(sale_person_name, max_sold), FROM sales_department_details ```
       Frankly, it seems like some sort of macro system is needed. Perhaps
       the SQL compiles into the above?
        
       Author : fny
       Score  : 44 points
       Date   : 2022-09-11 14:31 UTC (8 hours ago)
        
       | spullara wrote:
       | I translated this into "I don't really know SQL but I think I
       | do".
        
       | croes wrote:
       | Wouldn't
       | 
       |  _SELECT * FROM
       | 
       | (SELECT DENSE_RANK() OVER ( PARTITION BY sd1.sale_person_id ORDER
       | BY sd1.no_products_sold DESC) r ,sd1.sale_person_id
       | ,sd1.sale_person_name ,sd1.no_products_sold
       | ,sd1.commission_percentage ,sd1.sales_department
       | 
       | FROM sales_department_details sd1) sd
       | 
       | Where sd.r = 1_
       | 
       | do the same without the inner join?
        
         | rjh29 wrote:
         | Yes, you actually know SQL, unlike other people who don't even
         | know window functions exist.
        
       | sph wrote:
       | The problem with SQL is that it's all about transforming data,
       | and that would better be modelled as a pipeline.
       | 
       | https://prql-lang.org/
        
       | CodesInChaos wrote:
       | You could take a look at EdgeDB. It's built on postgres, but uses
       | its own query language and data model. Its data model has build
       | in support for links and polymorphism. The query language EdgeQL
       | makes it easy to follow links and output embedded arrays.
        
         | sequence7 wrote:
         | Agreed, EdgeDB[0] looks to be trying to solve a lot of SQL's
         | perceived flaws with EdgeQL[1]. It's not the first attempt and
         | I suspect it won't be the last but composability is explicitly
         | one of the challenges it addresses.
         | 
         | [0] https://www.edgedb.com/ [1]
         | https://www.edgedb.com/showcase/edgeql
        
       | carabiner wrote:
       | There needs to be a Pandas to SQL translator.
        
       | TekMol wrote:
       | Put your example on https://www.db-fiddle.com/ or
       | http://sqlfiddle.com/ or some other sql fiddle then we can all
       | experiment with it and come up with leaner solutions.
       | 
       | Just asking for mooar build in stuff just leads to bloated
       | software.
        
       | SnowHill9902 wrote:
       | That's because you are 1) writing naive SQL and 2) forcing naive
       | SQL on an arbitrary data structure. You may use VIEWs, CTEs,
       | GROUP BY CUBE/ROLLUP for 1) and rethink your structures for 2).
        
       | alrlroipsp wrote:
       | > Frankly, it seems like some sort of macro system is needed.
       | 
       | Check out stored procedures and views.
       | 
       | Here's a great summary: https://stackoverflow.com/a/5195020
        
       | michael_j_x wrote:
       | Feel you. I've long moved to jupyter notebooks as my de-facto
       | database tool. I use python functions that generate the SQL,
       | which I then execute in jupyter and load the results into pandas.
        
         | mechanical_bear wrote:
         | I'm raising my eyebrow here as far I can. Care to share what
         | sort of use case this is for? Seems...elaborate.
        
       | bawolff wrote:
       | It sounds like you just reinvented VIEWs?
        
         | e12e wrote:
         | VIEWs and generated columns should be able to help with these
         | problems?
         | 
         | https://www.postgresql.org/docs/14/ddl-generated-columns.htm...
         | 
         | > A generated column is a special column that is always
         | computed from other columns. Thus, it is for columns what a
         | view is for tables.
        
           | RedShift1 wrote:
           | Unfortunately with Postgres, generated columns are always
           | stored on disk, so you lose some storage space.
        
             | bawolff wrote:
             | Finally, something for mariadb fans to hold on to in the
             | flame war.
        
               | srcreigh wrote:
               | My reason for liking MariaDB is that MySQL uses a B-Tree
               | for primary storage. In Postgres if you want a table with
               | large keys, you have to store them twice once in the
               | index once in primary storage (hash table). In MySQL
               | there's no hash table you basically just store the rows
               | in an index. Could be as much as 40% reduction in
               | storage... 400GB instead of 1TB.
               | 
               | Postgres does have index compression as of a recent
               | version, but I'm not sure how much that would improve
               | random keys.
        
               | RedShift1 wrote:
               | Haha I guess :-D. Though I have a workaround for
               | complicated generated columns that I don't wish to store:
               | using a function. So my select would look like:
               | SELECT a, b, something_complicated(table) FROM table.
               | 
               | Of course you have to remember to do that but hey it's
               | something.
        
       | PaulHoule wrote:
       | In the semantic web, RDF can put every table in the universe in a
       | unique namespace and OWL can bind together multiple tables and do
       | inference at a high level, then you can query it with SPARQL
       | which is quite similar to SQL at heart.
       | 
       | The Department of Defense was asked by Congress to answer "Where
       | does the money go?" and tried to use OWL and SPARQL queries
       | across all the relational databases they owned but they couldn't
       | get it to work.
       | 
       | I can't help but think something along those lines could present
       | as a 'low code' data tool.
       | 
       | You can accomplish something similar with views, foreign tables,
       | stored procedures, user defined functions, and other mechanisms
       | which are common in databases like PostgreSQL, Microsoft SQL,
       | etc.
       | 
       | I find PostgreSQL pretty handy because it supports embedded data
       | structures such as lists in columns, supports queries over JSON
       | documents in columns, etc.
       | 
       | My favorite DB for side projects is ArangoDB which uses
       | collections of JSON objects like tables and the obvious algebra
       | over them which lets you query and update like the best
       | relational dbs but you don't have the programmability of views,
       | stored procedures, etc.
        
         | shubb wrote:
         | I read this like "DoD tried to do quite a complicated semantic
         | layer over tables project but failed because the RDF cluster of
         | technologies is horrible. It would have been easier to code
         | from scratch with SQL and python".
         | 
         | I think you are pretty familiar with those technologies. Do you
         | think they have any value?
        
       | mamcx wrote:
       | Check my project:
       | 
       | https://tablam.org
       | 
       | ---
       | 
       | Exist 2 major ways to solve this: You do a transpiler (like most
       | ORM are, actually) or you go deeper and fix from the base.
       | 
       | The second could yield much better results. It will sound weird
       | at first, but not exist anything that block the idea of "a SQL"
       | language work for make the full app, with UI and all that.
       | 
       | SQL as-is is just too limited, in some unfortunate ways even for
       | the use-case of have a restricted language usable for ad-hoc
       | queries, but that is purely incidental.
       | 
       | The relational model is much more expressive than array,
       | functional models (because well, it can supersede/integrate both)
       | and with some extra adjustment you can get something like
       | python/ML that could become super-productive.
        
       | RedShift1 wrote:
       | If you make the joining keys have the same name, you can use
       | USING(col_foo) instead of table1.col_foo = table2.col_foo. It's
       | one of the reasons I always use tablename_id as primary key name
       | and foreign key name. Doesn't always work (for example 2 foreign
       | keys linking to the same table but represent something different
       | like created_by and modified_by).
        
       | cultofmetatron wrote:
       | Elixir's Ecto library basically lets you write sql in elixir
       | while abstracting away all the stuff op talking about.
       | 
       | you can simply assign a subquery to a variable and reuse it.
       | 
       | PS: a lot of you are saying use views. thats dangerous. In most
       | cases, thats a nice way to end up with ballooning autovacume
       | processes as your databasse gets bigger. Materialized views are
       | fine as long as your use case doesn't need realtime accuracy.
        
       | pgt wrote:
       | Learn Datalog Today: http://www.learndatalogtoday.org/
        
       | forinti wrote:
       | Frankly, the biggest problem with SQL is that not enough people
       | know it well (and a lot of people think they know it well).
        
         | Rodeoclash wrote:
         | Exactly. Even knowing a small bit of SQL (i.e. enough to know
         | the full scope of what SQL can give you so you grok the limits
         | of your ability) is a super power compared to using ORMs.
         | 
         | The right views, applied in the right places are like a super
         | power compared to trying to do the same thing in application
         | code.
        
         | jppope wrote:
         | Roughly would you consider your statement equal to "SQL is
         | designed for experts?" just curious
        
           | forinti wrote:
           | Not at all. I think it looks deceptively simple so
           | programmers don't waste too much time learning it properly.
           | But you can take up a good book (such as Joe Celko's) and set
           | up a good foundation in a short time.
        
             | aae42 wrote:
             | he seems to have done a whole series, which would you
             | recommend?
        
               | forinti wrote:
               | "SQL for Smarties" is a good book to learn the language
               | and "Puzzles and Answers" has a good collection of
               | problems/solutions for typical situations you might find.
        
         | randomdata wrote:
         | SQL is one of, if not the most, common programming language out
         | there. I agree it isn't known well, but it not lending itself
         | to being known well, and tricking people into thinking they
         | know it well, is a flaw of the language. This is a much bigger
         | problem. A good language would not lead people down that path.
         | It not being known well certainly is not for lack of exposure.
         | 
         | They used to say the problem with C (especially with respect to
         | memory issues) was that people didn't know it well, which was
         | also true to some extent, but then we realized that we could
         | improve language design to not leave people in that situation
         | in the first place. Something like Rust would have never
         | materialized if people simply took the attitude that you just
         | need to hunker down and know C harder.
         | 
         | SQL is ripe for a similar evolution, but such evolution is
         | difficult when the only reason we settled on SQL in the first
         | place was to be compatible with Oracle and Oracle has no reason
         | to change as long as the money keeps flowing in. The
         | competition was going down some other interesting paths for a
         | while. Even Postgres originally used QUEL, which is much closer
         | to Codd's original vision, instead of SQL but eventually had to
         | accept that Oracle won. But does Oracle need to win forever?
        
       | tacosbane wrote:
       | I think every dialect has an idiomatic way to do what you're
       | asking. e.g., in Snowflake it's `select * from
       | sales_department_details qualify no_products_sold =
       | max(no_products_sold) over (partition by sales_person_id)`, PSQL
       | `select distinct on (sales_person_id) * from
       | sales_department_details order by sales_person_id,
       | no_products_sold desc`, ...
        
         | hantusk wrote:
         | Postgres can do the"function over (partition by col_a order by
         | col_b) as well.
         | 
         | Agree, SQL can already do what the OP proposes in almost the
         | exact same syntax.
         | 
         | and together with CTEs, intermediate temp tables or views, you
         | can totally avoid the spaghetti and break down the queries in
         | manageable chunks.
        
           | tacosbane wrote:
           | postgres has window functions but it does not have the
           | `qualify` clause.
        
       | sshine wrote:
       | Pine [1] and jq [2] are combinator query languages.
       | 
       | Every expression represents a filter of some kind.
       | 
       | In SQL (and in relational algebra), every expression represents a
       | dataset.
       | 
       | That means composing SQL expressions is composing data, not
       | composing operations on data.
       | 
       | Since there is a somewhat low, backwards-compatible barrier to
       | entry to build a combinator language on top of SQL (Pine is an
       | example), whether or not trading the increased learning curve of
       | using combinators for better composability is a good idea can
       | stand the test of time.
       | 
       | [1]: https://github.com/pine-lang/pine [2]:
       | https://stedolan.github.io/jq/
        
       | AtlasBarfed wrote:
       | 1) templating constructs or env vars? The programming languages
       | can provide that too, but... prepared statements and variable-
       | based substitution was a huge improvement to SQL code from
       | programming languages (at least in java and ruby). Why wasn't
       | that part of the SQL standard, why did it have to evolve
       | organically? If it was in the specification, so many SQL
       | injection vulns would have been avoided.
       | 
       | 2) SQL statement syntax should start with the table specification
       | so you can get autocompletion help when doing the columns and
       | other expressions. The INSERT statement has it right, while
       | SELECTs suck because the columns are specified before the WHERE
       | clause, so there is no context. At least provide alternate
       | syntaxes like SELECT FROM <table> COLUMNS <cols> WHERE
       | <whereclause>, it would be minimal for the DB vendors to support.
       | 
       | 3) SQL standards have been too nice to vendors, especially
       | Oracle. The lack of portable SQL between databases has hamstrung
       | the industry.
       | 
       | 4) Stored procedure standards should have been formalized, as
       | should standard apis. JDBC/ODBC were ok, but why are programming
       | languages doing major enhancements/de facto standards setting?
        
       | PaulHoule wrote:
       | The one that drives me nuts is                  select
       | this,count(*) from that group by this
       | 
       | which is the most common data exploration query of them all which
       | makes you type 'this' twice.
        
         | zasdffaa wrote:
         | What would you prefer to see instead, bearing in mind that
         | anything you will suggest is likely to bsave minimal typing but
         | introduce new complexity into sql which everyone agrees is
         | already creaking like a constipated elephant.
        
           | PaulHoule wrote:
           | something like a function or macro                  CREATE
           | MACRO count_values(this, that) WITH DEFINITION
           | SELECT this,COUNT(*) FROM that GROUP BY this
        
         | thomasdziedzic wrote:
         | select this, count(*) from that group by 1
        
           | dominotw wrote:
           | I actually find this less readable. i have to go back and
           | scan the select to see what column this index corresponds to.
           | Totally breaks the flow of reading.
        
           | eatonphil wrote:
           | Tested this out on Postgres and it works:
           | SELECT         name, count(1)       FROM          (VALUES
           | ('Kev',12), ('Meg', 14), ('Kev', 14))        people(name,
           | age)       GROUP BY         1
           | 
           | Produces: [[Kev, 2], [Meg, 1]].
           | 
           | But I don't get it. Could someone explain how this works?
        
             | lgsilver wrote:
             | Most SQL dialects accept a positional or a direct reference
             | to the column name. "1" represents the first column created
             | by the select statement, "name".
        
               | eatonphil wrote:
               | Ah I see. Is that how `count(1)` also works? Or is that
               | different.
        
               | xapata wrote:
               | Different. SQL's syntax is funky.
        
           | zasdffaa wrote:
           | Please never allow this into production. I believe it is
           | deprecated, it certainly is fragile in general.
        
       | hotdamnson wrote:
       | This is a very poorly written SQL.. Try some modern syntax:
       | select DISTINCT sale_person_id, sale_person_name,
       | max(no_products_sold) over (partition by sale_person_id) AS
       | max_products_sold, commission_percentage, sales_department from
       | no_products_sold;
        
       | benjiweber wrote:
       | create view max_products_sold as select sale_person_id,
       | max(no_products_sold) AS no_products_sold from
       | sales_department_details group by sale_person_id;
       | 
       | select sale_person_id, sale_person_name, no_products_sold,
       | commission_percentage, sales_department from
       | sales_department_details natural join max_products_sold;
        
       | trollied wrote:
       | >Frankly, it seems like some sort of macro system is needed
       | 
       | Views, materialized views, CTEs.
        
         | fny wrote:
         | No, these are all highly coupled to the underlying data
         | structure. Even stored procedures tend to know a bit too much.
         | 
         | Say I have a table with the following properties:
         | 
         | - It has a column that corresponds to an id
         | 
         | - It has a column that corresponds to a category
         | 
         | - It has a column that corresponds to a value
         | 
         | - It has a column that corresponds to a timestamp
         | 
         | - It has a bunch of other columns
         | 
         | I want to have a general transformation that allows me to:
         | 
         | Select an aggregate min, max over a period of time for a given
         | category along with the other columns linked to min/max.
         | 
         | The stored procedure ends up being a mashup of string
         | concatenation. Meanwhile, it just feels something is missing to
         | make the language expressive enough.
        
           | zasdffaa wrote:
           | > No, these are all highly coupled to the underlying data
           | structure
           | 
           | Difficult to be sure what you mean, but I think you're wrong.
           | You can select from any table source (edit: result set, if
           | you prefer that term).
           | 
           | > Select an aggregate min, max over a period of time for a
           | given category along with the other columns linked to
           | min/max.                  select min(category) as mincat,
           | max(category) as maxcat,           min(value) as minval, --
           | etc        where ts between <start> and <end>
           | 
           | I am not sure what you're asking for if the above is not it.
           | Can you give an example please?
        
           | dmux wrote:
           | >Select an aggregate min, max over a period of time for a
           | given category along with the other columns linked to
           | min/max.
           | 
           | The min/max aggregations over a period of time per category
           | is pretty straight forward (aggregating rows), but what do
           | you mean by "along with the other columns linked to min/max"?
           | How would other columns come into play after aggregating
           | rows?
        
       | PaulHoule wrote:
       | I am using JooQ now at work to write SQL statements in a Java
       | DSL. Java's type system mostly works as intended and is helpful,
       | particularly in conjunction with an IDE. Many kinds of
       | metaprogramming are possible with JooQ.
        
       | throwyawayyyy wrote:
       | The story of the past 10 years at my FAANG has been one of making
       | our key-value databases look as much like relational databases as
       | possible, complete with SQL query engines (at least two of them,
       | because what would a FAANG be without multiple slightly
       | incompatible ways of doing things?). Honestly it's been a huge
       | boon. Should we have abandoned SQL entirely, and come up with a
       | new language? Maybe. But given that the primary initial
       | motivation for this work was to allow analysts to creation
       | millions of reports without bothering eng, it had to be a
       | language that analysts understand. That's SQL.
        
       | carlineng wrote:
       | I've written about this topic a fair amount; first examining some
       | of the criticisms of SQL that have been present since the
       | language's inception [1], then looking at a new project called
       | Malloy that I'm quite excited about, and think has a lot of
       | potential to address some of the problems of SQL as it relates to
       | data analysis [2].
       | 
       | [1]: https://carlineng.com/?postid=sql-critique#blog [2]:
       | https://carlineng.com/?postid=malloy-intro#blog
        
       | rawgabbit wrote:
       | What you described is already solved by window functions.
        
       | x-shadowban wrote:
       | Really would like "select * except ..." and a non-dynamic-sql way
       | to remap column names
        
         | spullara wrote:
         | Except in ad hoc queries you shouldn't be using * at all so it
         | isn't really an interesting problem to solve.
        
       | anon84873628 wrote:
       | You sound like the exact target for Malloy, which was posted to
       | HN recently:
       | 
       | https://github.com/looker-open-source/malloy
        
       | thangalin wrote:
       | From https://bitbucket.org/djarvis/rxm/src/master/ :
       | root               > people,           # "root" keyword starts
       | the document         person             > person,           #
       | maps table context to a node         .age               > @age,
       | # @ maps a column to an attribute node          .first_name
       | > name/first,       # maps a column to a node         .last_name
       | > .../last,         # ... reuses the previous node's path
       | account.person_id +> person.person_id, # +> performs an INNER
       | JOIN         account            > account,          # context is
       | now "account" node         .id                > @id,
       | # account id attribute         ^,
       | # pop stack to previous table context         address
       | > address,          # switch context to "address" node
       | .*,                                    # glob remaining columns
       | ;                                      # Denotes optional WHERE
       | clause
       | 
       | The query produces:                   <people>           <person
       | age="42">             <name>               <first>Charles</first>
       | <last>Goldfarb</last>             </name>             <account
       | id="123"/>             <address>               <id>456</id>
       | <street>123 Query Lane</street>               <city>San
       | Francisco</city>             </address>           </person>
       | </people>
       | 
       | It wouldn't take much to make that a JSON document.
        
       | george_ciobanu wrote:
       | Check out https://human.software, a visual language replacement
       | (no code)
        
       | greggyb wrote:
       | There are a number of query languages to address this type of
       | reuse and composability for analytical query workloads.
       | 
       | - MDX: created by Microsoft to provide a dimensional query
       | language. The language is incredibly powerful, but depends on
       | your understanding of dimensional modeling (go read Kimball is
       | the best starting point for learning MDX). There are several
       | tools, both commercial and open source, which implement an MDX
       | interface to data.
       | 
       | - DAX: Microsoft's attempt to make MDX more approachable. A
       | relational, functional language built on top of an in-memory
       | columnstore relational storage engine, and used in several
       | Microsoft products (Power BI, Analysis Services Tabular mode).
       | 
       | - Qlik has its own expression language whose name I am not sure
       | of.
       | 
       | - Looker has LookML
       | 
       | There are a lot of BI tools out there. Not all have an expression
       | language to support reusable measures/calculations, but many do.
       | You may want to look into one.
        
         | fny wrote:
         | Thanks! MDX and DAX look powerful but also unreasonably
         | complicated. LookML seems even more constrained. I'll keep
         | digging.
        
           | greggyb wrote:
           | Such languages offload more work onto the data model than SQL
           | does.[0] For example, in the Tabular model (where you would
           | use DAX), relationships are defined in the model (and
           | supported by an in-memory data structure that you can
           | shortcut-think of as a pre-computed hash-join) and implicit
           | in (most) expressions you write.
           | 
           | If you think of SQL as a general-purpose query language,
           | these BI-centric languages are DSLs for analytical queries.
           | They are most useful for workloads where you have many
           | queries that all use the same data model and are reasonably
           | served by a pre-defined set of calculations/metrics.
           | 
           | [0] Do not mistake me, good data modeling is important for
           | SQL as well.
        
       | SQL2219 wrote:
       | With dynamic sql you can write your own. I know you don't have
       | time, just throwing it out there.
        
       | dasil003 wrote:
       | Agreed that SQL lends itself to spaghetti. However after working
       | with it for 25 years both as a developer but also collaborating
       | with analysts and data scientists, I have to say I appreciate its
       | basic declarative nature and remarkable portability as it has
       | expanded into distributed databases.
       | 
       | For me the value of being able to walk up to an unfamiliar
       | database of widely varying tech and start getting value out of it
       | immediately is the killer app. Macros or other DB-specific
       | extensions would be useful at times, but I'm not sure how much
       | they would enable solving the messiness problem in a generic way
       | that reshapes the paradigm more broadly. My instinct is the
       | messiness is a consequence of the direct utility and trying to
       | make it nicer from a programmerly aesthetics point of view might
       | be hard to sell. It's not like SQL doesn't have affordances for
       | aesthetics already (eg. WITH clauses).
        
         | systemvoltage wrote:
         | I think if your SQL becomes spaghetti, it is time to fetch the
         | data out of the database and then process it. SQL databases
         | aren't a data-science studio that lends itself for extreme
         | processing capabilities just because it is fast. Furthermore,
         | it is the last thing that scales typically. Relational
         | processing is what its good at. Doing things like
         | bayesian/statistical analysis on the data in SQL is asking for
         | trouble.
        
         | fny wrote:
         | SQL is masquerades as declarative. It's relational algebra in
         | disguise. This results in...
         | 
         | - Queries that are difficult to interpret/verify.
         | 
         | - Non trivial amount of effort to go from question to query: I
         | often feel like I have to jump through hoops to obtain
         | something trivial. I have to think, and I don't want to think.
        
           | dasil003 wrote:
           | I agree that SQL is needlessly baroque. Not sure whether I
           | agree that there are too many hoops for trivial things
           | though. Leaving SQL ergonomics aside for the moment, isn't
           | relational algebra something which _does_ require one to
           | think carefully to avoid hastily producing plausible but
           | incorrect queries?
        
       | pdntspa wrote:
       | I can't believe that at time of this writing, only a few other
       | comments are mentioning VIEWs! Like that is this dude's exact
       | complaint!
       | 
       | smh...
        
         | kiernanmcgowan wrote:
         | Views, especially Materialized Views if you're on Postgres, are
         | about as close as you can come to a "one neat trick that solves
         | all your problems" in modern software development imo.
        
           | remus wrote:
           | I'd love to use materialised views more often but my
           | experience is that they're full of annoying limitations,
           | typically around functions that feel like they should be
           | immutable but aren't.
        
         | viraptor wrote:
         | There are situations where you can't use the views. For example
         | I'm writing software which integrates with $vendor app. Not
         | only do I have restricted select access, I have no influence on
         | the schema at all. By definition, I have to send a complete
         | query every time. And there are many similar scenarios.
        
           | spullara wrote:
           | Not SQL's fault. They probably wouldn't let you use macros
           | either.
        
         | EMM_386 wrote:
         | They did mention they've only been using it 3 years.
         | 
         | It's been an ANSI standard for 36 years.
        
         | default-kramer wrote:
         | Let's say you have 14 derived facts (eg aggregations,
         | computations involving joined tables, etc) about a SalesPerson.
         | Do create 14 single-column views? Or have one view with 14
         | columns? Either way the composability is terrible. If you go
         | with 14 views, the syntax is awful and not very resistant to
         | refactoring. If you go with one view, you will eventually end
         | up paying a performance cost for computations you actually
         | aren't using in certain places (barring incredible and maybe
         | impossible advancements in query optimization).
        
           | pdntspa wrote:
           | You can SELECT just the columns you want with one view, and
           | IIRC that will exclude other columns from being retrieved
           | unless they are the source of a derived value.
        
           | zasdffaa wrote:
           | As others have said, any decent optimiser will prune out
           | values that aren't SELECT'ed. I don't think it's that complex
           | either - find the unused columns, remove them from subviews,
           | recurse all the way down. Try it on MSSQL/Postgres and you
           | can see it being done.
        
           | wswope wrote:
           | > If you go with one view, you will eventually end up paying
           | a performance cost for computations you actually aren't using
           | in certain places (barring incredible and maybe impossible
           | advancements in query optimization).
           | 
           | This is outdated; most modern DBs treat views as they would
           | CTEs. If you don't reference a defined field downstream, it
           | doesn't need to get calculated.
        
             | default-kramer wrote:
             | It's more about the fact that a join can change the shape
             | of the result set. Even if the columns being projected
             | aren't surfaced, the DB still has to process the joins to
             | make sure the result set has the correct shape. For
             | example, a human might know that a join will always be
             | one:one or one:zero-or-one, but the DB has no choice but to
             | make sure. Perhaps using subqueries instead of joins would
             | work, but that gets ugly too.
             | 
             | (Or maybe my knowledge is outdated and the optimizers have
             | gotten way better than they were 3-4 years ago.)
        
           | fny wrote:
           | The bigger problem is views are tightly coupled to the
           | underlying data.
        
             | saltcured wrote:
             | Right, there's a big difference between views and the kind
             | of generic abstraction that I think the original poster
             | seeks. You do need something like macros or generated SQL
             | to build a library of algorithms you can apply to data.
             | 
             | Imagine wanting something as simple as
             | "find_percentile_value(table, column, percentile)". There
             | is no portable and standard way to write this parametric
             | abstraction and then reuse it on any column source you need
             | to interrogate later.
        
         | irrational wrote:
         | Oh good, it's not just me. As soon as I read his post I said to
         | myself, "It sounds like he hasn't heard of views." But the
         | first comment didn't mention views and I was so confused that I
         | started wondering if I misunderstood the post.
        
       | simonw wrote:
       | I find CTEs (the WITH statement) greatly improved my ability to
       | run more complex queries because they offer an abstraction I can
       | use to name and compose queries together.
       | 
       | SQL views are useful for that kind of thing too.
        
         | warmwaffles wrote:
         | The only issue I see with CTE's and have encountered, is poor
         | query performance. The query planner needs to improve in this
         | regard. At one point I noticed some of my CTE with queries
         | being written to a temp file on disk and used to join against,
         | when an inlined query did not do this producing the same
         | results.
        
           | srcreigh wrote:
           | In Postgres you can use AS NIT MATERIALIZED to avoid the temp
           | buffers. A recent PG version also makes this default, but
           | only when the CTE is only queried once later.
           | 
           | This can definitely cause issues. I've seen queries with CTEs
           | that basically process unindexed 100 MB data due to querying
           | the materialized rows.
           | 
           | It would be cool to be able generate indexes with the CTEs
           | though
        
         | mixmastamyk wrote:
         | How to use a CTE with a group of several following statements?
         | I think this is where you would use a view instead, but would
         | appreciate confirmation.
        
       | scubbo wrote:
       | Absolutely.
       | 
       | https://www.scattered-thoughts.net/writing/against-sql (found
       | from HN a week or so ago)
        
       | db48x wrote:
       | People build composable and reusable abstractions on top of sql
       | all the time.
        
         | quantified wrote:
         | Examples would help. It's an assertion people frequently make.
         | I have struggled with homegrown not-particularly-good ones. One
         | problem is finding a good set of abstractions for the things I
         | really need to abstract on. So there are probably different
         | abstractions to note.
         | 
         | In the same vein that different languages provide higher-level
         | abstractions for machine code or JVM bytecode.
        
           | db48x wrote:
           | There is a database library called Diesel that I use which
           | has good support for composition. The easiest example to
           | point to is on their own webpage:
           | https://diesel.rs/guides/composing-applications.html
        
           | PaulHoule wrote:
           | Stored procedures, user-defined functions, views...
        
           | sequence7 wrote:
           | EdgeDB & EdgeQL would be one good example.
           | 
           | https://www.edgedb.com/
           | https://www.edgedb.com/showcase/edgeql
        
           | simonw wrote:
           | The Django ORM is one of my favourite abstractions on top of
           | SQL.
        
             | rglullis wrote:
             | Nice try, Django core developer... :p
        
               | db48x wrote:
               | Now now, we all have some biases.
        
       | richbhanover wrote:
       | What do you think of PRQL ("prequel")?
       | 
       | https://github.com/prql/prql
        
         | vassilevsky wrote:
         | Cool name ;)
        
         | euroderf wrote:
         | The idea that you can visualize your pipeline and then "put it
         | to paper" is pretty nice.
        
       | kristov wrote:
       | I think it would be neat if columns of a resultset could contain
       | other resultsets (relvars?) as values. This would be the natural
       | outcome of a join operation, and give greater flexibility on how
       | you want to reduce those sub-relvars down. It also makes
       | recursive queries more natural, yielding a "tree" of results. You
       | could even build a reducer for a recursive query that
       | concatenated rows to a string to serialize into whatever: json,
       | XML, etc.
        
         | deergomoo wrote:
         | This is essentially how EdgeDB's "EdgeQL" language [0] works.
         | Pretty much everything it deals with is a set. It maps much
         | nicer onto many types of application code.
         | 
         | [0] https://www.edgedb.com/
        
       | SakeOfBrevity wrote:
       | Check out dbt (not dbt-cloud), great open-source SQL tool, macro
       | system is included in the ever-growing set of features. It helps
       | managing SQLs a lot.
        
       | default-kramer wrote:
       | Yes, I can't believe that the awesome power of relational DBs is
       | still hindered by the major flaws of SQL. Here is my attempt at
       | something better: https://docs.racket-
       | lang.org/plisqin/Read_Me_First.html The feature I am most pleased
       | with is that joins are values (or expressions, if you prefer)
       | which can be then refactored using normal techniques.
        
       ___________________________________________________________________
       (page generated 2022-09-11 23:01 UTC)