[HN Gopher] SQL is syntactic sugar for relational algebra
       ___________________________________________________________________
        
       SQL is syntactic sugar for relational algebra
        
       Author : dmarto
       Score  : 193 points
       Date   : 2024-03-19 07:54 UTC (4 days ago)
        
 (HTM) web link (www.scattered-thoughts.net)
 (TXT) w3m dump (www.scattered-thoughts.net)
        
       | roenxi wrote:
       | I'm glad that the article concluded "No" to it's own headline.
       | Calling SQL "syntactic sugar" is an insult to sugar. The "helpful
       | diagram explaining how the scoping rules work" alone should make
       | people blanch. The language is a syntactic disaster that we've
       | been saddled with out of habit and inertia.
        
         | solumunus wrote:
         | I agree that SQL syntax is terrible, but for some potentially
         | masochistic reason I still like writing it. It could certainly
         | be done better though and I don't understand how there hasn't
         | been more innovation in this space.
        
           | IshKebab wrote:
           | There's PRQL which is really nice. But it has pretty big
           | downsides: it only supports querying data, not modifying it;
           | and since it's not database-specific it doesn't support all
           | the features of whatever database you're using. E.g. it can't
           | access complex data types like structs/arrays in DuckDB.
        
           | refset wrote:
           | > I don't understand how there hasn't been more innovation in
           | this space
           | 
           | I think it's simply that most businesses and investors don't
           | register SQL as having any real problems, and especially now
           | with a resurgent interest in SQL the idea of attempting
           | anything novel feels too risky.
           | 
           | Shameless plug of one recent attempt to offer something
           | different: XTQL https://docs.xtdb.com/intro/what-is-xtql.html
        
             | account-5 wrote:
             | Having been forced to write queries as JSON in elastic I
             | definitely prefer writing SQL. I would actively avoid
             | utilizing any query language where I have to count
             | brackets.
        
               | refset wrote:
               | Handcrafting JSON is undoubtedly always a pain, but the
               | idea with XTQL is rather that it can be easily generated
               | from any regular programming language.
               | 
               | > I would actively avoid utilizing any query language
               | where I have to count brackets
               | 
               | That's really an editor/tooling problem, solvable in many
               | ways, but I guess a Python-like/Parinfer approach would
               | be your preference? (where whitespace/indentation is
               | significant)
        
               | account-5 wrote:
               | True enough, tooling is key. I'm not a developer but I
               | think SQL is a decent language that's easy to pick up.
               | But I also like xpath, regex, and jq syntax too, so maybe
               | I'm the weird one.
        
           | bradleyjg wrote:
           | PG's blub essay isn't especially charitable. There's a good
           | reason to want to stick with what you know and are
           | comfortable with---climbing endless learning curves means not
           | getting things done and it isn't simple to know ex ante what
           | is worth the time
        
             | refset wrote:
             | Which is fine when everything is handcrafted and human-
             | scale, but as soon as you start going down the path of
             | machine generated SQL (both the generation and analysis
             | thereof) the tax is non-trivial.
        
           | masklinn wrote:
           | It's not bad enough that it's worth actually trying to
           | overturn, and the inertia from everything speaking SQL means
           | there's a lot less pushback if you also do that.
           | 
           | SQL actually killed alternatives back in the 80s (and 90s,
           | Postgres used to use a dialect of Ingres's QUEL, the "SQL"
           | bit of PostgreSQL specifically denotes the switch over from
           | QUEL to SQL).
        
         | bugbuddy wrote:
         | Call a masochist but I love me good old SQL. I find most
         | problems with systems I have worked with are the result of
         | managements putting people who never paid attention in database
         | classes in charge. Or maybe nobody really cares about doing a
         | good job. In any case, my professors would have given all the
         | legacy SQL databases that I have come across F-.
         | 
         | The worse part is nobody dares changing anything because it
         | would require application code changes. Don't blame SQL. Blame
         | incompetent people doing SQL badly.
        
           | b33j0r wrote:
           | I didn't "get" SQL for the first 6-8 years of professional
           | full-stack engineering. I was chasing, along with many in my
           | cohort, the ORM abstraction dragon we thought we needed to
           | work on j2ee, django, or sqlalchemy teams quickly.
           | 
           | It's perplexing to me now why SQL was perplexing. Even
           | knowing about relational theory and enough DBA to pass
           | interviews, it somehow always seemed like we should have been
           | able to treat databases like our OOP's.
           | 
           | To think of the hours I wasted. Maybe I heard in a meeting
           | "it's the join table" and another colleague said "just use
           | the association table," and I was trying the n^2 debugging
           | approach to solve it with model classes.
           | 
           | The best advice I got on this made sense later: "I wouldn't
           | even try to manage that [dataset] myself. Just figure out how
           | to tell a database to do it--that's its only job."
        
             | bitwize wrote:
             | Understanding relational algebra helps you write better SQL
             | the way understanding lambda calculus helps you write
             | better JavaScript. The language is warty and but a shadow
             | of the pristine purity underneath, but it gets the job done
             | and the theory guides your reasoning on how to use the
             | language.
        
               | lkuty wrote:
               | In particular, I find the closure property of SQL very
               | helpful and always think in terms of queries gettings
               | tables and producing tables (derived most of the time).
               | https://en.wikipedia.org/wiki/Closure_(mathematics)
        
             | AdrianB1 wrote:
             | I really "got" SQL about 15 years after I started using it.
             | The explanation is very simple: until that moment, SQL was
             | a minor component for me in a big pie. Then one day I got
             | into the position to troubleshoot some apps with >1 TB
             | database behind them and I figured that SQL is more than
             | simple CRUD operations. For most full stack developers that
             | I know, SQL is still that minor component in the back of
             | their mind and they are usually right.
        
           | eek2121 wrote:
           | Agreed, though disagree about the database class thing. I got
           | a C in my class. I was a DBA at the time and definitely knew
           | my shit. That was kind of the issue. He was teaching by the
           | book with no regard to the database engine used. I was using
           | SQL Server and MySQL daily.
           | 
           | There was a fundamental disconnect. I approached a class
           | project differently not only because I was a DBA, but I am
           | neurodivergent. Because my design didn't align with his
           | expectations, I got a C.
           | 
           | I had a 4.0 GPA prior to that, but that made me realize I
           | wasn't a fit for college.
           | 
           | Good thing as well because I am debt free today.
        
             | AdrianB1 wrote:
             | I failed COBOL in college in a similar situation: I was
             | professionally writing COBOL, my project was too divergent
             | from what the teacher expected. I took that exam 3 times to
             | barely pass. Good times, good times.
        
               | doubloon wrote:
               | i took compilers and prof wanted us to write lexer code
               | by hand for a specific language. instead i wrote a
               | generic lexer generator and fed in the language
               | definition to generate the lexer code, got a bad grade
               | for not following instructions. it seems kind of funny
               | now but at the time i was rather cross.
        
             | 20after4 wrote:
             | I also got a C in my cs relational databases class. I also
             | got more out of that class than just about any other in CS.
             | I'm still kind of mad about it though. We had a group
             | project assignment and our group had far-and-away the best
             | project, of which I did 90% of the work. My bad grade came
             | down to just one thing: the class required us to use Oracle
             | and I refused on moral grounds.
        
           | elcritch wrote:
           | In my experience few CS students seem to take a relational
           | databases course. It was optional in my program.
           | Unfortunately, that means many of those databases were
           | created by folks with no formal grounding in databases. Some
           | folks pick it up on their own, but likely not as thoroughly
           | as those who've taken a good course on the topic.
        
           | default-kramer wrote:
           | Yes, people who don't understand SQL databases can easily
           | create "F-" situations. The problem is that people who are
           | great at SQL databases can't do any better than a C+ because
           | that's as good as SQL gets.
        
           | randomdata wrote:
           | _> Blame incompetent people doing SQL badly._
           | 
           | Everything you spoke to is around people doing relational
           | algebra badly. That has little to do with SQL. The same
           | mistakes would lead to the same problems if the database
           | used, say, QUEL instead.
           | 
           | Perhaps it's not SQL that you love at all?
        
         | sherburt3 wrote:
         | Strongly disagree, complete newbies can get up and running with
         | SQL very quickly and it's expressive enough that experienced
         | people can do things that no ORM would ever hope of being able
         | to do. One criticism I have though is that whoever thought
         | "SELECT" should come before "FROM" should be fired.
        
           | alexvoda wrote:
           | And yet I agree with both. SQL is at the same time easy to
           | get started, incredibly powerful and yet also a complete
           | disaster. Just like JavaScript but with standards you have to
           | pay to read and implementers take their sweet time to
           | implement. ORMs (with the possible exception of JOOQ, whose
           | open source version is unfortunately also very limited) are
           | hopelessly limited and have been a drag on the development of
           | the field ever since.
        
             | swasheck wrote:
             | i think "disaster" is hyperbole. my brain thinks in concert
             | with sql so it feels natural, though i will concede that
             | the parsing/syntax order differences was an odd choice.
             | 
             | programmers and language users all have a mental
             | translation layers that they use to go from
             | conceptual->concrete using the syntax and idioms of their
             | language of choice.
             | 
             | javascript doesn't make sense in a lot of ways. calling it
             | a disaster is hyperbole. go doesn't make sense in a lot of
             | ways. it's not a disaster. unfamiliarity or disagreement
             | with choices does not necessarily mean it's a disaster
             | 
             | the only language with which i'm familiar that is a
             | complete, unmitigated disaster is the english language, but
             | i suppose that's outside of the scope of this conversation
        
               | runlaszlorun wrote:
               | > the only language with which i'm familiar that is a
               | complete, unmitigated disaster is the english language
               | 
               | Out of curiosity, are you a native speaker of english or
               | did you learn it as a second language?
               | 
               | I'm a native speaker and feel sorry for anyone who has to
               | learn all its irregularities, etc. But the few times I've
               | mentioned this to anyone who had to learn as a second
               | language, they'd correct me and tell it was easy. Which
               | baffled me.
               | 
               | Obviously as a native speaker, I'm in no place to judge
               | it.
        
           | Mutjake wrote:
           | I sort of feel the same...but on the other hand if you
           | consider "delete from" exists also, it's not completely
           | unsensible to consider you first tell what operation you're
           | about to perform to the data. Would be nice to start with the
           | source entity name for sure. Dunno what "select 1" would look
           | like, I guess the from foo would be optional.
           | 
           | Random saturday ramblings, sorry about that :-D
        
             | williamdclt wrote:
             | DELETE FROM is even worse. Accidentally/mindlessly press
             | cmd+enter before you wrote the WHERE? Poof, data gone. Make
             | it FROM ... DELETE!
             | 
             | I also wish we needed to explicitly take locks. In
             | PostgreSQL at least (I think other dialects/engines too),
             | figuring out what operation will take what lock and
             | potentially mean downtime is left as an exercise to the
             | reader. Force me to write WITH EXCLUSIVE TABLE LOCK when I
             | add a non nullable column with a default!
        
               | Mutjake wrote:
               | Well, to be fair "from foo delete" would do the same I
               | suppose :-D Unless there'd be an explicit end to the
               | statement to designate you really want to delete
               | everything. Which might not be a bad idea. Or make "where
               | ..." mandatory and bring in "delete_all" or "delete
               | everything from foo" as a syntactic guardrail. This is
               | equally implementable, whichever the order of "delete"
               | and "from" would be.
        
               | Izkata wrote:
               | The MySQL client has an --i-am-a-dummy flag that won't
               | let you run a DELETE (or UPDATE, I think) without a
               | WHERE.
        
               | ndriscoll wrote:
               | If you're going to run commands that modify data directly
               | on the cli, do it in a transaction so you can roll back.
               | Also, start with `--` to make it a comment. Once you have
               | the correct query and someone's checked your work, go
               | back to the beginning of the line and remove the `--` so
               | you can run it. It's also a good idea to write a select *
               | or count first to check that things make sense, and then
               | start your transaction, go up and modify your select into
               | a delete/update, check affected rows looks good again,
               | maybe do another select or two to check that data looks
               | like you expect, commit.
        
               | Izkata wrote:
               | Fun fact: if you still have a MySQL (MyISAM) database,
               | the transaction commands return success but don't
               | actually function, so you can't roll back.
        
               | antisthenes wrote:
               | If you don't SELECT first before running the DELETE
               | query, you shouldn't be anywhere near an IT job, let
               | alone a production database.
        
           | rileymat2 wrote:
           | I like it starting with the desired output columns, often
           | when I work with a db quite a bit and have common naming
           | conventions that's all I need to know.
           | 
           | Even writing queries I think in terms of outputs then build
           | the query from there.
           | 
           | I am blanking about the advantages of starting with from,
           | people clearly share your view, but I have no idea why.
           | 
           | Maybe it comes from my C++ background where return type comes
           | first, then function name, then inputs.
        
             | arzig wrote:
             | Ability to tab complete columns in an interactive settings.
             | 
             | If I'm in psql I can tab complete columns in the where
             | clause but not in the select because I haven't actually
             | given any information about what I'm selecting from yet.
        
               | rileymat2 wrote:
               | This feels like a tradeoff we make where a convenience
               | while writing once harms long term readability?
               | 
               | But that is a valid usecase!
        
             | duped wrote:
             | Imagine if methods on your C++ classes had to be called
             | like method(object) instead of object.method(). While it's
             | a meaningless syntactic difference, the latter makes
             | autocomplete in an IDE possible.
        
               | rileymat2 wrote:
               | Sure, in that very small example, if you don't care
               | internal about visibility.
               | 
               | But method(object1, object2, object3, object4) is
               | different than object.method().
        
           | exabrial wrote:
           | Completely agree with everything you said
           | 
           | Reminds me of reading music actually... many first time
           | noob's, mostly out of frustration, realize how sub optimal
           | staff notation is and seek to write something better. In the
           | end it's 'Good enough' and accept it's flaws because no
           | rewrite has proven to be a 10x better solution.
        
             | arrowsmith wrote:
             | ... unless you're playing guitar, in which case tab
             | notation works way better ;)
        
               | dizhn wrote:
               | More like Tab + already knowing the song.:)
        
             | iamwil wrote:
             | SQL seems to be the QWERTY of the data query world. It's
             | far too entrenched.
        
           | rileymat2 wrote:
           | It just occurred to me, do you have wide denormalized table
           | queries where tables represent entities more than relations?
           | 
           | I think if so that might explain the difference in query
           | structure.
        
             | sherburt3 wrote:
             | Yeah I maintain several BI reports and the codebase is
             | about 20 years old, so multiple teams have taken creative
             | liberties with the naming conventions for tables. Without
             | intellisense I'd probably have gone insane by now.
        
         | pkkm wrote:
         | Agreed, I wish that SQL was more concise and organized more
         | left-to-right, like shell pipelines or the F# pipe operator.
         | But what are the alternatives? The only one I know is PRQL, and
         | that doesn't seem to be getting popular (correct me if I'm
         | wrong).
        
           | AdrianB1 wrote:
           | It is organized, especially if you write it correctly (split
           | lines and indent). Minified, it is a nightmare, but with a
           | proper layout it is a pleasure to read it. I just did a code
           | review for a 20,000 LoC piece and it was quite a breeze after
           | I improved the layout a bit.
        
             | pkkm wrote:
             | "Disaster" may be an exaggeration, but I think the syntax
             | could be better, proper layout or not. In particular, I
             | find that complex SQL queries often need to be read in a
             | "spiral" order due to SQL's attempt to resemble natural
             | language. I find PRQL's consistent left-to-right/top-to-
             | bottom reading order a clear improvement.
        
           | __mharrison__ wrote:
           | Pandas and now Polars can replace a lot of SQL. I find that
           | many of my students find these easier to build up and debug.
        
           | listenallyall wrote:
           | I can understand that the statement order can be awkward,
           | such as choosing what to select before the source.
           | 
           | But help me with "more concise". SQL has no boilerplate, its
           | keywords are brief, you express the exact logic of the result
           | set you're seeking. There are tons of SQL overlays or
           | replacements embedded in programming languages or
           | BI/reporting tools, and they are universally, more difficult
           | to work with than straight SQL.
        
             | temporarely wrote:
             | There is a paper at the end of this link trail and that
             | paper discusses everything that is wrong with SQL including
             | concision.
             | 
             | https://news.ycombinator.com/item?id=39798528
             | 
             | et voila:
             | https://www.cidrdb.org/cidr2024/papers/p48-neumann.pdf
             | 
             | see their comments on WITHIN GROUP and redundant
             | declaration of identifiers. As for boiler plate, SQL's turf
             | war over plain English words as reserved names creates this
             | interesting situation of boiler plate identifiers in SQL to
             | disambiguate from reserved keywords.
        
           | anon84873628 wrote:
           | You may like: https://www.malloydata.dev/
        
         | Scene_Cast2 wrote:
         | My take is that SQL is a bit like C - very nice to know, but
         | move off of it to a more "managed" environment for safety and
         | sanity.
        
           | williamdclt wrote:
           | Definitely disagree on this one. ORMs are a leaky
           | abstraction, they hide the wall you're going to bump into
           | anyway, and they bring their own gotchas on top
        
             | Scene_Cast2 wrote:
             | I worked with two frameworks so far - SQLalchemy core and a
             | much bigger "orchestration" one (dagster / airflow
             | equivalent). SQLalchemy got me type checking, column name
             | checking, and DB migrations versioning with Alembic.
             | 
             | The bigger framework got me more static analysis and
             | dependency graphs on top of that. Those features saved me
             | lots of time and headaches (type checking, dependency
             | graphs, etc). Mind you, the frameworks I worked with still
             | allowed for raw SQL for those edge cases that still pop up.
        
         | fudged71 wrote:
         | https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...
        
         | jameshart wrote:
         | Syntactic sugar, in the sense that it is a syntax that is bad
         | for you.
        
       | KingOfCoders wrote:
       | Everything is just syntactic sugar for something else. I'm
       | syntactic sugar for the hydrogen atoms in my body.
        
         | WWWWH wrote:
         | And on some cases sugar sugar not syntactic sugar.
        
         | lkey wrote:
         | Your comment is both needlessly dismissive, and worse,
         | incorrect. You cannot get a human being, much less the person
         | that is you, by recursively applying a constant set of rewrite
         | rules on unstructured hydrogen atoms.
         | 
         | If you think to counter my assertion with "The standard model
         | of particle physics and the big bang already did that, I'm here
         | after all", then spare us both the trouble and don't reply. The
         | particular arrangement of all known matter _and energy_ in the
         | universe at t=0 is not a repeatable initial condition.
         | 
         | Some rewriting systems are in fact Turing complete[1], and
         | that's an interesting digression. However, it's far afield from
         | the article's discussion of untangling the syntactic mess that
         | is the SQL standard and bringing it closer in line with the
         | standard expression of its _semantics_.
         | 
         | [1]:
         | https://www.sciencedirect.com/science/article/pii/0304397592...
        
       | halayli wrote:
       | Relational algebra IR is implemented in MonetDB and discussed in
       | their paper. Definitely worth reading.
       | 
       | Not trying to be picky but pure relational algebra doesn't map to
       | SQL and IMO it's not a good idea to attempt to do that due to the
       | fact that relational algebra treats tuples as mathematical sets
       | (ordering/uniqueness matters) while SQL does not(and has to deal
       | with nullability).
        
       | infogulch wrote:
       | SQL is pretty good all things considered.
       | 
       | But I've always looked out for languages that can represent
       | relational algebra concepts more directly. Maybe CozoaDB is
       | close, though still immature. Any recommendations?
        
         | markisus wrote:
         | I've been using Pandas which exposes a python slicing syntax
         | for manipulating relational data. It also has a builtin join()
         | function.
         | 
         | "select id, date from orders" is orders["id", "date"].
         | 
         | It's meant for in-memory datasets but the syntax could be
         | extended to work for other backends. I'm not sure if anyone is
         | working on that.
        
           | __mharrison__ wrote:
           | Ibis takes the notion of a dataframe and abstracts it from
           | SQL backends.
        
         | samatman wrote:
         | I highly recommend the Third Manifesto. I could link this under
         | most posts in this thread but I'll limit myself to two.
         | 
         | https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf
         | 
         | The only problem there is that you might want to _use_ a D
         | language, and well. You can 't. There was a product called
         | Dataphor which one can find some writeups on but, baffling
         | though I find this, there are no robust open-source relational
         | databases which use a D language.
        
       | triska wrote:
       | Codd's seminal paper, _A Relational Model of Data for Large
       | Shared Data Banks_ , states that a language based on applied
       | _predicate calculus_ "would provide a yard-stick of linguistic
       | power for all other proposed data languages". Quoting from
       | https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf:
       | 
       | " _1.5 Some linguistic aspects
       | 
       | The adoption of a relational model of data, as described above,
       | permits the development of a universal data sub-language based on
       | an applied predicate calculus. A first-order predicate calculus
       | suffices if the collection of relations is in normal form. Such a
       | language would provide a yard-stick of linguistic power for all
       | other proposed data languages, and would itself be a strong
       | candidate for embedding (with appropriate syntactic modification)
       | in a variety of host languages (programming, command- or problem-
       | oriented)._"
       | 
       | Languages based on predicate calculus indeed seem extremely
       | suitable for reasoning about relational data. Datalog is a well-
       | known example. It is more directly based on predicate logic, and
       | much simpler than SQL.
        
       | refset wrote:
       | > Lest you think is just one weird corner of the sql spec, I
       | found this helpful diagram explaining how the scoping rules work
       | (from Neumann and Leis, 2023)
       | 
       | It's an excellent diagram, it really conveys the dissonance.
       | Incidentally I interviewed Viktor Leis on a podcast last week
       | about the paper where it's from: https://juxt.pro/blog/sane-
       | query-languages-podcast/
       | 
       | A lot of people seem to believe that LLMs or other ML methods can
       | overcome the complexity challenges of generating SQL accurately,
       | but I'm yet to be convinced that a database-powered AI revolution
       | can happen without somehow bypassing SQL.
        
       | joking wrote:
       | A few tweaks here and there and it would be nice enough for me.
       | Most of them are actually implemented by some engines but are not
       | part of the standard. Just changing the order of the from and
       | select clauses so autocomplete can know what fields can you use
       | would be a nice enough change.
        
       | samsquire wrote:
       | Thanks for this interesting post.
       | 
       | Intuitively, relational algebra compresses enumeration over data
       | in time that a CPU executing billions of cycles a second can
       | feasibly and efficiently traverse and execute against many
       | collections of millions or billions of records in human
       | perceivable time thanks to indexes.
       | 
       | I've been trying to think of systems communicating with eachother
       | as parts of a relational model in the sense we can model system
       | behaviour as a series of events and a join is a communication
       | between components.
       | 
       | I would love to talk about this with people.
        
         | refset wrote:
         | > a join is a communication between components
         | 
         | Makes me wonder just how far people have pushed Foreign Data
         | Wrappers in practice.
        
         | seanhunter wrote:
         | Some years back I spent a weekend and built a "good enough to
         | prove to myself" version of this in Haskell that implemented
         | the main relational operations of projection, selection and
         | cartesian product.
         | 
         | The basic idea was based on the "stream fusion" papers. So the
         | relation was a stream in the stream fusion sense so it was
         | pretty trivial to implement the normal relational operators in
         | that paradigm. Changing this type of system to work on "events"
         | as input would be pretty trivial.
         | 
         | The one thing I never managed to get to work was the actual
         | "fusion" compiler hint. I kept trying variants of what he did
         | in the paper but ghc just refused to optimise my stream/unsteam
         | unstream/stream pairs away because it had already done some
         | rewriting to them. I couldn't figure out how to apply the
         | optimisation early enough to be effective.
         | 
         | [1] Which are a fantastic read if you're into CS whatever you
         | think of my idea
         | https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/str...
         | and
         | https://www.researchgate.net/publication/220802863_Rewriting...
        
           | samsquire wrote:
           | Thank you for sharing your thoughts and ideas Sean,
           | appreciated.
           | 
           | I am unfamiliar with the source material and I have recorded
           | the paper you linked to my reading list.
           | 
           | From a description of "stream fusion", it reminds me of
           | Clojure's "transducers".
           | 
           | It also reminds me of Kafka's Table/Stream duality.
           | 
           | Term rewriting is something is really interesting to me.
           | 
           | Communication, protocols and Communicating sequential
           | processes, session types are all ideas I am thinking about
           | and trying to understand.
        
         | ndriscoll wrote:
         | When working on a system that was overly split into
         | microservices, I once wrote a leftJoin stream operation for
         | Akka, partly to demonstrate how stupid it was that we had split
         | our database apart and now needed to implement a database in
         | our application using http calls, with 1000x worse performance.
        
       | bkanuka wrote:
       | As someone who learned mathematics first and programming later, I
       | think it took me about 10 years of working in data-intensive
       | programming before I could write really "good" SQL from scratch.
       | 
       | I completely attribute this to SQL being difficult or "backwards"
       | to parse. I mean backwards in the way that in SQL you start with
       | what you want first (the SELECT) rather than what you have and
       | widdling it down. Also in SQL (as the author states) you often
       | need to read and understand the structure of the database before
       | you can be 100% sure what the query is doing. SQL is very
       | difficult to parse into a consistent symbolic language.
       | 
       | The turning point for me was to just accept SQL for what it is.
       | It feels overly flexible in some areas (and then comparatively
       | ridgid in other areas), but instead of fighting against this or
       | trying to understand it as a consistent, precise language , I
       | instead just go "oh SQL - you are not like the other programming
       | languages I use but you can do some pretty neat stuff so we can
       | be on good terms".
       | 
       | Writing good SQL involves understanding the database,
       | understanding exactly the end result you want, and only then
       | constructing the subqueries or building blocks you need to get to
       | your result. (then followed by some trial and error of course)
        
         | mrits wrote:
         | I always thought writing SQL from scratch was the easy part.
         | The hard part for me was coming back to my query a few weeks
         | later
        
           | arrowsmith wrote:
           | This is true for most programming languages.
        
           | Winsaucerer wrote:
           | That's why I try (but sometimes forget) to extensively
           | comment my queries that have any kind of complexity :)
        
         | mdcurran wrote:
         | This doesn't totally solve the issue of SELECT'ing first then
         | filtering, but for complex queries I've found CTEs very useful
         | (whenever the database/SQL dialect supports it).
        
           | icedchai wrote:
           | What I usually do is start with "select *", get the joins and
           | where clause down, then refine the select.
        
         | thaumasiotes wrote:
         | > widdling it down
         | 
         | Whittling. It means to carve something out of wood, with a
         | metaphorical extension, as here, to gradually making something
         | smaller by shaving bits of it away.
        
           | Strang wrote:
           | Important distinction. "Widdling" is urination.
        
         | whatever1 wrote:
         | The trial and error is the worst part.
         | 
         | In traditional languages, you can print iteration by iteration
         | the intermediate result and understand if there is something
         | wrong.
         | 
         | In SQL you sample output, and you keep changing the query until
         | you think you get it right. And then 2 years later someone else
         | finds that the query was wrong all this time.
        
           | fifilura wrote:
           | > you can print iteration by iteration the intermediate
           | result
           | 
           | You would not be able to do that with a multi-threaded/multi-
           | process application.
           | 
           | And this is the reason why e.g. Trino/Presto is so powerful
           | together with SQL.
           | 
           | Instead of telling the computer how to go by to get your
           | result, you tell it what result you want and let it do it in
           | the best way.
           | 
           | The most up-front way of telling a computer "how" is a for-
           | loop. And SQL does not have it. It may seem limiting, but
           | avoiding explicit for loops gives the freedom to the
           | computer. If it sees it fit to distribute that calculation
           | over 200 distributed CPUs it can do that. With an imperative
           | language you need to tell the computer exactly how it should
           | distribute it. And from there it gets really hairy.
        
             | whatever1 wrote:
             | In development I don't need it to be multi-threaded. 1
             | thread is fine, as long as I can explain, step-by-step, how
             | the calculations produced the output.
        
               | fifilura wrote:
               | If you don't need threads in development OR production,
               | you might as well do SELECT * from users and do the join
               | in your imperative code.
               | 
               | If you need threads in production I think you will end up
               | getting rid of your for loops anyway (or possibly, if you
               | really want to, end up in a mutex/semaphore quagmire).
               | 
               | I must say, though, that there are other benefits with a
               | declarative approach than just avoiding threading issues.
               | But I guess it takes some getting used to.
               | 
               | I would say that the same "I cant step through my code"
               | argument also goes for functional style code.
        
               | Ma8ee wrote:
               | > If you don't need threads in development OR production,
               | you might as well do SELECT * from users and do the join
               | in your imperative code.
               | 
               | Except that it most likely will be orders of magnitude
               | slower. Most databases are very good at what they are
               | doing.
        
               | fifilura wrote:
               | Yes. Kind of my point to. But the OP missed the
               | possibility to step through the code.
        
           | jalk wrote:
           | Common Table Expressions (CTE) do help a little, as you can
           | query each "table" and inspect the output. Debugging a giant
           | query with deeply nested sub queries is very painful indeed
        
             | Ma8ee wrote:
             | So do table variables and temp tables.
        
           | r00fus wrote:
           | I mean, I never build a query from front to back. Usually I
           | build it FROM -> JOIN -> WHERE -> SELECT.
        
             | mycall wrote:
             | Start off with SELECT * then once the joins are working,
             | filter * down to the essentials.
        
           | remus wrote:
           | > The trial and error is the worst part.
           | 
           | I don't know about anyone else, but I do this kinda naturally
           | when writing SQL queries. Usually start with a base table,
           | query the first 100 rows to see what the data looks like,
           | start joining on other tables to get info I need, querying as
           | I go to check join conditions, perhaps build out some CTEs if
           | I need to do some more complex work, query those to check the
           | format of the data ... And so on.
           | 
           | It doesn't feel that different to any other programming in
           | that sense. Querying is printing.
        
           | Ma8ee wrote:
           | Trial and error is usually a bad idea in all kinds of
           | programming.
        
         | 392 wrote:
         | You may like PRQL, which gives a more composable-atoms based
         | approach. I find it far easier than SQL.
        
         | Winsaucerer wrote:
         | I feel like a foreigner in another land when I read your
         | comment and others like it. For as long as I can remember using
         | SQL, I can't remember ever finding it more difficult or
         | backwards than anything else I use.
         | 
         | That difference might go some way towards explaining why I
         | prefer a much more database heavy/thick approach to writing
         | apps than my peers.
        
           | htag wrote:
           | I learned SQL before I learned set theory. While learning set
           | theory I remember thinking "oh this notation is just SQL
           | backwards." Afterwards I began to find SQL much harder
           | because I realized there are so many ways to mathematically
           | ask for the same data, but SQL servers will computationally
           | arrive at the end differently and with very different
           | performance. This is a minor deal if you're just doing small
           | transactions on the database, because if you are dealing with
           | pages of 100 objects it's trivial to hit good-enough
           | performance benchmarks, even with a few joins.
           | 
           | I was first introduced to the issue of needing hyper
           | optimized SQL in ETL type tasks, dealing with very large
           | relational databases. The company switched to non-relational
           | database shortly after I left, and it was the first time I
           | professional witness someone make the switch and agreed that
           | it was obviously required for them. We were dealing with very
           | large batch operations every night, and our fortune 500
           | customers expected to have the newest data and to be able to
           | do Business Intelligence operations on the data every
           | morning. After acquiring bigger and bigger customers, and
           | collecting longer and longer histories of data, our DBA team
           | had exhausted every trick to get maximum performance from
           | SQL. I was writing BI sql scripts against this large pool of
           | SQL data to white-glove some high value customers, and
           | constantly had to ask people for help optimizing the sql. I
           | did this for a year at the beginning of my career, before
           | deciding to move cities for better opportunities.
           | 
           | Lately, I've began seeing the requirements of high
           | performance SQL again with the wave of microservice
           | architectures. The internal dependency chain, even of what
           | would have been a mid size monolith project a decade ago, can
           | be huge. If your upstream sets a KBI of a response time, it's
           | likely you'll get asked to reduce your response time if your
           | microservice takes up more than a few percentage points of
           | the total end to end time. Often, if you are using relational
           | SQL with an ORM you can find performance increases in your
           | slowest queries by hand writing the SQL. Many ORMs have a
           | really good library for generating sql queries they expose to
           | users, but almost all ORMs will allow you to write a direct
           | sql query or call a stored procedure. The trick to getting
           | performance gains is to capture the SQL your ORM is
           | generating and show it to the best sql expert that will agree
           | to help you. If they can write better SQL than the ORM
           | generated than incorporate it into your app and have the SQL
           | expert and a security expert on the PR. You might also need
           | to do a SQL migration to modify indexes.
           | 
           | So in summary, I think your experiences with SQL depends
           | heavily on your mathematical background and your professional
           | experience. It's important to look at SQL as computational
           | steps to reach your required data and not simply as a way to
           | describe the data you would like the SQL server to give you.
        
             | fifilura wrote:
             | Was this before BigQuery/Presto/Trino? To me it seems like
             | those technologies would have been a good fit.
             | 
             | They don't really work with indexes but instead regular
             | files stored in partitions (where date is typically one of
             | them).
             | 
             | This means that they only have to worry about the data
             | (e.g. dates) that you are actually querying. And they scale
             | up to the number of CPUs that particular calculation needs.
             | They rarely choke on big query sizes. And big tables are
             | not really an issue as long as you query only the
             | partitions you need.
        
               | htag wrote:
               | Those technologies were brand new at the time, the
               | discussions about the problem started in 2013. The
               | company (I had zero input) choose a more established
               | vendor with an older product. Given the time and
               | institutional customers that were trusting us with their
               | data, I suspect any cloud based offerings were a
               | nonstarter, and open source felt like a liability.
               | 
               | Of course with 20/20 hindsight that decision is easy to
               | criticize. I suspect their primary concerns were to
               | minimize risk and costs while meeting our customer's
               | requirements. Even today, making a brand new Google
               | product or Facebook backed open source project a hard
               | dependency would be too much risk for an established
               | business.
        
           | naasking wrote:
           | > I feel like a foreigner in another land when I read your
           | comment and others like it. For as long as I can remember
           | using SQL, I can't remember ever finding it more difficult or
           | backwards than anything else I use
           | 
           | Learn linq or query/list comprehensions and then you'll
           | easily see why SQL is backwards.
        
             | Izkata wrote:
             | I've been using Django almost as long as I've been using
             | SQL and I prefer the SQL ordering more: it matches the rest
             | of the code, making it faster/easier to read. As a crude
             | example:                 SELECT results FROM source WHERE
             | criteria            results = source(criteria)
             | 
             | It's rare to see someone want to change assignments in code
             | to be like:                 source(criteria) -> results
             | 
             | Where I see it as the same thing: the SELECT columns, like
             | the variable assignment, are the interaction point with the
             | following lines of code.
             | 
             | And yes, CTE ordering does annoy me because of this.
             | Putting it in the middle is pretty much the worst order.
        
               | naasking wrote:
               | > Where I see it as the same thing: the SELECT columns,
               | like the variable assignment, are the interaction point
               | with the following lines of code.
               | 
               | Indeed, which is why source(criteria) -> results makes
               | more sense: the results definition is right next to the
               | code that's going to be using that definition. If you put
               | the results definition first as with SQL, then you have
               | to scroll up to find the context (although perhaps
               | Python's indentation sensitivity is the tripping point in
               | this case). Not even mentioning the fact that the SQL way
               | completely destroys any chance of code completion.
               | 
               | I'm going to boldly state that the SQL way is literally
               | _objectively_ wrong, in that there is no world in which
               | SQL 's choice is superior for general querying.
        
               | Izkata wrote:
               | Then why are you advocating for it?
               | 
               | > or query/list comprehensions
               | 
               | List comprehensions _are_ column first.
        
           | sodapopcan wrote:
           | I agree. I never even thought about "select what you want
           | first" as a problem until someone else pointed out.
           | 
           | Programmers seem far too sensitive about wanting everything
           | to work one way. SQL is a very powerful DSL. It has its
           | quirks but nothing that ever enraged me. I don't really care
           | that it doesn't work like some other stuff I use, I just
           | accept that I'm learning the language of a particular domain.
           | This doesn't mean that I don't think there is always room for
           | improvement. Of course I think FROM first would be a little
           | nicer, but so much nicer that I think its worth changing a
           | whole battle-tested standard? Not at all. The pain is so
           | minimal I don't even feel it.
        
             | refset wrote:
             | The actual ISO standard falls well short of being
             | useful/sufficient to anyone who isn't an incumbent player.
             | It's effectively a moat and therefore a direct impediment
             | to competition from teams who have novel technical ideas
             | but don't have access to significant capital - building a
             | SQL implementation is a long, expensive journey. This is
             | why many startups resort to building Postgres extensions,
             | or using Calcite or DataFusion.
             | 
             | If SQL weren't so (needlessly) complex we would see much
             | more competition across the database space.
        
             | mycall wrote:
             | LINQ runs with FROM being first. Definitely trivial
             | difference but a bit easier.
        
         | dmead wrote:
         | Saying what you want first rather than what you have is
         | evidence of the von Neumann bottleneck or it was a sign of the
         | times when SQL was being developed on 1970s machine.
         | 
         | Either way, point taken that it is not like a proof.
        
           | ako wrote:
           | Covey's: "start with the end in mind" is not a bad advise
           | when building something complex. With procedural languages
           | you do the same, you first write the signature, parameters
           | expected to go in and out, and then you start writing the way
           | to achieve this.
        
       | zvmaz wrote:
       | I tried to study C. J. Date's books to understand relational
       | theory... suffice it to say that I got nothing from his books,
       | except a deep irritation partly due to his absolute pedantry...
       | 
       | I finally learned SQL with a gentle introduction by Alan
       | Beaulieu. I stumbled upon another book that's about the theory:
       | Applied Mathematics for Database Professionals, by Lex deHaan,
       | and Toon Koppelaars. Maybe these authors will benevolently teach
       | me relational theory.
       | 
       | But please avoid C. J. Date's books. And don't be him when
       | writing a book or trying to explain something to another human
       | being.
        
       | exabrial wrote:
       | I'd much rather deal with the peculiarities of SQL than any of
       | the attempted replacements (ones I've seen in my minted
       | experience). Elastic for instance, other json based languages,
       | are absolutely terrible. We lost something we when stopped
       | writing ANSI standards.
       | 
       | We've even stayed on InfluxDB og versions _because of _ the SQL
       | like syntax, and also their improved languages are a nuclear
       | disaster area.
       | 
       | SQL, despite its flaws (null != null) is pretty good enough!
        
       | keid wrote:
       | See C.J. Date's "An Introduction to Database Systems,"
       | https://www.amazon.com/Introduction-Database-Systems-8th/dp/...
       | This is not news.
        
       | scythmic_waves wrote:
       | This is a great write up. There appear to be a few camps forming
       | in the comments and I'm in camp "SQL is confusing and attempts to
       | explain it in terms of relational algebra have felt inadequate to
       | me".
       | 
       | It also gives me some good follow up material to read. I'm
       | particularly interested in that one link that forms subqueries
       | and lateral joins in terms of a new "dependent join" operator.
        
         | zer00eyz wrote:
         | Go read: Database Design for Mere Mortals.
         | 
         | ERD's are your friend. Learn how to generate one, and how to
         | read it.
         | 
         | The relations (not relational, not algebra) are IN the design
         | they are IN the ERD (as a tool to visualize). Even if your not
         | visual thinker the ERD might help you find a path between two
         | distant tables.
         | 
         | Needing a subquery is rare. It happens but a lot of subqueries
         | would be better off as joins. The moment you grasp the design
         | of something you're less likely to want to sub query.
         | 
         | Explain is your friend. Reading an explain plan is going to
         | give you some good insight into what is going on UNDER the
         | hood. Not only will it help you tune slow queries but it is
         | more insight into how large queries decompose.
         | 
         | Lastly, there is nothing worse than having to query a badly
         | designed DB. If you do a shit job on the first part everything
         | else is going to be painful.
        
       | achr2 wrote:
       | You should look at LINQ in C#/.net . The SQL-like syntax always
       | has a function-first equivalent, that gets across this point
       | fairly eloquently.
        
       | qazxcvbnm wrote:
       | As someone who has implemented a composable SQL generator from
       | user-defined algebras of (arbitrary SQL) queries using relational
       | algebra, I understand the shortcomings of SQL when viewed from an
       | angle of a _relational_ _query_ language.
       | 
       | However, SQL is a language with many facets (DML, DDL, DCL) other
       | than 'relational' querying. Putting on a less mathematical and
       | more engineering mindset, SQL ingratiates me by its interface to
       | incredibly powerful primitives difficult to find anywhere else.
       | (I've primarily worked with Postgres SQL)
       | 
       | Consider the humble function; in SQL
       | https://www.postgresql.org/docs/current/sql-createfunction.h...,
       | one can declare the function as `stable` or `immutable` to let
       | the runtime optimise repeated calls; as `parallel` to let the
       | runtime consider parallelisation, as `cost ...` and `rows ...` to
       | aid optimiser cost estimation. Imagine if one could do that in C
       | or Javascript!
       | 
       | Another facet which regularly puts me in awe is the transaction
       | isolation primitives and locking primitives offered by SQL.
       | 
       | I understand that as a database language, SQL necessarily has
       | these within its specialised niche, but it seems to me these
       | aspects of SQL as an interface to a language runtime would be
       | equally useful in the everyday program; in all these areas of
       | functionality, SQL is so much more advanced than nearly every
       | other general purpose programming language.
        
       | barfbagginus wrote:
       | Can we call it syntactic ashtray? Because it feels like I'm
       | sucking on 1970s ashtray when I see or use it.
       | 
       | Those who have read their Spivak 2017 will know that databases
       | are just Co-presheaves of Ologs over the Kliesli Category of the
       | Power-Set Monad, the Identity Monad, or the Giry Monad. I would
       | like a QL that acts like it!
        
         | breezeTrowel wrote:
         | I know some of these words.
        
           | lkey wrote:
           | Snark of grandparent aside: https://arxiv.org/abs/1102.1889
           | if you want to read more.
        
       | xbar wrote:
       | Discussions of what is/is not syntactic sugar are unapproachable
       | for me because I cannot get past the abuse of sugar's essential
       | functions in the tortured metaphor.
        
       | jameshart wrote:
       | Not totally convinced by the ORDER BY obstacles that the author
       | raises..
       | table('test').project('a').orderBy('b')
       | 
       | > That's an error, because we can't order by a column that we
       | just projected away. Right?
       | 
       | assumes that 'projection' completely eliminates part of the
       | underlying relation, but why does that have to be the case?
       | 
       | If a relation includes 'selected fields' and 'hidden fields', and
       | project just 'hides' the fields it doesn't project, while orderBy
       | can operate on either projected or hidden fields, this ends up
       | being perfectly sound.
       | 
       | Even the more complex example which is translated as follows:
       | translate('select a+1 as c from test order by b,c')        =>
       | table('test').project('a','b').addColumn('a+1',
       | as='c').orderBy('b','c').project('a')
       | 
       | would work fine as:                  table('test')
       | // selected: [a, b, ...], hidden: []           .addColumn('a+1',
       | as='c')  // selected: [a, b, c, ...], hidden: []
       | .project('c')              // selected: [c], hidden: [a, b, ...]
       | .orderBy('b','c')          // selected: [c], hidden: [a, b, ...]
       | 
       | (not sure why there's a .project('a') on the end of their
       | version)
       | 
       | Which is a reasonably local, algebraic transformation.
        
       | samatman wrote:
       | If you're interested in what it would take to put relational
       | databases back on the rigorous footing of relational algebras,
       | the Third Manifesto is a good place to start.
       | 
       | https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf
       | 
       | I find it somewhat sad that an implementation of a database with
       | a proper D language hasn't broken out and become a ubiquitous
       | tool for the profession. There were some proprietary versions
       | shortly after the manifesto's publication, but it never caught
       | on.
        
       | aoeusnth1 wrote:
       | I find that most people who object to SQL do not use TVFs. If you
       | don't have any tools to easily break down the steps of the work,
       | of course SQL will feel like an opaque Write-only language. With
       | TVFs you can easily iteratively add more complex steps to your
       | query while checking your work while you build.
        
       | r00fus wrote:
       | That diagram separating the syntactic vs. semantic layers of a
       | SQL statement (from Neumann & Leis paper) is brilliant.
        
       | chubot wrote:
       | An analogy I like is - _Are Perl-style regexes (used in Python,
       | Ruby, Java, .NET, etc.) syntactic sugar for regular languages?_
       | 
       | The answer is no, because Perl added all sorts of imperative
       | doodads to regexes, which can't be easily represented and
       | executed in the automata-based paradigm. Trying to do this is
       | like a "research paper generator" (and not in a bad way), e.g.
       | 
       |  _Derivative Based Nonbacktracking Real-World Regex Matching with
       | Backtracking Semantics_ -
       | https://dl.acm.org/doi/abs/10.1145/3591262 (2023)
       | 
       | This is until Go and Rust, which used automata-based regexes from
       | the beginning. I don't think users have lost much.
       | 
       | Purely automata-based engines are kind of pleasant to write,
       | because almost everything is in the compiler, and not in the
       | runtime, e.g. https://github.com/andychu/rsc-
       | regexp/blob/master/py/README....
       | 
       | That is, features like ? + * really are syntactic sugar for
       | repetition. There's also a lot of syntax sugar around character
       | classes like [^a], and the runtime is very small.
       | 
       | ---
       | 
       | Likewise, SQL seems to have so many non-relational doodads in its
       | language design, which cause problems for implementers. In this
       | case, I think there's an incentive problem with SQL: It benefits
       | vendors if their dialect is harder to re-implement. Although
       | certainly they've added many useful features too in 4-5 decades!
       | 
       | To me a language design issue is we never really "learned" to
       | compose languages with different paradigms:
       | 
       | - the set-based paradigms like relational algebra and regular
       | languages, with
       | 
       | - Turing-machine like code. (and also I/O!)
       | 
       | We never learned polyglot programming, so each language becomes
       | its own source of "reckless growth" - its own parochial
       | backwater.
       | 
       | Both regexes and SQL should be able to "escape" to normal code,
       | and that would greatly simplify them. This can be done both by
       | language implementers and by application programmers, i.e.
       | "factoring" across languages. It's not always obvious how to do
       | this, but it certainly it can be done more than we do it today.
       | 
       | ---
       | 
       | I'd argue the same phenomenon - lack of language composition -
       | leads to programming languages within YAML. Github Actions is
       | nominally some kind of "declarative" scheduler specification, or
       | graph (job -> job dependencies), but that's not enough for many
       | problems.
       | 
       | So it also has a bunch of doodads for escaping that model (to the
       | extent it has a model).
       | 
       | Shell, Awk, and Make also grew many doodads
       | (https://www.oilshell.org/blog/2016/11/14.html), which are not
       | very well designed. They used to be declarative languages, but no
       | longer are.
       | 
       | Although there is some distinction between "formerly set-based
       | languages" like SQL and regex, and other "declarative" non-
       | Turing-complete languages. But I think the language composition
       | problem is approximately the same. Part of it is syntax, but a
       | lot of it is semantics.
       | 
       | (copy of lobste.rs comment)
        
       ___________________________________________________________________
       (page generated 2024-03-23 23:01 UTC)