[HN Gopher] SQL is syntactic sugar for relational algebra
___________________________________________________________________
SQL is syntactic sugar for relational algebra
Author : dmarto
Score : 193 points
Date : 2024-03-19 07:54 UTC (4 days ago)
(HTM) web link (www.scattered-thoughts.net)
(TXT) w3m dump (www.scattered-thoughts.net)
| roenxi wrote:
| I'm glad that the article concluded "No" to it's own headline.
| Calling SQL "syntactic sugar" is an insult to sugar. The "helpful
| diagram explaining how the scoping rules work" alone should make
| people blanch. The language is a syntactic disaster that we've
| been saddled with out of habit and inertia.
| solumunus wrote:
| I agree that SQL syntax is terrible, but for some potentially
| masochistic reason I still like writing it. It could certainly
| be done better though and I don't understand how there hasn't
| been more innovation in this space.
| IshKebab wrote:
| There's PRQL which is really nice. But it has pretty big
| downsides: it only supports querying data, not modifying it;
| and since it's not database-specific it doesn't support all
| the features of whatever database you're using. E.g. it can't
| access complex data types like structs/arrays in DuckDB.
| refset wrote:
| > I don't understand how there hasn't been more innovation in
| this space
|
| I think it's simply that most businesses and investors don't
| register SQL as having any real problems, and especially now
| with a resurgent interest in SQL the idea of attempting
| anything novel feels too risky.
|
| Shameless plug of one recent attempt to offer something
| different: XTQL https://docs.xtdb.com/intro/what-is-xtql.html
| account-5 wrote:
| Having been forced to write queries as JSON in elastic I
| definitely prefer writing SQL. I would actively avoid
| utilizing any query language where I have to count
| brackets.
| refset wrote:
| Handcrafting JSON is undoubtedly always a pain, but the
| idea with XTQL is rather that it can be easily generated
| from any regular programming language.
|
| > I would actively avoid utilizing any query language
| where I have to count brackets
|
| That's really an editor/tooling problem, solvable in many
| ways, but I guess a Python-like/Parinfer approach would
| be your preference? (where whitespace/indentation is
| significant)
| account-5 wrote:
| True enough, tooling is key. I'm not a developer but I
| think SQL is a decent language that's easy to pick up.
| But I also like xpath, regex, and jq syntax too, so maybe
| I'm the weird one.
| bradleyjg wrote:
| PG's blub essay isn't especially charitable. There's a good
| reason to want to stick with what you know and are
| comfortable with---climbing endless learning curves means not
| getting things done and it isn't simple to know ex ante what
| is worth the time
| refset wrote:
| Which is fine when everything is handcrafted and human-
| scale, but as soon as you start going down the path of
| machine generated SQL (both the generation and analysis
| thereof) the tax is non-trivial.
| masklinn wrote:
| It's not bad enough that it's worth actually trying to
| overturn, and the inertia from everything speaking SQL means
| there's a lot less pushback if you also do that.
|
| SQL actually killed alternatives back in the 80s (and 90s,
| Postgres used to use a dialect of Ingres's QUEL, the "SQL"
| bit of PostgreSQL specifically denotes the switch over from
| QUEL to SQL).
| bugbuddy wrote:
| Call a masochist but I love me good old SQL. I find most
| problems with systems I have worked with are the result of
| managements putting people who never paid attention in database
| classes in charge. Or maybe nobody really cares about doing a
| good job. In any case, my professors would have given all the
| legacy SQL databases that I have come across F-.
|
| The worse part is nobody dares changing anything because it
| would require application code changes. Don't blame SQL. Blame
| incompetent people doing SQL badly.
| b33j0r wrote:
| I didn't "get" SQL for the first 6-8 years of professional
| full-stack engineering. I was chasing, along with many in my
| cohort, the ORM abstraction dragon we thought we needed to
| work on j2ee, django, or sqlalchemy teams quickly.
|
| It's perplexing to me now why SQL was perplexing. Even
| knowing about relational theory and enough DBA to pass
| interviews, it somehow always seemed like we should have been
| able to treat databases like our OOP's.
|
| To think of the hours I wasted. Maybe I heard in a meeting
| "it's the join table" and another colleague said "just use
| the association table," and I was trying the n^2 debugging
| approach to solve it with model classes.
|
| The best advice I got on this made sense later: "I wouldn't
| even try to manage that [dataset] myself. Just figure out how
| to tell a database to do it--that's its only job."
| bitwize wrote:
| Understanding relational algebra helps you write better SQL
| the way understanding lambda calculus helps you write
| better JavaScript. The language is warty and but a shadow
| of the pristine purity underneath, but it gets the job done
| and the theory guides your reasoning on how to use the
| language.
| lkuty wrote:
| In particular, I find the closure property of SQL very
| helpful and always think in terms of queries gettings
| tables and producing tables (derived most of the time).
| https://en.wikipedia.org/wiki/Closure_(mathematics)
| AdrianB1 wrote:
| I really "got" SQL about 15 years after I started using it.
| The explanation is very simple: until that moment, SQL was
| a minor component for me in a big pie. Then one day I got
| into the position to troubleshoot some apps with >1 TB
| database behind them and I figured that SQL is more than
| simple CRUD operations. For most full stack developers that
| I know, SQL is still that minor component in the back of
| their mind and they are usually right.
| eek2121 wrote:
| Agreed, though disagree about the database class thing. I got
| a C in my class. I was a DBA at the time and definitely knew
| my shit. That was kind of the issue. He was teaching by the
| book with no regard to the database engine used. I was using
| SQL Server and MySQL daily.
|
| There was a fundamental disconnect. I approached a class
| project differently not only because I was a DBA, but I am
| neurodivergent. Because my design didn't align with his
| expectations, I got a C.
|
| I had a 4.0 GPA prior to that, but that made me realize I
| wasn't a fit for college.
|
| Good thing as well because I am debt free today.
| AdrianB1 wrote:
| I failed COBOL in college in a similar situation: I was
| professionally writing COBOL, my project was too divergent
| from what the teacher expected. I took that exam 3 times to
| barely pass. Good times, good times.
| doubloon wrote:
| i took compilers and prof wanted us to write lexer code
| by hand for a specific language. instead i wrote a
| generic lexer generator and fed in the language
| definition to generate the lexer code, got a bad grade
| for not following instructions. it seems kind of funny
| now but at the time i was rather cross.
| 20after4 wrote:
| I also got a C in my cs relational databases class. I also
| got more out of that class than just about any other in CS.
| I'm still kind of mad about it though. We had a group
| project assignment and our group had far-and-away the best
| project, of which I did 90% of the work. My bad grade came
| down to just one thing: the class required us to use Oracle
| and I refused on moral grounds.
| elcritch wrote:
| In my experience few CS students seem to take a relational
| databases course. It was optional in my program.
| Unfortunately, that means many of those databases were
| created by folks with no formal grounding in databases. Some
| folks pick it up on their own, but likely not as thoroughly
| as those who've taken a good course on the topic.
| default-kramer wrote:
| Yes, people who don't understand SQL databases can easily
| create "F-" situations. The problem is that people who are
| great at SQL databases can't do any better than a C+ because
| that's as good as SQL gets.
| randomdata wrote:
| _> Blame incompetent people doing SQL badly._
|
| Everything you spoke to is around people doing relational
| algebra badly. That has little to do with SQL. The same
| mistakes would lead to the same problems if the database
| used, say, QUEL instead.
|
| Perhaps it's not SQL that you love at all?
| sherburt3 wrote:
| Strongly disagree, complete newbies can get up and running with
| SQL very quickly and it's expressive enough that experienced
| people can do things that no ORM would ever hope of being able
| to do. One criticism I have though is that whoever thought
| "SELECT" should come before "FROM" should be fired.
| alexvoda wrote:
| And yet I agree with both. SQL is at the same time easy to
| get started, incredibly powerful and yet also a complete
| disaster. Just like JavaScript but with standards you have to
| pay to read and implementers take their sweet time to
| implement. ORMs (with the possible exception of JOOQ, whose
| open source version is unfortunately also very limited) are
| hopelessly limited and have been a drag on the development of
| the field ever since.
| swasheck wrote:
| i think "disaster" is hyperbole. my brain thinks in concert
| with sql so it feels natural, though i will concede that
| the parsing/syntax order differences was an odd choice.
|
| programmers and language users all have a mental
| translation layers that they use to go from
| conceptual->concrete using the syntax and idioms of their
| language of choice.
|
| javascript doesn't make sense in a lot of ways. calling it
| a disaster is hyperbole. go doesn't make sense in a lot of
| ways. it's not a disaster. unfamiliarity or disagreement
| with choices does not necessarily mean it's a disaster
|
| the only language with which i'm familiar that is a
| complete, unmitigated disaster is the english language, but
| i suppose that's outside of the scope of this conversation
| runlaszlorun wrote:
| > the only language with which i'm familiar that is a
| complete, unmitigated disaster is the english language
|
| Out of curiosity, are you a native speaker of english or
| did you learn it as a second language?
|
| I'm a native speaker and feel sorry for anyone who has to
| learn all its irregularities, etc. But the few times I've
| mentioned this to anyone who had to learn as a second
| language, they'd correct me and tell it was easy. Which
| baffled me.
|
| Obviously as a native speaker, I'm in no place to judge
| it.
| Mutjake wrote:
| I sort of feel the same...but on the other hand if you
| consider "delete from" exists also, it's not completely
| unsensible to consider you first tell what operation you're
| about to perform to the data. Would be nice to start with the
| source entity name for sure. Dunno what "select 1" would look
| like, I guess the from foo would be optional.
|
| Random saturday ramblings, sorry about that :-D
| williamdclt wrote:
| DELETE FROM is even worse. Accidentally/mindlessly press
| cmd+enter before you wrote the WHERE? Poof, data gone. Make
| it FROM ... DELETE!
|
| I also wish we needed to explicitly take locks. In
| PostgreSQL at least (I think other dialects/engines too),
| figuring out what operation will take what lock and
| potentially mean downtime is left as an exercise to the
| reader. Force me to write WITH EXCLUSIVE TABLE LOCK when I
| add a non nullable column with a default!
| Mutjake wrote:
| Well, to be fair "from foo delete" would do the same I
| suppose :-D Unless there'd be an explicit end to the
| statement to designate you really want to delete
| everything. Which might not be a bad idea. Or make "where
| ..." mandatory and bring in "delete_all" or "delete
| everything from foo" as a syntactic guardrail. This is
| equally implementable, whichever the order of "delete"
| and "from" would be.
| Izkata wrote:
| The MySQL client has an --i-am-a-dummy flag that won't
| let you run a DELETE (or UPDATE, I think) without a
| WHERE.
| ndriscoll wrote:
| If you're going to run commands that modify data directly
| on the cli, do it in a transaction so you can roll back.
| Also, start with `--` to make it a comment. Once you have
| the correct query and someone's checked your work, go
| back to the beginning of the line and remove the `--` so
| you can run it. It's also a good idea to write a select *
| or count first to check that things make sense, and then
| start your transaction, go up and modify your select into
| a delete/update, check affected rows looks good again,
| maybe do another select or two to check that data looks
| like you expect, commit.
| Izkata wrote:
| Fun fact: if you still have a MySQL (MyISAM) database,
| the transaction commands return success but don't
| actually function, so you can't roll back.
| antisthenes wrote:
| If you don't SELECT first before running the DELETE
| query, you shouldn't be anywhere near an IT job, let
| alone a production database.
| rileymat2 wrote:
| I like it starting with the desired output columns, often
| when I work with a db quite a bit and have common naming
| conventions that's all I need to know.
|
| Even writing queries I think in terms of outputs then build
| the query from there.
|
| I am blanking about the advantages of starting with from,
| people clearly share your view, but I have no idea why.
|
| Maybe it comes from my C++ background where return type comes
| first, then function name, then inputs.
| arzig wrote:
| Ability to tab complete columns in an interactive settings.
|
| If I'm in psql I can tab complete columns in the where
| clause but not in the select because I haven't actually
| given any information about what I'm selecting from yet.
| rileymat2 wrote:
| This feels like a tradeoff we make where a convenience
| while writing once harms long term readability?
|
| But that is a valid usecase!
| duped wrote:
| Imagine if methods on your C++ classes had to be called
| like method(object) instead of object.method(). While it's
| a meaningless syntactic difference, the latter makes
| autocomplete in an IDE possible.
| rileymat2 wrote:
| Sure, in that very small example, if you don't care
| internal about visibility.
|
| But method(object1, object2, object3, object4) is
| different than object.method().
| exabrial wrote:
| Completely agree with everything you said
|
| Reminds me of reading music actually... many first time
| noob's, mostly out of frustration, realize how sub optimal
| staff notation is and seek to write something better. In the
| end it's 'Good enough' and accept it's flaws because no
| rewrite has proven to be a 10x better solution.
| arrowsmith wrote:
| ... unless you're playing guitar, in which case tab
| notation works way better ;)
| dizhn wrote:
| More like Tab + already knowing the song.:)
| iamwil wrote:
| SQL seems to be the QWERTY of the data query world. It's
| far too entrenched.
| rileymat2 wrote:
| It just occurred to me, do you have wide denormalized table
| queries where tables represent entities more than relations?
|
| I think if so that might explain the difference in query
| structure.
| sherburt3 wrote:
| Yeah I maintain several BI reports and the codebase is
| about 20 years old, so multiple teams have taken creative
| liberties with the naming conventions for tables. Without
| intellisense I'd probably have gone insane by now.
| pkkm wrote:
| Agreed, I wish that SQL was more concise and organized more
| left-to-right, like shell pipelines or the F# pipe operator.
| But what are the alternatives? The only one I know is PRQL, and
| that doesn't seem to be getting popular (correct me if I'm
| wrong).
| AdrianB1 wrote:
| It is organized, especially if you write it correctly (split
| lines and indent). Minified, it is a nightmare, but with a
| proper layout it is a pleasure to read it. I just did a code
| review for a 20,000 LoC piece and it was quite a breeze after
| I improved the layout a bit.
| pkkm wrote:
| "Disaster" may be an exaggeration, but I think the syntax
| could be better, proper layout or not. In particular, I
| find that complex SQL queries often need to be read in a
| "spiral" order due to SQL's attempt to resemble natural
| language. I find PRQL's consistent left-to-right/top-to-
| bottom reading order a clear improvement.
| __mharrison__ wrote:
| Pandas and now Polars can replace a lot of SQL. I find that
| many of my students find these easier to build up and debug.
| listenallyall wrote:
| I can understand that the statement order can be awkward,
| such as choosing what to select before the source.
|
| But help me with "more concise". SQL has no boilerplate, its
| keywords are brief, you express the exact logic of the result
| set you're seeking. There are tons of SQL overlays or
| replacements embedded in programming languages or
| BI/reporting tools, and they are universally, more difficult
| to work with than straight SQL.
| temporarely wrote:
| There is a paper at the end of this link trail and that
| paper discusses everything that is wrong with SQL including
| concision.
|
| https://news.ycombinator.com/item?id=39798528
|
| et voila:
| https://www.cidrdb.org/cidr2024/papers/p48-neumann.pdf
|
| see their comments on WITHIN GROUP and redundant
| declaration of identifiers. As for boiler plate, SQL's turf
| war over plain English words as reserved names creates this
| interesting situation of boiler plate identifiers in SQL to
| disambiguate from reserved keywords.
| anon84873628 wrote:
| You may like: https://www.malloydata.dev/
| Scene_Cast2 wrote:
| My take is that SQL is a bit like C - very nice to know, but
| move off of it to a more "managed" environment for safety and
| sanity.
| williamdclt wrote:
| Definitely disagree on this one. ORMs are a leaky
| abstraction, they hide the wall you're going to bump into
| anyway, and they bring their own gotchas on top
| Scene_Cast2 wrote:
| I worked with two frameworks so far - SQLalchemy core and a
| much bigger "orchestration" one (dagster / airflow
| equivalent). SQLalchemy got me type checking, column name
| checking, and DB migrations versioning with Alembic.
|
| The bigger framework got me more static analysis and
| dependency graphs on top of that. Those features saved me
| lots of time and headaches (type checking, dependency
| graphs, etc). Mind you, the frameworks I worked with still
| allowed for raw SQL for those edge cases that still pop up.
| fudged71 wrote:
| https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...
| jameshart wrote:
| Syntactic sugar, in the sense that it is a syntax that is bad
| for you.
| KingOfCoders wrote:
| Everything is just syntactic sugar for something else. I'm
| syntactic sugar for the hydrogen atoms in my body.
| WWWWH wrote:
| And on some cases sugar sugar not syntactic sugar.
| lkey wrote:
| Your comment is both needlessly dismissive, and worse,
| incorrect. You cannot get a human being, much less the person
| that is you, by recursively applying a constant set of rewrite
| rules on unstructured hydrogen atoms.
|
| If you think to counter my assertion with "The standard model
| of particle physics and the big bang already did that, I'm here
| after all", then spare us both the trouble and don't reply. The
| particular arrangement of all known matter _and energy_ in the
| universe at t=0 is not a repeatable initial condition.
|
| Some rewriting systems are in fact Turing complete[1], and
| that's an interesting digression. However, it's far afield from
| the article's discussion of untangling the syntactic mess that
| is the SQL standard and bringing it closer in line with the
| standard expression of its _semantics_.
|
| [1]:
| https://www.sciencedirect.com/science/article/pii/0304397592...
| halayli wrote:
| Relational algebra IR is implemented in MonetDB and discussed in
| their paper. Definitely worth reading.
|
| Not trying to be picky but pure relational algebra doesn't map to
| SQL and IMO it's not a good idea to attempt to do that due to the
| fact that relational algebra treats tuples as mathematical sets
| (ordering/uniqueness matters) while SQL does not(and has to deal
| with nullability).
| infogulch wrote:
| SQL is pretty good all things considered.
|
| But I've always looked out for languages that can represent
| relational algebra concepts more directly. Maybe CozoaDB is
| close, though still immature. Any recommendations?
| markisus wrote:
| I've been using Pandas which exposes a python slicing syntax
| for manipulating relational data. It also has a builtin join()
| function.
|
| "select id, date from orders" is orders["id", "date"].
|
| It's meant for in-memory datasets but the syntax could be
| extended to work for other backends. I'm not sure if anyone is
| working on that.
| __mharrison__ wrote:
| Ibis takes the notion of a dataframe and abstracts it from
| SQL backends.
| samatman wrote:
| I highly recommend the Third Manifesto. I could link this under
| most posts in this thread but I'll limit myself to two.
|
| https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf
|
| The only problem there is that you might want to _use_ a D
| language, and well. You can 't. There was a product called
| Dataphor which one can find some writeups on but, baffling
| though I find this, there are no robust open-source relational
| databases which use a D language.
| triska wrote:
| Codd's seminal paper, _A Relational Model of Data for Large
| Shared Data Banks_ , states that a language based on applied
| _predicate calculus_ "would provide a yard-stick of linguistic
| power for all other proposed data languages". Quoting from
| https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf:
|
| " _1.5 Some linguistic aspects
|
| The adoption of a relational model of data, as described above,
| permits the development of a universal data sub-language based on
| an applied predicate calculus. A first-order predicate calculus
| suffices if the collection of relations is in normal form. Such a
| language would provide a yard-stick of linguistic power for all
| other proposed data languages, and would itself be a strong
| candidate for embedding (with appropriate syntactic modification)
| in a variety of host languages (programming, command- or problem-
| oriented)._"
|
| Languages based on predicate calculus indeed seem extremely
| suitable for reasoning about relational data. Datalog is a well-
| known example. It is more directly based on predicate logic, and
| much simpler than SQL.
| refset wrote:
| > Lest you think is just one weird corner of the sql spec, I
| found this helpful diagram explaining how the scoping rules work
| (from Neumann and Leis, 2023)
|
| It's an excellent diagram, it really conveys the dissonance.
| Incidentally I interviewed Viktor Leis on a podcast last week
| about the paper where it's from: https://juxt.pro/blog/sane-
| query-languages-podcast/
|
| A lot of people seem to believe that LLMs or other ML methods can
| overcome the complexity challenges of generating SQL accurately,
| but I'm yet to be convinced that a database-powered AI revolution
| can happen without somehow bypassing SQL.
| joking wrote:
| A few tweaks here and there and it would be nice enough for me.
| Most of them are actually implemented by some engines but are not
| part of the standard. Just changing the order of the from and
| select clauses so autocomplete can know what fields can you use
| would be a nice enough change.
| samsquire wrote:
| Thanks for this interesting post.
|
| Intuitively, relational algebra compresses enumeration over data
| in time that a CPU executing billions of cycles a second can
| feasibly and efficiently traverse and execute against many
| collections of millions or billions of records in human
| perceivable time thanks to indexes.
|
| I've been trying to think of systems communicating with eachother
| as parts of a relational model in the sense we can model system
| behaviour as a series of events and a join is a communication
| between components.
|
| I would love to talk about this with people.
| refset wrote:
| > a join is a communication between components
|
| Makes me wonder just how far people have pushed Foreign Data
| Wrappers in practice.
| seanhunter wrote:
| Some years back I spent a weekend and built a "good enough to
| prove to myself" version of this in Haskell that implemented
| the main relational operations of projection, selection and
| cartesian product.
|
| The basic idea was based on the "stream fusion" papers. So the
| relation was a stream in the stream fusion sense so it was
| pretty trivial to implement the normal relational operators in
| that paradigm. Changing this type of system to work on "events"
| as input would be pretty trivial.
|
| The one thing I never managed to get to work was the actual
| "fusion" compiler hint. I kept trying variants of what he did
| in the paper but ghc just refused to optimise my stream/unsteam
| unstream/stream pairs away because it had already done some
| rewriting to them. I couldn't figure out how to apply the
| optimisation early enough to be effective.
|
| [1] Which are a fantastic read if you're into CS whatever you
| think of my idea
| https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/str...
| and
| https://www.researchgate.net/publication/220802863_Rewriting...
| samsquire wrote:
| Thank you for sharing your thoughts and ideas Sean,
| appreciated.
|
| I am unfamiliar with the source material and I have recorded
| the paper you linked to my reading list.
|
| From a description of "stream fusion", it reminds me of
| Clojure's "transducers".
|
| It also reminds me of Kafka's Table/Stream duality.
|
| Term rewriting is something is really interesting to me.
|
| Communication, protocols and Communicating sequential
| processes, session types are all ideas I am thinking about
| and trying to understand.
| ndriscoll wrote:
| When working on a system that was overly split into
| microservices, I once wrote a leftJoin stream operation for
| Akka, partly to demonstrate how stupid it was that we had split
| our database apart and now needed to implement a database in
| our application using http calls, with 1000x worse performance.
| bkanuka wrote:
| As someone who learned mathematics first and programming later, I
| think it took me about 10 years of working in data-intensive
| programming before I could write really "good" SQL from scratch.
|
| I completely attribute this to SQL being difficult or "backwards"
| to parse. I mean backwards in the way that in SQL you start with
| what you want first (the SELECT) rather than what you have and
| widdling it down. Also in SQL (as the author states) you often
| need to read and understand the structure of the database before
| you can be 100% sure what the query is doing. SQL is very
| difficult to parse into a consistent symbolic language.
|
| The turning point for me was to just accept SQL for what it is.
| It feels overly flexible in some areas (and then comparatively
| ridgid in other areas), but instead of fighting against this or
| trying to understand it as a consistent, precise language , I
| instead just go "oh SQL - you are not like the other programming
| languages I use but you can do some pretty neat stuff so we can
| be on good terms".
|
| Writing good SQL involves understanding the database,
| understanding exactly the end result you want, and only then
| constructing the subqueries or building blocks you need to get to
| your result. (then followed by some trial and error of course)
| mrits wrote:
| I always thought writing SQL from scratch was the easy part.
| The hard part for me was coming back to my query a few weeks
| later
| arrowsmith wrote:
| This is true for most programming languages.
| Winsaucerer wrote:
| That's why I try (but sometimes forget) to extensively
| comment my queries that have any kind of complexity :)
| mdcurran wrote:
| This doesn't totally solve the issue of SELECT'ing first then
| filtering, but for complex queries I've found CTEs very useful
| (whenever the database/SQL dialect supports it).
| icedchai wrote:
| What I usually do is start with "select *", get the joins and
| where clause down, then refine the select.
| thaumasiotes wrote:
| > widdling it down
|
| Whittling. It means to carve something out of wood, with a
| metaphorical extension, as here, to gradually making something
| smaller by shaving bits of it away.
| Strang wrote:
| Important distinction. "Widdling" is urination.
| whatever1 wrote:
| The trial and error is the worst part.
|
| In traditional languages, you can print iteration by iteration
| the intermediate result and understand if there is something
| wrong.
|
| In SQL you sample output, and you keep changing the query until
| you think you get it right. And then 2 years later someone else
| finds that the query was wrong all this time.
| fifilura wrote:
| > you can print iteration by iteration the intermediate
| result
|
| You would not be able to do that with a multi-threaded/multi-
| process application.
|
| And this is the reason why e.g. Trino/Presto is so powerful
| together with SQL.
|
| Instead of telling the computer how to go by to get your
| result, you tell it what result you want and let it do it in
| the best way.
|
| The most up-front way of telling a computer "how" is a for-
| loop. And SQL does not have it. It may seem limiting, but
| avoiding explicit for loops gives the freedom to the
| computer. If it sees it fit to distribute that calculation
| over 200 distributed CPUs it can do that. With an imperative
| language you need to tell the computer exactly how it should
| distribute it. And from there it gets really hairy.
| whatever1 wrote:
| In development I don't need it to be multi-threaded. 1
| thread is fine, as long as I can explain, step-by-step, how
| the calculations produced the output.
| fifilura wrote:
| If you don't need threads in development OR production,
| you might as well do SELECT * from users and do the join
| in your imperative code.
|
| If you need threads in production I think you will end up
| getting rid of your for loops anyway (or possibly, if you
| really want to, end up in a mutex/semaphore quagmire).
|
| I must say, though, that there are other benefits with a
| declarative approach than just avoiding threading issues.
| But I guess it takes some getting used to.
|
| I would say that the same "I cant step through my code"
| argument also goes for functional style code.
| Ma8ee wrote:
| > If you don't need threads in development OR production,
| you might as well do SELECT * from users and do the join
| in your imperative code.
|
| Except that it most likely will be orders of magnitude
| slower. Most databases are very good at what they are
| doing.
| fifilura wrote:
| Yes. Kind of my point to. But the OP missed the
| possibility to step through the code.
| jalk wrote:
| Common Table Expressions (CTE) do help a little, as you can
| query each "table" and inspect the output. Debugging a giant
| query with deeply nested sub queries is very painful indeed
| Ma8ee wrote:
| So do table variables and temp tables.
| r00fus wrote:
| I mean, I never build a query from front to back. Usually I
| build it FROM -> JOIN -> WHERE -> SELECT.
| mycall wrote:
| Start off with SELECT * then once the joins are working,
| filter * down to the essentials.
| remus wrote:
| > The trial and error is the worst part.
|
| I don't know about anyone else, but I do this kinda naturally
| when writing SQL queries. Usually start with a base table,
| query the first 100 rows to see what the data looks like,
| start joining on other tables to get info I need, querying as
| I go to check join conditions, perhaps build out some CTEs if
| I need to do some more complex work, query those to check the
| format of the data ... And so on.
|
| It doesn't feel that different to any other programming in
| that sense. Querying is printing.
| Ma8ee wrote:
| Trial and error is usually a bad idea in all kinds of
| programming.
| 392 wrote:
| You may like PRQL, which gives a more composable-atoms based
| approach. I find it far easier than SQL.
| Winsaucerer wrote:
| I feel like a foreigner in another land when I read your
| comment and others like it. For as long as I can remember using
| SQL, I can't remember ever finding it more difficult or
| backwards than anything else I use.
|
| That difference might go some way towards explaining why I
| prefer a much more database heavy/thick approach to writing
| apps than my peers.
| htag wrote:
| I learned SQL before I learned set theory. While learning set
| theory I remember thinking "oh this notation is just SQL
| backwards." Afterwards I began to find SQL much harder
| because I realized there are so many ways to mathematically
| ask for the same data, but SQL servers will computationally
| arrive at the end differently and with very different
| performance. This is a minor deal if you're just doing small
| transactions on the database, because if you are dealing with
| pages of 100 objects it's trivial to hit good-enough
| performance benchmarks, even with a few joins.
|
| I was first introduced to the issue of needing hyper
| optimized SQL in ETL type tasks, dealing with very large
| relational databases. The company switched to non-relational
| database shortly after I left, and it was the first time I
| professional witness someone make the switch and agreed that
| it was obviously required for them. We were dealing with very
| large batch operations every night, and our fortune 500
| customers expected to have the newest data and to be able to
| do Business Intelligence operations on the data every
| morning. After acquiring bigger and bigger customers, and
| collecting longer and longer histories of data, our DBA team
| had exhausted every trick to get maximum performance from
| SQL. I was writing BI sql scripts against this large pool of
| SQL data to white-glove some high value customers, and
| constantly had to ask people for help optimizing the sql. I
| did this for a year at the beginning of my career, before
| deciding to move cities for better opportunities.
|
| Lately, I've began seeing the requirements of high
| performance SQL again with the wave of microservice
| architectures. The internal dependency chain, even of what
| would have been a mid size monolith project a decade ago, can
| be huge. If your upstream sets a KBI of a response time, it's
| likely you'll get asked to reduce your response time if your
| microservice takes up more than a few percentage points of
| the total end to end time. Often, if you are using relational
| SQL with an ORM you can find performance increases in your
| slowest queries by hand writing the SQL. Many ORMs have a
| really good library for generating sql queries they expose to
| users, but almost all ORMs will allow you to write a direct
| sql query or call a stored procedure. The trick to getting
| performance gains is to capture the SQL your ORM is
| generating and show it to the best sql expert that will agree
| to help you. If they can write better SQL than the ORM
| generated than incorporate it into your app and have the SQL
| expert and a security expert on the PR. You might also need
| to do a SQL migration to modify indexes.
|
| So in summary, I think your experiences with SQL depends
| heavily on your mathematical background and your professional
| experience. It's important to look at SQL as computational
| steps to reach your required data and not simply as a way to
| describe the data you would like the SQL server to give you.
| fifilura wrote:
| Was this before BigQuery/Presto/Trino? To me it seems like
| those technologies would have been a good fit.
|
| They don't really work with indexes but instead regular
| files stored in partitions (where date is typically one of
| them).
|
| This means that they only have to worry about the data
| (e.g. dates) that you are actually querying. And they scale
| up to the number of CPUs that particular calculation needs.
| They rarely choke on big query sizes. And big tables are
| not really an issue as long as you query only the
| partitions you need.
| htag wrote:
| Those technologies were brand new at the time, the
| discussions about the problem started in 2013. The
| company (I had zero input) choose a more established
| vendor with an older product. Given the time and
| institutional customers that were trusting us with their
| data, I suspect any cloud based offerings were a
| nonstarter, and open source felt like a liability.
|
| Of course with 20/20 hindsight that decision is easy to
| criticize. I suspect their primary concerns were to
| minimize risk and costs while meeting our customer's
| requirements. Even today, making a brand new Google
| product or Facebook backed open source project a hard
| dependency would be too much risk for an established
| business.
| naasking wrote:
| > I feel like a foreigner in another land when I read your
| comment and others like it. For as long as I can remember
| using SQL, I can't remember ever finding it more difficult or
| backwards than anything else I use
|
| Learn linq or query/list comprehensions and then you'll
| easily see why SQL is backwards.
| Izkata wrote:
| I've been using Django almost as long as I've been using
| SQL and I prefer the SQL ordering more: it matches the rest
| of the code, making it faster/easier to read. As a crude
| example: SELECT results FROM source WHERE
| criteria results = source(criteria)
|
| It's rare to see someone want to change assignments in code
| to be like: source(criteria) -> results
|
| Where I see it as the same thing: the SELECT columns, like
| the variable assignment, are the interaction point with the
| following lines of code.
|
| And yes, CTE ordering does annoy me because of this.
| Putting it in the middle is pretty much the worst order.
| naasking wrote:
| > Where I see it as the same thing: the SELECT columns,
| like the variable assignment, are the interaction point
| with the following lines of code.
|
| Indeed, which is why source(criteria) -> results makes
| more sense: the results definition is right next to the
| code that's going to be using that definition. If you put
| the results definition first as with SQL, then you have
| to scroll up to find the context (although perhaps
| Python's indentation sensitivity is the tripping point in
| this case). Not even mentioning the fact that the SQL way
| completely destroys any chance of code completion.
|
| I'm going to boldly state that the SQL way is literally
| _objectively_ wrong, in that there is no world in which
| SQL 's choice is superior for general querying.
| Izkata wrote:
| Then why are you advocating for it?
|
| > or query/list comprehensions
|
| List comprehensions _are_ column first.
| sodapopcan wrote:
| I agree. I never even thought about "select what you want
| first" as a problem until someone else pointed out.
|
| Programmers seem far too sensitive about wanting everything
| to work one way. SQL is a very powerful DSL. It has its
| quirks but nothing that ever enraged me. I don't really care
| that it doesn't work like some other stuff I use, I just
| accept that I'm learning the language of a particular domain.
| This doesn't mean that I don't think there is always room for
| improvement. Of course I think FROM first would be a little
| nicer, but so much nicer that I think its worth changing a
| whole battle-tested standard? Not at all. The pain is so
| minimal I don't even feel it.
| refset wrote:
| The actual ISO standard falls well short of being
| useful/sufficient to anyone who isn't an incumbent player.
| It's effectively a moat and therefore a direct impediment
| to competition from teams who have novel technical ideas
| but don't have access to significant capital - building a
| SQL implementation is a long, expensive journey. This is
| why many startups resort to building Postgres extensions,
| or using Calcite or DataFusion.
|
| If SQL weren't so (needlessly) complex we would see much
| more competition across the database space.
| mycall wrote:
| LINQ runs with FROM being first. Definitely trivial
| difference but a bit easier.
| dmead wrote:
| Saying what you want first rather than what you have is
| evidence of the von Neumann bottleneck or it was a sign of the
| times when SQL was being developed on 1970s machine.
|
| Either way, point taken that it is not like a proof.
| ako wrote:
| Covey's: "start with the end in mind" is not a bad advise
| when building something complex. With procedural languages
| you do the same, you first write the signature, parameters
| expected to go in and out, and then you start writing the way
| to achieve this.
| zvmaz wrote:
| I tried to study C. J. Date's books to understand relational
| theory... suffice it to say that I got nothing from his books,
| except a deep irritation partly due to his absolute pedantry...
|
| I finally learned SQL with a gentle introduction by Alan
| Beaulieu. I stumbled upon another book that's about the theory:
| Applied Mathematics for Database Professionals, by Lex deHaan,
| and Toon Koppelaars. Maybe these authors will benevolently teach
| me relational theory.
|
| But please avoid C. J. Date's books. And don't be him when
| writing a book or trying to explain something to another human
| being.
| exabrial wrote:
| I'd much rather deal with the peculiarities of SQL than any of
| the attempted replacements (ones I've seen in my minted
| experience). Elastic for instance, other json based languages,
| are absolutely terrible. We lost something we when stopped
| writing ANSI standards.
|
| We've even stayed on InfluxDB og versions _because of _ the SQL
| like syntax, and also their improved languages are a nuclear
| disaster area.
|
| SQL, despite its flaws (null != null) is pretty good enough!
| keid wrote:
| See C.J. Date's "An Introduction to Database Systems,"
| https://www.amazon.com/Introduction-Database-Systems-8th/dp/...
| This is not news.
| scythmic_waves wrote:
| This is a great write up. There appear to be a few camps forming
| in the comments and I'm in camp "SQL is confusing and attempts to
| explain it in terms of relational algebra have felt inadequate to
| me".
|
| It also gives me some good follow up material to read. I'm
| particularly interested in that one link that forms subqueries
| and lateral joins in terms of a new "dependent join" operator.
| zer00eyz wrote:
| Go read: Database Design for Mere Mortals.
|
| ERD's are your friend. Learn how to generate one, and how to
| read it.
|
| The relations (not relational, not algebra) are IN the design
| they are IN the ERD (as a tool to visualize). Even if your not
| visual thinker the ERD might help you find a path between two
| distant tables.
|
| Needing a subquery is rare. It happens but a lot of subqueries
| would be better off as joins. The moment you grasp the design
| of something you're less likely to want to sub query.
|
| Explain is your friend. Reading an explain plan is going to
| give you some good insight into what is going on UNDER the
| hood. Not only will it help you tune slow queries but it is
| more insight into how large queries decompose.
|
| Lastly, there is nothing worse than having to query a badly
| designed DB. If you do a shit job on the first part everything
| else is going to be painful.
| achr2 wrote:
| You should look at LINQ in C#/.net . The SQL-like syntax always
| has a function-first equivalent, that gets across this point
| fairly eloquently.
| qazxcvbnm wrote:
| As someone who has implemented a composable SQL generator from
| user-defined algebras of (arbitrary SQL) queries using relational
| algebra, I understand the shortcomings of SQL when viewed from an
| angle of a _relational_ _query_ language.
|
| However, SQL is a language with many facets (DML, DDL, DCL) other
| than 'relational' querying. Putting on a less mathematical and
| more engineering mindset, SQL ingratiates me by its interface to
| incredibly powerful primitives difficult to find anywhere else.
| (I've primarily worked with Postgres SQL)
|
| Consider the humble function; in SQL
| https://www.postgresql.org/docs/current/sql-createfunction.h...,
| one can declare the function as `stable` or `immutable` to let
| the runtime optimise repeated calls; as `parallel` to let the
| runtime consider parallelisation, as `cost ...` and `rows ...` to
| aid optimiser cost estimation. Imagine if one could do that in C
| or Javascript!
|
| Another facet which regularly puts me in awe is the transaction
| isolation primitives and locking primitives offered by SQL.
|
| I understand that as a database language, SQL necessarily has
| these within its specialised niche, but it seems to me these
| aspects of SQL as an interface to a language runtime would be
| equally useful in the everyday program; in all these areas of
| functionality, SQL is so much more advanced than nearly every
| other general purpose programming language.
| barfbagginus wrote:
| Can we call it syntactic ashtray? Because it feels like I'm
| sucking on 1970s ashtray when I see or use it.
|
| Those who have read their Spivak 2017 will know that databases
| are just Co-presheaves of Ologs over the Kliesli Category of the
| Power-Set Monad, the Identity Monad, or the Giry Monad. I would
| like a QL that acts like it!
| breezeTrowel wrote:
| I know some of these words.
| lkey wrote:
| Snark of grandparent aside: https://arxiv.org/abs/1102.1889
| if you want to read more.
| xbar wrote:
| Discussions of what is/is not syntactic sugar are unapproachable
| for me because I cannot get past the abuse of sugar's essential
| functions in the tortured metaphor.
| jameshart wrote:
| Not totally convinced by the ORDER BY obstacles that the author
| raises..
| table('test').project('a').orderBy('b')
|
| > That's an error, because we can't order by a column that we
| just projected away. Right?
|
| assumes that 'projection' completely eliminates part of the
| underlying relation, but why does that have to be the case?
|
| If a relation includes 'selected fields' and 'hidden fields', and
| project just 'hides' the fields it doesn't project, while orderBy
| can operate on either projected or hidden fields, this ends up
| being perfectly sound.
|
| Even the more complex example which is translated as follows:
| translate('select a+1 as c from test order by b,c') =>
| table('test').project('a','b').addColumn('a+1',
| as='c').orderBy('b','c').project('a')
|
| would work fine as: table('test')
| // selected: [a, b, ...], hidden: [] .addColumn('a+1',
| as='c') // selected: [a, b, c, ...], hidden: []
| .project('c') // selected: [c], hidden: [a, b, ...]
| .orderBy('b','c') // selected: [c], hidden: [a, b, ...]
|
| (not sure why there's a .project('a') on the end of their
| version)
|
| Which is a reasonably local, algebraic transformation.
| samatman wrote:
| If you're interested in what it would take to put relational
| databases back on the rigorous footing of relational algebras,
| the Third Manifesto is a good place to start.
|
| https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf
|
| I find it somewhat sad that an implementation of a database with
| a proper D language hasn't broken out and become a ubiquitous
| tool for the profession. There were some proprietary versions
| shortly after the manifesto's publication, but it never caught
| on.
| aoeusnth1 wrote:
| I find that most people who object to SQL do not use TVFs. If you
| don't have any tools to easily break down the steps of the work,
| of course SQL will feel like an opaque Write-only language. With
| TVFs you can easily iteratively add more complex steps to your
| query while checking your work while you build.
| r00fus wrote:
| That diagram separating the syntactic vs. semantic layers of a
| SQL statement (from Neumann & Leis paper) is brilliant.
| chubot wrote:
| An analogy I like is - _Are Perl-style regexes (used in Python,
| Ruby, Java, .NET, etc.) syntactic sugar for regular languages?_
|
| The answer is no, because Perl added all sorts of imperative
| doodads to regexes, which can't be easily represented and
| executed in the automata-based paradigm. Trying to do this is
| like a "research paper generator" (and not in a bad way), e.g.
|
| _Derivative Based Nonbacktracking Real-World Regex Matching with
| Backtracking Semantics_ -
| https://dl.acm.org/doi/abs/10.1145/3591262 (2023)
|
| This is until Go and Rust, which used automata-based regexes from
| the beginning. I don't think users have lost much.
|
| Purely automata-based engines are kind of pleasant to write,
| because almost everything is in the compiler, and not in the
| runtime, e.g. https://github.com/andychu/rsc-
| regexp/blob/master/py/README....
|
| That is, features like ? + * really are syntactic sugar for
| repetition. There's also a lot of syntax sugar around character
| classes like [^a], and the runtime is very small.
|
| ---
|
| Likewise, SQL seems to have so many non-relational doodads in its
| language design, which cause problems for implementers. In this
| case, I think there's an incentive problem with SQL: It benefits
| vendors if their dialect is harder to re-implement. Although
| certainly they've added many useful features too in 4-5 decades!
|
| To me a language design issue is we never really "learned" to
| compose languages with different paradigms:
|
| - the set-based paradigms like relational algebra and regular
| languages, with
|
| - Turing-machine like code. (and also I/O!)
|
| We never learned polyglot programming, so each language becomes
| its own source of "reckless growth" - its own parochial
| backwater.
|
| Both regexes and SQL should be able to "escape" to normal code,
| and that would greatly simplify them. This can be done both by
| language implementers and by application programmers, i.e.
| "factoring" across languages. It's not always obvious how to do
| this, but it certainly it can be done more than we do it today.
|
| ---
|
| I'd argue the same phenomenon - lack of language composition -
| leads to programming languages within YAML. Github Actions is
| nominally some kind of "declarative" scheduler specification, or
| graph (job -> job dependencies), but that's not enough for many
| problems.
|
| So it also has a bunch of doodads for escaping that model (to the
| extent it has a model).
|
| Shell, Awk, and Make also grew many doodads
| (https://www.oilshell.org/blog/2016/11/14.html), which are not
| very well designed. They used to be declarative languages, but no
| longer are.
|
| Although there is some distinction between "formerly set-based
| languages" like SQL and regex, and other "declarative" non-
| Turing-complete languages. But I think the language composition
| problem is approximately the same. Part of it is syntax, but a
| lot of it is semantics.
|
| (copy of lobste.rs comment)
___________________________________________________________________
(page generated 2024-03-23 23:01 UTC)