hngopher.com

       [HN Gopher] SQL language proposal: JOIN FOREIGN
       ___________________________________________________________________
        
       SQL language proposal: JOIN FOREIGN
        
       Author : JoelJacobson
       Score  : 142 points
       Date   : 2021-12-30 17:54 UTC (5 hours ago)
        
 (HTM) web link (gist.github.com)
 (TXT) w3m dump (gist.github.com)
        
       | jeff-davis wrote:
       | I'm not sure this is a good idea. It means that a constraint on
       | the data (a FK is a constraint) affects the semantics of the
       | query. In SQL, those two concepts are seperate.
        
       | unbanned wrote:
       | >This improved in SQL-92:
       | 
       | Did it?
        
         | JoelJacobson wrote:
         | Yes, I think so, at least according to:
         | https://learnsql.com/blog/history-of-sql-standards/
         | 
         | What makes you think otherwise?
        
       | antender wrote:
       | I, personally, feel like this is a rather pointless addition to
       | an already somewhat bloated language. It it isn't saving that
       | much typing (we have NATURAL JOIN's already, nobody is using
       | them), is kind of inconsistent with other language principles (we
       | should be able to join anything to anything if we need to, see
       | SQL-89) and limits JOIN flexibility (you don't have to join with
       | "=" operator, JOIN's supports different types of conditions and
       | are equivalent to WHERE clause)
        
       | thom wrote:
       | Of all the myriad indignities of SQL, this isn't near the top of
       | my list. I also don't like making the names of objects like
       | foreign keys and indexes first class concerns in your queries,
       | that's a whole new layer of cognitive overhead.
        
         | degenerate wrote:
         | Agree. I'd much prefer they worked on the way MySQL determines
         | what indexes to use on a multi-table join, so there could be
         | less emphasis on the need to ANALYZE TABLE and FORCE INDEX when
         | the DB is seemingly being dumb, but there's not enough
         | information in EXPLAIN to tell you where it's actually getting
         | tripped up. Troubleshooting full table scans on large sets of
         | data is a nightmare.
        
           | Tostino wrote:
           | Slightly different group of people working on MySQL vs this
           | proposal.
        
       | torgard wrote:
       | I dig it!
       | 
       | I prefer defining tables like this:                   CREATE
       | TABLE category (             id int GENERATED ALWAYS AS IDENTITY,
       | name text         );                  CREATE TABLE post (
       | id int GENERATED ALWAYS AS IDENTITY,             category_id int
       | REFERENCES category (id)                     ON DELETE CASCADE
       | );
       | 
       | That is, category.id rather than category.category_id. But the
       | USING clause doesn't work with that style, as far as I
       | understand.
       | 
       | This would make my queries nicer.
        
         | JoelJacobson wrote:
         | Much cleaner, I agree, that's a big win. Will add that to the
         | list of benefits.
         | 
         | Will also add a "Drawbacks / Remaining issues" section to the
         | Gist, from all the valuable comments so far in this thread,
         | thank you all, positive as well as negative comments, all very
         | helpful.
        
         | thom wrote:
         | I've been quite happy having fully prefixed column names for a
         | long time now. Makes joins easier, big views clearer, and
         | random exports more readable out of the box. Also in my case
         | it's also easier to line up unique column names with things
         | like Clojure specs but I accept that's a niche concern.
        
       | unbanned wrote:
       | Foreign keys exist for data integrity... something which is
       | rather opaque to query onterface... So why do you believe that
       | has place in a query?
        
         | JoelJacobson wrote:
         | Foreign keys both define relationships between tables, and also
         | enforce referential integrity. The discussion is about how we
         | could potentially mine various additional value from foreign
         | keys as an information resource, rather than just mostly being
         | about referential integrity.
        
       | hn_throwaway_99 wrote:
       | This is my "Thanks, I hate it" response.
       | 
       | Reason being if you use the example they gave:
       | 
       | SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM films f
       | JOIN FOREIGN f.films_did_fkey d
       | 
       | You need to _implicitly_ know the table that films_did_fkey
       | points to, because  'd' is just a table alias. I can't think of
       | anywhere else in the SQL standard where you can introduce a table
       | alias without explicitly referencing the table. In my opinion
       | making code essentially unreadable unless you have other
       | background information is an antipattern.
        
         | JoelJacobson wrote:
         | I think your concern is addressed by the idea further down in
         | proposal; giving the foreign keys the same names as the
         | referenced tables. This example was provided:
         | 
         | SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM films f
         | JOIN FOREIGN f.distributors d
        
           | hn_throwaway_99 wrote:
           | No, my concern is not that you _can_ also specify the table
           | name that explicitly states the name of both tables, it 's
           | that you have proposed a syntax where it is _possible_ to
           | leave off the explicit table name.
           | 
           | If you got rid of the syntax that uses the underlying "magic"
           | of needing to know which table the foreign key points to, I'd
           | be more amenable.
        
             | JoelJacobson wrote:
             | There are two asymmetrical cases; one of them needs two
             | pieces of information, the other three.
             | 
             | Having to always specify both tables would syntax wise be
             | redundant, but I agree it's worth considering it might be a
             | good thing to improve readability, which would help
             | especially in the case where the foreign key isn't/cannot
             | be given the same name as the referenced table.
        
             | mrweasel wrote:
             | Yeah, my concern is people in the real world. If you can
             | "magically" join on a foreign key and leave the table name
             | out, or still call the thing anything you want, some smart
             | ass will abuse the live crap out of it and I'll somehow be
             | left to deal with it.
             | 
             | I like things to be explicit. Tell me what you're joining
             | and how you want to join it. What is the use case for this?
             | Other than saving some typing, and let's face it, sometimes
             | a little extra typing now, will save you a lot of trouble
             | later. The proposal claims: "The idea is to improve the SQL
             | language", but is it really better?
        
               | tapas73 wrote:
               | First, very valid concern regarding table name.
               | 
               | > Other than saving some typing,
               | 
               | I would like to point out the corectness aspect of the
               | proposal.
               | 
               | Today foreign keys are enforced during inserts, updates,
               | deletes. (you just can't violate fk. this is so good) But
               | you can violate it in selects. (e.g. mistype column name,
               | forget to include a column)
               | 
               | This proposal (or its adjustment) would allow to use fk
               | also during select. It's like static typing for join
               | conditions.
        
           | tqi wrote:
           | 1. Just because you can doesn't mean people will. 2. That
           | doesn't work if you need to have multiple keys pointing to
           | the same table (ie owner_id and secondary_owner_id pointing
           | to a users table)
        
         | mritchie712 wrote:
         | yeah, this saves like 10 characters of text in exchange for
         | massive confusion.
        
           | JoelJacobson wrote:
           | In this simple example yes, but there can be multiple columns
           | in a join.
           | 
           | The syntax proposal i O(1) syntax wise compared to O(2n) for
           | a JOIN ON where n is the number of columns.
        
             | hn_throwaway_99 wrote:
             | But that "simple example" is like 95% plus of all my joins,
             | where I'm joining two tables on a foreign key.
        
               | JoelJacobson wrote:
               | Your data model must be a lot simpler than mine, congrats
               | ;)
        
               | unbanned wrote:
        
         | mjevans wrote:
         | In this example D is the target table, FK's natural
         | relationship could be aliased (specified multiple times under
         | different names).
        
         | ridaj wrote:
         | Another thing, this makes things harder to reverse.
         | 
         | Some engines have join-order-dependent performance, so there
         | are instances where you would want to write
         | 
         | ``` FROM orders LEFT JOIN customers USING (customer_id) ```
         | 
         | and others where you'd want to write
         | 
         | ``` FROM customers RIGHT JOIN orders USING (customer_id) ```
         | 
         | Swapping join order with current syntax is relatively easy
         | since references in the same FROM clause are interchangeable.
         | But in this proposal, the reference to the joined table isn't
         | written out, so it would be pretty complicated to reverse join
         | order.
        
         | jmull wrote:
         | Agreed. Maybe something like this instead:
         | SELECT f.title, f.did, d.name, f.date_prod, f.kind
         | FROM films f           JOIN distributors d ON FOREIGN
         | f.films_did_fkey
         | 
         | and                   SELECT f.title, f.did, d.name,
         | f.date_prod, f.kind           FROM distributors d
         | JOIN films f ON FOREIGN films_did_fkey
        
           | silon42 wrote:
           | +1 ... needed at least as an option where FK is not declared
        
           | magicalhippo wrote:
           | I had a similar suggestion the other day[1], except I chose
           | what to me sounded more English-sounding:
           | JOIN films f USING FOREIGN KEY
           | 
           | Here the explicit foreign key, films_did_fkey in this case,
           | could be specified after FOREIGN KEY. This would be similar
           | in syntax to when you force an index for a select statement,
           | at least in the DB we use.
           | 
           | [1]: https://news.ycombinator.com/item?id=29687565
        
             | zeroimpl wrote:
             | Yeah I think something like this is the way to go. The
             | table name needs to be listed in a consistent manner like
             | all other joins, and I don't think the syntax should be
             | different depending on which order the tables are listed
             | in.
             | 
             | With the above proposal, it seems the foreign key name
             | could be left out in the common case of there being only
             | one fkey between the two tables too.
        
               | magicalhippo wrote:
               | > With the above proposal, it seems the foreign key name
               | could be left out in the common case of there being only
               | one fkey between the two tables too.
               | 
               | Yes, this was indeed my intention. The common case with
               | only a single matching foreign key constraint would not
               | require being explicit, but one could be if needed in a
               | natural way.                   JOIN films f USING FOREIGN
               | KEY films_did_fkey
        
       | JoelJacobson wrote:
       | This is the same idea as already posted in this thread
       | https://news.ycombinator.com/item?id=29687134, but with a more
       | in-depth explanation that couldn't fit in the comment field, and
       | with some syntax improvements, such as using "FOREIGN", which is
       | a reserved keyword, instead of the previously suggested "->"
       | notation.
       | 
       | Thanks for all the valuable comments on last proposal. Excited to
       | hear what you think about this update.
        
       | nicoburns wrote:
       | I wish the SQL standard would add some basic syntax sugar, such
       | as support for trailing commas and allowing use of aliases in
       | more places.
        
       | [deleted]
        
       | btilly wrote:
       | Please, no. A common problem in data warehousing is that a design
       | with lots of foreign keys becomes slow to load. A standard
       | solution is to move the checks for referential integrity
       | elsewhere, then drop the foreign key constraint. This massively
       | improves load performance.
       | 
       | This syntax change means that this solution can't be used because
       | you have no idea what random queries out there might rely on the
       | specific existence of a foreign key constraint for the definition
       | of the query. Thereby meaning that if a foreign key constraint
       | becomes a performance problem, we're stuck with it rather than
       | having a solution.
       | 
       | Features have consequences. And I don't like the consequences of
       | making business rules that are now explicit in the query, be
       | instead implicit in the table design.
        
         | tapas73 wrote:
         | Lets have disabled FK then.
         | 
         | postgre, seems to not have it, but the proposal could include
         | also "disabled FK" part.
         | 
         | teradata, oracle, sql server already have option for FK
         | "disable/no check".
         | 
         | https://docs.teradata.com/r/eWpPpcMoLGQcZEoyt5AjEg/df1PvVh6e...
         | https://docs.microsoft.com/en-us/sql/relational-databases/ta...
         | https://docs.oracle.com/cd/B28359_01/server.111/b28310/gener...
        
         | zzzeek wrote:
         | this was my thought too. but now you need an ALTER CONSTRAINT
         | NON ENFORCING or something like that so that the "constraints"
         | can be present declaratively but don't actually get used.
         | 
         | which then looks a whole lot like you're just introducing
         | macros into SQL where you have some symbolic keywords that
         | expand out into pre-fabricated ON clauses.
        
         | JoelJacobson wrote:
         | Good point, but addressable:
         | 
         | Simply decouple the relationship definition and referential
         | integrity check, allowing a user to drop the referential
         | integrity check if desired, but keeping the relationship
         | definition.
         | 
         | I cannot see why you would not want to at least always store
         | the information a certain table/column(s) references some other
         | table/column(s) in the data model. Enforcing referential
         | integrity is probably good in general too, but I agree you
         | might need to disable it for some FKs, in some databases, like
         | PostgreSQL before they got FOR KEY SHARE locks.
        
           | hn_throwaway_99 wrote:
           | At this point you should realize your proposal is a non-
           | starter, and I didn't even realize btilly's objection
           | originally.
           | 
           | I can't think of any other feature in SQL where the rules of
           | the query are actually dependent on something _not explicit
           | to the query itself_. Even USING and NATURAL are just
           | syntactic sugar that depends on the _structure_ of the table,
           | not on any underlying constraints.
           | 
           | So, what you have proposed, "allowing a user to drop the
           | referential integrity check if desired, but keeping the
           | relationship definition" would be a _massive_ change to tons
           | of SQL tools out there as it 's a huge new feature, for some
           | minor syntactic sugar. Ain't gonna happen.
        
             | JoelJacobson wrote:
             | > Even USING and NATURAL are just syntactic sugar that
             | depends on the structure of the table, not on any
             | underlying constraints.
             | 
             | Similar to how JOIN FOREIGN would depend on the _structure_
             | of the data model, defined by tables, foreign keys, etc.
             | 
             | > So, what you have proposed, "allowing a user to drop the
             | referential integrity check if desired, but keeping the
             | relationship definition" would be a massive change to tons
             | of SQL tools out there as it's a huge new feature, for some
             | minor syntactic sugar. Ain't gonna happen.
             | 
             | Why would it be a problem from the tools perspective if the
             | foreign key wasn't actually enforced if the DBA insists on
             | temporarily disabling the enforcement of the FK? If the
             | tool would e.g. be used to insert a row, and the DB would
             | accept it, even though it would violate the FK, what do you
             | suggest would be the problem from the tools perspective?
             | 
             | This is also not a new idea. It's already implemented in
             | MSSQL, see WITH NOCHECK.
        
               | [deleted]
        
               | hn_throwaway_99 wrote:
               | > Similar to how JOIN FOREIGN would depend on the
               | structure of the data model, defined by tables, _foreign
               | keys_ , etc.
               | 
               | The point that everyone is making is there are _not_
               | currently any SQL statements that depend on structural
               | information as defined in the foreign key relationships
               | when calculating the structure of the data. Furthermore,
               | there are already tons of tooling and processes that
               | depend on this fact, that your proposal would break, for
               | a teeny bit of less typing.
               | 
               | Beating a dead horse at this point.
        
               | tqi wrote:
               | The person you are responding to was more direct/harsh
               | than was necessary, but I think it would be good to step
               | back for a moment and reflect on the feedback from this
               | community. You said "If someone can convince me this is a
               | bad idea, that would help me forget about all of this, so
               | I would greatly appreciate your thoughts, no matter how
               | negative or positive." I think there are enough valid
               | objections here to at least consider the idea that this
               | is not a clear improvement?
        
               | JoelJacobson wrote:
               | When I wrote that comment, the idea had even more flaws
               | than currently, at that time I suggested using "WITH" and
               | a new "->" operator. Thanks to new ideas coming from
               | other users in the PostgreSQL and Hacker News community,
               | those problems have been solved, and we now have less
               | remaining problems with the proposal. I'm really grateful
               | for all the help.
               | 
               | Like I said in another reply, I will put together a
               | "Drawbacks / Remaining issues" section and update the
               | Gist, based on all replies. Perhaps the end result will
               | be Status Quo, but at least then we have documented the
               | reasons why this idea is a dead end. However, thanks to
               | all the improvements just during the last couple of days,
               | I feel really optimistic and motivated, so I think there
               | is a great chance we can solve the remaining issues
               | together if we try.
               | 
               | To comment on the response from the direct/hash person:
               | 
               | The point made by the user, "not currently any SQL
               | statements that depend on structural information as
               | defined in the foreign key relationships", is true, but I
               | don't see why that's an argument by itself against the
               | idea?
               | 
               | I find the other argument, claiming there would be a
               | problem with tooling and processes, much more interesting
               | and I'm eager to fully understand it. I asked a question
               | in hope to do so, "what do you suggest would be the
               | problem from the tools perspective", but has so far not
               | received any reply.
        
         | seadan83 wrote:
         | This, the result of a query should not change depending on a
         | constraint.
        
           | zmmmmm wrote:
           | By that do you mean, should not vary in the data returned or
           | should not break?
           | 
           | Personally I agree that changing a constraint shouldn't alter
           | the data returned. But I'm happy enough if it breaks in a
           | clear and verifiable manner. There are plenty of other
           | situations where adding a constraint will cause existing SQL
           | (if not queries) to break so its not really that much of a
           | change.
        
         | zmmmmm wrote:
         | Data warehouses have all sorts of different design approaches
         | based on their different requirements, wouldn't this just be
         | another one of them? I actually think that having a way to
         | alter the constraint to non-enforcing sounds like it'd be
         | better for your situation anyway because you can then have your
         | load performance but utilise the constraints with background /
         | delayed process that checks the integrity because the
         | information is still there in a standard form.
         | 
         | Which sort of leads to .... I don't agree with your
         | characterisation of foreign key constraints as business rules.
         | They are genuine information about the structure of the data.
        
         | cogman10 wrote:
         | Absolutely agree.
         | 
         | Further, whenever someone is adjusting table performance index
         | tweaks is almost always the first thing to tackle.
         | 
         | Adding foreign keys into the query is just as bad as adding
         | indexes into the query (which, you can do in T-SQL, but
         | generally shouldn't). Indexes can be dropped, changed, or added
         | and you SHOULD be relying on the SQL optimizer to use the most
         | appropriate index.
         | 
         | This feature appears to only save a bit of typing in the best
         | of scenarios. In the worst, an update/drop of a foreign key
         | will end up breaking a bunch of queries, which is insane.
        
           | Tostino wrote:
           | While I wish there were a way to easily refactor queries when
           | making these changes, this is not different than it works for
           | renaming/dropping a column..."In the worst, an re-name/drop
           | of a column will end up breaking a bunch of queries, which is
           | insane."
        
       | tapas73 wrote:
       | I will write a letter to santa asking to get this into sql.
       | During 10 years in data warehouse I constantly wondered why SQL
       | does not have something like that.
       | 
       | Reducing verboseness is nice, but the main perk is the
       | _correctness_.
       | 
       | Oh.. if I got a cent every time I found a bug in colleagues sql,
       | because of join accidentally multiplying/doubling rows... :-)
        
       | thanatos519 wrote:
       | I dunno. I preferred the SQL-89 syntax.
        
         | samtheprogram wrote:
         | Same here. I didn't even realize that was an option. As a
         | relative beginner to SQL, is there a real reason the JOIN
         | syntax was needed at all beyond special inner/outer joins?
        
           | Tostino wrote:
           | Sanity for your coworkers who may have to touch your SQL.
        
       | tester756 wrote:
       | While we are at it
       | 
       | let's write SQL queries starting from FROM.
       | 
       | `FROM users SELECT *`
       | 
       | It'd allow tooling to provide IntelliSense better.
        
         | miohtama wrote:
         | While this might be a joke, the world has been ripe for a
         | replacement for SQL... for the last 20 years.
        
           | hdjrudni wrote:
           | I started writing a new SQL parser that just transforms the
           | query...this was one of the features. The other being a
           | syntax for group-wise max queries which are just crazy dumb
           | to write efficiently in MySQL.
        
         | vosper wrote:
         | I agree, though I'll give SQL a pass for this because it's old.
         | 
         | But how the Javascript world ever thought that `import {
         | function } from 'library'` was better than `from 'library'
         | import { function }` I'll never know. Python got this right
         | long before anyone was even thinking about adding imports to
         | JS!
        
           | hdjrudni wrote:
           | Ya..that one is particularly sad because it was recent. Don't
           | know why they felt the need to botch that.
        
           | goscript wrote:
           | I agree but not with current JavaScript modules. i would
           | rather work with this:                   import { functionA }
           | from 'library';         import { functionB } from
           | '../utils/core/abc';         import { functionC } from './a';
           | 
           | over:                   from 'library' import { functionA };
           | from '../utils/core/abc' import { functionB };         from
           | './a' import { functionC };
        
       | jimbobmcgee wrote:
       | Is anyone actually likely to implement these based on some random
       | gist on the internet, or are you just screaming into the void?
       | 
       | Because, if you have that kind of clout, I've got an INSERT/SET
       | syntax I'd like to put your way...
        
         | JoelJacobson wrote:
         | I'm a small PostgreSQL contributor since 2010 myself. I
         | probably can't write the whole patch myself, but if there is
         | enough interest, and if we can work out the details and address
         | the problems raised in this thread and elsewhere, I'm pretty
         | confident we can do it. If we ever get there, the next step
         | would be a reference implementation, probably in PostgreSQL, or
         | to discuss a proposal in the SQL committee.
        
           | jimbobmcgee wrote:
           | I'm often interested into what goes into changes to
           | committee-driven standards.
           | 
           | To an outsider, proposing a change seems to require one to be
           | part of a shady cabal of Big-5 employees, skilled in the art
           | of hiding subtle, privacy-invading features into inscrutable,
           | plain-text RFCs.
           | 
           | That or subjecting yourself to 30K+ what-abouters who deform
           | your suggestion into something unrecognisable.
           | 
           | It's refreshing to see a straightforward, well-formatted
           | proposal (even if I do slightly prefer the `FROM table1 x
           | JOIN table2 y ON x.fk` syntax suggested in other comments).
        
             | JoelJacobson wrote:
             | > To an outsider, proposing a change seems to require one
             | to be part of a shady cabal of Big-5 employees, skilled in
             | the art of hiding subtle, privacy-invading features into
             | inscrutable, plain-text RFCs.
             | 
             | I thought so too. Initially I just tried to get in contact
             | with someone at the Swedish Institute for Standards (SIS),
             | to see if it would be possible to send a proposal to
             | someone in the SQL committee, which I thought was nearly
             | impossible to become a member of. But as it turns out, SIS
             | explained I could actually join the Swedish working group,
             | and participate directly there, I just had to send in an
             | application and get the approval from my employer, since
             | there is a cost involved and you have to be a member via a
             | company. Turns out ISO is a very open and democratic
             | organization, just like Hacker News! :)
             | 
             | I think this proposal could take years until it land, if it
             | ever does, in some form, if concerns can be addressed, but
             | SQL is here to stay for a while, so that doesn't scare me.
        
               | Tostino wrote:
               | I was following this proposal on the -hackers thread,
               | glad to see it getting traction here. I find it really
               | interesting to read that, it'll be even more interesting
               | to see how it plays out longer term. I'd love to see some
               | more progress made with the SQL language / syntax, and
               | having someone in the community actively engaging with
               | the committees seems like a great way forward.
        
             | JoelJacobson wrote:
             | > It's refreshing to see a straightforward, well-formatted
             | proposal (even if I do slightly prefer the `FROM table1 x
             | JOIN table2 y ON x.fk` syntax suggested in other comments).
             | 
             | Thank you! /me feeling happy
        
       | tqi wrote:
       | Personally, I think SQLs verbose syntax is a good thing, and
       | increases clarity/interpretability. It reduces cognitive load to
       | have things explicitly listed out. Not having the table and
       | column names in the query itself makes it much harder to read and
       | understand queries without prior knowledge of the data model.
        
         | bendbro wrote:
         | I agree with your point about explicit columns being easier to
         | read, but I still sometimes prefer implicit columns. An
         | example: whenever you have numerous subqueries, all using the
         | same columns, implicit columns are easier to read since there
         | is less text on the screen. It's also less error prone to
         | change just one line rather than numerous lines.
         | 
         | This feeds into my view of metaprogramming-like situations.
         | Whenever the code-time-view of a program differs significantly
         | from some runtime-state-view of the program, I think there
         | should be a code-time way to view and perhaps edit both the
         | code-time-view and some kind of runtime-state-view. A
         | programmer shouldn't have to waste time digging through
         | numerous files to evaluate what implementation slots into some
         | dependency injected class, or find out what structure ends up
         | in a python method parameter, or what a preprocessor directive
         | ultimately produces. I know IDEs can handle some of these
         | things, but I think better tools can be produced.
         | 
         | More concisely, instead of approaching code as the single and
         | unchanging view of the program, perhaps it would help to
         | approach code as something more dynamic. I have no concrete
         | ideas as to how this would work.
        
         | rcoveson wrote:
         | FWIW we do have `NATURAL JOIN` already, which is a lot worse
         | than this proposal.
        
           | benjiweber wrote:
           | NATURAL JOIN is great because it's like a relational AND. How
           | else will you join with Table Dee and Table Dum?
           | https://twitter.com/benjiweber/status/1476629550608101384
           | 
           | It's just risky if you don't design your schema with
           | relational algebra in mind.
        
             | cpill wrote:
             | I'm 20 years I have never seen anyone use relational
             | algebra. schemas have to be self documenting albeit only in
             | the real world.
        
               | salawat wrote:
               | ...I do... All the time. I'm cleaning up a mess of a
               | namespace collision as we speak. I was the only one who
               | saw it coming. I've spent the better part of a month
               | getting people to believe it's an issue.
        
               | gavinray wrote:
               | SQL _is_ relational algebra, the core of relational
               | algebra is Selection, Projection, Cross-products, Unions,
               | and Set Differences.
               | 
               | SQL is a language that implements Relational
               | Algebra/Relational Calculus
        
             | rcoveson wrote:
             | That is a really interesting case. Correct me if I'm wrong,
             | but I don't think this is a special case where you have to
             | use `NATURAL JOIN` over `JOIN ON`. The problem is just that
             | the Postgres SQL grammar (maybe most/all SQL grammars)
             | require ON or USING in a non-NATURAL join, so you have to
             | specify the (empty) equijoin predicate list yourself, i.e.
             | the identity value: `JOIN ON TRUE`.
             | 
             | EDIT: I had another thought about this.
             | 
             | I think people not designing with the relational algebra in
             | mind is the heart of the issue, specifically w.r.t. column
             | names. We know that namespaces are a hard problem, and a
             | consequence of that problem is that `NATURAL JOIN` as
             | specified in the relational algebra seems risky, or overly
             | magick-y. It makes what might be an unfortunate coincidence
             | (name collision) into something algebraically impactful.
             | 
             | A foreign key join gets around the problem by keeping names
             | and namespaces out of it. It's really doing exactly what
             | `NATURAL JOIN` is supposed to do, but only in the subset of
             | cases where name collisions are meaningful, not
             | coincidental.
        
         | mpolichette wrote:
         | I'm with you on this one. I prefer it to be in your face simple
         | to understand over the terse-ness.
         | 
         | In the gist example, I actually prefer the SQL-92 approach
         | where we are joining given an explicit comparison condition.
         | Every other implementation seems to be trying to hide details,
         | for what gain? Less typing?
         | 
         | In order to use FOREIGN, you will need to know not just what
         | columns a table has, but also their configuration. Which would
         | also require that you have properly configured your tables.
         | While this shouldn't be a hard ask, it does add additional
         | dependency and makes use of this "tool" slightly less
         | "portable" between systems.
         | 
         | I have unfortunately seen cases where people will only have
         | foreign keys un-enforced by their table config. As a dev, if
         | you're introduced to a new DB, you wont know immediately if you
         | can use this, and if things are configured wrong, you need to
         | make a pretty significant change to be able to use it.
         | 
         | I don't see a lot of harm from adding this syntax however as
         | people are free to not use it and it relies on an existing
         | strict convention.
        
           | JoelJacobson wrote:
           | > I don't see a lot of harm from adding this syntax however
           | as people are free to not use it and it relies on an existing
           | strict convention.
           | 
           | This is a good argument I will add to the list.
           | 
           | Also interesting to read about un-enforced foreign keys. I
           | haven't used MSSQL myself, the DB in which I heard it's
           | possible, I've only been using PostgreSQL for the last 20
           | years, and before that MySQL.
           | 
           | I think the problems you describe is an argument against a
           | WITH NOCHECK feature, since it could be misused. Maybe it's
           | necessary in some databases still, but at least in
           | PostgreSQL, the FOR KEY SHARE lock solved all the issues with
           | concurrent updates we had at Trustly. The FOR KEY SHARE was a
           | _huge_ patch [1] written mainly by Alvaro Herrera. Thanks to
           | it, Trustly has never since had any performance problems with
           | foreign keys, and they have AFAIK not needed to drop any
           | foreign keys up until today due to locking /performance
           | problems.
           | 
           | [1] https://www.commandprompt.com/blog/fixing_foreign_key_dea
           | dlo...
        
         | tapas73 wrote:
         | I would argue that conceptually it's not too different from
         | needing to know what tables/columns exist in datamodel.
         | 
         | I get that it is not common now to care about FKs when writing
         | selects. But it could be. Tooling can be improved to help here.
         | (show fks, autocomplete)
         | 
         | Btw. Everybody seems to concentrate on conciseness, but keep in
         | mind that this helps also with query correctness.
        
       | hnthrowaway0315 wrote:
       | What I really need is something like:
       | 
       | SELECT -col1, -col14 FROM table LIMIT 50;
       | 
       | Where the minus sign means I don't want these two columns. I
       | still don't see a way to do it easily (for Vertica and in
       | Datagrip).
        
         | stingraycharles wrote:
         | Simple, effective, and really useful in a lot of situations. I
         | like it!
        
         | bobbyi wrote:
         | If you have specific columns that you frequently want to ignore
         | (which, I think, is the common case of this), you could define
         | a view that selects all the columns except those and do your
         | queries against that view.
        
           | Eremotherium wrote:
           | Or create a function that returns a view of a table minus
           | named columns in situ if that's really a common case for you.
           | Just as an idea.
        
         | piaste wrote:
         | Similarly, I'd love some form of                   GROUP BY
         | every column except for <these>
         | 
         | It feels silly when you are SELECTing a ton of columns, then
         | you add a JOIN to a many-to-one relationship which you want to
         | aggregate. Now you need to either make it a subquery (and hope
         | the optimizer doesn't screw up) or duplicate all your SELECT
         | _expression_ (not even the identifiers) into the GROUP BY.
        
           | Svip wrote:
           | You know, you can just write GROUP BY x, y, ...
           | SELECT t.i+1, count(*) FROM table t GROUP BY 1
           | 
           | 1 in this context means the first selected item (i.e. t.i+1).
           | I know this works in PostgreSQL.
        
             | edoceo wrote:
             | I use this. It gets messy after like 10 columns. I have a
             | trick in my ORM that copies the non-aggrigate columns into
             | the statement. Maybe your ORM has it?
        
           | baskethead wrote:
           | Queries like this won't break but will silently fail upstream
           | by missing data if column names change or are deleted. The
           | explicit nature of SQL ensures that it will break positively
           | which is a better type of failure in my opinion.
        
             | [deleted]
        
             | Groxx wrote:
             | whether that's desirable or not depends on the query and
             | what kinds of changes you make. though I could see it being
             | error-prone in the most common cases.
        
           | dragonwriter wrote:
           | What I really want is a special phrase MINIMAL GROUPING such
           | that GROUP BY MINIMAL GROUPING includes exactly the items
           | that it would be an error _not_ to include in the GROUP BY.
        
           | icambron wrote:
           | I wish it just had a "group by all the stuff I selected
           | without aggregation" shortcut
        
             | xyzzy_plugh wrote:
             | Isn't this just SELECT DISTINCT ...?
        
               | orthoxerox wrote:
               | No. What icambron and I would like is something that can
               | get rid of this group by:                   select
               | division_name, branch_name, dealer_id, dealer_name,
               | quarter, month, sum(total_paid)         from divsions,
               | branches, dealers, transactions         where .....
               | --buncha joins         group by division_name,
               | branch_name, dealer_id, dealer_name, quarter, month
               | 
               | If I didn't want to group the result by division_name,
               | branch_name, dealer_id, dealer_name, quarter and month,
               | why would I put them in the select clause?
        
               | benjiweber wrote:
               | "group by 1,2,3,4,5,6" is at least a little more consise.
        
               | Sesse__ wrote:
               | What if you do SELECT a + b, SUM(x) FROM t1? Should the
               | implicit GROUP BY be on (a,b) or (a+b)?
        
             | coderzach wrote:
             | yeah, this is the right answer. Are there any examples
             | where this isn't what you want? I guess GROUPING SETS and
             | ROLLUP, but those could be special.
        
           | oweiler wrote:
           | Can't you use an alias in a group by?
        
             | Sesse__ wrote:
             | Not by the SQL standard, since projection (which creates
             | aliases) happens after aggregation. There are many
             | databases that allow it as an extension, though.
        
         | blondin wrote:
         | i would rather it written as SELECT * BUT or SELECT * EXCEPT or
         | even SELECT ALL BUT.
         | 
         | SQL has always been that language that is easy to read. even
         | when you don't understand what the queries are doing. adding a
         | cryptic syntax like "-column" would make it less readable.
        
           | colinmhayes wrote:
           | EXCEPT is already a sql keyword
        
           | recursive wrote:
           | The given syntax is obviously a no-go. It's already unary
           | minus.
        
           | polygotdomain wrote:
           | What about SELECT * WITHOUT {columns_you_do_not_want} FROM...
           | 
           | EXCEPT is already a keyword and has is used for set-based
           | operations, so I don't think it's good to overload it.
        
             | edgyquant wrote:
             | I like this fwiw
        
         | c06n wrote:
         | R's data.table has that.
         | 
         | dt[1:50, -c('col1', 'col14')]
        
         | jdunstan wrote:
         | That's very close to SELECT *, which has it's own dangers. I
         | agree that it would be nice for exploration and testing, but
         | probably should not be used in production.
        
           | hnthrowaway0315 wrote:
           | The problem I found out is that SELECT * LIMIT 10 is
           | guaranteed to be needed so it's actually much more important
           | than whatever is in production (you have to explore a lot
           | before writing the production scripts). My idea is that
           | exploration should be as easy as possible.
           | 
           | Things such as SELECT * EXCEPT col1, col2 are really a PIA to
           | write and can build up frustration level really quickly.
           | Certain IDEs such as Datagrip ease the process by providing
           | "macros" but they are not enough.
           | 
           | Another thing is to generate useful boilerplates such as
           | SELECT col1 FROM table GROUP BY col1 ORDER BY col1 to explore
           | all unique values of col1.
        
           | ipaddr wrote:
           | What dangers lay in select *...? Too much data?
        
             | wiredfool wrote:
             | Minimal if you refer to columns by name. Risky if you rely
             | on column ordering. Can be too much data and a performance
             | issue if you have large columns that you're not using.
             | (Though, it's not much different if you're using an orm
             | that loads the whole object anyway), like Django or
             | sqlalchemy.
             | 
             | I've done it for many years (using named columns/
             | dictionaries as the result set) and its never been an
             | issue.
        
         | nvartolomei wrote:
         | Some have support for this.
         | 
         | - BQ https://cloud.google.com/bigquery/docs/reference/standard-
         | sq...
         | 
         | - CH https://clickhouse.com/docs/en/sql-
         | reference/statements/sele...
        
           | hnthrowaway0315 wrote:
           | Thanks, these really look neat. Unfortunately we only use
           | Vertica :/
        
         | dolmen wrote:
         | > SELECT -col1, -col14 FROM table LIMIT 50;
         | 
         | This is already valid SQL. Example: SELECT -col1, -col4 FROM
         | (SELECT 1 AS col1, 2 AS col4) AS tbl;
         | 
         | Do you think seriously that a new meaning could ever be
         | attached to that syntax?
        
       | bawolff wrote:
       | Seems like minor syntactic sugar. I'm not opposed but really only
       | seems mildly useful at best.
        
       | pella wrote:
       | related: pgsql-hackers thread and context
       | 
       | https://www.postgresql.org/message-id/flat/1aec0dd0-dc27-40e...
        
       | mi_lk wrote:
       | tangent question - what're some effective ways to learn how to
       | write SQL?
        
         | jdunstan wrote:
         | w3schools and the online documentation for any database are
         | good places to start (Postgres and SQL Server have pretty good
         | documentation).
         | 
         | There are also easy-to-google SQL Puzzles, if you're looking
         | for something more advanced.
         | 
         | I would also recommend learning about query plans and how to
         | read them, they're invaluable for query optimization.
        
         | orthoxerox wrote:
         | Basic SQL has lots of tutorials. Graduate from them to Markus
         | Winand's use-the-index-luke.com and modern-sql.com
        
       | Guest42 wrote:
       | Is this not something that should be written at the app level
       | rather than dml?
        
         | Tostino wrote:
         | Foreign keys in general? I am not sure I understand what you
         | mean...
        
           | Guest42 wrote:
           | Not something I've ever thought about but I think that it
           | could be done via macro,
           | 
           | I have written tsql functions in c# and imagine that other
           | dialects have analogous functionality.
           | 
           | It seems as though the point is to cut down on sql code, it'd
           | also be possible to query the foreign keys and create a data
           | structure that could feed a function for joins.
           | 
           | I haven't thought out the specifics but think this type of
           | approach would be more practical than changing the sql
           | standard.
        
             | Tostino wrote:
             | At that point you are just building a feature into an "ORM"
             | of some sort. The entire point is to allow queries to be
             | written in a more concise way for this simple equality join
             | use case which is (total ballpark figure) ~half of the
             | joins in my system.
        
       | Ayesh wrote:
       | It looked very useful and intuitive at first, but on further
       | thought, I think the only time that you'd truly benefit from it
       | if you SELECT *. For other SELECT queries with explicit field
       | name list, you'd need to know the table the key constraint links
       | to anyway.
       | 
       | SQL Views serve perfectly well for queries encouraged by the
       | schema itself, and i think they are more sophisticated and
       | practical way.
        
         | JoelJacobson wrote:
         | There is similar comment in the thread, suggesting both tables
         | should always be specified.
         | 
         | Not sure what I prefer yet. The idea is the foreign key name is
         | usually the same as the referenced table, so should be an
         | infrequent problem.
        
           | zmmmmm wrote:
           | > The idea is the foreign key name is usually the same as the
           | referenced table
           | 
           | I think this is where you are hitting some of your turbulence
           | here because lots of ORMs / schema management tools actually
           | generate completely cryptic fk names (sometimes based on the
           | hash of the columns or similar). Personally I think weighing
           | in legacy baggage like that too highly is a bad thing as it
           | creates enormous inertia.
        
             | JoelJacobson wrote:
             | Thanks for explaining, I have no experience of ORMs, always
             | written my queries manually. This was insightful. Thanks
             | also for the word "inertia", good one, will add to my
             | vocabulary.
        
       | [deleted]
        
       | Smotko wrote:
       | It might be just me, but I feel like remembering the foreign key
       | name is more difficult than remembering the columns that you need
       | in the ON clause. Especially since you can usually find the
       | column names by just seeing the data in the table (select * from
       | x) wheres seeing the foreign key names is much harder (show
       | create table x?).
       | 
       | Also, if you use an ORM it will usually generate foreign key
       | names that are almost impossible to remember.
        
         | exabrial wrote:
         | I think this is an operator problem. You're using the wrong
         | tool for the job.
         | 
         | TablePlus, SequelAce, the official MySQL client all support
         | cntrl-space autocompletion. I wish we used Postgres, but I
         | imagine the landscape is the same. The big box databases like
         | Oracle, DB2 undoubtedly having this tooling as well.
         | 
         | That being said, here is our fk naming convention: `fk-
         | asset_types->asset_categories` which pretty states what's going
         | on and is easy to remember.
        
           | dolmen wrote:
           | SQL is not only written in an SQL client. SQL is also written
           | (and read from) embedded/mixed in an other programming
           | language were tooling is not always available.
           | 
           | Having to know the names of foreign keys (in addition to the
           | column names of the 2 tables) is adding more cognitive load.
           | I don't think that is an improvement.
        
         | JoelJacobson wrote:
         | It would indeed be difficult to remember, but the proposal also
         | suggest changing the default naming convention for foreign
         | keys, to give them the same name as the referenced table.
         | 
         | If using an ORM, I would guess this proposal isn't useful,
         | since then you wouldn't hand-write queries anyway, right?
         | Except when you want to override the queries generated by the
         | ORM? (I'm not an ORM user myself.)
        
           | to11mtm wrote:
           | Speaking as someone who has used ORMs in the past and
           | contributes to a LINQ Micro ORM...
           | 
           | It might make tooling 'easier', but since backwards
           | compatibility has to be considered the actual value add is
           | questionable IMO.
           | 
           | Most ORMs/MicroORMs will have tooling that sniffs out the DB
           | Schema including foreign keys, and if you are using those
           | bits (i.e. 'not hand written') most will do the right thing
           | today. I suppose you could include some extra syntax for
           | whatever DSL you're providing users....
           | 
           | IDK. Speaking as someone who is very comfortable in SQL, This
           | feels more like syntactic sugar than anything else.
        
         | rtpg wrote:
         | In a universe where foreign key index names are important we
         | would specify better names.
         | 
         | I think stuff like "documents_by_user" as foreign key names and
         | explicit index usage would improve peoples awareness of how
         | indices get used and would generally be a positive
        
       | matteote wrote:
       | This reminds me of KEY JOIN in Sybase which, unlike this syntax,
       | does not require to specify the foreign key. I find Sybase's
       | syntax to be too implicit, and challenging to port to other
       | dialects; JOIN FOREIGN with explicit PK reference looks much
       | better.
       | 
       | I wonder how it is expected to work with non-table references
       | (views, CTEs, subqueries), especially when the columns involved
       | in the foreign key (on either side) are not returned explicitly
       | by the referenced object.
        
         | JoelJacobson wrote:
         | > I wonder how it is expected to work with non-table references
         | (views, CTEs, subqueries)
         | 
         | It's not, it's only for the special but common case of joining
         | two tables based on a foreign key.
        
       | halayli wrote:
       | It's a great proposition. However, I would suggest having it as a
       | prefix rather than postfix of the JOIN expression to reduce
       | implementation side effects and avoid ambiguity between keyword
       | 'FOREIGN' and a table called 'FOREIGN'.
       | 
       | example: FOREIGN LEFT JOIN
        
       | Upitor wrote:
       | In my opinion this proposal seems only to consider simple cases,
       | but there are many not-so-simple relationsship types:
       | 
       | Consider a 'sales' table which includes columns [time] and
       | [sold_by_employee_id], and a periodized 'employee' table which
       | includes columns [employee_id], [valid_from] and [valid_to]
       | columns. There is a perfectly valid relationsship between the two
       | tables, but you cant join them using only equal-statements (you
       | need a between-statement as well)
        
         | JoelJacobson wrote:
         | Nice example! The join you describe would remain as a JOIN ON.
         | 
         | This is per design. Quote from the proposal:
         | 
         | "The idea is to improve the SQL language, specifically the join
         | syntax, for the special but common case when joining on foreign
         | key columns." ... "If the common simple joins (when joining on
         | foreign key columns) would be written in a different syntax,
         | the remaining joins would visually stand out and we could focus
         | on making sure we understand them when reading a large SQL
         | query."
         | 
         | So, the special non-equal based join condition you describe,
         | would become more visible, and stand out, allowing readers to
         | pay more attention to it.
         | 
         | The hypothesis is most joins are made on foreign key columns,
         | so if we can improve such cases, a lot can be won.
        
           | DevKoala wrote:
           | But how could you accurately tell if some queries join on the
           | foreign key, but were written by someone without knowledge of
           | the new specification?
        
           | Upitor wrote:
           | I see. I should have read the article more carefully :-)
        
         | Upitor wrote:
         | Also, consider a 'sales' table with multiple references to a
         | 'calendar' table: [shipped_date], [order_date], [received_date]
        
           | JoelJacobson wrote:
           | Good example too, but this one can with benefit be written
           | using the JOIN FOREIGN syntax, you just need to give the
           | foreign keys suitable names such as e.g. "shipped_date",
           | "order_date", "received_date". Or, to remind you of which is
           | the referenced table, perhaps you want to include it in the
           | names, and the names would be "shipped_date_calendar",
           | "order_date_calendar", "received_date_calendar", but
           | personally I would prefer just "shipped_date" or perhaps even
           | just "shipped".
        
       | zoomablemind wrote:
       | I wonder if primary objective of this proposal is to increase
       | convenience and reduce the amount of typing that users need to
       | do, when using SQL inline?
       | 
       | If indeed that's the goal, then it targets a rather specific
       | subset of users dealing with explorative/ad-hoc analysis on a
       | database. Once such analysis is done, the queries would usually
       | need to be formalized for robustness and to avoid ambiguities.
       | 
       | Obviously, the whole train of queries would derail, should the FK
       | (which is just an index) be dropped for one reason or the other.
       | 
       | The existing JOIN features are explicit at least on the level of
       | specified table structure. I believe, any constraint details in
       | such context will be, well, ...foreign.
       | 
       | Perhaps, a simple solution to verbosity problem may be to use an
       | "intelligent" SQL client, which supports some form of
       | autocomplete and which may as well internally use as many
       | schema/data details as available.
       | 
       | In anycase, thanks for making the proposal. I was not aware of
       | JOIN ... USING syntax. I often wanted some convenient way of
       | specifying homonymous join columns, as some schemes are
       | consistent in such namings. So typing JOIN on col1, col2... would
       | translate into equality joins between the listed tables. However,
       | again, there is ambiguity here...
        
       | abhchand wrote:
       | This feels like adding syntactic sugar in some sense. That is, a
       | layer of convenience to accomplish something that can already be
       | accomplished today.
       | 
       | It also breaks the following:
       | 
       | * It's better to be explicit than infer * Keep specifications
       | simple
        
       | slt2021 wrote:
       | not a good idea, constaints are one thing and joins are another
       | thing. plus I can join on conditions other than =.
       | 
       | the problem that author is trying to solve can be easily solved
       | by a view:
       | 
       | 1. Declare a view with all necessary JOINs once
       | 
       | 2. select from view only what you need, aggregate what you want
       | 
       | 3. Optimizer will throw out unnecessary stuff and optimize query
       | while all JOIN logic will be declared only once and will be
       | hidden inside the view
       | 
       | plus each DBMS has its own flavor of SQL and will have its own
       | query optimizer nuances when dealing with joins, especially
       | nested via CTEs/views/lateral queries,etc.
        
         | Sesse__ wrote:
         | Your #3 is very optimistic. You can't just remove a join (be it
         | inner, left, or really any other type) just because you didn't
         | refer to any fields from one of the sides, and it's even harder
         | when aggregation is in the mix. There are cases where you can
         | without influencing the result, but they are special-case
         | optimizations and not universally supported across databases.
        
       | gigatexal wrote:
       | I like this a lot. It makes thing cleaner and more like explicit.
        
         | clintonb wrote:
         | How is this more explicit than specifying join columns today?
        
       | lolive wrote:
       | Coming from the graph database / semantic web area, I would
       | propose foreign key/ primary key relationships in the DB to be
       | detailed _and named_ in a schema description, and then queries
       | reference those relationships by name to define the needed joins.
        
         | criticaltinker wrote:
         | Yup totally agree - OP is an interesting proposal and has
         | provoked a lot of quality commentary regarding the tradeoffs of
         | clarity vs conciseness, implicit vs explicit, FK performance,
         | etc.
         | 
         | But IMO you've raised _the_ important long term consideration -
         | do graph based schemas and query languages obviate the need to
         | model foreign keys explicitly? If this JOIN FOREIGN proposal is
         | an incremental step forward, what's the next big leap?
        
       ___________________________________________________________________
       (page generated 2021-12-30 23:00 UTC)