[HN Gopher] SQL language proposal: JOIN FOREIGN
___________________________________________________________________
SQL language proposal: JOIN FOREIGN
Author : JoelJacobson
Score : 142 points
Date : 2021-12-30 17:54 UTC (5 hours ago)
(HTM) web link (gist.github.com)
(TXT) w3m dump (gist.github.com)
| jeff-davis wrote:
| I'm not sure this is a good idea. It means that a constraint on
| the data (a FK is a constraint) affects the semantics of the
| query. In SQL, those two concepts are seperate.
| unbanned wrote:
| >This improved in SQL-92:
|
| Did it?
| JoelJacobson wrote:
| Yes, I think so, at least according to:
| https://learnsql.com/blog/history-of-sql-standards/
|
| What makes you think otherwise?
| antender wrote:
| I, personally, feel like this is a rather pointless addition to
| an already somewhat bloated language. It it isn't saving that
| much typing (we have NATURAL JOIN's already, nobody is using
| them), is kind of inconsistent with other language principles (we
| should be able to join anything to anything if we need to, see
| SQL-89) and limits JOIN flexibility (you don't have to join with
| "=" operator, JOIN's supports different types of conditions and
| are equivalent to WHERE clause)
| thom wrote:
| Of all the myriad indignities of SQL, this isn't near the top of
| my list. I also don't like making the names of objects like
| foreign keys and indexes first class concerns in your queries,
| that's a whole new layer of cognitive overhead.
| degenerate wrote:
| Agree. I'd much prefer they worked on the way MySQL determines
| what indexes to use on a multi-table join, so there could be
| less emphasis on the need to ANALYZE TABLE and FORCE INDEX when
| the DB is seemingly being dumb, but there's not enough
| information in EXPLAIN to tell you where it's actually getting
| tripped up. Troubleshooting full table scans on large sets of
| data is a nightmare.
| Tostino wrote:
| Slightly different group of people working on MySQL vs this
| proposal.
| torgard wrote:
| I dig it!
|
| I prefer defining tables like this: CREATE
| TABLE category ( id int GENERATED ALWAYS AS IDENTITY,
| name text ); CREATE TABLE post (
| id int GENERATED ALWAYS AS IDENTITY, category_id int
| REFERENCES category (id) ON DELETE CASCADE
| );
|
| That is, category.id rather than category.category_id. But the
| USING clause doesn't work with that style, as far as I
| understand.
|
| This would make my queries nicer.
| JoelJacobson wrote:
| Much cleaner, I agree, that's a big win. Will add that to the
| list of benefits.
|
| Will also add a "Drawbacks / Remaining issues" section to the
| Gist, from all the valuable comments so far in this thread,
| thank you all, positive as well as negative comments, all very
| helpful.
| thom wrote:
| I've been quite happy having fully prefixed column names for a
| long time now. Makes joins easier, big views clearer, and
| random exports more readable out of the box. Also in my case
| it's also easier to line up unique column names with things
| like Clojure specs but I accept that's a niche concern.
| unbanned wrote:
| Foreign keys exist for data integrity... something which is
| rather opaque to query onterface... So why do you believe that
| has place in a query?
| JoelJacobson wrote:
| Foreign keys both define relationships between tables, and also
| enforce referential integrity. The discussion is about how we
| could potentially mine various additional value from foreign
| keys as an information resource, rather than just mostly being
| about referential integrity.
| hn_throwaway_99 wrote:
| This is my "Thanks, I hate it" response.
|
| Reason being if you use the example they gave:
|
| SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM films f
| JOIN FOREIGN f.films_did_fkey d
|
| You need to _implicitly_ know the table that films_did_fkey
| points to, because 'd' is just a table alias. I can't think of
| anywhere else in the SQL standard where you can introduce a table
| alias without explicitly referencing the table. In my opinion
| making code essentially unreadable unless you have other
| background information is an antipattern.
| JoelJacobson wrote:
| I think your concern is addressed by the idea further down in
| proposal; giving the foreign keys the same names as the
| referenced tables. This example was provided:
|
| SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM films f
| JOIN FOREIGN f.distributors d
| hn_throwaway_99 wrote:
| No, my concern is not that you _can_ also specify the table
| name that explicitly states the name of both tables, it 's
| that you have proposed a syntax where it is _possible_ to
| leave off the explicit table name.
|
| If you got rid of the syntax that uses the underlying "magic"
| of needing to know which table the foreign key points to, I'd
| be more amenable.
| JoelJacobson wrote:
| There are two asymmetrical cases; one of them needs two
| pieces of information, the other three.
|
| Having to always specify both tables would syntax wise be
| redundant, but I agree it's worth considering it might be a
| good thing to improve readability, which would help
| especially in the case where the foreign key isn't/cannot
| be given the same name as the referenced table.
| mrweasel wrote:
| Yeah, my concern is people in the real world. If you can
| "magically" join on a foreign key and leave the table name
| out, or still call the thing anything you want, some smart
| ass will abuse the live crap out of it and I'll somehow be
| left to deal with it.
|
| I like things to be explicit. Tell me what you're joining
| and how you want to join it. What is the use case for this?
| Other than saving some typing, and let's face it, sometimes
| a little extra typing now, will save you a lot of trouble
| later. The proposal claims: "The idea is to improve the SQL
| language", but is it really better?
| tapas73 wrote:
| First, very valid concern regarding table name.
|
| > Other than saving some typing,
|
| I would like to point out the corectness aspect of the
| proposal.
|
| Today foreign keys are enforced during inserts, updates,
| deletes. (you just can't violate fk. this is so good) But
| you can violate it in selects. (e.g. mistype column name,
| forget to include a column)
|
| This proposal (or its adjustment) would allow to use fk
| also during select. It's like static typing for join
| conditions.
| tqi wrote:
| 1. Just because you can doesn't mean people will. 2. That
| doesn't work if you need to have multiple keys pointing to
| the same table (ie owner_id and secondary_owner_id pointing
| to a users table)
| mritchie712 wrote:
| yeah, this saves like 10 characters of text in exchange for
| massive confusion.
| JoelJacobson wrote:
| In this simple example yes, but there can be multiple columns
| in a join.
|
| The syntax proposal i O(1) syntax wise compared to O(2n) for
| a JOIN ON where n is the number of columns.
| hn_throwaway_99 wrote:
| But that "simple example" is like 95% plus of all my joins,
| where I'm joining two tables on a foreign key.
| JoelJacobson wrote:
| Your data model must be a lot simpler than mine, congrats
| ;)
| unbanned wrote:
| mjevans wrote:
| In this example D is the target table, FK's natural
| relationship could be aliased (specified multiple times under
| different names).
| ridaj wrote:
| Another thing, this makes things harder to reverse.
|
| Some engines have join-order-dependent performance, so there
| are instances where you would want to write
|
| ``` FROM orders LEFT JOIN customers USING (customer_id) ```
|
| and others where you'd want to write
|
| ``` FROM customers RIGHT JOIN orders USING (customer_id) ```
|
| Swapping join order with current syntax is relatively easy
| since references in the same FROM clause are interchangeable.
| But in this proposal, the reference to the joined table isn't
| written out, so it would be pretty complicated to reverse join
| order.
| jmull wrote:
| Agreed. Maybe something like this instead:
| SELECT f.title, f.did, d.name, f.date_prod, f.kind
| FROM films f JOIN distributors d ON FOREIGN
| f.films_did_fkey
|
| and SELECT f.title, f.did, d.name,
| f.date_prod, f.kind FROM distributors d
| JOIN films f ON FOREIGN films_did_fkey
| silon42 wrote:
| +1 ... needed at least as an option where FK is not declared
| magicalhippo wrote:
| I had a similar suggestion the other day[1], except I chose
| what to me sounded more English-sounding:
| JOIN films f USING FOREIGN KEY
|
| Here the explicit foreign key, films_did_fkey in this case,
| could be specified after FOREIGN KEY. This would be similar
| in syntax to when you force an index for a select statement,
| at least in the DB we use.
|
| [1]: https://news.ycombinator.com/item?id=29687565
| zeroimpl wrote:
| Yeah I think something like this is the way to go. The
| table name needs to be listed in a consistent manner like
| all other joins, and I don't think the syntax should be
| different depending on which order the tables are listed
| in.
|
| With the above proposal, it seems the foreign key name
| could be left out in the common case of there being only
| one fkey between the two tables too.
| magicalhippo wrote:
| > With the above proposal, it seems the foreign key name
| could be left out in the common case of there being only
| one fkey between the two tables too.
|
| Yes, this was indeed my intention. The common case with
| only a single matching foreign key constraint would not
| require being explicit, but one could be if needed in a
| natural way. JOIN films f USING FOREIGN
| KEY films_did_fkey
| JoelJacobson wrote:
| This is the same idea as already posted in this thread
| https://news.ycombinator.com/item?id=29687134, but with a more
| in-depth explanation that couldn't fit in the comment field, and
| with some syntax improvements, such as using "FOREIGN", which is
| a reserved keyword, instead of the previously suggested "->"
| notation.
|
| Thanks for all the valuable comments on last proposal. Excited to
| hear what you think about this update.
| nicoburns wrote:
| I wish the SQL standard would add some basic syntax sugar, such
| as support for trailing commas and allowing use of aliases in
| more places.
| [deleted]
| btilly wrote:
| Please, no. A common problem in data warehousing is that a design
| with lots of foreign keys becomes slow to load. A standard
| solution is to move the checks for referential integrity
| elsewhere, then drop the foreign key constraint. This massively
| improves load performance.
|
| This syntax change means that this solution can't be used because
| you have no idea what random queries out there might rely on the
| specific existence of a foreign key constraint for the definition
| of the query. Thereby meaning that if a foreign key constraint
| becomes a performance problem, we're stuck with it rather than
| having a solution.
|
| Features have consequences. And I don't like the consequences of
| making business rules that are now explicit in the query, be
| instead implicit in the table design.
| tapas73 wrote:
| Lets have disabled FK then.
|
| postgre, seems to not have it, but the proposal could include
| also "disabled FK" part.
|
| teradata, oracle, sql server already have option for FK
| "disable/no check".
|
| https://docs.teradata.com/r/eWpPpcMoLGQcZEoyt5AjEg/df1PvVh6e...
| https://docs.microsoft.com/en-us/sql/relational-databases/ta...
| https://docs.oracle.com/cd/B28359_01/server.111/b28310/gener...
| zzzeek wrote:
| this was my thought too. but now you need an ALTER CONSTRAINT
| NON ENFORCING or something like that so that the "constraints"
| can be present declaratively but don't actually get used.
|
| which then looks a whole lot like you're just introducing
| macros into SQL where you have some symbolic keywords that
| expand out into pre-fabricated ON clauses.
| JoelJacobson wrote:
| Good point, but addressable:
|
| Simply decouple the relationship definition and referential
| integrity check, allowing a user to drop the referential
| integrity check if desired, but keeping the relationship
| definition.
|
| I cannot see why you would not want to at least always store
| the information a certain table/column(s) references some other
| table/column(s) in the data model. Enforcing referential
| integrity is probably good in general too, but I agree you
| might need to disable it for some FKs, in some databases, like
| PostgreSQL before they got FOR KEY SHARE locks.
| hn_throwaway_99 wrote:
| At this point you should realize your proposal is a non-
| starter, and I didn't even realize btilly's objection
| originally.
|
| I can't think of any other feature in SQL where the rules of
| the query are actually dependent on something _not explicit
| to the query itself_. Even USING and NATURAL are just
| syntactic sugar that depends on the _structure_ of the table,
| not on any underlying constraints.
|
| So, what you have proposed, "allowing a user to drop the
| referential integrity check if desired, but keeping the
| relationship definition" would be a _massive_ change to tons
| of SQL tools out there as it 's a huge new feature, for some
| minor syntactic sugar. Ain't gonna happen.
| JoelJacobson wrote:
| > Even USING and NATURAL are just syntactic sugar that
| depends on the structure of the table, not on any
| underlying constraints.
|
| Similar to how JOIN FOREIGN would depend on the _structure_
| of the data model, defined by tables, foreign keys, etc.
|
| > So, what you have proposed, "allowing a user to drop the
| referential integrity check if desired, but keeping the
| relationship definition" would be a massive change to tons
| of SQL tools out there as it's a huge new feature, for some
| minor syntactic sugar. Ain't gonna happen.
|
| Why would it be a problem from the tools perspective if the
| foreign key wasn't actually enforced if the DBA insists on
| temporarily disabling the enforcement of the FK? If the
| tool would e.g. be used to insert a row, and the DB would
| accept it, even though it would violate the FK, what do you
| suggest would be the problem from the tools perspective?
|
| This is also not a new idea. It's already implemented in
| MSSQL, see WITH NOCHECK.
| [deleted]
| hn_throwaway_99 wrote:
| > Similar to how JOIN FOREIGN would depend on the
| structure of the data model, defined by tables, _foreign
| keys_ , etc.
|
| The point that everyone is making is there are _not_
| currently any SQL statements that depend on structural
| information as defined in the foreign key relationships
| when calculating the structure of the data. Furthermore,
| there are already tons of tooling and processes that
| depend on this fact, that your proposal would break, for
| a teeny bit of less typing.
|
| Beating a dead horse at this point.
| tqi wrote:
| The person you are responding to was more direct/harsh
| than was necessary, but I think it would be good to step
| back for a moment and reflect on the feedback from this
| community. You said "If someone can convince me this is a
| bad idea, that would help me forget about all of this, so
| I would greatly appreciate your thoughts, no matter how
| negative or positive." I think there are enough valid
| objections here to at least consider the idea that this
| is not a clear improvement?
| JoelJacobson wrote:
| When I wrote that comment, the idea had even more flaws
| than currently, at that time I suggested using "WITH" and
| a new "->" operator. Thanks to new ideas coming from
| other users in the PostgreSQL and Hacker News community,
| those problems have been solved, and we now have less
| remaining problems with the proposal. I'm really grateful
| for all the help.
|
| Like I said in another reply, I will put together a
| "Drawbacks / Remaining issues" section and update the
| Gist, based on all replies. Perhaps the end result will
| be Status Quo, but at least then we have documented the
| reasons why this idea is a dead end. However, thanks to
| all the improvements just during the last couple of days,
| I feel really optimistic and motivated, so I think there
| is a great chance we can solve the remaining issues
| together if we try.
|
| To comment on the response from the direct/hash person:
|
| The point made by the user, "not currently any SQL
| statements that depend on structural information as
| defined in the foreign key relationships", is true, but I
| don't see why that's an argument by itself against the
| idea?
|
| I find the other argument, claiming there would be a
| problem with tooling and processes, much more interesting
| and I'm eager to fully understand it. I asked a question
| in hope to do so, "what do you suggest would be the
| problem from the tools perspective", but has so far not
| received any reply.
| seadan83 wrote:
| This, the result of a query should not change depending on a
| constraint.
| zmmmmm wrote:
| By that do you mean, should not vary in the data returned or
| should not break?
|
| Personally I agree that changing a constraint shouldn't alter
| the data returned. But I'm happy enough if it breaks in a
| clear and verifiable manner. There are plenty of other
| situations where adding a constraint will cause existing SQL
| (if not queries) to break so its not really that much of a
| change.
| zmmmmm wrote:
| Data warehouses have all sorts of different design approaches
| based on their different requirements, wouldn't this just be
| another one of them? I actually think that having a way to
| alter the constraint to non-enforcing sounds like it'd be
| better for your situation anyway because you can then have your
| load performance but utilise the constraints with background /
| delayed process that checks the integrity because the
| information is still there in a standard form.
|
| Which sort of leads to .... I don't agree with your
| characterisation of foreign key constraints as business rules.
| They are genuine information about the structure of the data.
| cogman10 wrote:
| Absolutely agree.
|
| Further, whenever someone is adjusting table performance index
| tweaks is almost always the first thing to tackle.
|
| Adding foreign keys into the query is just as bad as adding
| indexes into the query (which, you can do in T-SQL, but
| generally shouldn't). Indexes can be dropped, changed, or added
| and you SHOULD be relying on the SQL optimizer to use the most
| appropriate index.
|
| This feature appears to only save a bit of typing in the best
| of scenarios. In the worst, an update/drop of a foreign key
| will end up breaking a bunch of queries, which is insane.
| Tostino wrote:
| While I wish there were a way to easily refactor queries when
| making these changes, this is not different than it works for
| renaming/dropping a column..."In the worst, an re-name/drop
| of a column will end up breaking a bunch of queries, which is
| insane."
| tapas73 wrote:
| I will write a letter to santa asking to get this into sql.
| During 10 years in data warehouse I constantly wondered why SQL
| does not have something like that.
|
| Reducing verboseness is nice, but the main perk is the
| _correctness_.
|
| Oh.. if I got a cent every time I found a bug in colleagues sql,
| because of join accidentally multiplying/doubling rows... :-)
| thanatos519 wrote:
| I dunno. I preferred the SQL-89 syntax.
| samtheprogram wrote:
| Same here. I didn't even realize that was an option. As a
| relative beginner to SQL, is there a real reason the JOIN
| syntax was needed at all beyond special inner/outer joins?
| Tostino wrote:
| Sanity for your coworkers who may have to touch your SQL.
| tester756 wrote:
| While we are at it
|
| let's write SQL queries starting from FROM.
|
| `FROM users SELECT *`
|
| It'd allow tooling to provide IntelliSense better.
| miohtama wrote:
| While this might be a joke, the world has been ripe for a
| replacement for SQL... for the last 20 years.
| hdjrudni wrote:
| I started writing a new SQL parser that just transforms the
| query...this was one of the features. The other being a
| syntax for group-wise max queries which are just crazy dumb
| to write efficiently in MySQL.
| vosper wrote:
| I agree, though I'll give SQL a pass for this because it's old.
|
| But how the Javascript world ever thought that `import {
| function } from 'library'` was better than `from 'library'
| import { function }` I'll never know. Python got this right
| long before anyone was even thinking about adding imports to
| JS!
| hdjrudni wrote:
| Ya..that one is particularly sad because it was recent. Don't
| know why they felt the need to botch that.
| goscript wrote:
| I agree but not with current JavaScript modules. i would
| rather work with this: import { functionA }
| from 'library'; import { functionB } from
| '../utils/core/abc'; import { functionC } from './a';
|
| over: from 'library' import { functionA };
| from '../utils/core/abc' import { functionB }; from
| './a' import { functionC };
| jimbobmcgee wrote:
| Is anyone actually likely to implement these based on some random
| gist on the internet, or are you just screaming into the void?
|
| Because, if you have that kind of clout, I've got an INSERT/SET
| syntax I'd like to put your way...
| JoelJacobson wrote:
| I'm a small PostgreSQL contributor since 2010 myself. I
| probably can't write the whole patch myself, but if there is
| enough interest, and if we can work out the details and address
| the problems raised in this thread and elsewhere, I'm pretty
| confident we can do it. If we ever get there, the next step
| would be a reference implementation, probably in PostgreSQL, or
| to discuss a proposal in the SQL committee.
| jimbobmcgee wrote:
| I'm often interested into what goes into changes to
| committee-driven standards.
|
| To an outsider, proposing a change seems to require one to be
| part of a shady cabal of Big-5 employees, skilled in the art
| of hiding subtle, privacy-invading features into inscrutable,
| plain-text RFCs.
|
| That or subjecting yourself to 30K+ what-abouters who deform
| your suggestion into something unrecognisable.
|
| It's refreshing to see a straightforward, well-formatted
| proposal (even if I do slightly prefer the `FROM table1 x
| JOIN table2 y ON x.fk` syntax suggested in other comments).
| JoelJacobson wrote:
| > To an outsider, proposing a change seems to require one
| to be part of a shady cabal of Big-5 employees, skilled in
| the art of hiding subtle, privacy-invading features into
| inscrutable, plain-text RFCs.
|
| I thought so too. Initially I just tried to get in contact
| with someone at the Swedish Institute for Standards (SIS),
| to see if it would be possible to send a proposal to
| someone in the SQL committee, which I thought was nearly
| impossible to become a member of. But as it turns out, SIS
| explained I could actually join the Swedish working group,
| and participate directly there, I just had to send in an
| application and get the approval from my employer, since
| there is a cost involved and you have to be a member via a
| company. Turns out ISO is a very open and democratic
| organization, just like Hacker News! :)
|
| I think this proposal could take years until it land, if it
| ever does, in some form, if concerns can be addressed, but
| SQL is here to stay for a while, so that doesn't scare me.
| Tostino wrote:
| I was following this proposal on the -hackers thread,
| glad to see it getting traction here. I find it really
| interesting to read that, it'll be even more interesting
| to see how it plays out longer term. I'd love to see some
| more progress made with the SQL language / syntax, and
| having someone in the community actively engaging with
| the committees seems like a great way forward.
| JoelJacobson wrote:
| > It's refreshing to see a straightforward, well-formatted
| proposal (even if I do slightly prefer the `FROM table1 x
| JOIN table2 y ON x.fk` syntax suggested in other comments).
|
| Thank you! /me feeling happy
| tqi wrote:
| Personally, I think SQLs verbose syntax is a good thing, and
| increases clarity/interpretability. It reduces cognitive load to
| have things explicitly listed out. Not having the table and
| column names in the query itself makes it much harder to read and
| understand queries without prior knowledge of the data model.
| bendbro wrote:
| I agree with your point about explicit columns being easier to
| read, but I still sometimes prefer implicit columns. An
| example: whenever you have numerous subqueries, all using the
| same columns, implicit columns are easier to read since there
| is less text on the screen. It's also less error prone to
| change just one line rather than numerous lines.
|
| This feeds into my view of metaprogramming-like situations.
| Whenever the code-time-view of a program differs significantly
| from some runtime-state-view of the program, I think there
| should be a code-time way to view and perhaps edit both the
| code-time-view and some kind of runtime-state-view. A
| programmer shouldn't have to waste time digging through
| numerous files to evaluate what implementation slots into some
| dependency injected class, or find out what structure ends up
| in a python method parameter, or what a preprocessor directive
| ultimately produces. I know IDEs can handle some of these
| things, but I think better tools can be produced.
|
| More concisely, instead of approaching code as the single and
| unchanging view of the program, perhaps it would help to
| approach code as something more dynamic. I have no concrete
| ideas as to how this would work.
| rcoveson wrote:
| FWIW we do have `NATURAL JOIN` already, which is a lot worse
| than this proposal.
| benjiweber wrote:
| NATURAL JOIN is great because it's like a relational AND. How
| else will you join with Table Dee and Table Dum?
| https://twitter.com/benjiweber/status/1476629550608101384
|
| It's just risky if you don't design your schema with
| relational algebra in mind.
| cpill wrote:
| I'm 20 years I have never seen anyone use relational
| algebra. schemas have to be self documenting albeit only in
| the real world.
| salawat wrote:
| ...I do... All the time. I'm cleaning up a mess of a
| namespace collision as we speak. I was the only one who
| saw it coming. I've spent the better part of a month
| getting people to believe it's an issue.
| gavinray wrote:
| SQL _is_ relational algebra, the core of relational
| algebra is Selection, Projection, Cross-products, Unions,
| and Set Differences.
|
| SQL is a language that implements Relational
| Algebra/Relational Calculus
| rcoveson wrote:
| That is a really interesting case. Correct me if I'm wrong,
| but I don't think this is a special case where you have to
| use `NATURAL JOIN` over `JOIN ON`. The problem is just that
| the Postgres SQL grammar (maybe most/all SQL grammars)
| require ON or USING in a non-NATURAL join, so you have to
| specify the (empty) equijoin predicate list yourself, i.e.
| the identity value: `JOIN ON TRUE`.
|
| EDIT: I had another thought about this.
|
| I think people not designing with the relational algebra in
| mind is the heart of the issue, specifically w.r.t. column
| names. We know that namespaces are a hard problem, and a
| consequence of that problem is that `NATURAL JOIN` as
| specified in the relational algebra seems risky, or overly
| magick-y. It makes what might be an unfortunate coincidence
| (name collision) into something algebraically impactful.
|
| A foreign key join gets around the problem by keeping names
| and namespaces out of it. It's really doing exactly what
| `NATURAL JOIN` is supposed to do, but only in the subset of
| cases where name collisions are meaningful, not
| coincidental.
| mpolichette wrote:
| I'm with you on this one. I prefer it to be in your face simple
| to understand over the terse-ness.
|
| In the gist example, I actually prefer the SQL-92 approach
| where we are joining given an explicit comparison condition.
| Every other implementation seems to be trying to hide details,
| for what gain? Less typing?
|
| In order to use FOREIGN, you will need to know not just what
| columns a table has, but also their configuration. Which would
| also require that you have properly configured your tables.
| While this shouldn't be a hard ask, it does add additional
| dependency and makes use of this "tool" slightly less
| "portable" between systems.
|
| I have unfortunately seen cases where people will only have
| foreign keys un-enforced by their table config. As a dev, if
| you're introduced to a new DB, you wont know immediately if you
| can use this, and if things are configured wrong, you need to
| make a pretty significant change to be able to use it.
|
| I don't see a lot of harm from adding this syntax however as
| people are free to not use it and it relies on an existing
| strict convention.
| JoelJacobson wrote:
| > I don't see a lot of harm from adding this syntax however
| as people are free to not use it and it relies on an existing
| strict convention.
|
| This is a good argument I will add to the list.
|
| Also interesting to read about un-enforced foreign keys. I
| haven't used MSSQL myself, the DB in which I heard it's
| possible, I've only been using PostgreSQL for the last 20
| years, and before that MySQL.
|
| I think the problems you describe is an argument against a
| WITH NOCHECK feature, since it could be misused. Maybe it's
| necessary in some databases still, but at least in
| PostgreSQL, the FOR KEY SHARE lock solved all the issues with
| concurrent updates we had at Trustly. The FOR KEY SHARE was a
| _huge_ patch [1] written mainly by Alvaro Herrera. Thanks to
| it, Trustly has never since had any performance problems with
| foreign keys, and they have AFAIK not needed to drop any
| foreign keys up until today due to locking /performance
| problems.
|
| [1] https://www.commandprompt.com/blog/fixing_foreign_key_dea
| dlo...
| tapas73 wrote:
| I would argue that conceptually it's not too different from
| needing to know what tables/columns exist in datamodel.
|
| I get that it is not common now to care about FKs when writing
| selects. But it could be. Tooling can be improved to help here.
| (show fks, autocomplete)
|
| Btw. Everybody seems to concentrate on conciseness, but keep in
| mind that this helps also with query correctness.
| hnthrowaway0315 wrote:
| What I really need is something like:
|
| SELECT -col1, -col14 FROM table LIMIT 50;
|
| Where the minus sign means I don't want these two columns. I
| still don't see a way to do it easily (for Vertica and in
| Datagrip).
| stingraycharles wrote:
| Simple, effective, and really useful in a lot of situations. I
| like it!
| bobbyi wrote:
| If you have specific columns that you frequently want to ignore
| (which, I think, is the common case of this), you could define
| a view that selects all the columns except those and do your
| queries against that view.
| Eremotherium wrote:
| Or create a function that returns a view of a table minus
| named columns in situ if that's really a common case for you.
| Just as an idea.
| piaste wrote:
| Similarly, I'd love some form of GROUP BY
| every column except for <these>
|
| It feels silly when you are SELECTing a ton of columns, then
| you add a JOIN to a many-to-one relationship which you want to
| aggregate. Now you need to either make it a subquery (and hope
| the optimizer doesn't screw up) or duplicate all your SELECT
| _expression_ (not even the identifiers) into the GROUP BY.
| Svip wrote:
| You know, you can just write GROUP BY x, y, ...
| SELECT t.i+1, count(*) FROM table t GROUP BY 1
|
| 1 in this context means the first selected item (i.e. t.i+1).
| I know this works in PostgreSQL.
| edoceo wrote:
| I use this. It gets messy after like 10 columns. I have a
| trick in my ORM that copies the non-aggrigate columns into
| the statement. Maybe your ORM has it?
| baskethead wrote:
| Queries like this won't break but will silently fail upstream
| by missing data if column names change or are deleted. The
| explicit nature of SQL ensures that it will break positively
| which is a better type of failure in my opinion.
| [deleted]
| Groxx wrote:
| whether that's desirable or not depends on the query and
| what kinds of changes you make. though I could see it being
| error-prone in the most common cases.
| dragonwriter wrote:
| What I really want is a special phrase MINIMAL GROUPING such
| that GROUP BY MINIMAL GROUPING includes exactly the items
| that it would be an error _not_ to include in the GROUP BY.
| icambron wrote:
| I wish it just had a "group by all the stuff I selected
| without aggregation" shortcut
| xyzzy_plugh wrote:
| Isn't this just SELECT DISTINCT ...?
| orthoxerox wrote:
| No. What icambron and I would like is something that can
| get rid of this group by: select
| division_name, branch_name, dealer_id, dealer_name,
| quarter, month, sum(total_paid) from divsions,
| branches, dealers, transactions where .....
| --buncha joins group by division_name,
| branch_name, dealer_id, dealer_name, quarter, month
|
| If I didn't want to group the result by division_name,
| branch_name, dealer_id, dealer_name, quarter and month,
| why would I put them in the select clause?
| benjiweber wrote:
| "group by 1,2,3,4,5,6" is at least a little more consise.
| Sesse__ wrote:
| What if you do SELECT a + b, SUM(x) FROM t1? Should the
| implicit GROUP BY be on (a,b) or (a+b)?
| coderzach wrote:
| yeah, this is the right answer. Are there any examples
| where this isn't what you want? I guess GROUPING SETS and
| ROLLUP, but those could be special.
| oweiler wrote:
| Can't you use an alias in a group by?
| Sesse__ wrote:
| Not by the SQL standard, since projection (which creates
| aliases) happens after aggregation. There are many
| databases that allow it as an extension, though.
| blondin wrote:
| i would rather it written as SELECT * BUT or SELECT * EXCEPT or
| even SELECT ALL BUT.
|
| SQL has always been that language that is easy to read. even
| when you don't understand what the queries are doing. adding a
| cryptic syntax like "-column" would make it less readable.
| colinmhayes wrote:
| EXCEPT is already a sql keyword
| recursive wrote:
| The given syntax is obviously a no-go. It's already unary
| minus.
| polygotdomain wrote:
| What about SELECT * WITHOUT {columns_you_do_not_want} FROM...
|
| EXCEPT is already a keyword and has is used for set-based
| operations, so I don't think it's good to overload it.
| edgyquant wrote:
| I like this fwiw
| c06n wrote:
| R's data.table has that.
|
| dt[1:50, -c('col1', 'col14')]
| jdunstan wrote:
| That's very close to SELECT *, which has it's own dangers. I
| agree that it would be nice for exploration and testing, but
| probably should not be used in production.
| hnthrowaway0315 wrote:
| The problem I found out is that SELECT * LIMIT 10 is
| guaranteed to be needed so it's actually much more important
| than whatever is in production (you have to explore a lot
| before writing the production scripts). My idea is that
| exploration should be as easy as possible.
|
| Things such as SELECT * EXCEPT col1, col2 are really a PIA to
| write and can build up frustration level really quickly.
| Certain IDEs such as Datagrip ease the process by providing
| "macros" but they are not enough.
|
| Another thing is to generate useful boilerplates such as
| SELECT col1 FROM table GROUP BY col1 ORDER BY col1 to explore
| all unique values of col1.
| ipaddr wrote:
| What dangers lay in select *...? Too much data?
| wiredfool wrote:
| Minimal if you refer to columns by name. Risky if you rely
| on column ordering. Can be too much data and a performance
| issue if you have large columns that you're not using.
| (Though, it's not much different if you're using an orm
| that loads the whole object anyway), like Django or
| sqlalchemy.
|
| I've done it for many years (using named columns/
| dictionaries as the result set) and its never been an
| issue.
| nvartolomei wrote:
| Some have support for this.
|
| - BQ https://cloud.google.com/bigquery/docs/reference/standard-
| sq...
|
| - CH https://clickhouse.com/docs/en/sql-
| reference/statements/sele...
| hnthrowaway0315 wrote:
| Thanks, these really look neat. Unfortunately we only use
| Vertica :/
| dolmen wrote:
| > SELECT -col1, -col14 FROM table LIMIT 50;
|
| This is already valid SQL. Example: SELECT -col1, -col4 FROM
| (SELECT 1 AS col1, 2 AS col4) AS tbl;
|
| Do you think seriously that a new meaning could ever be
| attached to that syntax?
| bawolff wrote:
| Seems like minor syntactic sugar. I'm not opposed but really only
| seems mildly useful at best.
| pella wrote:
| related: pgsql-hackers thread and context
|
| https://www.postgresql.org/message-id/flat/1aec0dd0-dc27-40e...
| mi_lk wrote:
| tangent question - what're some effective ways to learn how to
| write SQL?
| jdunstan wrote:
| w3schools and the online documentation for any database are
| good places to start (Postgres and SQL Server have pretty good
| documentation).
|
| There are also easy-to-google SQL Puzzles, if you're looking
| for something more advanced.
|
| I would also recommend learning about query plans and how to
| read them, they're invaluable for query optimization.
| orthoxerox wrote:
| Basic SQL has lots of tutorials. Graduate from them to Markus
| Winand's use-the-index-luke.com and modern-sql.com
| Guest42 wrote:
| Is this not something that should be written at the app level
| rather than dml?
| Tostino wrote:
| Foreign keys in general? I am not sure I understand what you
| mean...
| Guest42 wrote:
| Not something I've ever thought about but I think that it
| could be done via macro,
|
| I have written tsql functions in c# and imagine that other
| dialects have analogous functionality.
|
| It seems as though the point is to cut down on sql code, it'd
| also be possible to query the foreign keys and create a data
| structure that could feed a function for joins.
|
| I haven't thought out the specifics but think this type of
| approach would be more practical than changing the sql
| standard.
| Tostino wrote:
| At that point you are just building a feature into an "ORM"
| of some sort. The entire point is to allow queries to be
| written in a more concise way for this simple equality join
| use case which is (total ballpark figure) ~half of the
| joins in my system.
| Ayesh wrote:
| It looked very useful and intuitive at first, but on further
| thought, I think the only time that you'd truly benefit from it
| if you SELECT *. For other SELECT queries with explicit field
| name list, you'd need to know the table the key constraint links
| to anyway.
|
| SQL Views serve perfectly well for queries encouraged by the
| schema itself, and i think they are more sophisticated and
| practical way.
| JoelJacobson wrote:
| There is similar comment in the thread, suggesting both tables
| should always be specified.
|
| Not sure what I prefer yet. The idea is the foreign key name is
| usually the same as the referenced table, so should be an
| infrequent problem.
| zmmmmm wrote:
| > The idea is the foreign key name is usually the same as the
| referenced table
|
| I think this is where you are hitting some of your turbulence
| here because lots of ORMs / schema management tools actually
| generate completely cryptic fk names (sometimes based on the
| hash of the columns or similar). Personally I think weighing
| in legacy baggage like that too highly is a bad thing as it
| creates enormous inertia.
| JoelJacobson wrote:
| Thanks for explaining, I have no experience of ORMs, always
| written my queries manually. This was insightful. Thanks
| also for the word "inertia", good one, will add to my
| vocabulary.
| [deleted]
| Smotko wrote:
| It might be just me, but I feel like remembering the foreign key
| name is more difficult than remembering the columns that you need
| in the ON clause. Especially since you can usually find the
| column names by just seeing the data in the table (select * from
| x) wheres seeing the foreign key names is much harder (show
| create table x?).
|
| Also, if you use an ORM it will usually generate foreign key
| names that are almost impossible to remember.
| exabrial wrote:
| I think this is an operator problem. You're using the wrong
| tool for the job.
|
| TablePlus, SequelAce, the official MySQL client all support
| cntrl-space autocompletion. I wish we used Postgres, but I
| imagine the landscape is the same. The big box databases like
| Oracle, DB2 undoubtedly having this tooling as well.
|
| That being said, here is our fk naming convention: `fk-
| asset_types->asset_categories` which pretty states what's going
| on and is easy to remember.
| dolmen wrote:
| SQL is not only written in an SQL client. SQL is also written
| (and read from) embedded/mixed in an other programming
| language were tooling is not always available.
|
| Having to know the names of foreign keys (in addition to the
| column names of the 2 tables) is adding more cognitive load.
| I don't think that is an improvement.
| JoelJacobson wrote:
| It would indeed be difficult to remember, but the proposal also
| suggest changing the default naming convention for foreign
| keys, to give them the same name as the referenced table.
|
| If using an ORM, I would guess this proposal isn't useful,
| since then you wouldn't hand-write queries anyway, right?
| Except when you want to override the queries generated by the
| ORM? (I'm not an ORM user myself.)
| to11mtm wrote:
| Speaking as someone who has used ORMs in the past and
| contributes to a LINQ Micro ORM...
|
| It might make tooling 'easier', but since backwards
| compatibility has to be considered the actual value add is
| questionable IMO.
|
| Most ORMs/MicroORMs will have tooling that sniffs out the DB
| Schema including foreign keys, and if you are using those
| bits (i.e. 'not hand written') most will do the right thing
| today. I suppose you could include some extra syntax for
| whatever DSL you're providing users....
|
| IDK. Speaking as someone who is very comfortable in SQL, This
| feels more like syntactic sugar than anything else.
| rtpg wrote:
| In a universe where foreign key index names are important we
| would specify better names.
|
| I think stuff like "documents_by_user" as foreign key names and
| explicit index usage would improve peoples awareness of how
| indices get used and would generally be a positive
| matteote wrote:
| This reminds me of KEY JOIN in Sybase which, unlike this syntax,
| does not require to specify the foreign key. I find Sybase's
| syntax to be too implicit, and challenging to port to other
| dialects; JOIN FOREIGN with explicit PK reference looks much
| better.
|
| I wonder how it is expected to work with non-table references
| (views, CTEs, subqueries), especially when the columns involved
| in the foreign key (on either side) are not returned explicitly
| by the referenced object.
| JoelJacobson wrote:
| > I wonder how it is expected to work with non-table references
| (views, CTEs, subqueries)
|
| It's not, it's only for the special but common case of joining
| two tables based on a foreign key.
| halayli wrote:
| It's a great proposition. However, I would suggest having it as a
| prefix rather than postfix of the JOIN expression to reduce
| implementation side effects and avoid ambiguity between keyword
| 'FOREIGN' and a table called 'FOREIGN'.
|
| example: FOREIGN LEFT JOIN
| Upitor wrote:
| In my opinion this proposal seems only to consider simple cases,
| but there are many not-so-simple relationsship types:
|
| Consider a 'sales' table which includes columns [time] and
| [sold_by_employee_id], and a periodized 'employee' table which
| includes columns [employee_id], [valid_from] and [valid_to]
| columns. There is a perfectly valid relationsship between the two
| tables, but you cant join them using only equal-statements (you
| need a between-statement as well)
| JoelJacobson wrote:
| Nice example! The join you describe would remain as a JOIN ON.
|
| This is per design. Quote from the proposal:
|
| "The idea is to improve the SQL language, specifically the join
| syntax, for the special but common case when joining on foreign
| key columns." ... "If the common simple joins (when joining on
| foreign key columns) would be written in a different syntax,
| the remaining joins would visually stand out and we could focus
| on making sure we understand them when reading a large SQL
| query."
|
| So, the special non-equal based join condition you describe,
| would become more visible, and stand out, allowing readers to
| pay more attention to it.
|
| The hypothesis is most joins are made on foreign key columns,
| so if we can improve such cases, a lot can be won.
| DevKoala wrote:
| But how could you accurately tell if some queries join on the
| foreign key, but were written by someone without knowledge of
| the new specification?
| Upitor wrote:
| I see. I should have read the article more carefully :-)
| Upitor wrote:
| Also, consider a 'sales' table with multiple references to a
| 'calendar' table: [shipped_date], [order_date], [received_date]
| JoelJacobson wrote:
| Good example too, but this one can with benefit be written
| using the JOIN FOREIGN syntax, you just need to give the
| foreign keys suitable names such as e.g. "shipped_date",
| "order_date", "received_date". Or, to remind you of which is
| the referenced table, perhaps you want to include it in the
| names, and the names would be "shipped_date_calendar",
| "order_date_calendar", "received_date_calendar", but
| personally I would prefer just "shipped_date" or perhaps even
| just "shipped".
| zoomablemind wrote:
| I wonder if primary objective of this proposal is to increase
| convenience and reduce the amount of typing that users need to
| do, when using SQL inline?
|
| If indeed that's the goal, then it targets a rather specific
| subset of users dealing with explorative/ad-hoc analysis on a
| database. Once such analysis is done, the queries would usually
| need to be formalized for robustness and to avoid ambiguities.
|
| Obviously, the whole train of queries would derail, should the FK
| (which is just an index) be dropped for one reason or the other.
|
| The existing JOIN features are explicit at least on the level of
| specified table structure. I believe, any constraint details in
| such context will be, well, ...foreign.
|
| Perhaps, a simple solution to verbosity problem may be to use an
| "intelligent" SQL client, which supports some form of
| autocomplete and which may as well internally use as many
| schema/data details as available.
|
| In anycase, thanks for making the proposal. I was not aware of
| JOIN ... USING syntax. I often wanted some convenient way of
| specifying homonymous join columns, as some schemes are
| consistent in such namings. So typing JOIN on col1, col2... would
| translate into equality joins between the listed tables. However,
| again, there is ambiguity here...
| abhchand wrote:
| This feels like adding syntactic sugar in some sense. That is, a
| layer of convenience to accomplish something that can already be
| accomplished today.
|
| It also breaks the following:
|
| * It's better to be explicit than infer * Keep specifications
| simple
| slt2021 wrote:
| not a good idea, constaints are one thing and joins are another
| thing. plus I can join on conditions other than =.
|
| the problem that author is trying to solve can be easily solved
| by a view:
|
| 1. Declare a view with all necessary JOINs once
|
| 2. select from view only what you need, aggregate what you want
|
| 3. Optimizer will throw out unnecessary stuff and optimize query
| while all JOIN logic will be declared only once and will be
| hidden inside the view
|
| plus each DBMS has its own flavor of SQL and will have its own
| query optimizer nuances when dealing with joins, especially
| nested via CTEs/views/lateral queries,etc.
| Sesse__ wrote:
| Your #3 is very optimistic. You can't just remove a join (be it
| inner, left, or really any other type) just because you didn't
| refer to any fields from one of the sides, and it's even harder
| when aggregation is in the mix. There are cases where you can
| without influencing the result, but they are special-case
| optimizations and not universally supported across databases.
| gigatexal wrote:
| I like this a lot. It makes thing cleaner and more like explicit.
| clintonb wrote:
| How is this more explicit than specifying join columns today?
| lolive wrote:
| Coming from the graph database / semantic web area, I would
| propose foreign key/ primary key relationships in the DB to be
| detailed _and named_ in a schema description, and then queries
| reference those relationships by name to define the needed joins.
| criticaltinker wrote:
| Yup totally agree - OP is an interesting proposal and has
| provoked a lot of quality commentary regarding the tradeoffs of
| clarity vs conciseness, implicit vs explicit, FK performance,
| etc.
|
| But IMO you've raised _the_ important long term consideration -
| do graph based schemas and query languages obviate the need to
| model foreign keys explicitly? If this JOIN FOREIGN proposal is
| an incremental step forward, what's the next big leap?
___________________________________________________________________
(page generated 2021-12-30 23:00 UTC)