[HN Gopher] SQL Join Flavors
___________________________________________________________________
SQL Join Flavors
Author : nalgeon
Score : 33 points
Date : 2023-09-20 12:09 UTC (1 days ago)
(HTM) web link (antonz.org)
(TXT) w3m dump (antonz.org)
| hatthew wrote:
| This is a nice overview, but am I the only one who found the
| graphic more confusing than helpful? I can decipher its meaning
| by using my knowledge of joins, but it in no way improves my
| understanding of joins.
| aidos wrote:
| I've done a lot of interviewing and one thing I've noticed is
| that many people are pretty confused about how joins work.
|
| They think the foreign keys put some sort of link on a parent
| table to let it access rows of a child table as if they were an
| array.
|
| There's also a massive misunderstanding around ordering where a
| lot of people think that by reordering the joins you can control
| the order in which the db is going to execute the query.
| paulddraper wrote:
| Maybe someday SQL will let us do ANTI JOIN
|
| A man can dream
| aidos wrote:
| I always enjoy how the planner explicitly calls it an anti join
| but for those unfamiliar with the tricks the sql looks pretty
| weird on first glance.
|
| Traditionally I've always done it as left join null but I saw a
| case a while back where the not exists version performed better
| (Postgres). I feel like I must have missed something because I
| thought they optimised to the same thing (maybe there were
| other criteria in the subquery that allowed it to use a better
| index).
| Supply5411 wrote:
| Having a deeper understanding of the different JOIN flavors is
| critical. For example, I don't think many people realize how
| deceptively dangerous OUTER joins are: SELECT
| user.user_id FROM users RIGHT JOIN purchases
| ON purchases.user_id = user.user_id AND user.user_id=123
|
| By leaving the user_id=123 constraint in the JOIN instead of
| putting it in the WHERE, you've just exposed everyone's purchase
| data to the user. Easy to miss too if your tests don't fully
| create fixtures for multiple users. This becomes even more
| dangerous as we're entering the world of LLMs generating SQL[1]
|
| 1. https://docs.heimdallm.ai/en/main/attack-
| surface/sql.html#ou...
| bob1029 wrote:
| > This becomes even more dangerous as we're entering the world
| of LLMs generating SQL[1]
|
| I am beginning to push back against the idea of LLMs authoring
| SQL. Determinism is really important for _most_ realistic use
| cases. SQL is one of those "all or nothing" experiences.
| Pretty much the last thing you want to have a
| temperature/spiciness slider over.
|
| Granted, if your use case is to simply memorize a handful of
| canonical queries and to provide a UX-friendly way for business
| administrators to retrieve these queries, then have at it.
|
| But, if you are hoping to ask the LLM to write novel,
| sophisticated, _domain-specific_ queries involving joins across
| 10+ tables, aggregations, recursion, windowing, etc., you are
| _going to have a very bad time_. In my experience, this is what
| everyone is dreaming about doing, and I think it is an
| unproductive fantasy at this point. Fine tuning will not give
| you what you seek, but don 't let me stop you from trying. We
| burned over 200 hours on it with nothing to show for our time.
|
| For me, LLMs are far more interesting for determining the
| user's intentions than emitting a final query. Classification
| seems like the actual superpower here. Classification can feed
| a very powerful deterministic query building engine that will
| actually give you what you want and you can prove it will work.
| It just takes a lot more tedious work than most humans are
| willing to endure, so here we are talking about various
| shortcuts.
| Supply5411 wrote:
| >We burned over 200 hours on it with nothing to show for our
| time.
|
| Can you talk more about this? I'm working in this space now,
| but not from the "make an LLM give you the right query"
| perspective, but from the "make the LLM-produced query be
| safe" perspective[1]
|
| 1. https://docs.heimdallm.ai/en/main/blog/posts/safe-sql-
| execut...
| totalhack wrote:
| How about an LLM talking to a semantic layer that then writes
| the SQL? I added experimental support for that in a project
| of mine, but using a semantic layer makes most querying so
| simple that natural language based approaches just seem kinda
| silly unless you have interface constraints (like on a mobile
| device).
|
| https://github.com/totalhack/zillion
| chasil wrote:
| SQLite doesn't support right joins, which is a virtue in this
| case?
|
| The article really ought to filter everything through SQLite,
| for how pervasive it is.
| bewaretheirs wrote:
| Not any more. SQLite received support for RIGHT and FULL
| OUTER joins in 3.39.0, released over a year ago.
| jumpman500 wrote:
| I mean yes writing the wrong query will give you the wrong
| results. But there's nothing inherently risky with a
| left/right/full join.
|
| I could make the same argument about inner joins giving the
| risk of dropping records you intended to keep (very common in
| analytics). Just have to know what you're doing and what data
| you got.
| Supply5411 wrote:
| Outer joins are inherently risky because they make it very
| easy to put conditions in a place that looks correct (read:
| passes review) but isn't, and the results are disastrous
| (information leak).
| djoletina wrote:
| I disagree, as the previous commenter said it's not the
| tool, it's the user and if you don't know what you're doing
| you're prone to doing damage. Non trivial amounts of auth
| are made by "junior" engineers. I find that a lot more
| disastrous than a potential outer join + where accident
| *that also requires* the application code to leak the data
| to the outside world to be damaging *and* the underlying
| data to be sensitive.
| Supply5411 wrote:
| So tools carry the same level of risk if the only way
| they can do harm is user error, even if that user error
| is easier to miss by design? I guess I just fundamentally
| disagree because I think risk analysis is holistic. Is
| the tool fundamentally correct (free of bugs) and is it
| easier to misuse.
|
| I'm going to look at an outer join FAR more critically,
| regardless of who wrote it, because it's easy to mess up
| the conditions.
___________________________________________________________________
(page generated 2023-09-21 23:01 UTC)