[HN Gopher] SQL Join Flavors
       ___________________________________________________________________
        
       SQL Join Flavors
        
       Author : nalgeon
       Score  : 33 points
       Date   : 2023-09-20 12:09 UTC (1 days ago)
        
 (HTM) web link (antonz.org)
 (TXT) w3m dump (antonz.org)
        
       | hatthew wrote:
       | This is a nice overview, but am I the only one who found the
       | graphic more confusing than helpful? I can decipher its meaning
       | by using my knowledge of joins, but it in no way improves my
       | understanding of joins.
        
       | aidos wrote:
       | I've done a lot of interviewing and one thing I've noticed is
       | that many people are pretty confused about how joins work.
       | 
       | They think the foreign keys put some sort of link on a parent
       | table to let it access rows of a child table as if they were an
       | array.
       | 
       | There's also a massive misunderstanding around ordering where a
       | lot of people think that by reordering the joins you can control
       | the order in which the db is going to execute the query.
        
       | paulddraper wrote:
       | Maybe someday SQL will let us do ANTI JOIN
       | 
       | A man can dream
        
         | aidos wrote:
         | I always enjoy how the planner explicitly calls it an anti join
         | but for those unfamiliar with the tricks the sql looks pretty
         | weird on first glance.
         | 
         | Traditionally I've always done it as left join null but I saw a
         | case a while back where the not exists version performed better
         | (Postgres). I feel like I must have missed something because I
         | thought they optimised to the same thing (maybe there were
         | other criteria in the subquery that allowed it to use a better
         | index).
        
       | Supply5411 wrote:
       | Having a deeper understanding of the different JOIN flavors is
       | critical. For example, I don't think many people realize how
       | deceptively dangerous OUTER joins are:                 SELECT
       | user.user_id       FROM users       RIGHT JOIN purchases
       | ON purchases.user_id = user.user_id         AND user.user_id=123
       | 
       | By leaving the user_id=123 constraint in the JOIN instead of
       | putting it in the WHERE, you've just exposed everyone's purchase
       | data to the user. Easy to miss too if your tests don't fully
       | create fixtures for multiple users. This becomes even more
       | dangerous as we're entering the world of LLMs generating SQL[1]
       | 
       | 1. https://docs.heimdallm.ai/en/main/attack-
       | surface/sql.html#ou...
        
         | bob1029 wrote:
         | > This becomes even more dangerous as we're entering the world
         | of LLMs generating SQL[1]
         | 
         | I am beginning to push back against the idea of LLMs authoring
         | SQL. Determinism is really important for _most_ realistic use
         | cases. SQL is one of those  "all or nothing" experiences.
         | Pretty much the last thing you want to have a
         | temperature/spiciness slider over.
         | 
         | Granted, if your use case is to simply memorize a handful of
         | canonical queries and to provide a UX-friendly way for business
         | administrators to retrieve these queries, then have at it.
         | 
         | But, if you are hoping to ask the LLM to write novel,
         | sophisticated, _domain-specific_ queries involving joins across
         | 10+ tables, aggregations, recursion, windowing, etc., you are
         | _going to have a very bad time_. In my experience, this is what
         | everyone is dreaming about doing, and I think it is an
         | unproductive fantasy at this point. Fine tuning will not give
         | you what you seek, but don 't let me stop you from trying. We
         | burned over 200 hours on it with nothing to show for our time.
         | 
         | For me, LLMs are far more interesting for determining the
         | user's intentions than emitting a final query. Classification
         | seems like the actual superpower here. Classification can feed
         | a very powerful deterministic query building engine that will
         | actually give you what you want and you can prove it will work.
         | It just takes a lot more tedious work than most humans are
         | willing to endure, so here we are talking about various
         | shortcuts.
        
           | Supply5411 wrote:
           | >We burned over 200 hours on it with nothing to show for our
           | time.
           | 
           | Can you talk more about this? I'm working in this space now,
           | but not from the "make an LLM give you the right query"
           | perspective, but from the "make the LLM-produced query be
           | safe" perspective[1]
           | 
           | 1. https://docs.heimdallm.ai/en/main/blog/posts/safe-sql-
           | execut...
        
           | totalhack wrote:
           | How about an LLM talking to a semantic layer that then writes
           | the SQL? I added experimental support for that in a project
           | of mine, but using a semantic layer makes most querying so
           | simple that natural language based approaches just seem kinda
           | silly unless you have interface constraints (like on a mobile
           | device).
           | 
           | https://github.com/totalhack/zillion
        
         | chasil wrote:
         | SQLite doesn't support right joins, which is a virtue in this
         | case?
         | 
         | The article really ought to filter everything through SQLite,
         | for how pervasive it is.
        
           | bewaretheirs wrote:
           | Not any more. SQLite received support for RIGHT and FULL
           | OUTER joins in 3.39.0, released over a year ago.
        
         | jumpman500 wrote:
         | I mean yes writing the wrong query will give you the wrong
         | results. But there's nothing inherently risky with a
         | left/right/full join.
         | 
         | I could make the same argument about inner joins giving the
         | risk of dropping records you intended to keep (very common in
         | analytics). Just have to know what you're doing and what data
         | you got.
        
           | Supply5411 wrote:
           | Outer joins are inherently risky because they make it very
           | easy to put conditions in a place that looks correct (read:
           | passes review) but isn't, and the results are disastrous
           | (information leak).
        
             | djoletina wrote:
             | I disagree, as the previous commenter said it's not the
             | tool, it's the user and if you don't know what you're doing
             | you're prone to doing damage. Non trivial amounts of auth
             | are made by "junior" engineers. I find that a lot more
             | disastrous than a potential outer join + where accident
             | *that also requires* the application code to leak the data
             | to the outside world to be damaging *and* the underlying
             | data to be sensitive.
        
               | Supply5411 wrote:
               | So tools carry the same level of risk if the only way
               | they can do harm is user error, even if that user error
               | is easier to miss by design? I guess I just fundamentally
               | disagree because I think risk analysis is holistic. Is
               | the tool fundamentally correct (free of bugs) and is it
               | easier to misuse.
               | 
               | I'm going to look at an outer join FAR more critically,
               | regardless of who wrote it, because it's easy to mess up
               | the conditions.
        
       ___________________________________________________________________
       (page generated 2023-09-21 23:01 UTC)