[HN Gopher] Avoiding duplicate objects in Django querysets
       ___________________________________________________________________
        
       Avoiding duplicate objects in Django querysets
        
       Author : johnnymetz
       Score  : 27 points
       Date   : 2026-01-23 16:43 UTC (4 days ago)
        
 (HTM) web link (johnnymetz.com)
 (TXT) w3m dump (johnnymetz.com)
        
       | augusteo wrote:
       | Nice writeup. The Exists subquery approach is definitely the
       | cleanest.
       | 
       | One thing worth mentioning: if you're hitting this problem
       | frequently, it might be worth reconsidering the query patterns
       | themselves. We had a similar issue at work where we kept adding
       | `.distinct()` everywhere, and eventually realized we were doing
       | the filtering wrong upstream.
       | 
       | The PostgreSQL-specific `distinct(*fields)` with the ORDER BY
       | restriction is one of those things that trips people up. The
       | error message isn't great either. "SELECT DISTINCT ON expressions
       | must match initial ORDER BY expressions" is technically correct
       | but doesn't explain why or what to do about it.
       | 
       | Good call recommending Exists as the default approach. It's more
       | explicit about intent too.
        
       | ducdetronquito wrote:
       | Good read, TIL!
       | 
       | That being said, I use Django daily for 10 years but I don't
       | understand the ORM besides basic CRUD. Even a simple group by
       | looks weird.
       | 
       | Writing plain SQL feels easier and more maintainable in the long
       | run.
        
         | storystarling wrote:
         | I've mostly switched to raw SQL for these kinds of queries too.
         | It seems like the maintenance burden is actually lower when you
         | can see the explicit query plan, rather than trying to reverse
         | engineer what the ORM is doing with the joins.
        
       | jiaaro wrote:
       | Exists is a useful tool that you should certainly know how to
       | use. Whether or not it's faster than distinct depends on the rest
       | of the query. I've optimized queries where distinct is faster
       | than exists. It's been some time, but I think it boils down to
       | the relative sizes of the tables and how many of the exists
       | queries actually find something (and how often they find more
       | than one something).
       | 
       | Also, some databases (like clickhouse) allow for `any` joins
       | which avoid producing duplicate rows. For example:
       | select author.*         from author         inner any join book
       | on (             book.author_id = author.id              and
       | book.title like 'Book%'         )
        
       | tecoholic wrote:
       | Nice write up showcasing Exists. I would say, if ORM abstraction
       | "distinct()" is a performance issue, then it's probably time to
       | switch to SQL. I find it simpler to either use the ORM or the SQL
       | than to bend ORM into SQL.
        
         | sgarland wrote:
         | The ORM isn't the performance issue here, it's the DB. DISTINCT
         | is a form of GROUP BY, and so it brings with it the various
         | limitations imposed by the RDBMS. For example, look at what
         | MySQL requires to use an index to perform a GROUP BY.
         | 
         | 0: https://dev.mysql.com/doc/refman/8.4/en/group-by-
         | optimizatio...
        
           | tecoholic wrote:
           | Ah! I see. Thank you.
        
       ___________________________________________________________________
       (page generated 2026-01-28 07:01 UTC)