[HN Gopher] Avoiding duplicate objects in Django querysets
___________________________________________________________________
Avoiding duplicate objects in Django querysets
Author : johnnymetz
Score : 27 points
Date : 2026-01-23 16:43 UTC (4 days ago)
(HTM) web link (johnnymetz.com)
(TXT) w3m dump (johnnymetz.com)
| augusteo wrote:
| Nice writeup. The Exists subquery approach is definitely the
| cleanest.
|
| One thing worth mentioning: if you're hitting this problem
| frequently, it might be worth reconsidering the query patterns
| themselves. We had a similar issue at work where we kept adding
| `.distinct()` everywhere, and eventually realized we were doing
| the filtering wrong upstream.
|
| The PostgreSQL-specific `distinct(*fields)` with the ORDER BY
| restriction is one of those things that trips people up. The
| error message isn't great either. "SELECT DISTINCT ON expressions
| must match initial ORDER BY expressions" is technically correct
| but doesn't explain why or what to do about it.
|
| Good call recommending Exists as the default approach. It's more
| explicit about intent too.
| ducdetronquito wrote:
| Good read, TIL!
|
| That being said, I use Django daily for 10 years but I don't
| understand the ORM besides basic CRUD. Even a simple group by
| looks weird.
|
| Writing plain SQL feels easier and more maintainable in the long
| run.
| storystarling wrote:
| I've mostly switched to raw SQL for these kinds of queries too.
| It seems like the maintenance burden is actually lower when you
| can see the explicit query plan, rather than trying to reverse
| engineer what the ORM is doing with the joins.
| jiaaro wrote:
| Exists is a useful tool that you should certainly know how to
| use. Whether or not it's faster than distinct depends on the rest
| of the query. I've optimized queries where distinct is faster
| than exists. It's been some time, but I think it boils down to
| the relative sizes of the tables and how many of the exists
| queries actually find something (and how often they find more
| than one something).
|
| Also, some databases (like clickhouse) allow for `any` joins
| which avoid producing duplicate rows. For example:
| select author.* from author inner any join book
| on ( book.author_id = author.id and
| book.title like 'Book%' )
| tecoholic wrote:
| Nice write up showcasing Exists. I would say, if ORM abstraction
| "distinct()" is a performance issue, then it's probably time to
| switch to SQL. I find it simpler to either use the ORM or the SQL
| than to bend ORM into SQL.
| sgarland wrote:
| The ORM isn't the performance issue here, it's the DB. DISTINCT
| is a form of GROUP BY, and so it brings with it the various
| limitations imposed by the RDBMS. For example, look at what
| MySQL requires to use an index to perform a GROUP BY.
|
| 0: https://dev.mysql.com/doc/refman/8.4/en/group-by-
| optimizatio...
| tecoholic wrote:
| Ah! I see. Thank you.
___________________________________________________________________
(page generated 2026-01-28 07:01 UTC)