[HN Gopher] Fast Counting with PostgreSQL and Haskell
___________________________________________________________________
Fast Counting with PostgreSQL and Haskell
Author : EntICOnc
Score : 23 points
Date : 2021-12-26 09:44 UTC (13 hours ago)
(HTM) web link (jezenthomas.com)
(TXT) w3m dump (jezenthomas.com)
| kjeetgill wrote:
| > Some clever abuse of the query planner output quickly gives us
| an estimate.
|
| Tldr: They estimate counts by parsing it out of the query plan
| but get an exact count if the estimate is under a threshold.
|
| Clever! But it makes me nervous. I was expecting it to involve
| HLL or CountMinSketch or something.
| KarlKemp wrote:
| This is the ugliest hack I've seen since I wrote a php script
| that would output a Perl script that would create an excel file.
|
| I have nothing but respect for the author, but if this is needed,
| maybe Postgres needs to invest some work?
| Zababa wrote:
| How is that an ugly hack? That looks like a reasonable solution
| to a reasonable problem. Counting millions of things with user-
| defined filters looks like a hard problem.
| Rezwoodly wrote:
| Why can't he just use an index?
| WJW wrote:
| If I understand correctly, it is because they want to support
| arbitrary queries and so a single index would only help for
| certain queries but not others.
| DylanSp wrote:
| Given that the users can apply arbitrary filters, what exactly
| would you index?
| Rezwoodly wrote:
| Surely you could use compound indexing, depending on the set
| size of fields to be indexed, should be achievable.
| amichal wrote:
| Indexes and compound indexes tend to become slow themselves
| when you end up indexing a significant fraction of the
| columns or when your result sets are a significant portion
| of the total for count. They cost extra time and space to
| update on write. If your database has concurrent writes
| going on Postgres still needs to verify which row versions
| are visible to the transaction on read. The additional
| pressure on memory and disk means you have a bigger working
| set. The query planner has a combinatorial number of
| indexes to choose from So the planning stage slows down.
| You have to spend even more time on analyze to keep the
| estimates (which power this technique) correct etc. At 10s
| of millions of rows with more than a few low cardinality
| columns in the filter it's sometimes faster to just get as
| much of the table as you can into ram and accept the scan.
___________________________________________________________________
(page generated 2021-12-26 23:01 UTC)