[HN Gopher] Solving duplicate data with performant deduplication
___________________________________________________________________
Solving duplicate data with performant deduplication
Author : goodroot
Score : 9 points
Date : 2023-11-20 18:08 UTC (1 days ago)
(HTM) web link (questdb.io)
(TXT) w3m dump (questdb.io)
| goodroot wrote:
| Hey! Thanks for upvoting.
|
| Happy to answer any questions about deduplication. One thing
| that's not included in the write-up is that we also address out-
| of-order indexing alongside deduplication.
| CommanderHux wrote:
| The dataset link seems to be dead. Do you have a mirror?
| whalesalad wrote:
| Can anyone comment on QuestDB vs Clickhouse vs TimescaleDB? Real
| world experience around ergonomics, ops, etc.
|
| Currently using BigQuery for a lot of this (ingesting ~5-10TB
| monthly) but would like to begin exploring in-house tooling.
|
| On the flip side, we still use PSQL/RDS a lot and I enjoy it for
| the low operations burden - but we're doing some time series
| stuff with it now that is starting to fall over. TimescaleDB is
| nice because it _is_ postgres, but afaik cannot work inside RDS.
| Clickhouse is next on my list for a test deployment, but QuestDB
| looks pretty neat too.
| gigatexal wrote:
| What about iceberg tables and a lake approach on GCS and then
| picking a querying engine?
___________________________________________________________________
(page generated 2023-11-21 23:00 UTC)