[HN Gopher] Solving duplicate data with performant deduplication
       ___________________________________________________________________
        
       Solving duplicate data with performant deduplication
        
       Author : goodroot
       Score  : 9 points
       Date   : 2023-11-20 18:08 UTC (1 days ago)
        
 (HTM) web link (questdb.io)
 (TXT) w3m dump (questdb.io)
        
       | goodroot wrote:
       | Hey! Thanks for upvoting.
       | 
       | Happy to answer any questions about deduplication. One thing
       | that's not included in the write-up is that we also address out-
       | of-order indexing alongside deduplication.
        
         | CommanderHux wrote:
         | The dataset link seems to be dead. Do you have a mirror?
        
       | whalesalad wrote:
       | Can anyone comment on QuestDB vs Clickhouse vs TimescaleDB? Real
       | world experience around ergonomics, ops, etc.
       | 
       | Currently using BigQuery for a lot of this (ingesting ~5-10TB
       | monthly) but would like to begin exploring in-house tooling.
       | 
       | On the flip side, we still use PSQL/RDS a lot and I enjoy it for
       | the low operations burden - but we're doing some time series
       | stuff with it now that is starting to fall over. TimescaleDB is
       | nice because it _is_ postgres, but afaik cannot work inside RDS.
       | Clickhouse is next on my list for a test deployment, but QuestDB
       | looks pretty neat too.
        
         | gigatexal wrote:
         | What about iceberg tables and a lake approach on GCS and then
         | picking a querying engine?
        
       ___________________________________________________________________
       (page generated 2023-11-21 23:00 UTC)