[HN Gopher] Nulls: Revisiting null representation in modern colu...
       ___________________________________________________________________
        
       Nulls: Revisiting null representation in modern columnar formats
        
       Author : tosh
       Score  : 53 points
       Date   : 2024-10-30 20:26 UTC (7 days ago)
        
 (HTM) web link (dl.acm.org)
 (TXT) w3m dump (dl.acm.org)
        
       | zX41ZdbW wrote:
       | How did it go through peer review without a comparison with
       | ClickHouse?
       | 
       | > Our analysis shows that the Compact layout performs better when
       | Null ratio is high and the Placeholder layout is better when the
       | Null ratio is low or the data is serial-correlated.
       | 
       | ClickHouse uses a placeholder value with a separate stream with
       | NULL-masks, and additionally, it has the Sparse column format,
       | which is named Compact in the paper (but currently, the Sparse
       | format applies to encode default values more efficiently rather
       | than NULL values).
        
         | jnordwick wrote:
         | kdb+ isn't in there either, and that is more important than CH
         | I think. KDB is boring, just uses a placeholder. I think it
         | might be because these two do it in a boring fashion.
        
           | mhuffman wrote:
           | >and that is more important than CH I think.
           | 
           | If measured by $$$$$ clickhouse certainly has more
           | installations.
        
             | jnordwick wrote:
             | CH definitely has more installations. not sure about which
             | pulls in more revenue. KDB installations will run you
             | $250,000/yr on the low-end for just the software license.
             | Not sure how that compares.
        
         | xyzzy_plugh wrote:
         | Is there documentation for the ClickHouse native binary format
         | somewhere? Parquet and ORC are standalone formats. This is a
         | strange comparison to demand.
         | 
         | The paper is addressing the abstract techniques and is not a
         | benchmark of various implementations. It seems to me that
         | ClickHouse's design is already represented.
         | 
         | You're the CTO of ClickHouse. How's your relationship with
         | Pavlo and McKinney?
        
       | mwexler wrote:
       | While the other authors are from Tsinghua University, two more
       | recognizable names include Wes McKinney of Pandas and Apache
       | Arrow fame and Andy Pavlo at CMU, who has done some fun work on
       | columnar stores and database optimization.
       | 
       | Always fun to see the mix of authors globally linking up.
        
       ___________________________________________________________________
       (page generated 2024-11-06 23:02 UTC)