[HN Gopher] Nulls: Revisiting null representation in modern colu...
___________________________________________________________________
Nulls: Revisiting null representation in modern columnar formats
Author : tosh
Score : 53 points
Date : 2024-10-30 20:26 UTC (7 days ago)
(HTM) web link (dl.acm.org)
(TXT) w3m dump (dl.acm.org)
| zX41ZdbW wrote:
| How did it go through peer review without a comparison with
| ClickHouse?
|
| > Our analysis shows that the Compact layout performs better when
| Null ratio is high and the Placeholder layout is better when the
| Null ratio is low or the data is serial-correlated.
|
| ClickHouse uses a placeholder value with a separate stream with
| NULL-masks, and additionally, it has the Sparse column format,
| which is named Compact in the paper (but currently, the Sparse
| format applies to encode default values more efficiently rather
| than NULL values).
| jnordwick wrote:
| kdb+ isn't in there either, and that is more important than CH
| I think. KDB is boring, just uses a placeholder. I think it
| might be because these two do it in a boring fashion.
| mhuffman wrote:
| >and that is more important than CH I think.
|
| If measured by $$$$$ clickhouse certainly has more
| installations.
| jnordwick wrote:
| CH definitely has more installations. not sure about which
| pulls in more revenue. KDB installations will run you
| $250,000/yr on the low-end for just the software license.
| Not sure how that compares.
| xyzzy_plugh wrote:
| Is there documentation for the ClickHouse native binary format
| somewhere? Parquet and ORC are standalone formats. This is a
| strange comparison to demand.
|
| The paper is addressing the abstract techniques and is not a
| benchmark of various implementations. It seems to me that
| ClickHouse's design is already represented.
|
| You're the CTO of ClickHouse. How's your relationship with
| Pavlo and McKinney?
| mwexler wrote:
| While the other authors are from Tsinghua University, two more
| recognizable names include Wes McKinney of Pandas and Apache
| Arrow fame and Andy Pavlo at CMU, who has done some fun work on
| columnar stores and database optimization.
|
| Always fun to see the mix of authors globally linking up.
___________________________________________________________________
(page generated 2024-11-06 23:02 UTC)