[HN Gopher] Demystifying the use of Parquet for time series
___________________________________________________________________
Demystifying the use of Parquet for time series
Author : francoismassot
Score : 39 points
Date : 2024-01-15 12:13 UTC (2 days ago)
(HTM) web link (blog.senx.io)
(TXT) w3m dump (blog.senx.io)
| stargrazer wrote:
| This reminds me of HDF5, which, even thought the data is
| written/appended in row format, there is an API to chunk the
| data, organize into columns, compress based upon column
| regularities, and write to storage.
|
| On reading the reverse happens.
|
| This becomes the compute/space conundrum: space is reduced with
| column based regularity, but time is increased due to the extra
| overhead of columnar compression.
| speedgoose wrote:
| This seems to be an ad for a proprietary time series data format
| named HFiles that is locked behind a "contact us".
|
| Thanks but I will stay with Parquet for now.
| MrPowers wrote:
| Delta Lake solves a lot of the Parquet limitations mentioned in
| this post. Disclosure: I work on the Delta Lake project.
|
| Parquet files store metadata about row groups in the file footer.
| Delta Lake adds file-level metadata in the transaction log. So
| Delta Lake can perform file-level skipping before even opening
| any of the Parquet files to get the row-group metadata.
|
| Delta Lake allows you to rearrange your data to improve file-
| skipping. You can Z Order by timestamp for time-series analyses.
|
| Delta Lake also allows for schema evolution, so you can evolve
| the schema of your table over time.
|
| This company may have a cool file format, but is it closed
| source? It seems like enterprises don't want to be locked into
| closed formats anymore.
| Malcolmlisk wrote:
| Wow ! I've been reading for a while from delta lake and Im
| interested in the company. Is there a chance to drop a CV for
| remote work (i am from spain).
|
| The schema evolution is something that popped out in a water
| cooler conversation the other day in my team.
| adammarples wrote:
| Can you z order in delta lake? I thought that was one of the
| features databricks had kept to themselves
___________________________________________________________________
(page generated 2024-01-17 23:01 UTC)