[HN Gopher] Demystifying the use of Parquet for time series
       ___________________________________________________________________
        
       Demystifying the use of Parquet for time series
        
       Author : francoismassot
       Score  : 39 points
       Date   : 2024-01-15 12:13 UTC (2 days ago)
        
 (HTM) web link (blog.senx.io)
 (TXT) w3m dump (blog.senx.io)
        
       | stargrazer wrote:
       | This reminds me of HDF5, which, even thought the data is
       | written/appended in row format, there is an API to chunk the
       | data, organize into columns, compress based upon column
       | regularities, and write to storage.
       | 
       | On reading the reverse happens.
       | 
       | This becomes the compute/space conundrum: space is reduced with
       | column based regularity, but time is increased due to the extra
       | overhead of columnar compression.
        
       | speedgoose wrote:
       | This seems to be an ad for a proprietary time series data format
       | named HFiles that is locked behind a "contact us".
       | 
       | Thanks but I will stay with Parquet for now.
        
       | MrPowers wrote:
       | Delta Lake solves a lot of the Parquet limitations mentioned in
       | this post. Disclosure: I work on the Delta Lake project.
       | 
       | Parquet files store metadata about row groups in the file footer.
       | Delta Lake adds file-level metadata in the transaction log. So
       | Delta Lake can perform file-level skipping before even opening
       | any of the Parquet files to get the row-group metadata.
       | 
       | Delta Lake allows you to rearrange your data to improve file-
       | skipping. You can Z Order by timestamp for time-series analyses.
       | 
       | Delta Lake also allows for schema evolution, so you can evolve
       | the schema of your table over time.
       | 
       | This company may have a cool file format, but is it closed
       | source? It seems like enterprises don't want to be locked into
       | closed formats anymore.
        
         | Malcolmlisk wrote:
         | Wow ! I've been reading for a while from delta lake and Im
         | interested in the company. Is there a chance to drop a CV for
         | remote work (i am from spain).
         | 
         | The schema evolution is something that popped out in a water
         | cooler conversation the other day in my team.
        
         | adammarples wrote:
         | Can you z order in delta lake? I thought that was one of the
         | features databricks had kept to themselves
        
       ___________________________________________________________________
       (page generated 2024-01-17 23:01 UTC)