[HN Gopher] Frozen DuckLakes for Multi-User, Serverless Data Access
___________________________________________________________________
Frozen DuckLakes for Multi-User, Serverless Data Access
Author : g0xA52A2A
Score : 47 points
Date : 2025-10-25 10:57 UTC (5 days ago)
(HTM) web link (ducklake.select)
(TXT) w3m dump (ducklake.select)
| gopalv wrote:
| The useful part is that duckdb is so easy to use as a client with
| an embedded server, because duckdb is a great client (+ a
| library).
|
| Similar to how git can serve a repo from a simple http server
| with no git installed on that (git update-server-info).
|
| The frozen part is what iceberg promised in the beginning, away
| from Hive's mutable metastore.
|
| Point to a manifest file + parquet/orc & all you need to query it
| is S3 API calls (there is no metadata/table server, the server is
| the client).
|
| > Creating and publishing a Frozen DuckLake with about 11 billion
| rows, stored in 4,030 S3-based Parquet files took about 22
| minutes on my MacBook
|
| Hard to pin down how much of it is CPU and how much is IO from
| s3, but doing something like HLL over all the columns + rows is
| pretty heavy on the CPU.
| ryanschneider wrote:
| Even cooler, let's say you need to "update" a subset of your
| parquet files after they are written. Once you have your parquet
| files in a ducklake, you can "virtually" update them (the files
| themselves aren't touched, just new ones created). Something
| like:
|
| - create your frozen ducklake
|
| - run whatever "normal" mutation query you want to run (DELETE,
| UPDATE, MERGE INTO)
|
| - use `ducklake_rewrite_data_files` to make new files w/
| mutations applied, then optionally run
| `ducklake_merge_adjacent_files` to compact the files as well
| (though this might cause all files to change).
|
| - call `ducklake_list_files` to get the new set of active files.
|
| - update your upstream "source of truth" with this new list,
| optionally deleting any files no longer referenced.
|
| The net result should be that any files "touched" by your updates
| will have new updated versions alongside them, while any that
| were unchanged should just be returned in the list files
| operation as is.
| mjhay wrote:
| I'm glad to see continued innovation in lake metaphors in the DE
| space. This does look good though, especially in keeping with
| DuckDB's emphasis on simplicity.
___________________________________________________________________
(page generated 2025-10-30 23:01 UTC)