[HN Gopher] Data-at-Rest Encryption in DuckDB
___________________________________________________________________
Data-at-Rest Encryption in DuckDB
Author : chmaynard
Score : 215 points
Date : 2025-11-20 19:26 UTC (1 days ago)
(HTM) web link (duckdb.org)
(TXT) w3m dump (duckdb.org)
| kianN wrote:
| I'm just continually amazed by the DuckDB team. We had built out
| a naive solution with OpenSSL to encrypt duckdb files, but that
| lead to a 2x runtime cost for first time queries and used up a
| lot of ram because we were encrypting/decrypting the entire file
| all at once. It seems like because DuckDB is encrypting at the
| page level and leveraging modern processors native AES
| operations, they are able to perform read/writes at practically
| no cost.
| PunchyHamster wrote:
| Why not just LUKS ? Kernel level, leverages acceleration,
| transparent to anything you run on top of it.
|
| DB encryption is useful if you have multiple things that need
| separate ACL and encryption keys but if it is one app one DB
| there is no need for it
| letmetweakit wrote:
| I believe it's also to protect against the occasionally
| "lost" DB file.
| beala wrote:
| From the article:
|
| > This allows for some interesting new deployment models for
| DuckDB, for example, we could now put an encrypted DuckDB
| database file on a Content Delivery Network (CDN). A fleet of
| DuckDB instances could attach to this file read-only using
| the decryption key. This elegantly allows efficient
| distribution of private background data in a similar way like
| encrypted Parquet files, but of course with many more
| features like multi-table storage. When using DuckDB with
| encrypted storage, we can also simplify threat modeling when
| - for example - using DuckDB on cloud providers. While in the
| past access to DuckDB storage would have been enough to leak
| data, we can now relax paranoia regarding storage a little,
| especially since temporary files and WAL are also encrypted.
| kianN wrote:
| We are in the separate ACL/encryption key bucket. We provide
| a Bayesian data analytics platform/api for other companies.
| Each company can have hundreds to thousands of datasets
| ("indices") each of which has a separate encryption key, and
| those keys are also stored encrypted with an organizational
| level key that is rotated daily.
| notorious_pgb wrote:
| With respect, none of this sounds like "amazing" work on
| DuckDB's part. It's not bad work, either! It's competent work.
|
| Comparing it to a naive approach (encrypting an entire database
| file in a single shot and loading it all into memory at once)
| is always going to make competent work seem "amazing".
|
| I say this not to shit on DuckDB (I see no reason to shit on
| them); rather, I think it's important that we as professionals
| have realistic standards that we expect _ourselves_ to hit.
| Work we view as "amazing" is work we allow ourselves not to be
| able to replicate. But this is not in that category, and
| therefore, you should hold yourself to the same standard.
| kianN wrote:
| I'm more amazed that they released this as part of their
| open-source offering (not clear from my above comment).
| Encryption is a standard lever for open-source projects to
| monetize.
|
| I run a small company and needed to budget solid amount of
| chunk of time for next year to dig into improving this
| component of our system. I respect your perspective around
| holding high standards, but I do think it's worth getting
| excited about and celebrating reliable performant software
| that demonstrates consistent competence.
| vjerancrnjak wrote:
| It's just pipelining. Encryption is free compared to reads or
| writes to storage.
| glenjamin wrote:
| Other than motherduck, is anyone aware of any good models for
| running multi-user cloud-based duckdb?
|
| ie. Running it like a normal database, and getting to take
| advantage of all of its goodies
| mritchie712 wrote:
| For pure duckdb, you can put an Arrow Flight server in front of
| duckdb[0] or use the httpserver extension[1].
|
| Where you store the .duckdb file will make a big difference in
| performance (e.g. S3 vs. Elastic File System).
|
| But I'd take a good look at ducklake as a better multiplayer
| option. If you store `.parquet` files in blob storage, it will
| be slower than `.duckdb` on EFS, but if you have largish data,
| EFS gets expensive.
|
| We[2] use DuckLake in our product and we've found a few ways to
| mitigate the performance hit. For example, we write all data
| into ducklake in blog storage, then create analytics tables and
| store them on faster storage (e.g. GCP Filestore). You can have
| multiple storage methods in the same DuckLake catalog, so this
| works nicely.
|
| 0 - https://www.definite.app/blog/duck-takes-flight
|
| 1 - https://github.com/Query-farm/httpserver
|
| 2 - https://www.definite.app/
| anentropic wrote:
| I wonder if anyone has experimented with "Mountpoint for S3"
| + DuckDB yet
|
| https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp.
| ..
| sigwinch wrote:
| The duckdb http extension reads S3 compatibles.
| glenjamin wrote:
| that looks neat - how but do you handle failover/restarts?
| mritchie712 wrote:
| in which one? restarts are no problem on ducklake (ACID
| transactions in catalog)
|
| the others, I haven't tried handling it in.
| derekhecksher wrote:
| https://github.com/gizmodata/gizmosql
| tempest_ wrote:
| Feels like I keep seeing "Duckdb in your postgres" posts here.
| Likely that is what you want.
| jedisct1 wrote:
| "Sqlite [...] encryption extension is a $2000 add-on".
|
| SqliteMultipleCiphers has been around for ages and is free
| https://utelle.github.io/SQLite3MultipleCiphers/
|
| And Turso Database supports encryption out of the box:
| https://docs.turso.tech/tursodb/encryption
| michaelsbradley wrote:
| There's also SQLCipher, it's been in development since 2009 and
| works quite well:
|
| https://github.com/sqlcipher/sqlcipher
| memset wrote:
| How do you use these in practice? Both Python and Go don't make
| it easy to link a different variation of SQLite with one of
| these plugins compiled in. How do you make it work?
| ncruces wrote:
| I don't think SqliteMultipleCiphers can be built into a
| runtime loadable extension (and the Turso thing is just a
| copy of it).
|
| I'm confident that a scheme based on tweakable block cyphers
| (like Adiantum or AES XTS) could be made into decent runtime
| loadable extension.
|
| I implemented such schemes for my Go driver, but Go code is
| not really ideal to make a runtime loadable extension of
| (it'd have to be ported to C/Rust/zig).
|
| https://news.ycombinator.com/item?id=40208800
| jasonthorsness wrote:
| AES-GCM sensitivity to nonce reuse is a tricky implementation
| detail. Here they acknowledge it but then don't share their
| solution - and in fact the header contains 16 bytes for the nonce
| instead of the expected 12 bytes and they do not share what bytes
| are random. Did I miss something, anyone know?
| jedisct1 wrote:
| Static key, random 12 byte nonces, no per-session key for temp
| buffers.
| dismantle wrote:
| Curious how the indexing of a key is hanlded. I'm not sure if the
| document already has it (as I don't remember coming across this),
| but I'm just a bit curious. Will the key being searched for be
| "encrypted" before a search or will a decryption occur for each
| block during a search.
| biophysboy wrote:
| DuckDB has been more useful to me than all AI combined (and I
| like LLMs overall)
___________________________________________________________________
(page generated 2025-11-21 23:02 UTC)