[HN Gopher] Show HN: Roast my SQLite encryption at-rest
       ___________________________________________________________________
        
       Show HN: Roast my SQLite encryption at-rest
        
       SQLite encryption at-rest is a hot requested feature of both the
       _"default"_ CGo driver [1] and the transpiled alternative driver
       [2]. So, this is a feature I wanted to bring to my own Wasm based
       Go driver /bindings [3].  Open-source SQLite encryption extensions
       have had a troubled last few years. For whatever reason, in 2020
       the (undocumented) feature that made it easy to offer page-level
       encryption was removed [4]. Some solutions are stuck with SQLite
       3.31.1, but Ulrich Telle stepped up with a VFS approach [5].
       Still, their solution seemed harder than something I'd want to
       maintain, as it requires understanding the structure of what's
       being written to disk at the VFS layer. So, I looked at full disk
       encryption for something with less of an impedance mismatch.
       Specifically, I'm using the Adiantum tweakable and length-
       preserving encryption (with 4K blocks, matching the default SQLite
       page size), and encrypting whole files (rather than page content).
       I'm not a cryptographer, so I'd really appreciate some roasting
       before release.  There is nothing very Go specific about this
       (apart from the implementation) so if there are no obvious flaws,
       it may make sense to port it to C/Rust/etc and make it a loadable
       extension.  [1] https://github.com/mattn/go-sqlite3/pull/1109  [2]
       https://gitlab.com/cznic/sqlite/-/issues/105  [3]
       https://github.com/ncruces/go-sqlite3/issues/55  [4]
       https://github.com/sqlite/sqlite/commit/b48c0d59  [5]
       https://github.com/utelle/SQLite3MultipleCiphers
        
       Author : ncruces
       Score  : 92 points
       Date   : 2024-04-30 09:12 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | justinclift wrote:
       | How do you feel yours compares with (say) SQLCipher, which is a
       | very popular C based one, and keeps fairly close to the upstream
       | SQLite releases?
       | 
       | * https://www.zetetic.net/sqlcipher/
       | 
       | * https://github.com/sqlcipher/sqlcipher
       | 
       | Their most recent release (v4.5.7) is based upon SQLite v3.45.3,
       | which is the latest version of SQLite:
       | 
       | * https://github.com/sqlcipher/sqlcipher/releases/tag/v4.5.7
        
         | ncruces wrote:
         | So, I haven't studied their solution much, but ISTM what they
         | did was fork SQLite at (around) the time of the commit that
         | removed SQLITE_HAS_CODEC, and forward port the 4 years of
         | changes since then.
         | 
         | That's a bit untenable for me, since I'd rather keep as close
         | as possible to SQLite compiled with Clang, and use the
         | extension points already provided by SQLite (the VFS API).
         | 
         | Most SQLite encryption solutions (SQLite Encryption Extension,
         | SQLCipher, SQLite3 Multiple Ciphers, sqleet) encrypt page
         | content, and some need to reserve a few bytes of every page to
         | do so (for nounces, MACs). This was "easy" to do with
         | SQLITE_HAS_CODEC, but requires some "reverse engineering" to do
         | from the VFS layer. Some of this "reverse engineering" is
         | likely stable, because the "checksum VFS" [1] depends on it.
         | OTOH, extension points that are not part of the "public API"
         | have been summarily dropped in the past [2].
         | 
         | My scheme does not care about the SQLite file format(s) at all,
         | because instead of encrypting just page content, it encrypts
         | entire files. It uses 4K blocks, so setting page size to (at
         | least) 4K is advised, but not required. The only assumption it
         | makes is that SQLite is not sensitive to file sizes rounding up
         | to the next block (4K) size. An assumption that holds for
         | databases, journals and WALs.
         | 
         | The scheme does not try to authenticate blocks, so it doesn't
         | try to protect against forgery. Other solutions may include
         | MACs, but to offer random access they don't protect against
         | reverting a page to an older version of itself, so IMO, this is
         | of limited value.
         | 
         | Other schemes add a nounce to each page, which allows on disk
         | content to change, while the decrypted content stays the same.
         | I don't include a nounce, so if an adversary gets hold of
         | multiple encrypted backups of the same database he knows not
         | only which blocks couldn't possibly have changed, but also
         | which ones definitely did.
         | 
         | [1] https://www.sqlite.org/cksumvfs.html
         | 
         | [2] https://sqlite.org/forum/forumpost/db235b3070
        
       | nemo1618 wrote:
       | Hey, it's my Adiantum package! How's the performance? Revisiting
       | the API now, I see that Encrypt and Decrypt return newly-
       | allocated slices, which is unfortunate. Should I add Append
       | variants that don't force an allocation?
        
         | ncruces wrote:
         | Hey! That's amazing that you found this.
         | 
         | I haven't benchmarked much, but I think I measured a 15% hit on
         | speedtest1 [1] (with about 10% inside your library). Less if I
         | kept temporary files in memory. Other solutions claim less of a
         | performance hit. I'd have to measure to make sure.
         | 
         | There are some things I could do to improve performance. A
         | partial block update needs a read-update-write cycle; journals
         | and WALs do some of these. I could buffer consecutive writes in
         | memory unencrypted and only flush them on fsync. Didn't do that
         | as it requires some additional bookkeeping.
         | 
         | I don't think Encrypt and Decrypt allocate at all! They
         | encrypt/decrypt in-place; the returned slice just aliases the
         | input block. But thanks, it'd be pretty bad if they did.
         | 
         | [1] https://sqlite.org/src/file/test/speedtest1.c
        
           | nemo1618 wrote:
           | Ha, you're right, I even have benchmarks in the README
           | showing 0 allocs/op. Nice. (Sorry for doubting you, past-
           | self!)
           | 
           | In the past I've implemented a form of the write-buffering
           | you describe -- it was not fun. But it was a lot more
           | important in that context, because flushing a write meant
           | uploading many MB to remote servers. With 4 KB pages flushed
           | locally, I'd wager it's not worth it.
        
       | pyuser583 wrote:
       | SQLite has a paid version. One of the paid features is
       | encryption.
       | 
       | They're not going to make it easy.
        
         | ncruces wrote:
         | Of course, nor would I expect them to make it so.
         | 
         | In fact, I mention the SEE in my package's documentation. If
         | you have a license to the paid extension, it should be easy to
         | compile it and use it with my package.
         | 
         | It will be slow, however, because it will be running the
         | reference AES implementation in Wasm.
         | 
         | That said, if anyone is interested in sponsoring a SEE license,
         | I can look into doing the encryption Go (which uses assembly on
         | most platforms for those bits).
        
           | cipherboy wrote:
           | Which reference AES implementation? My memory is that the one
           | from the spec has terrible timing side channel attacks...
           | e.g. https://www.redhat.com/en/blog/its-all-question-time-
           | aes-tim... seems to corroborate my memory.
           | 
           | I seem to recall this was remotely exploitable, and
           | exploiting timing side channels has only gotten better since
           | 2014.
        
             | ncruces wrote:
             | I don't have a license, so can't know for sure.
             | 
             | But the only versions mentioned in [1] that should compile
             | out of the box into Wasm, are the ones that say they use _"
             | the Rijndaal reference implementation."_
             | 
             | I don't think compiling OpenSSL into Wasm is tenable. But
             | some wrappers around the Go AES implementation should work.
             | 
             | [1] https://www.sqlite.org/see/doc/release/www/readme.wiki
        
         | acheong08 wrote:
         | That was a complete surprise to me.
         | 
         | Here's the relevant link:
         | 
         | https://sqlite.org/prosupport.html
        
       | bityard wrote:
       | Have you considered targeting libSQL for this instead?
       | https://github.com/tursodatabase/libsql
       | 
       | SQLite is a great product and I use it a lot, but it does not
       | have a great history of outside contributions and collaboration.
       | If I were developing any kind of extension, I feel like I would
       | have a better shot at getting help and feedback from a community-
       | based project.
        
         | ncruces wrote:
         | My extension targets my Go SQLite bindings (the VFS is
         | implemented in Go) [1].
         | 
         | This extension, and the wrapper, target public SQLite APIs, so
         | if anyone wants to replace SQLite with libSQL, that should be
         | easy.
         | 
         | You simply need to include a little bit of glue C code that I
         | append to the amalgamation, and use some SQLite compile
         | options.
         | 
         | I explicitly support building and providing your own, custom,
         | Wasm SQLite "blob" (it just needs to include my glue code).
         | 
         | As for Adiantum encryption, as I said, reimplementing this in
         | C/C++/Rust to make it a loadable extension is perfectly viable,
         | and would be compatible with libSQL, again because this uses
         | only public SQLite APIs (that's the point, basically!)
         | 
         | But this is predicated on it being a secure scheme (that's the
         | feedback I was looking for this time around).
         | 
         | PS: I've got nothing against Turso, or libSQL. In fact I spent
         | the last year perusing their virtual WAL API. The problem is
         | that I found no documentation, nor any useful open source
         | implementations of it. If there any I'd be very interested. So,
         | thus far, I also don't have anything that drives towards
         | libSQL.
         | 
         | [1] https://github.com/ncruces/go-sqlite3
        
           | avinassh wrote:
           | > PS: I've got nothing against Turso, or libSQL. In fact I
           | spent the last year perusing their virtual WAL API. The
           | problem is that I found no documentation, nor any useful open
           | source implementations of it. If there any I'd be very
           | interested. So, thus far, I also don't have anything that
           | drives towards libSQL.
           | 
           | Hey, this is v and I am an engineer at Turso. We do have some
           | documentation and an example implementation of Virtual WAL
           | 
           | docs: https://github.com/tursodatabase/libsql/blob/ef44612/li
           | bsql-...
           | 
           | example: https://github.com/tursodatabase/libsql/blob/ef44612
           | /libsql-...
           | 
           | for an open source implementation, you may check how
           | Bottomless works. Bottomless is another project which does
           | back up like litestream and it internally implements a
           | Virtual WAL.
           | 
           | Bottomless -
           | https://github.com/tursodatabase/libsql/tree/main/bottomless
           | 
           | I am sure we can improve our docs, make it more discover-able
           | and easy to find. I am open to feedback and suggestions!
        
             | ncruces wrote:
             | Hi! Thanks for this.
             | 
             | I find that documentation completely insufficient to
             | implement a virtual WAL. The example (which I _had_ looked
             | into) it 's just a "do nothing" wrapper. I learn that there
             | are 25 methods I should implement, and that I can just
             | forward them to another implementation.
             | 
             | For my bindings, I implemented an entire SQLite VFS from
             | scratch for a handful of different platforms, including
             | file-locking and shared-memory primitives. However, and
             | having studied your virtual WAL code for about a month, I
             | _still_ have no idea where to even start. It just doesn 't
             | feel like an coherent API (more like an internal SQLite
             | API, simply exposed for outside use).
             | 
             | I'll look into Bottomless, but I'm not very hopeful at this
             | point. Maybe it's just me and Rust; if that is so, I'm
             | sorry.
        
         | avinassh wrote:
         | Turso / libsql supports encryption, integrated with
         | MultipleCiphers
         | 
         | Example and usage code is here - https://turso.tech/blog/fully-
         | open-source-encryption-for-sql...
        
           | ncruces wrote:
           | FTA: _" One project in particular was very suitable for us,
           | SQLite Multiple Ciphers. Since it is licensed under MIT, we
           | have just moved the code into libSQL."_
           | 
           | See, this is what I don't get about libSQL. Turso claims to
           | want to foster a community around it, but then I go to the
           | SQLite3MultipleCiphers GitHub and there's zero trace of
           | contacting the author about "just moving the code into
           | libSQL."
           | 
           | I hope they at least considered sponsoring development [1],
           | given that they're making this a "premium feature" of their
           | hosted offering.
           | 
           | [1]: https://www.paypal.com/donate/?hosted_button_id=3JC9PMMH
           | 8X7U...
        
             | utelle wrote:
             | The Turso people didn't contact me. Actually, it was the
             | other way around after I detected accidentally that they
             | started to use my project _SQLite3 Multiple Ciphers_ within
             | _libSQL_. In the end they didn't show interest in
             | discussing anything with me. And up to now they definitely
             | didn't consider to sponsor my project. However, I really
             | don't care.
        
       | 580515975 wrote:
       | A couple of thoughts:
       | 
       | First, great job on the readme! One way you could improve it is
       | by expanding on the "Caution" section. What's written is the
       | beginnings of a threat model, but it could be improved by being
       | more explicit about which attacks this system does/doesn't defend
       | against.
       | 
       | > The only security property that disk encryption (and this
       | package) provides is that all information such an adversary can
       | obtain is whether the data in a sector has (or has not) changed
       | over time.
       | 
       | I think the adversary learns a bit more than this. Randomized
       | encryption would provide the above probably, but the
       | _deterministic_ scheme that's used here will let the adversary
       | learn not only whether a sector changed, but whether its value
       | matches what it was at a previous point in time.
       | 
       | How does this translate into the security of the database,
       | itself? Seeing what blocks have changed might reveal information
       | about what data has changed. Let's consider a security game where
       | I (the adversary) get to submit sql queries, and then learn which
       | blocks on disk has changed. After this initial phase (where I can
       | learn where data is stored), I submit two different sql queries,
       | you pick one of them at random and run it, and then tell me which
       | blocks have changed. I win if I can guess which sql query you
       | picked.
       | 
       | Suppose I submit queries which each insert into a different
       | table. Because the tables are stored separately on-disk, it'll
       | probably be pretty easy for me to distinguish them. But okay,
       | that's still count-ish/size-ish, and maybe out of scope.
       | 
       | What if I submit two queries which each insert different values,
       | but into the same table. Further, let's say that this table has
       | an index. Based on which pages were written to, I can now learn
       | something about the _values_ that were inserted, because
       | different values will write into the index in different places.
       | 
       | Now, it's completely valid if the threat model says, "if you can
       | see more than two copies of the database file, then all is lost."
       | However, I think it'd be worth translating the current write-up
       | of the threat model into the implications for leaking the
       | database. For more examples of attacks based on seeing what
       | indices/sizes changed [1] and [2].
       | 
       | Is it valid to pad the sqlite file to a multiple of the block
       | size? Does sqlite ever call truncate on a non-block-aligned size
       | and expect any truncated bytes to be fully removed?
       | 
       | What are the atomicity requirements for a sqlite VFS? SQLite, in
       | general, is supposed to not get corrupted if power were to be
       | yanked mid-write. However, because this VFS writes one block at-
       | a-time, the computer dying mid-write could corrupt more bytes
       | around the write position than would normally be corrupted if the
       | the standard VFS was used. It's possible this is a non-issue, but
       | it's worth considering what contract sqlite has for VFSes.
       | 
       | [1]: https://en.wikipedia.org/wiki/CRIME [2]:
       | https://www.usenix.org/legacy/events/sec07/tech/full_papers/...
        
         | cryptonector wrote:
         | The threat model has to exclude:
         | 
         | - attacks on a running app that has the keys loaded, naturally
         | 
         | The threat model has to include at least:
         | 
         | - passive attacks against the DB itself, lacking access to the
         | keys
         | 
         | The threat model really should also include:
         | 
         | - active attacks against the DB lacking access to the keys
         | (e.g., replace blocks)
         | 
         | IMO ZFS does a pretty good job against these threats, for
         | example, so ZFS is a good yardstick for measuring things like
         | TFA.
         | 
         | However, the fact that a _running system_ must have access to
         | the _keys_ means that at-rest data encryption does not buy one
         | much protection against server compromise, especially when the
         | system must be running much /most/all of the time. So you
         | really also want to do the utmost to secure the
         | server/application.
        
           | ncruces wrote:
           | ZFS, AFAIK, can offer something in addition which is harder
           | for a VFS to offer, and which AFAICT no other SQLite
           | encryption offers: a kind of HMAC Merkel tree that
           | authenticates an entire database (at a point in time).
           | 
           | Alternatives, even those that use MACs only authenticate
           | pages/blocks. They still allow mix-and-match of pages/blocks
           | from previous backups.
           | 
           | I could, potentially, add optional/configurable nounces and
           | MACs at the VFS layer.
           | 
           | I've refrained from doing so because (1) it complicates the
           | implementation; (2) it can be added later, compatibly; (3) it
           | doesn't fix mix-and-match; (4) it will impact performance
           | further; and (5) it would be MAC-then-encrypt (against best
           | practice).
        
             | cryptonector wrote:
             | Yes, that's right. In fact, ZFS w/ encryption gives you two
             | Merkle hash trees, one using a hash function _and_ using a
             | MAC. SQLite3 could do this, but it would have to change its
             | database format fairly radically.
             | 
             | A SQLite3 VFS could, maybe, store additional metadata on
             | the side knowing the SQLite3 database file format, I
             | suppose. But if you _really_ want this it 's best to do it
             | in the database itself.
        
           | justinclift wrote:
           | > ... the fact that a running system must have access to the
           | keys means that at-rest data encryption does not buy one much
           | protection against server compromise, especially when the
           | system must be running much/most/all of the time.
           | 
           | A common approach to help mitigate this is by having the keys
           | be fetchable (eg via ssh) from a remote server.
           | 
           | Preferably hosted in another jurisdiction (country) in a data
           | centre owned by a different organisation (ie. not both in
           | AWS).
           | 
           | When the encrypted server gets grabbed, the staff should (!)
           | notice the problem and remove its ssh keys from the ssh
           | server holding the ZFS encryption keys.
           | 
           | ---
           | 
           | That being said, I'm not an encryption guy whereas some of
           | the people in this thread clearly are. So that's just my best
           | understanding. ;)
        
             | cryptonector wrote:
             | > When the encrypted server gets grabbed, the staff should
             | (!)
             | 
             | If the people doing the grabbing are LEO then they have
             | ways of taking running servers such that they keep running
             | or otherwise don't lose what's in RAM. And if it's LEO then
             | "the staff" should absolutely not do things that can be
             | construed as destroying evidence.
        
               | justinclift wrote:
               | Sure, _if_ its LEO. That 's not the threat model for most
               | organisations encrypting their data at rest though. :)
               | 
               | ---
               | 
               | > should absolutely not do things that can be construed
               | as destroying evidence.
               | 
               | It'd be a very long stretch to successfully argue
               | "removing access to the key" is destroying evidence. The
               | data would still be intact, and available, to anyone with
               | the key.
               | 
               | Just not to whoever physically grabbed the server. ;)
        
               | cryptonector wrote:
               | I would get legal advice on that, from a lawyer in the
               | relevant jurisdiction, before going with that.
        
               | justinclift wrote:
               | > ways of taking running servers such that they keep
               | running
               | 
               | That's an interesting point. Wonder how complete that
               | approach is, and if it maintains network connectivity
               | between the servers they're grabbing?
               | 
               | Some clustering solutions automatically reboot a server
               | if it loses network connectivity for a short period of
               | time (ie 1 min). That would really mess up the "preserve
               | stuff in ram" thing, if it's purely just designed to keep
               | a server running.
        
               | cryptonector wrote:
               | There's at least two ways. One is to keep the servers
               | powered even after they are unplugged from wall power
               | (they have special adaptors for portable PSUs). The other
               | is to cryogenically cool the RAM then cut the power, keep
               | the RAM cooled, and then read it later in a lab.
        
         | ncruces wrote:
         | First of all, thanks for the review. I'll try to respond to all
         | points.
         | 
         | Disk encryption, on which this is based, is usually
         | deterministic in nature.
         | 
         | So yes, an adversary 1 that can inspect multiple versions of a
         | database (e.g. backups) can learn exactly: which (blocks)
         | changed, which didn't change, which have been reverted; but
         | that is _all_ they should learn.
         | 
         | An adversary 2 that can modify files, can also mix-and-match
         | blocks between versions to produce a valid file with high
         | probability .
         | 
         | And an adversary 3 that can submit changes and see their effect
         | on the encrypted data can probably infer a lot about the
         | database.
         | 
         | I'll try to make these more explicit in the README. In
         | practical terms: adversary 1 is the one I thought I'd covered
         | reasonably well; adversary 2 means that backups should be
         | independently signed, and signatures verified before restoring
         | them; adversaries 2 and 3 mean that this is ineffective against
         | live attacks.
         | 
         | Security, though, is also about comparing options. Reading the
         | documentation for alternatives (even the expensive ones) I
         | don't see this kind of analysis. I see 2 advantages to the
         | alternatives that encrypt page data with a nounce and a MAC.
         | The nounce allows reverts to go unnoticed. No change, means a
         | block definitely didn't change. But ciphertext changing doesn't
         | necessarily mean plaintext changed. The MAC ensures blocks are
         | valid. But they still be reverted to previous versions of
         | themselves, mix-and-match is still possible. Do these two
         | properties make a huge difference? Is there anything else I'm
         | missing?
         | 
         | On your other points.
         | 
         | Yes it's always safe to round up file sizes to block size, for
         | databases, journals and WALs (I could detail why, but the
         | formats are documented). It may not be safe for all temporary
         | files (I'm assuming it is), but that can be fixed for those
         | files by remembering the file size in memory.
         | 
         | About atomicity, corruption, etc, the VFS is supposed to
         | declare its characteristics [1] to SQLite. Your concerns are
         | covered by SAFE_APPEND and POWERSAFE_OVERWRITE. See also [2].
         | As a wrapper VFS, I filter most of those characteristics from
         | the underlying VFS, forcing SQLite to assume the worst.
         | 
         | [1] https://www.sqlite.org/c3ref/c_iocap_atomic.html
         | 
         | [2] https://www.sqlite.org/psow.html
        
       | Hakeed1990 wrote:
       | Thanks for sharing this.
        
       | d-z-m wrote:
       | For your KDF, how did you select the parameters for Argon2id? How
       | often is the KDF invoked / what are your requirements for speed
       | in KDF'ing?
       | 
       | Nit, but your "pepper" is confusingly named IMO, because it is
       | hardcoded and not secret. "label" may make more sense.
       | 
       | However, one thing to keep in mind with the use of a static(and
       | public) Argon2 salt is that it allows an attacker to pre-compute
       | keys. If this package were to gain adoption, I think that may
       | become a bigger issue. I would reccommend randomly generating a
       | 128 bit salt, similar to how you're randomly generating the key
       | if one isn't provided.
        
         | ncruces wrote:
         | I used the parameters suggested in the documentation [1], which
         | follow the RFC.
         | 
         | The KDF is invoked every time a connection is opened _iff_ you
         | specify your key with the textkey= parameter. It is ill advised
         | to overuse this, especially if you don 't use connection
         | pooling, as it makes opening connections slow. You can bypass
         | the KDF by providing your own 256-bit key using either the
         | hexkey= or key= parameters (key= cannot embed NULLs).
         | 
         | I agree pepper confusing (because the pepper is supposedly
         | secret), but this is not a salt either, as a salt is supposed
         | to be public, but unique. Do you have better naming that you
         | can suggest?
         | 
         | Anyway, I forgot to do this, but the intention was for the
         | pepper to be changeable as a build parameter. Thanks for
         | reminding me!
         | 
         | [1] https://pkg.go.dev/golang.org/x/crypto/argon2#IDKey
        
       | slifin wrote:
       | Why do you need this in built into your database?
       | 
       | Surely you turn on disk encryption on your drive then if an
       | attacker yanks the drive and tries to read your database without
       | your password they fail?
        
         | dig1 wrote:
         | This is very useful, for example, when you ship your
         | application to the client(s) with SQLite as the main conf/data
         | storage. You don't have to think about whether their drives are
         | encrypted. Also, it assures the clients that all the data in
         | your application is encrypted by default.
        
         | laurent123456 wrote:
         | I don't know if that's his use case, but we had many users tell
         | us they share their account with other users so they want the
         | data to be encrypted even when logged in.
         | 
         | Of course if they share their computer, someone could install a
         | keylogger and wait for them to type their passwords, but I
         | guess that's an extra layer of security that may help a bit.
        
       ___________________________________________________________________
       (page generated 2024-05-01 23:02 UTC)