[HN Gopher] Show HN: Roast my SQLite encryption at-rest
       ___________________________________________________________________
        
       Show HN: Roast my SQLite encryption at-rest
        
       SQLite encryption at-rest is a hot requested feature of both the
       _"default"_ CGo driver [1] and the transpiled alternative driver
       [2]. So, this is a feature I wanted to bring to my own Wasm based
       Go driver /bindings [3].  Open-source SQLite encryption extensions
       have had a troubled last few years. For whatever reason, in 2020
       the (undocumented) feature that made it easy to offer page-level
       encryption was removed [4]. Some solutions are stuck with SQLite
       3.31.1, but Ulrich Telle stepped up with a VFS approach [5].
       Still, their solution seemed harder than something I'd want to
       maintain, as it requires understanding the structure of what's
       being written to disk at the VFS layer. So, I looked at full disk
       encryption for something with less of an impedance mismatch.
       Specifically, I'm using the Adiantum tweakable and length-
       preserving encryption (with 4K blocks, matching the default SQLite
       page size), and encrypting whole files (rather than page content).
       I'm not a cryptographer, so I'd really appreciate some roasting
       before release.  There is nothing very Go specific about this
       (apart from the implementation) so if there are no obvious flaws,
       it may make sense to port it to C/Rust/etc and make it a loadable
       extension.  [1] https://github.com/mattn/go-sqlite3/pull/1109  [2]
       https://gitlab.com/cznic/sqlite/-/issues/105  [3]
       https://github.com/ncruces/go-sqlite3/issues/55  [4]
       https://github.com/sqlite/sqlite/commit/b48c0d59  [5]
       https://github.com/utelle/SQLite3MultipleCiphers
        
       Author : ncruces
       Score  : 52 points
       Date   : 2024-04-30 09:12 UTC (13 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | justinclift wrote:
       | How do you feel yours compares with (say) SQLCipher, which is a
       | very popular C based one, and keeps fairly close to the upstream
       | SQLite releases?
       | 
       | * https://www.zetetic.net/sqlcipher/
       | 
       | * https://github.com/sqlcipher/sqlcipher
       | 
       | Their most recent release (v4.5.7) is based upon SQLite v3.45.3,
       | which is the latest version of SQLite:
       | 
       | * https://github.com/sqlcipher/sqlcipher/releases/tag/v4.5.7
        
         | ncruces wrote:
         | So, I haven't studied their solution much, but ISTM what they
         | did was fork SQLite at (around) the time of the commit that
         | removed SQLITE_HAS_CODEC, and forward port the 4 years of
         | changes since then.
         | 
         | That's a bit untenable for me, since I'd rather keep as close
         | as possible to SQLite compiled with Clang, and use the
         | extension points already provided by SQLite (the VFS API).
         | 
         | Most SQLite encryption solutions (SQLite Encryption Extension,
         | SQLCipher, SQLite3 Multiple Ciphers, sqleet) encrypt page
         | content, and some need to reserve a few bytes of every page to
         | do so (for nounces, MACs). This was "easy" to do with
         | SQLITE_HAS_CODEC, but requires some "reverse engineering" to do
         | from the VFS layer. Some of this "reverse engineering" is
         | likely stable, because the "checksum VFS" [1] depends on it.
         | OTOH, extension points that are not part of the "public API"
         | have been summarily dropped in the past [2].
         | 
         | My scheme does not care about the SQLite file format(s) at all,
         | because instead of encrypting just page content, it encrypts
         | entire files. It uses 4K blocks, so setting page size to (at
         | least) 4K is advised, but not required. The only assumption it
         | makes is that SQLite is not sensitive to file sizes rounding up
         | to the next block (4K) size. An assumption that holds for
         | databases, journals and WALs.
         | 
         | The scheme does not try to authenticate blocks, so it doesn't
         | try to protect against forgery. Other solutions may include
         | MACs, but to offer random access they don't protect against
         | reverting a page to an older version of itself, so IMO, this is
         | of limited value.
         | 
         | Other schemes add a nounce to each page, which allows on disk
         | content to change, while the decrypted content stays the same.
         | I don't include a nounce, so if an adversary gets hold of
         | multiple encrypted backups of the same database he knows not
         | only which blocks couldn't possibly have changed, but also
         | which ones definitely did.
         | 
         | [1] https://www.sqlite.org/cksumvfs.html
         | 
         | [2] https://sqlite.org/forum/forumpost/db235b3070
        
       | nemo1618 wrote:
       | Hey, it's my Adiantum package! How's the performance? Revisiting
       | the API now, I see that Encrypt and Decrypt return newly-
       | allocated slices, which is unfortunate. Should I add Append
       | variants that don't force an allocation?
        
         | ncruces wrote:
         | Hey! That's amazing that you found this.
         | 
         | I haven't benchmarked much, but I think I measured a 15% hit on
         | speedtest1 [1] (with about 10% inside your library). Less if I
         | kept temporary files in memory. Other solutions claim less of a
         | performance hit. I'd have to measure to make sure.
         | 
         | There are some things I could do to improve performance. A
         | partial block update needs a read-update-write cycle; journals
         | and WALs do some of these. I could buffer consecutive writes in
         | memory unencrypted and only flush them on fsync. Didn't do that
         | as it requires some additional bookkeeping.
         | 
         | I don't think Encrypt and Decrypt allocate at all! They
         | encrypt/decrypt in-place; the returned slice just aliases the
         | input block. But thanks, it'd be pretty bad if they did.
         | 
         | [1] https://sqlite.org/src/file/test/speedtest1.c
        
           | nemo1618 wrote:
           | Ha, you're right, I even have benchmarks in the README
           | showing 0 allocs/op. Nice. (Sorry for doubting you, past-
           | self!)
           | 
           | In the past I've implemented a form of the write-buffering
           | you describe -- it was not fun. But it was a lot more
           | important in that context, because flushing a write meant
           | uploading many MB to remote servers. With 4 KB pages flushed
           | locally, I'd wager it's not worth it.
        
       | pyuser583 wrote:
       | SQLite has a paid version. One of the paid features is
       | encryption.
       | 
       | They're not going to make it easy.
        
         | ncruces wrote:
         | Of course, nor would I expect them to make it so.
         | 
         | In fact, I mention the SEE in my package's documentation. If
         | you have a license to the paid extension, it should be easy to
         | compile it and use it with my package.
         | 
         | It will be slow, however, because it will be running the
         | reference AES implementation in Wasm.
         | 
         | That said, if anyone is interested in sponsoring a SEE license,
         | I can look into doing the encryption Go (which uses assembly on
         | most platforms for those bits).
        
           | cipherboy wrote:
           | Which reference AES implementation? My memory is that the one
           | from the spec has terrible timing side channel attacks...
           | e.g. https://www.redhat.com/en/blog/its-all-question-time-
           | aes-tim... seems to corroborate my memory.
           | 
           | I seem to recall this was remotely exploitable, and
           | exploiting timing side channels has only gotten better since
           | 2014.
        
             | ncruces wrote:
             | I don't have a license, so can't know for sure.
             | 
             | But the only versions mentioned in [1] that should compile
             | out of the box into Wasm, are the ones that say they use _"
             | the Rijndaal reference implementation."_
             | 
             | I don't think compiling OpenSSL into Wasm is tenable. But
             | some wrappers around the Go AES implementation should work.
             | 
             | [1] https://www.sqlite.org/see/doc/release/www/readme.wiki
        
         | acheong08 wrote:
         | That was a complete surprise to me.
         | 
         | Here's the relevant link:
         | 
         | https://sqlite.org/prosupport.html
        
       | bityard wrote:
       | Have you considered targeting libSQL for this instead?
       | https://github.com/tursodatabase/libsql
       | 
       | SQLite is a great product and I use it a lot, but it does not
       | have a great history of outside contributions and collaboration.
       | If I were developing any kind of extension, I feel like I would
       | have a better shot at getting help and feedback from a community-
       | based project.
        
         | ncruces wrote:
         | My extension targets my Go SQLite bindings (the VFS is
         | implemented in Go) [1].
         | 
         | This extension, and the wrapper, target public SQLite APIs, so
         | if anyone wants to replace SQLite with libSQL, that should be
         | easy.
         | 
         | You simply need to include a little bit of glue C code that I
         | append to the amalgamation, and use some SQLite compile
         | options.
         | 
         | I explicitly support building and providing your own, custom,
         | Wasm SQLite "blob" (it just needs to include my glue code).
         | 
         | As for Adiantum encryption, as I said, reimplementing this in
         | C/C++/Rust to make it a loadable extension is perfectly viable,
         | and would be compatible with libSQL, again because this uses
         | only public SQLite APIs (that's the point, basically!)
         | 
         | But this is predicated on it being a secure scheme (that's the
         | feedback I was looking for this time around).
         | 
         | PS: I've got nothing against Turso, or libSQL. In fact I spent
         | the last year perusing their virtual WAL API. The problem is
         | that I found no documentation, nor any useful open source
         | implementations of it. If there any I'd be very interested. So,
         | thus far, I also don't have anything that drives towards
         | libSQL.
         | 
         | [1] https://github.com/ncruces/go-sqlite3
        
       | 580515975 wrote:
       | A couple of thoughts:
       | 
       | First, great job on the readme! One way you could improve it is
       | by expanding on the "Caution" section. What's written is the
       | beginnings of a threat model, but it could be improved by being
       | more explicit about which attacks this system does/doesn't defend
       | against.
       | 
       | > The only security property that disk encryption (and this
       | package) provides is that all information such an adversary can
       | obtain is whether the data in a sector has (or has not) changed
       | over time.
       | 
       | I think the adversary learns a bit more than this. Randomized
       | encryption would provide the above probably, but the
       | _deterministic_ scheme that's used here will let the adversary
       | learn not only whether a sector changed, but whether its value
       | matches what it was at a previous point in time.
       | 
       | How does this translate into the security of the database,
       | itself? Seeing what blocks have changed might reveal information
       | about what data has changed. Let's consider a security game where
       | I (the adversary) get to submit sql queries, and then learn which
       | blocks on disk has changed. After this initial phase (where I can
       | learn where data is stored), I submit two different sql queries,
       | you pick one of them at random and run it, and then tell me which
       | blocks have changed. I win if I can guess which sql query you
       | picked.
       | 
       | Suppose I submit queries which each insert into a different
       | table. Because the tables are stored separately on-disk, it'll
       | probably be pretty easy for me to distinguish them. But okay,
       | that's still count-ish/size-ish, and maybe out of scope.
       | 
       | What if I submit two queries which each insert different values,
       | but into the same table. Further, let's say that this table has
       | an index. Based on which pages were written to, I can now learn
       | something about the _values_ that were inserted, because
       | different values will write into the index in different places.
       | 
       | Now, it's completely valid if the threat model says, "if you can
       | see more than two copies of the database file, then all is lost."
       | However, I think it'd be worth translating the current write-up
       | of the threat model into the implications for leaking the
       | database. For more examples of attacks based on seeing what
       | indices/sizes changed [1] and [2].
       | 
       | Is it valid to pad the sqlite file to a multiple of the block
       | size? Does sqlite ever call truncate on a non-block-aligned size
       | and expect any truncated bytes to be fully removed?
       | 
       | What are the atomicity requirements for a sqlite VFS? SQLite, in
       | general, is supposed to not get corrupted if power were to be
       | yanked mid-write. However, because this VFS writes one block at-
       | a-time, the computer dying mid-write could corrupt more bytes
       | around the write position than would normally be corrupted if the
       | the standard VFS was used. It's possible this is a non-issue, but
       | it's worth considering what contract sqlite has for VFSes.
       | 
       | [1]: https://en.wikipedia.org/wiki/CRIME [2]:
       | https://www.usenix.org/legacy/events/sec07/tech/full_papers/...
        
         | cryptonector wrote:
         | The threat model has to exclude:
         | 
         | - attacks on a running app that has the keys loaded, naturally
         | 
         | The threat model has to include at least:
         | 
         | - passive attacks against the DB itself, lacking access to the
         | keys
         | 
         | The threat model really should also include:
         | 
         | - active attacks against the DB lacking access to the keys
         | (e.g., replace blocks)
         | 
         | IMO ZFS does a pretty good job against these threats, for
         | example, so ZFS is a good yardstick for measuring things like
         | TFA.
         | 
         | However, the fact that a _running system_ must have access to
         | the _keys_ means that at-rest data encryption does not buy one
         | much protection against server compromise, especially when the
         | system must be running much /most/all of the time. So you
         | really also want to do the utmost to secure the
         | server/application.
        
           | ncruces wrote:
           | ZFS, AFAIK, can offer something in addition which is harder
           | for a VFS to offer, and which AFAICT no other SQLite
           | encryption offers: a kind of HMAC Merkel tree that
           | authenticates an entire database (at a point in time).
           | 
           | Alternatives, even those that use MACs only authenticate
           | pages/blocks. They still allow mix-and-match of pages/blocks
           | from previous backups.
           | 
           | I could, potentially, add optional/configurable nounces and
           | MACs at the VFS layer.
           | 
           | I've refrained from doing so because (1) it complicates the
           | implementation; (2) it can be added later, compatibly; (3) it
           | doesn't fix mix-and-match; (4) it will impact performance
           | further; and (5) it would be MAC-then-encrypt (against best
           | practice).
        
             | cryptonector wrote:
             | Yes, that's right. In fact, ZFS w/ encryption gives you two
             | Merkle hash trees, one using a hash function _and_ using a
             | MAC. SQLite3 could do this, but it would have to change its
             | database format fairly radically.
             | 
             | A SQLite3 VFS could, maybe, store additional metadata on
             | the side knowing the SQLite3 database file format, I
             | suppose. But if you _really_ want this it 's best to do it
             | in the database itself.
        
         | ncruces wrote:
         | First of all, thanks for the review. I'll try to respond to all
         | points.
         | 
         | Disk encryption, on which this is based, is usually
         | deterministic in nature.
         | 
         | So yes, an adversary 1 that can inspect multiple versions of a
         | database (e.g. backups) can learn exactly: which (blocks)
         | changed, which didn't change, which have been reverted; but
         | that is _all_ they should learn.
         | 
         | An adversary 2 that can modify files, can also mix-and-match
         | blocks between versions to produce a valid file with high
         | probability .
         | 
         | And an adversary 3 that can submit changes and see their effect
         | on the encrypted data can probably infer a lot about the
         | database.
         | 
         | I'll try to make these more explicit in the README. In
         | practical terms: adversary 1 is the one I thought I'd covered
         | reasonably well; adversary 2 means that backups should be
         | independently signed, and signatures verified before restoring
         | them; adversaries 2 and 3 mean that this is ineffective against
         | live attacks.
         | 
         | Security, though, is also about comparing options. Reading the
         | documentation for alternatives (even the expensive ones) I
         | don't see this kind of analysis. I see 2 advantages to the
         | alternatives that encrypt page data with a nounce and a MAC.
         | The nounce allows reverts to go unnoticed. No change, means a
         | block definitely didn't change. But ciphertext changing doesn't
         | necessarily mean plaintext changed. The MAC ensures blocks are
         | valid. But they still be reverted to previous versions of
         | themselves, mix-and-match is still possible. Do these two
         | properties make a huge difference? Is there anything else I'm
         | missing?
         | 
         | On your other points.
         | 
         | Yes it's always safe to round up file sizes to block size, for
         | databases, journals and WALs (I could detail why, but the
         | formats are documented). It may not be safe for all temporary
         | files (I'm assuming it is), but that can be fixed for those
         | files by remembering the file size in memory.
         | 
         | About atomicity, corruption, etc, the VFS is supposed to
         | declare its characteristics [1] to SQLite. Your concerns are
         | covered by SAFE_APPEND and POWERSAFE_OVERWRITE. See also [2].
         | As a wrapper VFS, I filter most of those characteristics from
         | the underlying VFS, forcing SQLite to assume the worst.
         | 
         | [1] https://www.sqlite.org/c3ref/c_iocap_atomic.html
         | 
         | [2] https://www.sqlite.org/psow.html
        
       ___________________________________________________________________
       (page generated 2024-04-30 23:01 UTC)