Design notes
============

There are three main abstractions in the design of dedup:

  - The chunker interface
  - The snapshot layer
  - The block layer

The block layer
---------------

From the outside world, the block layer is just an abstraction for
dealing with variable length blocks.  All blocks are referenced with
their hash.

The block layer is arranged into a stack of layers.  From top to
bottom these are as follows:

  - Generic layer
  - The compression layer
  - The encryption layer
  - The storage layer

The generic layer is the one that client code interfaces with.  It is
the top level entrypoint to the block layer.  The generic layer
calculates the hash of the block and passes it down to the compression
layer.

The compression layer will prepend a compression descriptor to the
block and then compress the block using snappy or lz4.  It is possible
to disable compression in which case a special descriptor is prepended
and the data is passed uncompressed to the encryption layer.

The encryption layer will prepend an encryption descriptor to the
block and then encrypt/authenticate the block using XChaCha20 and
Poly1305.  It is possible to disable encryption in which case it acts
as a bypass with a special type of encryption descriptor.  The block
is then passed to the storage layer.

The storage layer will prepend a storage descriptor and append the
descriptor and the data to a single backing file.

The snapshot layer
------------------

The snapshot abstraction is currently very simplistic.  A snapshot is
a file under $repo/archive/<name>.  The contents of the file are the
block hashes of the data stored in the snapshot.

The chunker interface
---------------------

The chunker issues variable length blocks.  The minimum block size is
512KB, the maximum block size is 8MB and the average block size is
2MB.  These configuration parameters can be modified by editing
config.h but it can be tricky to tune it properly.

The buzhash[0] rolling hash algorithm is used to fingerprint the input
stream.

When encryption is enabled, a random seed is generated and stored
encrypted in the repository state file.  The seed is XOR-ed with the
buzhash initial state table to mitigate against length fingerprinting
attacks.

[0] http://www.serve.net/buz/Notes.1st.year/HTML/C6/rand.012.html
