tExplain the chunker a bit in the DESIGN document - dedup - deduplicating backup program
(HTM) git clone git://git.z3bra.org/dedup.git
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
(DIR) commit 08600b08eec99d0c6fce2749ade192cadd4a0ba5
(DIR) parent af4f203b687f0d19bb16036c882fbf2dad994393
(HTM) Author: sin <sin@2f30.org>
Date: Thu, 16 May 2019 16:43:35 +0300
Explain the chunker a bit in the DESIGN document
Diffstat:
M DESIGN | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
---
(DIR) diff --git a/DESIGN b/DESIGN
t@@ -51,4 +51,12 @@ block hashes of the data stored in the snapshot.
The chunker interface
---------------------
-TBD
+The chunker issues variable length blocks. The minimum block size is
+512KB, the maximum block size is 8MB and the average block size is
+2MB. These configuration parameters can be modified by editing
+config.h but it can be tricky to tune it properly.
+
+The buzhash[0] rolling hash algorithm is used to fingerprint the input
+stream.
+
+[0] http://www.serve.net/buz/Notes.1st.year/HTML/C6/rand.012.html