  Compression algorithm
  Markus Gutschke, gutschk@math.uni-muenster.de
  1 July 1997

  Some notes about the compression algorithm


  The  compressor achieves  an  average compression  rate of 60%  of the
  original size which is on par with "gzip". It seems that you cannot do
  much better for compressing  compiled  binaries.  This means that  the
  break even  point  for using compressed  images is   reached, once the
  uncompressed size approaches 1.5kB. We  can stuff more than 12kB  into
  an 8kB EPROM and more than 25kB into an 16kB EPROM.   As there is only
  32kB of RAM  for both the uncompressed  image  and its BSS  area, this
  means that 32kB EPROMs will hardly ever be required.



  The compression  algorithm uses a  4kB  ring buffer  for buffering the
  uncompressed data. Before   compression starts,  the  ring buffer   is
  filled  with spaces (ASCII  character  0x20).  The algorithm tries  to
  find repeated  input sequences of a  maximum length of  60 bytes.  All
  256 different input  bytes  plus the 58 (60   minus a threshold of  2)
  possible  repeat lengths form a set  of 314 symbols. These symbols are
  adaptively Huffman encoded.  The  algorithm starts out with a Huffmann
  tree  that  assigns equal code lengths    to each of  the  314 symbols
  (slightly favoring the repeat  symbols over symbols for regular  input
  characters), but  it will be changed whenever  the frequency of any of
  the symbols  changes. Frequency counts are  kept in 16bit  words until
  the total number of compressed codes totals 2^15.  Then, all frequency
  counts will be halfed (rounding to the bigger number).  For unrepeated
  characters (symbols 0..255) the Huffman code  is written to the output
  stream.  For repeated characters the  Huffmann code, which denotes the
  length of the repeated character sequence, is written out and then the
  index in the ring buffer is computed.   From this index, the algorithm
  computes  the offset   relative to  the current  index  into  the ring
  buffer. Thus,  for typical input data,  one would expect that short to
  medium range offsets are more frequent  than extremely short or medium
  range to long range offsets. Thus the  12bit (for a 4kB buffer) offset
  value  is statically Huffman encoded  using a precomputed Huffman tree
  that favors  those  offset  values    that  are deemed to   be    more
  frequent. The  Huffman encoded offset  is  written to the output  data
  stream,  directly  following the code  that   determines the length of
  repeated characters.



  This algorithm, as implemented in the  C example code, looks very good
  and  its operating parameters are   already well optimized. This  also
  explains   why  it achieves     compression ratios    comparable  with
  "gzip". Depending on the input data, it sometimes excells considerably
  beyond what "gzip -9" does, but this  phenomenon does not appear to be
  typical. There are some flaws with  the algorithm, such as the limited
  buffer  sizes, the  adaptive  Huffman tree  which takes  very  long to
  change, if    the input  characters  experience   a sudden   change in
  distribution, and the static Huffman   tree for encoding offsets  into
  the  buffer.   The slow  changes of   the  adaptive  Huffman  tree are
  partially counteracted by  artifically keeping  a 16bit precision  for
  the frequency counts, but  this does not  come into play until 32kB of
  compressed data is output, so  it does not  have any impact on our use
  for "etherboot", because  the BOOT Prom  does not support uncompressed
  data of more then 32kB (c.f. doc/spec.doc).




  Nonetheless,  these problems  do  not  seem  to affect  compression of
  compiled  programs very much.  Mixing  object code with English  text,
  would not work too  well though, and  the algorithm should be reset in
  between. Actually, we  might  gain a little  improvement, if  text and
  data   segments    were compressed  individually,    but   I have  not
  experimented with this option, yet.




























































