  Terminal Font Languages (How bit-maps are encoded as soft-font)
  David S. Lawyer
  v3.1g, July 1997

  This explains how to manually create softfont.  BitFontEdit does this
  for you automatically so you don't usually need to read this.  Doing
  it is something like programming but you use a "font language".  But
  you may need to use it for debugging purposes or if you modify or
  rewrite the source code.
  ______________________________________________________________________

  Table of Contents

  1. RASTER and VECTOR GRAPHICS

  2. BIT-MAPS

  3. HEADERS

  4. BASIC BYTE ENCODING SCHEMES

  5. SCANNING METHODS

  6. VOLTAGES, BIT/BYTE ORDER

  7. VT220 (SIXEL) SCANNING DETAILS

  8. WYSE TERMINAL SCANNING DETAILS

  9. CHARACTER MATRIX SIZES

  10. EXAMPLE OF A 7x12 PATTERN for "DUMB" TERMINALS

  ______________________________________________________________________

  (This document will hopefully in the future include X-Windows &
  printers)

  1.  RASTER and VECTOR GRAPHICS

  The displays on most terminals and monitors are a result of raster
  scanning where an electron beams travels horizontally across the
  screen, "painting" a line of hundreds of "dots" or pixels.  Then the
  beam goes down slightly lower (about one pixel width down) and paints
  the next line, etc.  Bit-mapped font software is for use in this type
  of display.  The pixels of a font character are mapped to a certain
  region of the screen where the character is to appear.  A scan line
  clear across the screen may pass thru 80 or so characters.  Thus to
  know what pixels (dots) in this scan line to turn on, the bit-maps (or
  pixel-layouts) of many characters need to be inspected.  This may be
  done by the video card in a PC or by the equivalent in the electronics
  of a "dumb" terminal.  For a graphics display, which displays pictures
  as well as text, this is often done by the main part of the computer
  (the CPU chip) with the CPU creating the screen image and the video
  card storing it in its memory.

  Another electronic scanning method which is rather rare is "vector
  graphics".  In this case the electron beam is "smart" and traces out
  patterns on the screen much as you would do with a pen, moving not
  just horizontally but in any direction.   By doing this there are no
  jags in drawing a diagonal line (such as happens in raster graphics).
  While vector graphics is more advanced than raster graphics, it's
  difficult to employ on color terminals/monitors since the color dots
  on the inside of a color picture tube form an inherent raster of fixed
  dots.  Vector graphics uses the concepts of lines rather than pixels
  so that bit-mapped font software is of no use for it.  However, some
  raster graphic terminals can emulate a vector graphic terminal by
  being able to display vector graphic code on a raster graphic screen.
  The result is not a true vector graphic display and slanted lines are
  likely to appear jagged.

  2.  BIT-MAPS

  A bit map is simply a matrix, the elements of which are either 0 or 1.
  Each element represents a pixel (a location (dot) on a sheet of paper
  or on a screen).  A pixel may be "on" or "off".  1 means the pixel is
  "on" while 0 means it is "off".  If the pixel is "on" the dot is
  normally "black" and if its off the "dot" is not printed.  Substitute
  for "black" the color of ink you are using or the foreground color
  selected on a video monitor or terminal.  An "off" pixel will be of
  the background color, which is the color of the paper if you are
  printing.  A number which is either 0 or 1 may be represented by a
  bit.  Thus the name "bit-map".

  For graphics, a pixel may have other attributes (such as color) and
  this requires many bits.  But for most fonts a pixel requires only one
  bit.  However, we could represents a bit by the ASCII letters 0 or 1,
  where each letter uses up 8 bits in the computer.  This use of 8 bits
  to represent one pixel uses 8 times the memory necessary.  Memory is
  becoming so cheap that such overuse of memory is, in some cases (if
  the character doesn't use too many pixels), of little significance.
  The coding rules are from the late 1980's but they should still work.

  3.  HEADERS

  In addition to the pixel map (bitmap), other information may accompany
  each character such as its width (for the case of proportional spaced
  characters).  Such information may constitute a character header
  (descriptor per HP printers).  In rare cases no such header is needed
  but even in this case some kind of separator mark may be inserted
  between the bitmaps of successive characters in order to separate
  them.

  In addition to a possible header for each character bitmap, there is
  usually a font header placed at the start of the soft-font.  It may
  tell the terminal or printer such things as: 1.  A name for the font.
  2.  How many font characters are about to be sent?  3. What is the
  ASCII character number for the first character to be sent?  4. What
  bank to store the font in.  5. What to do with the font that was
  formerly stored there.

  In most cases there are both headers for each character and a single
  header for the entire soft-font.  In some cases, one of these is
  omitted.  If enough information is provided in the header for each
  character, then there is no need for a header for the entire font.
  For terminal font: VT220 has a font header but no character headers;
  Wyse has character headers but no font header; X-Windows has both.
  Nomenclature varies.  In one case the first couple bytes of the header
  is itself called a header but the full header may be called something
  else.

  These headers together with the the encoded bit-maps of characters
  constitute the soft-font.  Soft-font is downloaded to the terminal or
  printer on same wire on which character codes are sent for displaying
  or printing characters.  Such soft-font (or segments of softfont) must
  be "escaped" or the like so that it will not be mistaken for
  characters to be printed or displayed.  This is usually done by an
  escape sequence which starts with the escape (ESC) character (Hex 1B).
  The code just after the ESC may tell the device that what follows is
  softfont code or may even give the number of bytes of soft-font to
  follow.  Unless the number of bytes of soft-font is specified in
  advance, some kind of an "end" character or escape sequence must be
  sent to mark the end of the soft-font.

  4.  BASIC BYTE ENCODING SCHEMES

  How does one represent a bit-map of a character in soft-font?  In
  order to understand (or write) a program to create softfont one must
  know this.  Unfortunately for the font programmer, there are many
  different ways to represent a bit-map.  One way would be to represent
  it within a rectangle (cell) on a page with a character such as * (or
  1) representing "on" pixels and a space (or 0) representing "off"
  pixels.  Since the * character is represented by an ASCII byte in the
  computer memory, one could simply put this character into the soft-
  font to represent an "on" pixel.  Likewise for the "off" pixel
  (space).  This is not very efficient in memory utilization (and disk
  storage utilization) since each pixel uses 8 bits.

  The most efficient way to represent a simple pixel is just by a single
  bit.  Most printer fonts do just this but fonts for ANSI/ASCII
  terminals don't do it this way.  Just how they do it will be explained
  shortly.

  It's sometimes desirable to be able to edit a soft-font file with an
  ordinary editor in order to change (or even create) the font header,
  and possibly for other purposes such as to check the format.
  Unfortunately, if the softfont is represented in the most efficient
  and simple way (a bit in the map is represented by a bit in the
  computer) then the soft-font file is a binary file containing the
  entire range of byte values.  Most editors and word processors can't
  handle this very well (if at all).  Thus there is a tradeoff between
  storage efficiency and editability.  How could one make a soft-font
  file easy to edit?

  The pixels of a bitmap may be grouped into bytes (usually of 8 pixels
  (or bits) each).  One way to represent each such byte (which is a
  decimal integer number between 0 and 255) is by a 3 digit ASCII number
  (such as 179).  This requires 3 bytes to represent one byte.  A better
  way is to represent the byte by two hexadecimal ASCII digits (such as
  B3 for 179).  This works since a byte ranges from 00 to FF.  This
  method is used for both Wyse terminal font and font for X-Windows.
  Although it's simple, it only utilizes 16 printable characters: 0, 1,
  2, ..., A, B, ..., F.

  DEC's VT220 terminals came out (in the mid 1980's) with a more
  efficient (but more complex) sixel method which was widely emulated
  (especially by Wyse).  They simply used 6-bit bytes (called "sixels")
  instead of 8-bit bytes.  (The word "sixel" actually means six pixels.)
  Since there are only 64 six-bit numbers they can be readily mapped to
  printable characters.  Rather than devise a new six-bit code mapping
  the numbers 0-63 to printable 8-bit characters, the ASCII code scheme
  is utilized.  Since the first 33 ASCII characters don't print, one
  could simply add 33 to the 6-bit numbers to enable them to print.  Of
  course they now become 7-bit numbers (and occupy 8-bits in memory).
  However, adding 63 (Hex. 3F) will also work and this is exactly what
  the VT sixel encoding method does.  3F is the largest number one may
  add and still get printable ASCII characters.  Thus, roughly speaking,
  it uses the upper half (Hex 40 to 7F) of the lower ASCII range, except
  that since the ASCII character DEL (7F) doesn't print it actually goes
  from Hex 3F to 7E (subtract one).  Thus to convert a "sixel" to
  printable ASCII add Hex 40 and subtract one, or what is the same
  thing, add 3F.

  We have mentioned 3 significant ways of encoding bytes:

  1. directly as 8-bit bytes (extended characters using all 256
     possibilities) resulting in  "binary" soft-font.

  2. as 6-bit bytes (sixels) with 3F added to them to convert them into
     7-bit printable characters, each using up an 8-bit byte for
     storage.

  3. as two-character hexadecimal words (such as B3) resulting in each
     8-bit byte being encoded as a sequence of 2 bytes.

     The efficiencies are: case 1: 100%,  case 2: 75% (6/8)  Case 3:
     50%.  The less "efficient" cases have the advantage of being easy
     to edit with an ordinary editor or word processor since they are
     not binary files.

  5.  SCANNING METHODS

  Just knowing these three basic methods of encoding pixels as bytes
  doesn't give one much a of clue as to how to create soft-font since
  one must know how to scan a character matrix of pixels.   A character
  on most terminals, monitors or printers is just a bunch of pixels (or
  bits) in a character matrix.  How do we partition this matrix into
  bytes?  Which byte is to be sent (downloaded) first to a device (a
  printer or terminal).  Which byte is to be sent next, etc?  The
  scanning method will provide the answer these questions.

  There are many ways to scan a character matrix.  One may start
  scanning rows, columns, or some combination of rows and columns.
  Scanning by rows would start with one row (say the top row), partition
  this row into bytes, and then read the resulting bytes.  Then it would
  repeat this for the next row, etc.  However (for the case of 8-bit
  bytes) the number of pixels in a row may not be an exact multiple of
  8.  Thus it is often necessary to zero fill the last partial byte to
  make it full length.  Should we zero-fill the low order bits or the
  high-order ones?  In scanning rows, should we scan from right to left
  or from left to right?  Should the first bit (pixel) we scan be
  considered high-order or low-order?

  A method of scanning that is neither strictly by rows nor columns
  works for example as follows.  We start scanning the first row but
  read only the first byte.  Then we scan the next row but take only the
  first byte.  Then we do the same for the third row, etc.  Then after
  we have done this for all rows we go back to the first row and read
  the second byte and so on until all the second bytes on every row have
  been read.  Then we repeat for the third byte, etc., etc. until all
  the bytes in the character matrix have been read.

  This is something like drawing a character matrix on square-ruled
  graph paper and scanning with a toy car one byte (say 8 pixels or
  squares) wide.  The car starts from the top of the page and rolls down
  the left hand "strip" of the paper, running over 8 pixels at a time.
  Thus while the scanning is by along rows until 8 bits are read, it may
  also be viewed as scanning down "wide columns" reading in 8 pixels (a
  byte) at a time.

  6.  VOLTAGES, BIT/BYTE ORDER

  What voltages are used?  For the conventional serial port a 1 bit is
  about -12 volts and a 0 bit is about +12 volts.  The exact voltages
  may vary.  A received voltage of between about 2 and 25 volts may be
  deemed to be a 0 bit.  A modem will convert these digital pulses into
  an analog phase-amplitude modulated signal.  The exact details,
  including the possible addition of start, stop, and parity bits by the
  serial port and the possible stripping of these bits by the modem is
  beyond the scope of this document.  See the Linux Serial-HOWTO (After
  Sept. 1998) for more details.

  Another question is how to send a byte to a device.  It's often done
  over a serial line which means a single wire over which one bit is
  sent at a time.  Which bit of a byte is to be sent first?  The ANSI
  standard for ASCII characters is that the low-order (least
  significant) bit is sent first.  The hardware for a serial port should
  automatically do this in converting from the parallel bus of the
  computer to serial.  Since the serial port doesn't know the difference
  between an ASCII character and some other kind of byte (such as part
  of a binary code), it sends the low-order byte first for all kinds of
  bytes.  Internal modems normally do the same.

  While the low-order bit of each byte is always sent first, there is
  still another question as to how to send an integer which consists of
  2 or 4 bytes.  Intel based machines send the low-order byte first.
  This is called little-endian order which means little-end-first.
  Motorola, SPARC, and Power PC based machines do the opposite and are
  big-endian.  This should have no effect on a bit-map sent as a
  sequence a bytes (and not as integers).  However, if integers are part
  of the headers for a font (fonts for dumb terminals don't use this)
  then they need to be compatible with the machines (including printers)
  they are being used on.

  7.  VT220 (SIXEL) SCANNING DETAILS

  The VT220 terminal font is generated by the type of scanning
  previously described that is neither by rows or columns.  It may be
  thought of as scanning by a toy car (six pixels wide) which runs from
  left to right (just like one would read a page).   It reads in the top
  strip (a wide "row" 6 pixels in height) by moving from left to right
  across the character matrix.  Then it moves back to the left edge of
  the matrix, jumps down 6 pixels, and rescans from left to right again
  like a human reading 6 lines in one sweep.  If the character matrix is
  18 pixels in height, then it scans the entire matrix in 3 sweeps from
  left to right.  The highest bit (at the top of the matrix) is the low-
  order bit for the first scan, etc.  If the bottom (last) strip is less
  than 6 pixels high, then each byte in this strip is zero-filled with
  high order zeros to make full-sized 6-bit bytes (sixels).  These
  sixels are then converted to printable 7-bit bytes (stored as 8-bit
  bytes) using the scheme previously described.  Since this encoding
  only uses 64 ASCII printable characters, many other characters are
  left over to punctuate the results.

  Here is an example of soft-font code for a Russian character 12 pixels
  high by 7 pixels wide: wACCcQw/NCA@??N;  The first 7 characters,
  wACCcQw, represents the 7@ 6-bit bytes from the first scan of the top
  strip of the character.  The first byte is "w" (hex. 77) which after
  subtracting 3F results in a "sixel" of hex. 38 = 111000.  This
  vertical sixel has 3 on-pixels in the lower 3 positions (the low order
  pixels, 000, are at the top).  Punctuation marks are "/" which
  separates scans of wide rows, and ";" which marks the end of a
  character.  Before sending a stream of such soft-font code to a VT220
  terminal, one must send a complicated header code of many bytes.  This
  is described in install_softfont <install_softfont.txt>
  8.  WYSE TERMINAL SCANNING DETAILS

  This just scans one row at a time starting with the first row, going
  from left to right, just like reading a page.  The bit by the left
  margin is the high order one.  If the width is less than 8 pixels the
  zero-fill is done on the low order pixels.  Here is a sample soft-font
  encoding for a character: ESCcA134003E42424242828282FE820000^Y.

  ESC is the escape control character.  ^Y is control-Y.  The encoding
  for the character starts with the 00 just after A134 and ends with the
  final 00 before the ^Y.  The 00's means a zero byte for that row (all
  pixels off).  In contrast to the VT220 encoding, each character has
  its own header.  Here the header is ESCcA134 which means the character
  is ASCII 34 (hex.) put into bank 1.  ESCc says that a character
  encoding is to follow.  The ^Y is only needed on early model Wyse
  terminals but seems to do no harm in other cases.

  The BDF format (for X-Windows) is allegedly the same as above (Wyse)
  but the headers are different and more complex.

  9.  CHARACTER MATRIX SIZES

  This is also sometimes called the character cell size.  There are
  usually two sizes for the same character and it can be confusing if
  you don't understand it.  The larger size is the advertised size which
  includes rows and columns that are almost always left blank to provide
  for spacing between characters.  Since one doesn't normally (if ever)
  use them, they are not included in the cells that are encoded nor in
  the cells that are used in a "pattern file" for my BitFontEdit
  software.  Sometimes one must include a row or column in the pattern
  file (and soft-font code) which is required to contain no on-pixels.

  10.  EXAMPLE OF A 7x12 PATTERN for "DUMB" TERMINALS

  Using the BitFontEdit program one creates a "pattern file" using any
  editor or word processor.  Here is an example of what the encoding of
  an "A" would look like in such a pattern file.  If you have read the
  rest of this document, it will be easy to follow this.  In
  BitFontEdit, the pattern of *'s and spaces is stored in an array of
  character matrices called a band[][][].

  Note: 7 is the width in pixels, 12 is the height in pixels.  It might
  be displayed within a larger 10x13 cell on a terminal.  But in this
  program the 7x12 region is also called a "cell".  The pattern of *'s
  inside this cell does not usually go all the way to the top or bottom
  of the cell but it is still called a 7x12 pattern (or cell).  Note
  that in matrix algebra notation it is a 12x7 matrix.

  A pattern Characters is shown below (in 2 different formats).  Such a
  pattern is also known as a Character-matrix or a dot-matrix.  If you
  think of the *'s as 1's and the background as 0's it is also a bitmap.
  In BitFontEdit, one format uses dots for the background.  The other
  format uses spaces for background with vertical bars separating
  characters.  Several such matrices in a row (on a "page") form a
  "band" of several Character matrices.  BitFontEdit will automatically
  determine which format you have used.

  In BitFontEdit the fill_band() function scans a band of several such
  Characters and puts the pixels (including background pixels but
  excluding separators such as | ) into the band[][][] array of
  characters.  For a certain Char_no, Char_matrix [row] [col] = band
  [Char_no] [row] [col].  The task is to scan Char_matrix to derive the
  softfont code for that character.  Below 11,6 means row 11 col. 6,
  etc.   The index origin is 0 but index origin=1 is used in BitFontEdit
  error messages. .

  For VT220 terminals:

  The character pattern shown below is encoded as two sequences of 7
  ASCII bytes, with a slash / separating the two sequences.  The six-bit
  bytes (sixels) are read from "half-columns" with low order pixels at
  the top.  Note: Add 3F to each byte before outputting to the soft-font
  file.  For taller cells, a column may be split up into 3 sixels.  The
  bottom sixels have their high-order pixels padded with 0's (if
  needed).

  |       | (0,0).......(0,6)   FOR VT220:
  |   *   |      ...*...  The 1st byte has pixels (5,0)-(0,0) = 110000
  |  * *  |      ..*.*..  The 2nd byte has pixels (5,1)-(0,1) = 101000
  | *   * |      .*...*.  The 3rd byte has pixels (5,2)-(0,2) = 100100
  |*     *|      *.....*  ......
  |*******|      *******  The 7th byte has pixels (5,6)-(0,6) = 110000
  |*     *|      *.....*  Add a / for separation
  |*     *|      *.....*  The 8th byte has pixels (11,0)-(6,0) = 001111
  |*     *|      *.....*  The 9th byte has pixels (11,1)-(6,1) = 000000
  |*     *|      *.....*  ..........
  |       |      .......  The 14th byte has pixels (11,6)-(6,6) = 00111
  |       |(11,0).......(11,6)    Add ; to end the character definition

  For WYSE terminals:                     (0,0) is the high-order bit.
  The 1st byte has pixels (0,0)-(0,6) = 00000000 = 00 hex
  The 2nd byte has pixels (1,0)-(1,6) = 00010000 = 10 hex
  The 3rd byte has pixels (2,0)-(2,6) = 00101000 = 28 hex
  .........
  The 12th byte has pixels (11,0)-(11,6) = 00000000 = 00 hex

  Note: Represent each byte as two Hex. digits as shown above.

