                   Making Fake Composite Fonts


1. Overview

A typical Type 1 font for the latin alphabet contains `pure'
characters, such a `A' or `acute', and composite characters, such as
`Aacute', which are composed of the `A' character, and the `acute'
character using the Type 1 `seac' operator.

Unfortunately, most fonts do not contain all the characters in the ISO
Latin-2 character set.  In particular, most of the Polish characters
(with the exception of `oacute' and `Oacute') are usually missing (a
notable exception is IBM Courier -- not Adobe Courier -- which
contains all the characters you will ever need).  However, the
components of those characters are present.  It should not be
difficult to add the characters we need if the necessary tools were
available...  At least four methods could be used:

i) Do not change the original font, but do overstriking for
individual characters (e.g. typeset an `a', then move backwards and
typeset an `ogonek' for `aogonek').  This method is widely used, for
instance by TeX.

ii) Parse the .pfa or .pfb file containing the font program for a
Type 1 font, and generate a new .pfa containing the composite
characters.  This should not be too difficult (.pfa parsers are freely
available), but would require that users have the .pfa files
corresponding to the fonts they use, which is often not the case.
Furthermore, I believe that it would violate the licence of the
fonts.  I am not quite sure what to make of the following:

    Adobe Systems' Type 1 font programs are licensed for use on one or
    more devices (depending on the terms of particular licenses).
    These licenses would permit the use of a licensed program in a
    system that translates a Type 1 font program to some other format
    in the process of rendering, as long as a copy of the program
    (even in translated form) is not produced.
                         Adobe Type 1 Font Format p. 7, Adobe Systems Inc.

iii) Download the original font, and add to its Charstrings
dictionary the supplementary characters using PostScript.  This would
not violate the licence agreement as the modified font would exist
only in the printer.  However, many PostScript interpreters do not
allow tampering with Type 1 font dictionaries.  Copy protection, as
well known, only ever bothers honest users.

iv) Create a new Type 3 font dictionary which draws characters by
using the characters in the original font.  This has the benefits of
working and being legal.  I expected it to be quite inefficient, but
found it to be reasonably fast -- faster than, for instance, using
Multiple Master fonts.

The `composite.ps' file contained in this distribution, and the
accompanying perl program `composite', follow scheme (iv).  The
following sections describe them in more detail, including information
on extending the code to create other composite characters (the full
ISO-Latin-2 and ISO-Latin-3 ranges being the final goal).

PLEASE NOTE: a Polish typographer would be appalled to see that we
consider the `ogonek' as a diacritical mark, and thus harm the
integrity of the two letters that all Poles love.  Indeed, in a proper
Polish font, the tail of `aogonek' would have a different shape than
that of `eogonek'.  Considering however the poor availability of fonts
with the needed characters, we do not currently have the luxury to
whine about such esthetic problems.


2. AFM files

Information about the fonts comes from `Adobe Font Metrics' (AFM)
files.  Much more information is in AFMs then usually known; in
particular, the encoding vector can be derived from the AFM, and AFMs
contain the composite character information.  Therefore, a program
could generate the needed composite font -- encoding vector and
everything included -- from the composite font.

In order to simplify the handling of AFM information, we use three
different AFMs on every run of the program: (i) an AFM with the
encoding vector to use, (ii) the AFM of the original font, and (iii)
an AFM with supplementary composite character information.  The latter
will usually have to be supplied by the user.  It is needed for AFMs
provided with Type 1 fonts usually only contain composite information
about the characters already in the font (in fact, the Adobe
documentation does not make it quite clear whether it is legal to
insert composite information about characters not already in the font
into an AFM).


3. What it does

The generated composite font contains (i) the characters that were in
the original font, and (ii) the characters which were not in the
original font but for which composite information was provided.  AFM
files for the generated fonts are generated too, which means that the
fonts can be used with most applications.  Furthermore, they can be
reencoded.

In order to maintain compatibility with PostScript Level 1, every
composed font must be based on a base encoding vector, and characters
can only be composed from components in that vector.  Thus, in order
to build `Nacute' from `N' and `acute', both `N' and `acute' must be
in the base vector.

However, as the base vector is only ever used internally, much liberty
can be taken when designing it.  In particular, it doesn't need to be
compatible with any other vector, and there is no reason to avoid the
control character range.  The base encoding vector that I use is
called `FunkyEncoding'; it is based on Latin-2, but contains all the
characters of `StandardEncoding' (albeit in strange places).

There is a limitation on the composite information that can be used.
In particular, the only composite entries used are of the form:

    CC ... ; PCC x 0 0 ; PCC ' ... ; ...

In other words, there must be exactly two characters to compose, the
first character must be set at the origin, and the width of the
composite character is taken to be that of the first character.  This
limitation is easy to lift, and I will generalize the code if you send
me AFMs that do not obey this convention (I have never seen any, and,
indeed, this is the format required by the Type 1 `seac' operator).
AFMs that do not obey it are (hopefully) gracefully handled (you
should see warnings about CC entries being ignored).

PLEASE NOTE: due to what I believe to be a bug in Ghostscript, some
older versions of gs need to be run with the -dNOPLATFONTS flag in
order to use my composite fonts.  You might want to run the versions
of Ghostscript which do not have this bug with -dNOPLATFONTS anyway,
as the platform fonts tend to differ from the real PostScript fonts
quite a bit.


4. Usage

The `composite' program can be run to generate either an encoding
vector or a composite font.  It is a perl script, and only reformats
the data; the real magic is in the file `composite.ps'.

In order to generate an encoding vector from a suitable AFM,
`composite' is run thus:

    % ./composite -e latin2.afm -E latin2.enc

which will generate the file `latin2.enc' from the AFM `latin2.afm'
(this is the default).  Any AFM file is suitable as input, but most
AFMs do not contain all the possible characters of an encoding vector
(the missing ones will be replaced by `.notdef').

In order to generate a composite font program, more input must be
provided:

    % ./composite -i Times-Roman.afm -c Times-Roman-Comp.afm \
        -o Times-Roman-Ogonki.ps -n Times-Roman-Ogonki \
        -a Times-Roman-Ogonki.afm \
        -e funky.afm -t adobe.afm

where -i specifies the AFM of the original font, -c supplementary
composite character information, -o the composite font program to
generate, -n the name of the new font, -a the name of the AFM file to
generate, -e the base encoding AFM to use, and -t the target encoding.

The file `makecomp' contains a shell script (run by the installation
program) which generates composite fonts for the Times family from the
supplementary AFMs `Times-*-Comp.afm', as well as for Helvetica (only
the upright version).  The former were created by hand from scratch in
one night; I am sure that you can improve on them.  They only contain
characters for Polish, Slovenian and Croatian.  The latter was written
by Primoz Peterlin, and contains many more characters.


5. How to add new accented characters

Assume that you want to add the `ccaron' and `Ccaron' characters to
the Times-Roman font.  Start from similar characters already present
in the base Times-Roman font -- for example, zcaron and Zcaron.  Take
the corresponding `composite character' (`CC') line from the
Times-Roman.afm file:

CC zcaron 2 ; PCC z 0 0 ; PCC caron 55 0 ;

and insert it into the Times-Roman-Comp.afm file, changing the names
of the characters thus:

CC ccaron 2 ; PCC c 0 0 ; PCC caron 55 0 ;

This line can be used as a strarting point.  Execute ./instogonki,
typeset some text with the new character, and fine-tune the last set
of coordinates (55 0).

If you create new composite font AFMs, please send me a copy so that I
can include them in the distribution.

                                   J. Chroboczek <jec@dcs.ed.ac.uk>

