Newsgroups: comp.std.internat
Path: utzoo!utgpu!watserv1!watdragon!rose.waterloo.edu!ccplumb
From: ccplumb@rose.waterloo.edu (Colin Plumb)
Subject: Re: Unicode vs ISO DIS 10646 (was universality of Latin-1)
Message-ID: <1991May5.064852.12971@watdragon.waterloo.edu>
Sender: news@watdragon.waterloo.edu (News Owner)
Organization: University of Waterloo
References: <ENAG.91May3200814@maud.ifi.uio.no> <1991May4.180549.29162@voa3.VOA.GOV> <124144@unix.cis.pitt.edu>
Date: Sun, 5 May 1991 06:48:52 GMT
Lines: 28

I'm not a great linguist (English, French, and German), but I also like
separate accents because it's so much easier to accomodate wierd uses.
Mathematicians put funny accents over and under every letter in creation.
Ever played with rho-hat?  Linguists and phoneticists may do the same.
And it's such a bother enumerating all the legal possibilities.
There's a CCITT standard which I can't seem to locate right now that
uses non-spacing accents, and it seems like the right thing to me.
Yess, e-acute is conceptually one thing in French, but qu and ph are
pretty distinct entities in English, and I can't say for sure how
different o-umlaut is from oe in German.  Mc and Mac have been
special-cased in many places in English (the correct all-caps spelling
of McDonald's is McDONALD'S), with superscript c's being common.

It's pretty impossible to come up with a character standard that
only lets you do sensible things.  All I can suggest is, don't do
the senseless ones.  Treat accented characters as double-byte characters
(recognizable by the first byte) if the accents are inseparable, but
don't if they can be logically separated.

The CCITT standard also specifies a subset of the possible combinations
that are required to be displayable, but the usual cheap implementation
is probably accents plus some sort of character-height information,
while a higher-rent scheme uses some dedicated pairs, with fallback to the
former.  Separate accents makes the low-cost scheme much easier,
without seriously hampering the higher-cost one.  Good typesetting systems
already handle ligatures and kerning as is.
-- 
	-Colin
