'\"macro stdmacro
.if n .pH g1a.colltbl @(#)colltbl	40.7
.\" Copyright 1989 AT&T
.if \nX=0 .ds x} colltbl 1M "System Administration Utilities" "\&"
.if \nX=1 .ds x} colltbl 1M "C Programming Language Utilities"
.if \nX=2 .ds x} colltbl 1M "" "\&"
.if \nX=3 .ds x} colltbl "" "" "\&"
.ds e' \o"e\(ga"
.TH \*(x}
.SH NAME
\f4colltbl\f1 \- create collation database
.SH SYNOPSIS
\f4colltbl \f1[ \f2file | \f4- \f1]\f1
.SH DESCRIPTION
The \f4colltbl\f1 command takes as input a specification
file, \f2file\f1, that describes the collating 
sequence for a particular language
and creates a database that can be read by \f4strxfrm\f1(3C)
and \f4strcoll\f1(3C).  \f4strxfrm\f1(3C) transforms its first
argument and places the result in its second argument. The
transformed string is such that it can be correctly ordered
with other transformed strings by using \f4strcmp\f1(3C),
\f4strncmp\f1(3C) or \f4memcmp\f1(3C).  \f4strcoll\f1(3C)
transforms its arguments and does a comparison.
.P
If no input file is supplied, \f2stdin\f1 is read.  
.P
The output file produced contains the database with collating sequence
information in a form usable by system commands and routines.
The name of this output file is the value you assign to the keyword
\f4codeset\f1
read in from
.IR file .
Before this file can be used,
it must be installed in the
\f4/usr/lib/locale\f2/locale\f1
directory with the name
\f4LC_COLLATE\f1
by someone who is super-user or a member of group \f4bin\f1.
\f2locale\f1 corresponds to
the language area whose collation sequence is described in \f2file\f1.
This file must be readable by user,
group, and other; no other permissions should be set.
To use the collating sequence information in this file,
set the
\f4LC_COLLATE\f1
environment variable appropriately
(see
\f4environ\fP(5)
or
\f4setlocale\fP(3C)).
.P
The \f4colltbl\f1 command can support languages whose collating
sequence can be completely described by the following cases:
.IP \(bu 4
Ordering of single characters within the codeset.
For example, in Swedish, \f4V\f1 is sorted after \f4U\f1, before \f4X\f1 and
with \f4W\f1 (\f4V\f1 and \f4W\f1 are considered identical as far
as sorting is concerned).
.IP \(bu 4
Ordering of "double characters" in the collation sequence.
For example, in Spanish, \f4ch\f1 and \f4ll\f1 are collated after \f4c\f1
and \f4l\f1, respectively.
.IP \(bu 4
Ordering of a single character as if it consists of two characters.
For example, in German, the "sharp s", \f4\(*b\f1, is sorted as \f4ss\f1.
This is a special instance of the next case below.
.IP \(bu 4
Substitution of one character string with another character string.
In the example above, the string \f4\(*b\f1 is replaced with \f4ss\f1
during sorting.
.IP \(bu 4
Ignoring certain characters in the codeset during collation.
For example, if \f4\-\f1 were ignored during collation, then the
strings \f4re\-locate\f1 and \f4relocate\f1 would be equal.
.IP \(bu 4
Secondary ordering between characters.  In the case where two
characters are sorted together in the collation sequence, (i.e.,
they have the same "primary" ordering),
there is sometimes a secondary
ordering that is used if two strings are identical except for
characters that have the same primary ordering.
For example, in French, the letters \f4e\f1 and \f4\*(e'\f1 have the same
primary ordering but \f4e\f1 comes before \f4\*(e'\f1 in the secondary
ordering.
Thus the word \f4lever\f1 would be ordered before \f4l\*(e'ver\f1,
but \f4l\*(e'ver\f1 would be sorted before \f4levitate\f1.
(Note that if \f4e\f1 came before \f4\*(e'\f1 in the primary ordering,
then \f4l\*(e'ver\f1 would be sorted after \f4levitate\f1.)
.PP
The specification file consists of three types of statements:
.IP "1." 4
\f4codeset	\f2filename\f1
.IP
\f2filename\f1 is the name of the output file to be created by
\f4colltbl\f1.
.IP "2." 4
\f4order is	\f2order_list\f1
.IP
\f2order_list\f1 is a list of symbols, separated by semicolons,
that defines the collating sequence.  The special symbol, \f(CB...\f1,
specifies symbols that are lexically sequential in a short-hand
form.
For example,
.RS
\f4     order is	a;b;c;d;...;x;y;z\f1
.RE
.IP
would specify the list of lower_case letters. Of course, this could
be further compressed to just 
\f4a;...;z\f1.
.IP
A symbol can be up to two bytes in length and can be represented
in any one of the following ways:
.RS
.IP \(bu 4
the symbol itself (e.g., \f4a\f1 for the lower-case letter \f4a\f1),
.IP \(bu 4
in octal representation (e.g., \f4\\141\f1 or \f40141\f1 for the letter
\f4a\f1), or
.IP \(bu 4
in hexadecimal representation (e.g., \f4\\x61\f1 or \f40x61\f1 
for the letter \f4a\f1).
.RE
.IP
Any combination of these may be used as well.
.IP
The backslash character, \f4\\\f1 , is used for continuation.
No characters are permitted after the backslash character.
.IP
Symbols enclosed in parenthesis are assigned the same primary ordering
but different secondary ordering.  Symbols enclosed in curly 
brackets are assigned only the same primary ordering.
For example,
.IP
.RS
.nf
\f4	order is	a;b;c;ch;d;(e;\*(e');f;...;z;\\
		     {1;...;9};A;...;Z\fP
.fi
.RE
.IP
In the above example, \f4e\f1 and \f4\*(e'\f1 are assigned the
same primary ordering and different secondary ordering, digits 1 
through 9 are assigned the same primary ordering and no secondary ordering.
Only primary ordering is assigned to the remaining symbols.
Notice how double letters can be specified in the collating
sequence (letter \f4ch\f1 comes between \f4c\f1 and \f4d\f1).
.IP
If a character is not included in the \f4order is\f1 statement
it is excluded from the ordering and will be ignored during
sorting.
.IP "3." 4
\f4substitute \f2string\f1 \f4with \f2repl\f1
.IP
The \f4substitute\f1 statement substitutes
the string \f2string\f1 with the string \f2repl\f1.
This can be used, for example, 
to provide rules to sort the
abbreviated month names numerically:
.bp
.RS
.IP
.nf
\f4substitute "Jan" with "01"
substitute "Feb" with "02"
	.
	.
	.
substitute "Dec" with "12"
.ft 1
.PP
.fi
.RE
.IP
A simpler use of the \f4substitute\f1 statement that was
mentioned above was to substitute a single character
with two characters, as with the substitution
of \f4\(*b\f1 with \f4ss\f1 in German.
.PP
The \f4substitute\f1 statement is optional.
The \f4order is\f1 and \f4codeset\f1 statements must
appear in the specification file.
.PP
Any lines in the specification file with a \f4#\f1 in the
first column are treated as comments and are ignored.
Empty lines are also ignored.
.SH EXAMPLE
The following example shows the collation specification 
required to support a hypothetical telephone book sorting
sequence. 
.PP
The sorting sequence is defined by the following rules:
.IP "a."
Upper and lower case letters must be sorted together, but upper
case letters have precedence over lower case letters.
.IP "b."
All special characters and punctuation should be ignored.
.IP "c."
Digits must be sorted as their alphabetic counterparts (e.g.,
\f40\f1 as \f4zero\f1, \f41\f1 as \f4one\f1).
.IP "d."
The \f4Ch\f1, \f4ch\f1, \f4CH\f1 combinations must be collated between
\f4C\f1 and \f4D\f1.
.IP "e."
\f4V\f1 and \f4W\f1, \f4v\f1 and \f4w\f1 must be collated together.
.PP
The input specification file to \f4colltbl\f1 will contain:
.IP
.nf
\f4     codeset	telephone

     order is	A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\\
			G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\\
			Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z

     substitute "0" with "zero"
     substitute "1" with "one"
     substitute "2" with "two"
     substitute "3" with "three"
     substitute "4" with "four"
     substitute "5" with "five"
     substitute "6" with "six"
     substitute "7" with "seven"
     substitute "8" with "eight"
     substitute "9" with "nine"
.fi
.ft 1
.SH FILES
.TP 16
\f4/lib/locale/\f2locale\f4/LC_COLLATE\f1
\f4LC_COLLATE\f1 database for \f2locale\f1
.TP
\f4/usr/lib/locale/C/colltbl_C\fP
input file used to construct
\f4LC_COLLATE\fP
in the default locale.
.SH "SEE ALSO"
\f4memory\fP(3C),
\f4setlocale\fP(3C),
\f4strcoll\fP(3C),
\f4string\fP(3C),
\f4strxfrm\fP(3C),
\f4environ\fP(5)
in the \f2Programmer's Reference Manual\f1.
.\"	@(#)colltbl.1
.Ee
