'\"macro stdmacro
.if n .pH g1a.chrtbl @(#)chrtbl	40.16 of 1/17/90
.\" Copyright 1989 AT&T
.nr X
.if \nX=0 .ds x} chrtbl 1M "System Administration Utilities" "\&"
.if \nX=1 .ds x} chrtbl 1M "System Administration Utilities"
.if \nX=2 .ds x} chrtbl 1M "" "\&"
.if \nX=3 .ds x} chrtbl "" "" "\&"
.TH \*(x}
.SH NAME
\f4chrtbl\f1 \- generate character classification and conversion tables
.SH SYNOPSIS
\f4chrtbl\f1
[\f2file\f1]
.SH DESCRIPTION
The 
\f4chrtbl\fP
command creates two
tables containing information on character classification, upper/lower-case
conversion, character-set width, and numeric formatting.
One table is an array of
(257\(**2) + 7
bytes that is encoded so a table lookup
can be used to determine the character classification of a character,
convert a character [see \f4ctype\fP(3C)], and find the byte
and screen width of a character in one of the supplementary code sets.
The other table contains information about the format of
non-monetary numeric quantities: the first
byte specifies the decimal delimiter; the second byte specifies
the thousands delimiter; and the remaining bytes comprise a null terminated
string indicating the grouping (each element of the string is taken
as an integer that indicates the number of digits that comprise the
current group in a formatted non-monetary numeric quantity).
.PP
\f4chrtbl\fP
reads the user-defined character
classification and conversion information from \f2file\fP
and creates three output files
in the current directory.
To construct
\f2file\f1,
use the file supplied in
\f4/usr/lib/locale/C/chrtbl_C\fP
as a starting point.
You may add entries, but do not change the original values supplied with the system.
For example, for other locales you may wish to add eight-bit entries to the
.SM ASCII
definitions provided in this file.
.P
One output file,
\f4ctype.c\f1
(a C-language source file),
contains a (257*2)+7-byte array generated from processing
the information from
.IR file .
You should review
the content of
\f4ctype.c\f1
to verify that the array is set
up as you had planned.
(In addition, an application program could use
\f4ctype.c\f1.)
The first 257 bytes of the array in
\f4ctype.c\f1
are used for character classification.
The characters used for initializing these bytes of
the array represent character classifications that are defined in
\f4/usr/include/ctype.h\f1;
for example, 
\f4_L\f1
means a character is lower case and
\f4_S\(bv\|_B\f1
means the character is both a spacing character and a blank.
The second 257 bytes of the array are used for character conversion.
These bytes of the array are initialized
so that characters for which you do not provide conversion information
will be converted to themselves.
When you do provide conversion information,
the first value of the pair is stored where the second one would be stored normally,
and vice versa;
for example, if you provide
\f4<0x41 0x61>\f1,
then
\f40x61\f1
is stored where
\f40x41\f1
would be stored normally, and
\f40x61\f1
is stored where
\f40x41\f1
would be stored normally.
The last 7 bytes are used for character width information
for up to three supplementary code sets. 
.PP
The second output file (a data file)
contains the same information, but is structured for
efficient use by the character classification
and conversion routines (see
\f4ctype\fP(3C)).
The name of this output file is the value you assign to the keyword
\f4LC_CTYPE\f1
read in from
.IR file .
Before this file can be used by the character classification
and conversion routines,
it must be installed in the
\f4/usr/lib/locale\f2/locale\f1
directory with the name
\f4LC_CTYPE\f1
by someone who is super-user
or a member of group
\f4bin\f1.
This file must be readable by user,
group, and other; no other permissions should be set.
To use the character classification\p
.bp
and conversion tables in this file,
set the
\f4LC_CTYPE\f1
environment variable appropriately
(see
\f4environ\fP(5)
or
\f4setlocale\fP(3C)).
.PP
The third output file (a data file)
is created only if numeric formatting information is specified
in the input file.
The name of this output file is the value you assign to the keyword
\f4LC_NUMERIC\f1
read in from
.IR file .
Before this file can be used,
it must be installed in the
\f4/usr/lib/locale/\f2locale\f1
directory with the name
\f4LC_NUMERIC\f1
by someone who is super-user
or a member of group
\f4bin\f1.
This file must be readable by user,
group, and other; no other permissions should be set.
To use the numeric formatting information in this file,
set the
\f4LC_NUMERIC\f1
environment variable appropriately
(see
\f4environ\fP(5)
or
\f4setlocale\fP(3C)).
.PP
The name of the locale where you install the files
\f4LC_CTYPE\fP
and
\f4LC_NUMERIC\fP
should correspond to the conventions defined in
.IR file .
For example,
if French conventions were defined,
and the name for the French locale on your system is
\f4french\fP,
then you should install the files in
\f4/usr/lib/locale/french\fP.
.PP
If no input file is given, or if the argument "\-"
is encountered,
\f4chrtbl\fP
reads from standard input.
.PP
The syntax of \f2file\fP allows the
user to define the names of the data files created by
\f4chrtbl\fP,
the assignment of characters to
character classifications, the
relationship between upper and lower-case letters,
byte and screen widths for up to three supplementary code sets,
and three items of numeric formatting information: the decimal delimiter,
the thousands delimiter and the grouping.
The keywords recognized by \f4chrtbl\fP are:
.P
.TP 17
\f4LC_CTYPE\^\f1
name of the data file created by \f4chrtbl\fP
to contain character classification, conversion, and width information
.TP 17
\f4isupper\^\f1
character codes to be classified as upper-case letters
.TP 17
\f4islower\^\f1
character codes to be classified as lower-case letters
.TP 17
\f4isdigit\^\f1
character codes to be classified as numeric
.TP 17
\f4isspace\^\f1
character codes to be classified as spacing (delimiter) characters
.TP 17
\f4ispunct\^\f1
character codes to be classified as punctuation characters
.TP 17
\f4iscntrl\^\f1
character codes to be classified as control characters
.TP 17
\f4isblank\^\f1
character code for the blank (space) character
.TP 17
\f4isxdigit\^\f1
character codes to be classified as hexadecimal digits
.TP 17
\f4ul\^\f1
relationship between upper- and lower-case characters
.TP 17
\f4cswidth\^\f1
byte and screen width information (by default, each is one character wide)
.TP 17
\f4LC_NUMERIC\^\f1
name of the data file created by \f4chrtbl\fP
to contain numeric formatting information
.TP 17
\f4decimal_point\^\f1
decimal delimiter
.TP 17
\f4thousands_sep\^\f1
thousands delimiter
.TP 17
\f4grouping\^\f1
string in which each element is taken
as an integer that indicates the number of digits that comprise the
current group in a formatted non-monetary numeric quantity.
.PP
Any lines with the number sign (\f4#\fP)
in the first column are treated as comments and are
ignored.
Blank lines are also ignored.
.PP
Characters for \f4isupper\fP,
\f4islower\fP, \f4isdigit\fP, \f4isspace\fP, \f4ispunct\fP,
\f4iscntrl\fP, \f4isblank\fP, \f4isxdigit\fP, and \f4ul\fP
can be represented as a hexadecimal
or octal constant (for example, the letter \f4a\fP can
be represented as \f40x61\fP in hexadecimal or \f40141\fP in octal).
Hexadecimal and octal constants may be
separated by one or more space and/or tab characters.
.PP
The dash character (\f4\-\fP) may be used to
indicate a range of consecutive numbers. 
Zero or more space characters may be used for
separating the dash character from the numbers.
.PP
The backslash character (\f4\e\fP) is used
for line continuation.
Only a carriage return is permitted
after the backslash character.
.PP
The relationship between upper- and lower-case
letters
\f1(\f4ul\f1)
is expressed as ordered pairs of octal or hexadecimal constants:
<\f2upper-case_character lower-case_character\fP>.
These two constants may be separated by
one or more space characters.
Zero or more space characters may be used 
for separating the angle brackets (<\0>) from 
the numbers.
.PP
The following is the format of an input specification for
\f4cswidth\f1:
.nf
\f2n1:s1,n2:s2,n3:s3\f1
where,
.RS 5
\f2n1	\f1byte width for supplementary code set 1, required
\f2s1	\f1screen width for supplementary code set 1
\f2n2	\f1byte width for supplementary code set 2
\f2s2	\f1screen width for supplementary code set 2
\f2n3	\f1byte width for supplementary code set 3
\f2s3	\f1screen width for supplementary code set 3
.RE
.fi
.PP
\f4decimal_point\f1 and \f4thousands_sep\f1 are specified by a single
character that gives the delimiter. 
\f4grouping\f1 is specified by
a quoted string in which each member may be in octal or hex
representation.  
For example, \f4\\3\f1 or \f4\\x3\f1 could be used
to set the value of a member of the string to 3.
.ne 8
.SH EXAMPLE
The following is an example of an input 
file used to create the \s-1USA-ENGLISH\s0
code set definition table in a file named
\f4usa\f1 and the non-monetary numeric formatting information
in a file name \f4num-usa\f1.
.RS
.PD 0
.ft CW
.cs CW 18
.nf
LC_CTYPE  usa
isupper   0x41 - 0x5a
islower   0x61 - 0x7a
isdigit   0x30 - 0x39
isspace   0x20 0x9 - 0xd
ispunct   0x21 - 0x2f	0x3a - 0x40	\\
          0x5b - 0x60	0x7b - 0x7e
iscntrl   0x0 - 0x1f	0x7f
isblank   0x20
isxdigit  0x30 - 0x39	0x61 - 0x66	\\
          0x41 - 0x46
ul       <0x41 0x61> <0x42 0x62> <0x43 0x63>  \e
         <0x44 0x64> <0x45 0x65> <0x46 0x66>  \e
         <0x47 0x67> <0x48 0x68> <0x49 0x69>  \e
         <0x4a 0x6a> <0x4b 0x6b> <0x4c 0x6c>  \e
         <0x4d 0x6d> <0x4e 0x6e> <0x4f 0x6f>  \e
         <0x50 0x70> <0x51 0x71> <0x52 0x72>  \e
         <0x53 0x73> <0x54 0x74> <0x55 0x75>  \e
         <0x56 0x76> <0x57 0x77> <0x58 0x78>  \e
         <0x59 0x79> <0x5a 0x7a>
cswidth		1:1,0:0,0:0
LC_NUMERIC	num_usa
decimal_point		.
thousands_sep		,
grouping			"\\3"
.cs CW
.ft1
.fi
.RE
.PD
.SH FILES
.PD 0
.TP 16
\f4/usr/lib/locale/\f2locale\fP/LC_CTYPE\f1
data files containing character classification, conversion,
and character-set width information created by
\f4chrtbl\fP
.TP
\f4/usr/lib/locale/\f2locale\fP/LC_NUMERIC\f1
data files containing numeric formatting information created by
\f4chrtbl\fP
.TP
\f4/usr/include/ctype.h\f1
header file containing information used by character classification and conversion routines
.TP
\f4/usr/lib/locale/C/chrtbl_C\fP
input file used to construct
\f4LC_CTYPE\fP
and
\f4LC_NUMERIC\fP
in the default locale.
.PD
.SH "SEE ALSO"
\f4environ\fP(5).
.br
\f4ctype\fP(3C), \f4setlocale\fP(3C)
in the
.IR "Programmer's Reference Manual" .
.SH DIAGNOSTICS
The error messages produced by \f4chrtbl\fP
are intended to be self-explanatory. 
They indicate errors in the command line
or syntactic errors encountered
within the input file.
.SH "WARNING"
Changing the files in
\f4/usr/lib/locale/C\fP
will cause the system to behave unpredictably.
.Ee


