Newsgroups: comp.misc
Path: utzoo!utgpu!news-server.csri.toronto.edu!torsqnt!lethe!tvcent!comspec!scocan!larryp
From: larryp@sco.COM (Larry Philps)
Subject: Re: International Character Sets
Organization: SCO Canada, Inc.
Date: Thu, 02 May 1991 12:51:58 GMT
Message-ID: <1991May02.125158.20032@sco.COM>
References: <1991May1.131212.8983@cbnewsl.att.com>
Keywords: standards multibyte
Sender: news@sco.COM (News administration)

In <1991May1.131212.8983@cbnewsl.att.com> jssk@cbnewsl.att.com (jeffrey.s.skelton) writes:


> Could somebody please give me pointers to standards on international
> character sets?

Here are the ones I know of,

1) ASCII	- Nuf said

2) EBCDIC	- More than enough said.

3) IBM pc850	- The standard PC character set.  Very similar to ISO 8859/1

4) HP Roman8	- HP's equivalent of the above.  Also very similar to
		  ISO 8859/1.

5) ISO8859	- This is a set of 9 8-bit codesets that can handle most
		  alphabetic languages.  These are published final standards.

6) EUC		- This is the Extended Unix Codeset.  Characters can be
		  1, 2, 3 or 4 bytes in length, and can be intermixed.
		  This is actually resonably popular, and is the base for
		  AT&T's MNLS product.  I have misplaced my reference, but
		  I think it is ISO Standard 10664.

7) SJIS		- JIS is a Japanese Information Standard, and SJIS is called
		  Shift-JIS for some reason I have never figured out.
		  It uses 16-bit characters to encode Kanji, but also allows
		  single byte ASCII characters.

8) ISO 10646	- This is a proposed ISO standard for a 32 bit character
		  set.  In this character set, each "character" has a
		  prefix that specifies which "code set" the rest of the
		  character is an index into.  Clear?  For example one
		  prefix would indicate ISO 8859/1, then the rest of the
		  bits would be an index into that character set.

9) Unicode	- This is being developed by a consortium of companies
		  including IBM, Microsoft, Sun, and Next.  It is a 16-bit
		  character set, that tries handle all the characters
		  for many languages by mapping identical shapes to the
		  same position in Unicode, regardless of what the characters
		  name is in different languages.  In particular, the
		  Chinese, Korean and Japanese symbols have been distilled
		  down to about 18,000 unique characters (I think).  I
		  don't have a good reference for this one.

Have fun.  It's a brutal world out there.

---
Larry Philps,	 SCO Canada, Inc (Formerly: HCR Corporation)
Postman:  130 Bloor St. West, 10th floor, Toronto, Ontario.  M5S 1N5
InterNet: larryp@sco.COM  or larryp%scocan@uunet.uu.net
UUCP:	 {uunet,utcsri,sco}!scocan!larryp
Phone:	 (416) 922-1937
Fax:	 (416) 922-8397
