https://www.dampfkraft.com/ghost-characters.html
[dklogo]
A Spectre is Haunting Unicode
This post is part of a collection on Code and Computers.
In 1978 Japan's Ministry of Economy, Trade and Industry established
the encoding that would later be known as JIS X 0208, which still
serves as an important reference for all Japanese encodings. However,
after the JIS standard was released people noticed something strange
- several of the added characters had no obvious sources, and nobody
could tell what they meant or how they should be pronounced. Nobody
was sure where they came from. These are what came to be known as the
ghost characters (You Ling Wen Zi ).
[ghostcanva]Be careful what you write. via the NDL
For a long time the ghost characters remained an unexplained and
mostly forgotten curiosity, but in 1997 an investigation was launched
to discover where they had come from. While all characters in the JIS
standard were supposed to have a record of their sources, even when
it existed it wasn't very specific, typically just listing the
document it was sourced from.
You'd think that listing the source would make tracking down the
origins of the characters easy, but it's important to clarify what
counts as a "source" - one of the more common sources for the ghost
characters was the "Overview of National Administrative Districts"
(Guo Tu Xing Zheng Qu Hua Zong Lan ), a comprehensive list of place names in Japan. You
might, as I initially did, imagine this to be a kind of atlas, an
oversize book with at most a few hundred pages. It turns out the
latest edition is a seven volume set with each volume having roughly
nine hundred pages. Imagine tracking down a single character without
a page reference.
Despite the difficulty, the investigation into the ghost characters
was successful in discovering their origins - mostly. By interviewing
the catalogers involved in the creation of the standard, the
investigators established that some characters were inadvertently
invented as mistakes in the cataloging process. For example, Shi was
an error introduced while trying to record "Shan over Nu ". "Shan over Nu "
occurs in the name of a particular place and was thus suitable for
inclusion in the JIS standard, but because they couldn't print it as
one character yet, Shan and Nu were printed separately, cut out, and
pasted onto a sheet of paper, and then copied. When reading the copy,
the line where the two little pieces of paper met looked like a
stroke and was added to the character by mistake. The original
character () was not added to JIS or Unicode until much later and
doesn't display on most sites for me.
[yuureimoji]The core ghost characters: Shi Ku Hi Ken Ori Kamakiri Ne Run Shuu Sho Ten Sei
In the end only one character had neither a clear source nor any
historical precedent: Sei . The most likely explanation is that it was
created as a misreading of the Qiang character, but no specific incident
was uncovered.
Following the general adoption of the JIS standards these characters
all made their way into Unicode, which has its own separate set of
ghost characters introduced during CJK unification.
To sum up - in 1978 a series of small mistakes created some
characters out of nothing. The errors went undiscovered just long
enough to be set in stone, and now these ghosts are, at least in
potential, a part of every computer on the planet, lurking in the
dark corners of character tables.
At this rate they'll presumably be with humanity forever. Ps
References / related links:
* You Ling Wen Zi - Tong Xin Yong Yu noJi Chu Zhi Shi - the most thorough online source,
with citations from the 1997 investigation.
* Da Zheng Shi Er Nian noYou Ling Wen Zi - kotobamagazin:Zhao Ri Xin Wen dezitaru - an
example of Sei mistakenly used in a digitized Taisho newspaper due
to a faded printing of Qiang .
* Nico Nico Douga's Wiki treats each of them as the name of a
youkai.
* Tian Shu or A Book from the Sky, a hand-printed book by Xu Bing using
only made-up Chinese characters.
2018-07-29T14:03:09+09:00
The number 23Howdy! I'm Paul O'Leary McCann, and Dampfkraft is my
home on the Internet. My home off the Internet is near Tokyo Tower .
You can follow me on Twitter or mail me if you'd like to get in
touch. Did you know I'm writing a book about Japanese NLP?
k Kopyleft, All Rites Reversed. Do as you like.
#