https://www.johndcook.com/blog/2022/09/25/katakana-hiragana-unicode/

John D. Cook
Skip to content

  * MATH
      + PROBABILITY
      + SIGNAL PROCESSING
      + NUMERICAL COMPUTING
      + SEE ALL ...
  * STATS
      + EXPERT TESTIMONY
      + FORECASTING
      + RNG TESTING
      + SEE ALL ...
  * PRIVACY
      + HIPAA
      + CRYPTOGRAPHY
      + DIFFERENTIAL PRIVACY
  * WRITING
      + BLOG
      + TWITTER
      + ARTICLES
      + TECH NOTES
      + SUBSCRIBE
      + NEWSLETTER
  * ABOUT
      + CLIENTS
      + ENDORSEMENTS
      + TEAM
      + SERVICES

(832) 422-8646
Contact

Katakana, Hiragana, and Unicode

Posted on 25 September 2022 by John

I figured out something that I wasn't able to find by searching, so
I'm posting it here in case other people have the same question and
the same difficulty finding an answer.

I'm sure other people have written about this, but I couldn't find
it. Maybe lots of people have written about this in Japanese but not
many in English.

Japanese kana consists of two syllabaries, hiragana and katakana,
that are like phonetic alphabets. Each has 46 basic characters, and
each corresponds to a block of 96 Unicode characters. I had two
simple questions:

 1. How do the 46 characters map into the 90 characters?
 2. Do they map the same way for both hiragana and katakana?

Hiragana / katakana correspondence

I'll start with the second question because it's easier. Hiragana and
katakana are different ways of representing the same sounds, and they
correspond one to one. For example, the full name of U+3047 (e) is

HIRAGANA LETTER SMALL E

and the full name of its katakana counterpart U+30A7 (e) is

KATAKANA LETTER SMALL E

The only difference as far as Unicode goes is that katakana has three
code points whose hiragana counterpart is unused, but these are not
part of the basic letters.

The following Python code shows that the names of all the characters
are the same except for the name of the system.

    from unicodedata import name

    unused = [0, 151, 152] # not in hiragana

    for i in range(0,63):
        if i in unused:
            continue
        h = name(chr(0x3040 + i))
        k = name(chr(0x30a0 + i))
        assert(h == k.replace("KATAKANA", "HIRAGANA"))
    print("done")

Mapping 46 into 50 and 96

You'll see kana written in grid with one side labeled with 5 vowels
and the other labeled with 10 consonants called a gojuon (Wu Shi Yin ).
That's 50 cells, and in fact gojuon literally means 50 sounds, so how
do we get 46? Five cells are empty, and one letter doesn't fit into
the grid. The empty cells are unused or archaic, and the extra
character doesn't fit the grid structure.

In the image below, the table on the left is for hiragana and the
table on the right is for katakana. HTML versions of the tables
available here.

[gojuon]

Left out of each table is n in hiragana and n in katakana.

So does each set of 46 characters map into its Unicode code block?

Unicode numbers the letters consecutively if you traverse the grid
increasing vowels first, then consonants, and adding the straggler at
the end. But the reason 46 letters expand into more code points is
that each letter can have one, two, or three variations. And there
are various miscellaneous other symbols in the Unicode block.

For example, there is a LETTER E as well as the SMALL LETTER E
mentioned above. Other variations seem to correspond to voiced and
unvoiced versions of a consonant with a phonetic marker added to the
voiced version. For example, ku is U+304F, HIRAGANA LETTER KU, and gu
is U+3050, HIRAGANA LETTER GU.

Here is how hiragana maps into Unicode. Each cell should be U+3000
plus the characters show.

         a  i  u  e  o
        42 44 46 48 4A
     k  4B 4D 4F 51 53
     s  55 57 59 5B 5D
     t  5F 61 64 66 68
     n  6A 6B 6C 6D 6E
     h  6F 72 75 78 7B
     m  7E 7F 80 81 82
     y  84    86    88
     r  89 8A 8B 8C 8D
     w  8F          92

The corresponding table for katakana is the previous table plus 0x60:

         a  i  u  e  o
        A2 A4 A6 A8 AA
     k  AB AD AF B1 B3
     s  B5 B7 B9 BB BD
     t  BF C1 C4 C6 C8
     n  CA CB CC CD CE
     h  CF D2 D5 D8 DB
     m  DE DF E0 E1 E2
     y  E4    E6    E8
     r  E9 EA EB EC ED
     w  EF          F2

In each case, the letter missing from the table is the next
consecutive value after the last in the table, i.e. n is U+30F3.

Related posts

  * Graphing Japanese prefectures
  * Alphabets in Unicode

Categories : Uncategorized
Tags : Unicode
Bookmark the permalink

Post navigation

Previous PostRoom squares and Tournaments
Next PostVisualizing correlations with graphs

Leave a Reply

Your email address will not be published. Required fields are marked 
*

          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
Comment * [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

Search for: [                    ] [Search]
John D. Cook

John D. Cook, PhD, President

My colleagues and I have decades of consulting experience helping
companies solve complex problems involving data privacy, math,
statistics, and computing.

Let's talk. We look forward to exploring the opportunity to help your
company too.

[                                        ]
[                                        ]
[                                        ]
[                                        ]
[                                        ]
[Send]

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

John D. Cook

(c) All rights reserved.

Search for: [                    ] [Search]
(832) 422-8646

EMAIL