Tuesday 30 September 2003
Mojikyo fixes a bug (in me)
Reading the opening chapters of Piers Brendon’s The
Dark Valley: A Panorama
of the 1930s a couple of weeks ago prompted me to start work on an entry about George W.
Bush’s aircraft carrier stunt and why it so greatly vexed me. But the writing
hasn’t gone smoothly and, in dire need of distraction, I hit the jackpot: xiaolongnu, a Chinese language specialist who regularly comments
at Languagehat (and occasionally here) had alerted Languagehat to the existence of the Mojikyō Institute, a Japanese organization that produces the Konjaku-Mojikyo, a dictionary of mainly Chinese Characters, with a free font set of about 110,000
characters plus an input program.
The Konjaku-Mojikyo includes
about 20,000 Chinese characters defined by Unicode (ISO 10646), and about
50,000 Chinese characters collected in the Professor Morohashi’s 13-volume Daikanwajiten (Great Kanji Japanese Dictionary), “the most comprehensive and authoritative reference work on the subject of Chinese
characters”. The Mojikyo contains a wealth of other characters including Oracle
Bone inscriptions, Siddham (Sanskrit) characters, Japanese Kana , Chu Nom
(the original characters used in medieval Vietnam) , Shui Script (characters
used by that Chinese ethnic minority) , and Tangut (Xixia) Script.
The Mojikyō
Character Map, to which xiaolongnu originally referred, is a freeware application developed from the profits of
the
commercial Konjaku-Mojikyo software published on CD-ROM by the Kinokuniya Bookstore (the commercial version
allows more convenient searching and finding information about the characters).
Languagehat
set the bait, confident that several of his readers would
be interested “in all this great stuff”. As he later admitted, I was at the top of his list. Happily, I didn’t disappoint
him—as I explained in my comment on his post, I could put the Mojikyo Character
Map to immediate use:
Something that’s bugged me for ages is that Nagai Kafū’s Bokutō kidan (A
Strange Tale from East of the River) uses an obsolete kanji for the boku character. Amazon lists the book as “墨東綺譚” but the first character is a much simplified version of the original that appears
on the cover and title page of Kafū’s novel. Now it looks like I might be able
to find the correct boku character.
Late last night, needing a break from Bush’s aviation exploits, I convinced
myself that I should download the 34 files (totalling 52MB) needed for the installation.
We’ll call that the thin edge of the wedge. This morning I decided it wouldn’t
hurt to install the Mojikyo Character Map and quickly see
if I could find Kafū’s boku character.
Although extracting the 34 files was a little tedious, Jack
Wiedrick’s instructions made the actual installation a snap. I use Extensis Suitcase to manage my Japanese fonts so I simply activated the Mojikyo fonts with Suitcase and double-clicked on
the
Mojikyo
Character Map application. I was in business:

These days, on the rare occasion that someone asks me why I continue to
study Japanese, I answer: “So I can read Nagai Kafū’s A Strange Tale from East of the
River in the original Japanese, rather than a translation.”
Kafū’s Strange
Tale is, in the words of his English translator Edward Seidensticker, “in many ways scarcely a novel at all”. Its nominal subject is an aging writer (Oe Tadasu) who, while researching a
novel he is writing, wanders “east of the (Sumida) river” from Asakusa to the lower-class Tamanoi district.
Trapped by a sudden storm, he meets a prostitute, Oyuki, who invites herself
under his umbrella and then him into her house. Oe embarks on an affair
with Oyuki, spending the hot summer evenings with her in Tamanoi; when the cold
weather returns he ends the affair.
The Strange Tale contains
another story—the one Oe is struggling to write—about a retired teacher (Taneda
Jumpei) who elopes with Osumi, a bar-girl who was once his maidservant. Part of the novella’s appeal lies in the skill with which Kafū plays one story off against
the other—in Keiko I. McDonald’s words, “expand[ing] his ‘discourse time’ by telling two stories that interact and complement
each other”.
I also admire Strange
Tale because, as Seidensticker explains, “it belongs to the uniquely Japanese
genre to which [Kafū’s] Quiet Rain also belongs, the leisurely, discursive ‘essay-novel’, its forebears the discursive
essay and ‘poem story’ (utamonogatari) of the Heian
Period, and the linked verse of the Muromachi Period and after”.
Put
it down to my anal-retentive temperament, but it’s always irritated me that I
couldn’t write Bokutō kidan correctly in Japanese because the Microsoft Japanese IME doesn’t support the first (boku) character. Worse still, I have three kanji dictionaries— Halpern’s New Japanese-English Character Dictionary, Spahn & Hadamitzky’s The Kanji Dictionary,
and Haig & Nelson’s New Nelson Japanese-English Character Dictionary—and Kafū’s boku isn’t in any of them.
Why did Kafū use such an “obscure” character? Well, for one thing, such kanji were more commonly used in the first half of the 20th century, when Kafū was writing. Also because of his upbringing: his father and maternal-grandfather
were trained in the Chinese classics and Kafū himself entered the Chinese department of the School of Foreign Studies in 1897 though, as Seidensticker explains, “he scarcely went near the place and failed to graduate”. That particular character may have evoked a specific feeling or impression in his readers or he may even have used it because in Kafū’s time the use of uncommon Chinese characters in one’s writing was a sign of erudition (an attitude that persists amongst some contemporary Japanese).
Not being able to represent the character correctly is not just
a problem for me—as I explained before, Amazon
in Japan uses a simplified version in its listing for the book.
[The primary meanings of the four characters in the Amazon title are, in order, “india ink”, “east”, “figured cloth; beautiful”, and “talk”. In his entry for the boku character that Amazon uses, Halpern includes a Chinese variant (mò), which looks like Kafū’s character minus the three-stroke radical on the left.]
But I found the correct character with the Mojikyo Character Map on my
second attempt. Although Halpern’s dictionary uses a different method (SKIP, based on geometrical
patterns), most kanji dictionaries require you to identify the radical (the primitive by which it is indexed), count the total
number of strokes in the character (or the number of strokes less those in the
radical), and finally locate the particular character within a list of characters
with that radical and stroke count. It sounds more difficult than it actually
is. Unless the dictionary doesn’t contain the character you’re looking for.
Kafū’s boku character
has the three-stroke radical sanzui—氵 (#85) on the left—and a total of 18 strokes. My first match (Mojikyo 050021;
below, left) was close but, as I realized almost immediately, not quite correct.
And it only contains 17 strokes. Interestingly, this one
is kind of “half-way” between the correct character and the simplified version that Amazon uses and
that (not surprisingly) the Microsoft IME supports. I scanned through the grid
of characters until I reached the 18-stroke section and there it was (Mojikyo
079131; below, right). Success!

But my elation rapidly turned to disappointment when I realized that I
couldn’t represent the correct character (Mojikyo 079131) via Unicode.
The
Mojikyo Character Map provides a contextual menu that allows you to copy a character
in a number of formats for pasting into other applications. Copying the Unicode
tag for
Mojikyo 079131 produces the Unicode (Decimal) tag 濹, which is also the
Unicode tag for my first (incorrect) match, Mojikyo 050021. And 濹 yields 濹.
At first I thought that this might be because Kafū’s boku character (Mojikyo
079131) is included in Morohashi’s Daikanwajiten but is not part of the current Unicode standard. But Jack Wiedrick’s documentation
indicates that:
- Gold characters
are included in the JIS standard;
- Cyan characters
are included in the ISO10646 (=Unicode) specification; and
- White characters are not included in either standard.
And Kafū’s boku character—perhaps xiaolongnu can
suggest an alternative name—is rendered in cyan, which means it is part of the Unicode standard. So perhaps,
on my first use of the Mojikyo Character Map, I’ve discovered a bug. I’ve emailed
the Mojikyo Institute and am waiting on their reply. But at least, in finding
the character Kafū used, I’ve fixed what was bugging me.
Permalink | Technorati
Oops: Between my trackback tool and your xmlrpc server, that — (—) got lost ...
after reading this entry I tried to convert 濹 to Big5 and EUC-TW; both failed. maybe iconv-2.2.5 is broken, or unicode is the only viable option to represent older kanji chars.
as far as I know, 17-stroke version and 18-stroke version differ only in the shape; inherent meaning should be the same, and such chars are given single codepoint in unicode. so there still might be a unicode mapped font with 18-stroke version of boku.
Hi Jonathon,
The Unicode tag you list above (#28665) for boku displays the correct character on my system (Mac OS X).
After exploring the Japanese character map in OS X, I discovered that indeed the two variants of the character in question are indentified by the same unicode number, but I'm able to select which of the two I'd like to use.
I posted a reduced screenshot of this selection process here:
http://userwww.sfsu.edu/~brianh/boku.gif
I hope I'll not derail this into a Mac vs. PC discussion as that certainly is not my intention. I would say that the Mojikyō Character Map isn't at fault here; my guess would be that your unicode font may only contain a certain subset of unicode characters.
Jonathon, the character in question is pronounced mo in Chinese -- I don't know the tone offhand, but I'll look it up in my Hanyu Dacidian (Chinese equivalent of the OED) when I get home tonight, though experience with this kind of thing suggests to me that it will turn out to be fourth tone like the character for "ink." Actually, the HD will also give me a history of usage, which I'm kind of curious about: looking at the character, it's surprising to me that it can be translated as "the river" in general rather than as the name of a specific river (like the character 洛, which means "the Luo River" as in the famous poem 洛神賦 "The Nymph of the Luo River"). That said, the two characters for "river" in modern Chinese, 江 and 河, originally meant the Yangtze and the Yellow River respectively, and only later were generalized to mean rivers in general.
I'm sorry I can't help you with the Unicode question -- I don't really understand the system myself (hmm, time to go back to read the post on languagehat's site) but Brian's explanation sounds right to me -- I've had problems before (using Twinbridge!) with characters that existed in the encoding system but not in the font I was using.
gaemon and brian, it appears that your suggestions -- that the two variants indeed share the same unicode number -- are borne out by the reply I received from the Mojikyo Institute. I say "appears" since the reply, which is quite technical, is written in Japanese and, rather than spend a couple of hours translating it, I think I'll enlist the help of a Japanese friend.
brian, you don't have to apologize for "derailing this into a Mac vs. PC discussion" -- I really appreciate your taking the time to make and post the screenshot. Regular readers will be aware that I've been hovering on the brink of buying a Macintosh for ages. Seeing that character map sent me scampering off to the Apple site (the Windows character map is pathetic by comparison).
xiaolongnu, since none of my Japanese kanji dictionaries list the "mo" character, I wasn't able to check whether one of its meanings can be "river" -- but the "bokutou" in Kafu's title clearly means "east of the river". I'll be interested to see what you discover about it.
Lh, you're correct in pointing out that the sanzui radical (#85) is a form of the four stroke character for water. Interestingly, Halpern's kanji dictionary lists it under the four stroke radicals but Spahn & Hadamitzky and Haig & Nelson (both are traditional radical-based dictionaries) each list it with the three stroke radicals, as do brian's Apple and my Windows character pads.
Fung-Lin, that's just great! Did you look at some of the other doll photo pages. I particularly liked the figurines of Edogawa Rampo (江戸川乱歩), the "father of the Japanese detective story":
http://inat.cool.ne.jp/rampo/english/
Edogawa's real name was Hirai Taro but he based his writing name on Edgar Allen Poe (if you say "Edogawa Rampo" quickly, you'll see why).
Kimiaki-san also does photographic oil prints in the Pictorialist-style, which are quite reasonably priced. Unfortunately, he doesn't have any Kafu images for sale.
xiaolongnu, you were right!
The KUN pronunciation for 墨 is "sumi" -- as in 墨絵 (sumi-e, ink painting). So, just as the character 洛 means "the Luo River", 濹 (and the variant Kafu used) must have meant "the Sumida River". "East of the River" means the Tamanoi district, on the eastern bank of the Sumida, opposite Asakusa.
Nowadays the Sumida River (sumidagawa) is written 隅田川 but Sumida Ward (sumida ku) where Tamanoi is situated is written 墨田区.
I was so hung up on the bokutō (ON) pronunciation that I didn't consider the KUN pronunciation: sumi-higashi.
Hey, that's great -- even a little surprising I guess -- I usually can't count on my Chinese instincts being of any use in Japanese. Exhibit A: 手紙, which means "letter" in Japanese and "toilet paper" in Chinese. Go figure.
I did try to look up the character in my Hanyu Dacidian, but it's not in there. This isn't all that strange since there's a point at which the line between "standard" and "nonstandard" characters is blurred and out beyond a certain level of rarity it's really hard to tell if a character is variant, obsolete, or just plain wrong. So I've got other dictionaries that document the different variant forms that are known for particular characters in particular historical periods, and even so I sometimes find "new" variants in my research -- legitimate variant forms of familiar characters that nonetheless haven't been observed/described yet. It's sort of like finding a new species of insect, but less significant.
I do wonder if the character we've been talking about here mightn't be a Japanese variant character -- that is, a character created in Japan by the combination of 墨 and the water radical, specifically for writing the name of the Sumidagawa. It's a pretty logical thing to do by analogy with characters like 洛, and that would explain its not appearing in the Hanyu Dacidian.
Another note: Languagehat, the phenomenon by which radicals are listed by the number of strokes of their "source" character (cf. Mathews) is actually somewhat political -- as part of the simplification process on the mainland, they started coming out with dictionaries where the radical is listed by the number of strokes it has itself, which is easier for the language learner, but totally anathema to the anti-simplification camp. Now, there are some good arguments against the simplification of Chinese characters, but ironically, the people who did the simplifying consulted just the kinds of books of historically attested variant characters that I've described above. Thus, many of the simplified characters so reviled by traditionalists are actually legitimate variants from the sixth and seventh centuries.
"It's sort of like finding a new species of insect, but less significant."
Personally, I find variant characters far more significant than new species of insects (unless those insects play some role in a human communication/symbol system).
"...they started coming out with dictionaries where the radical is listed by the number of strokes it has itself, which is easier for the language learner, but totally anathema to the anti-simplification camp."
And anything that's anathema to the anti-simplification camp is anathema to me, by hickory! Seriously, it took me about five minutes to learn that certain common radicals "have" more strokes than they appear to; I strongly disapprove of redoing entire reference systems to make life a little easier for learners. (On the other hand, I think complete romanization of Chinese would be a good idea, because it would make life a *lot* easier, and greatly increase the literacy rate in China.)
This is an absolutely fascinating discussion; I love this sort of arcana!
"I strongly disapprove of redoing entire reference systems to make life a little easier for learners."
LH -- On general principle I would have to agree with you, but there are exceptions. One of my dictionaries, published in 1953, is organized phonetically, according to the ping-zhe system which is the basis of the rhyme (and tone-pattern) schemes of Tang poetry. That is to say, the dictionary is organized according to the pronunciation and tone of the characters *in the eighth century.* Needless to say, what with sound change and all that, plenty of words that were homonyms back then are homonyms no longer. Sheesh.
I'd be interested to see if someone could come up with an effective scheme for romanizing Chinese. Systems I've seen in the past (hanyu pinyin etc.) tend to get hung up on the huge number of homonyms. And I wonder if people would write in different dialects? I just asked my officemate, who's from Sichuan, for the pronunciation of the character 誦 (song) and he said it should be "shong" (which is how it comes out in his accent, but which totally doesn't exist in standard pinyin). On top of this, people would have to learn to write in a more vernacular style than the partly classicizing usage that is currently considered proper for written Chinese. Interesting thought.
I have more thoughts but I'm worried that this is getting way way off topic. If this goes on much longer I am going to have to get my own blog.
Hear hear! (to your last remark)
You've nailed the main problem with romanization: different dialects. But the fact is it's very hard to write anything but Mandarin using the current system (and of course the government has no interest in making it easier; they want everybody to speak Mandarin). Probably the only way something as drastic as romanization would happen is in the context of such a major political upheaval that China broke up into regional units, and the "dialects" would finally be revealed for the separate languages they are.
And people having to write in a more vernacular style would be a Good Thing. It was only after Greece (for example) got away to some extent from the Attic-worshiping katharevousa that great poetry could be written. Trying to write like someone who lived centuries before your time is fatal for genuine literature.
Jonathan,
I am going to take you back to Rampo.
Edogawa Rampo was a popular writer when I was growing up in Tokyo.
To ignore A from Allan and make kawa
out of Edgar and Ran (like ranbo) instead
of Sanpo is anarchistic and fun.
We must thank Charles Baudelaire, who championed Poe long before Poe was appreciated in his own country, Baudelaire made Poe world famous.
Definitely Sino-Japanese is tremendous fun for people who like to do things the hard way (a group which includes me 95% of the time, but NOT for example when I'm looking up a word which is classified under an entirely arbitrary radical).
The Chinese simplification was really just a way of marking territory; if the complexity of the traditional system was counted as 100, the simplified system is at about 95. And many common characters were made terribly ugly.
I do approve of the Ch'in regularization, though. The three pre-Han manuscripts of Lao Tzu recently discovered included dozens of unique, never-seen-before characters, of which only about 90% could be easily explained. Rather few lead to new insights into the text, either. (Caveat: my number and percentage here are ballpark at best).
This discussion is now closed. My thanks to everyone who contributed.
© Copyright 2007 Jonathon Delacour
Oops: Between my trackback tool and your xmlrpc server, that — (—) got lost ...
Posted by blogal villager on 30 September 2003 (Comment Permalink)