Tuesday 30 September 2003

Mojikyo fixes a bug (in me)

Reading the opening chapters of Piers Brendon’s The Dark Valley: A Panorama of the 1930s a couple of weeks ago prompted me to start work on an entry about George W. Bush’s aircraft carrier stunt and why it so greatly vexed me. But the writing hasn’t gone smoothly and, in dire need of distraction, I hit the jackpot: xiaolongnu, a Chinese language specialist who regularly comments at Languagehat (and occasionally here) had alerted Languagehat to the existence of the Mojikyō Institute, a Japanese organization that produces the Konjaku-Mojikyo, a dictionary of mainly Chinese Characters, with a free font set of about 110,000 characters plus an input program.

The Konjaku-Mojikyo includes about 20,000 Chinese characters defined by Unicode (ISO 10646), and about 50,000 Chinese characters collected in the Professor Morohashi’s 13-volume Daikanwajiten (Great Kanji Japanese Dictionary), “the most comprehensive and authoritative reference work on the subject of Chinese characters”. The Mojikyo contains a wealth of other characters including Oracle Bone inscriptions, Siddham (Sanskrit) characters, Japanese Kana , Chu Nom (the original characters used in medieval Vietnam) , Shui Script (characters used by that Chinese ethnic minority) , and Tangut (Xixia) Script.

The Mojikyō Character Map, to which xiaolongnu originally referred, is a freeware application developed from the profits of the commercial Konjaku-Mojikyo software published on CD-ROM by the Kinokuniya Bookstore (the commercial version allows more convenient searching and finding information about the characters).

Languagehat set the bait, confident that several of his readers would be interested “in all this great stuff”. As he later admitted, I was at the top of his list. Happily, I didn’t disappoint him—as I explained in my comment on his post, I could put the Mojikyo Character Map to immediate use:

Something that’s bugged me for ages is that Nagai Kafū’s Bokutō kidan (A Strange Tale from East of the River) uses an obsolete kanji for the boku character. Amazon lists the book as “墨東綺譚” but the first character is a much simplified version of the original that appears on the cover and title page of Kafū’s novel. Now it looks like I might be able to find the correct boku character.

Late last night, needing a break from Bush’s aviation exploits, I convinced myself that I should download the 34 files (totalling 52MB) needed for the installation. We’ll call that the thin edge of the wedge. This morning I decided it wouldn’t hurt to install the Mojikyo Character Map and quickly see if I could find Kafū’s boku character.

Although extracting the 34 files was a little tedious, Jack Wiedrick’s instructions made the actual installation a snap. I use Extensis Suitcase to manage my Japanese fonts so I simply activated the Mojikyo fonts with Suitcase and double-clicked on the Mojikyo Character Map application. I was in business:

Mojikyo Character Map application

These days, on the rare occasion that someone asks me why I continue to study Japanese, I answer: “So I can read Nagai Kafū’s A Strange Tale from East of the River in the original Japanese, rather than a translation.”

Cover of Nagai Kafū's Bokuto Kidan (A Strange Tale from East of the River)Kafū’s Strange Tale is, in the words of his English translator Edward Seidensticker, “in many ways scarcely a novel at all”. Its nominal subject is an aging writer (Oe Tadasu) who, while researching a novel he is writing, wanders “east of the (Sumida) river” from Asakusa to the lower-class Tamanoi district.

Trapped by a sudden storm, he meets a prostitute, Oyuki, who invites herself under his umbrella and then him into her house. Oe embarks on an affair with Oyuki, spending the hot summer evenings with her in Tamanoi; when the cold weather returns he ends the affair.

The Strange Tale contains another story—the one Oe is struggling to write—about a retired teacher (Taneda Jumpei) who elopes with Osumi, a bar-girl who was once his maidservant. Part of the novella’s appeal lies in the skill with which Kafū plays one story off against the other—in Keiko I. McDonald’s words, “expand[ing] his ‘discourse time’ by telling two stories that interact and complement each other”.

I also admire Strange Tale because, as Seidensticker explains, “it belongs to the uniquely Japanese genre to which [Kafū’s] Quiet Rain also belongs, the leisurely, discursive ‘essay-novel’, its forebears the discursive essay and ‘poem story’ (utamonogatari) of the Heian Period, and the linked verse of the Muromachi Period and after”.

Japanese characters for Bokuto kidan (correct kanji  for boku)Put it down to my anal-retentive temperament, but it’s always irritated me that I couldn’t write Bokutō kidan correctly in Japanese because the Microsoft Japanese IME doesn’t support the first (boku) character. Worse still, I have three kanji dictionaries— Halpern’s New Japanese-English Character Dictionary, Spahn & Hadamitzky’s The Kanji Dictionary, and Haig & Nelson’s New Nelson Japanese-English Character Dictionary—and Kafū’s boku isn’t in any of them.

Why did Kafū use such an “obscure” character? Well, for one thing, such kanji were more commonly used in the first half of the 20th century, when Kafū was writing. Also because of his upbringing: his father and maternal-grandfather were trained in the Chinese classics and Kafū himself entered the Chinese department of the School of Foreign Studies in 1897 though, as Seidensticker explains, “he scarcely went near the place and failed to graduate”. That particular character may have evoked a specific feeling or impression in his readers or he may even have used it because in Kafū’s time the use of uncommon Chinese characters in one’s writing was a sign of erudition (an attitude that persists amongst some contemporary Japanese).

Japanese characters for Bokuto kidan (simplified kanji  for boku)Not being able to represent the character correctly is not just a problem for me—as I explained before, Amazon in Japan uses a simplified version in its listing for the book.

[The primary meanings of the four characters in the Amazon title are, in order, “india ink”, “east”, “figured cloth; beautiful”, and “talk”. In his entry for the boku character that Amazon uses, Halpern includes a Chinese variant (mò), which looks like Kafū’s character minus the three-stroke radical on the left.]

But I found the correct character with the Mojikyo Character Map on my second attempt. Although Halpern’s dictionary uses a different method (SKIP, based on geometrical patterns), most kanji dictionaries require you to identify the radical (the primitive by which it is indexed), count the total number of strokes in the character (or the number of strokes less those in the radical), and finally locate the particular character within a list of characters with that radical and stroke count. It sounds more difficult than it actually is. Unless the dictionary doesn’t contain the character you’re looking for.

Kafū’s boku character has the three-stroke radical sanzui (#85) on the left—and a total of 18 strokes. My first match (Mojikyo 050021; below, left) was close but, as I realized almost immediately, not quite correct. And it only contains 17 strokes. Interestingly, this one is kind of “half-way” between the correct character and the simplified version that Amazon uses and that (not surprisingly) the Microsoft IME supports. I scanned through the grid of characters until I reached the 18-stroke section and there it was (Mojikyo 079131; below, right). Success!

Mojikyo characters 050021 (left) and 079131 (right)

But my elation rapidly turned to disappointment when I realized that I couldn’t represent the correct character (Mojikyo 079131) via Unicode.

Mojikyo contextual Copy menuThe Mojikyo Character Map provides a contextual menu that allows you to copy a character in a number of formats for pasting into other applications. Copying the Unicode tag for Mojikyo 079131 produces the Unicode (Decimal) tag 濹, which is also the Unicode tag for my first (incorrect) match, Mojikyo 050021. And 濹 yields .

At first I thought that this might be because Kafū’s boku character (Mojikyo 079131) is included in Morohashi’s Daikanwajiten but is not part of the current Unicode standard. But Jack Wiedrick’s documentation indicates that:

  • Gold characters are included in the JIS standard;
  • Cyan characters are included in the ISO10646 (=Unicode) specification; and
  • White characters are not included in either standard.

And Kafū’s boku character—perhaps xiaolongnu can suggest an alternative name—is rendered in cyan, which means it is part of the Unicode standard. So perhaps, on my first use of the Mojikyo Character Map, I’ve discovered a bug. I’ve emailed the Mojikyo Institute and am waiting on their reply. But at least, in finding the character Kafū used, I’ve fixed what was bugging me.

Permalink | Technorati


Oops: Between my trackback tool and your xmlrpc server, that — (—) got lost ...

Posted by blogal villager on 30 September 2003 (Comment Permalink)

after reading this entry I tried to convert 濹 to Big5 and EUC-TW; both failed. maybe iconv-2.2.5 is broken, or unicode is the only viable option to represent older kanji chars.

as far as I know, 17-stroke version and 18-stroke version differ only in the shape; inherent meaning should be the same, and such chars are given single codepoint in unicode. so there still might be a unicode mapped font with 18-stroke version of boku.

Posted by gaemon on 1 October 2003 (Comment Permalink)

Hi Jonathon,
The Unicode tag you list above (#28665) for boku displays the correct character on my system (Mac OS X).

After exploring the Japanese character map in OS X, I discovered that indeed the two variants of the character in question are indentified by the same unicode number, but I'm able to select which of the two I'd like to use.

I posted a reduced screenshot of this selection process here:

I hope I'll not derail this into a Mac vs. PC discussion as that certainly is not my intention. I would say that the Mojikyō Character Map isn't at fault here; my guess would be that your unicode font may only contain a certain subset of unicode characters.

Posted by Brian on 1 October 2003 (Comment Permalink)

Jonathon, the character in question is pronounced mo in Chinese -- I don't know the tone offhand, but I'll look it up in my Hanyu Dacidian (Chinese equivalent of the OED) when I get home tonight, though experience with this kind of thing suggests to me that it will turn out to be fourth tone like the character for "ink." Actually, the HD will also give me a history of usage, which I'm kind of curious about: looking at the character, it's surprising to me that it can be translated as "the river" in general rather than as the name of a specific river (like the character 洛, which means "the Luo River" as in the famous poem 洛神賦 "The Nymph of the Luo River"). That said, the two characters for "river" in modern Chinese, 江 and 河, originally meant the Yangtze and the Yellow River respectively, and only later were generalized to mean rivers in general.

I'm sorry I can't help you with the Unicode question -- I don't really understand the system myself (hmm, time to go back to read the post on languagehat's site) but Brian's explanation sounds right to me -- I've had problems before (using Twinbridge!) with characters that existed in the encoding system but not in the font I was using.

Posted by xiaolongnu on 1 October 2003 (Comment Permalink)

"Kafū’s boku character has the three-stroke radical sanzui—氵 (#85) on the left"

For those who, like me, use a traditional radical-based Chinese dictionary (Mathews in my case), I hasten to point out that radical #85 is listed under the four-stroke radicals, being a scribbly form of the character for 'water':

The character is not actually in Mathews, needless to say.

Posted by language hat on 1 October 2003 (Comment Permalink)

gaemon and brian, it appears that your suggestions -- that the two variants indeed share the same unicode number -- are borne out by the reply I received from the Mojikyo Institute. I say "appears" since the reply, which is quite technical, is written in Japanese and, rather than spend a couple of hours translating it, I think I'll enlist the help of a Japanese friend.

brian, you don't have to apologize for "derailing this into a Mac vs. PC discussion" -- I really appreciate your taking the time to make and post the screenshot. Regular readers will be aware that I've been hovering on the brink of buying a Macintosh for ages. Seeing that character map sent me scampering off to the Apple site (the Windows character map is pathetic by comparison).

xiaolongnu, since none of my Japanese kanji dictionaries list the "mo" character, I wasn't able to check whether one of its meanings can be "river" -- but the "bokutou" in Kafu's title clearly means "east of the river". I'll be interested to see what you discover about it.

Lh, you're correct in pointing out that the sanzui radical (#85) is a form of the four stroke character for water. Interestingly, Halpern's kanji dictionary lists it under the four stroke radicals but Spahn & Hadamitzky and Haig & Nelson (both are traditional radical-based dictionaries) each list it with the three stroke radicals, as do brian's Apple and my Windows character pads.

Posted by Jonathon on 1 October 2003 (Comment Permalink)

Here is a strange Kafu-doll photo page I found,
This must be kimi to kidan or double life
of Veronique Kafu.

Posted by Fung-Lin Hall on 1 October 2003 (Comment Permalink)

Fung-Lin, that's just great! Did you look at some of the other doll photo pages. I particularly liked the figurines of Edogawa Rampo (江戸川乱歩), the "father of the Japanese detective story":


Edogawa's real name was Hirai Taro but he based his writing name on Edgar Allen Poe (if you say "Edogawa Rampo" quickly, you'll see why).

Kimiaki-san also does photographic oil prints in the Pictorialist-style, which are quite reasonably priced. Unfortunately, he doesn't have any Kafu images for sale.

Posted by Jonathon on 1 October 2003 (Comment Permalink)

xiaolongnu, you were right!

The KUN pronunciation for 墨 is "sumi" -- as in 墨絵 (sumi-e, ink painting). So, just as the character 洛 means "the Luo River", 濹 (and the variant Kafu used) must have meant "the Sumida River". "East of the River" means the Tamanoi district, on the eastern bank of the Sumida, opposite Asakusa.

Nowadays the Sumida River (sumidagawa) is written 隅田川 but Sumida Ward (sumida ku) where Tamanoi is situated is written 墨田区.

I was so hung up on the bokutō (ON) pronunciation that I didn't consider the KUN pronunciation: sumi-higashi.

Posted by Jonathon on 2 October 2003 (Comment Permalink)

Hey, that's great -- even a little surprising I guess -- I usually can't count on my Chinese instincts being of any use in Japanese. Exhibit A: 手紙, which means "letter" in Japanese and "toilet paper" in Chinese. Go figure.

I did try to look up the character in my Hanyu Dacidian, but it's not in there. This isn't all that strange since there's a point at which the line between "standard" and "nonstandard" characters is blurred and out beyond a certain level of rarity it's really hard to tell if a character is variant, obsolete, or just plain wrong. So I've got other dictionaries that document the different variant forms that are known for particular characters in particular historical periods, and even so I sometimes find "new" variants in my research -- legitimate variant forms of familiar characters that nonetheless haven't been observed/described yet. It's sort of like finding a new species of insect, but less significant.

I do wonder if the character we've been talking about here mightn't be a Japanese variant character -- that is, a character created in Japan by the combination of 墨 and the water radical, specifically for writing the name of the Sumidagawa. It's a pretty logical thing to do by analogy with characters like 洛, and that would explain its not appearing in the Hanyu Dacidian.

Another note: Languagehat, the phenomenon by which radicals are listed by the number of strokes of their "source" character (cf. Mathews) is actually somewhat political -- as part of the simplification process on the mainland, they started coming out with dictionaries where the radical is listed by the number of strokes it has itself, which is easier for the language learner, but totally anathema to the anti-simplification camp. Now, there are some good arguments against the simplification of Chinese characters, but ironically, the people who did the simplifying consulted just the kinds of books of historically attested variant characters that I've described above. Thus, many of the simplified characters so reviled by traditionalists are actually legitimate variants from the sixth and seventh centuries.

Posted by xiaolongnu on 2 October 2003 (Comment Permalink)

"It's sort of like finding a new species of insect, but less significant."

Personally, I find variant characters far more significant than new species of insects (unless those insects play some role in a human communication/symbol system).

"...they started coming out with dictionaries where the radical is listed by the number of strokes it has itself, which is easier for the language learner, but totally anathema to the anti-simplification camp."

And anything that's anathema to the anti-simplification camp is anathema to me, by hickory! Seriously, it took me about five minutes to learn that certain common radicals "have" more strokes than they appear to; I strongly disapprove of redoing entire reference systems to make life a little easier for learners. (On the other hand, I think complete romanization of Chinese would be a good idea, because it would make life a *lot* easier, and greatly increase the literacy rate in China.)

This is an absolutely fascinating discussion; I love this sort of arcana!

Posted by language hat on 2 October 2003 (Comment Permalink)

Good to see your phosphors, Jonathon. I agree with language hat, interesting discussion.

detailing the mundane at my new satellite blog:

Posted by Lisa on 2 October 2003 (Comment Permalink)

"I strongly disapprove of redoing entire reference systems to make life a little easier for learners."

LH -- On general principle I would have to agree with you, but there are exceptions. One of my dictionaries, published in 1953, is organized phonetically, according to the ping-zhe system which is the basis of the rhyme (and tone-pattern) schemes of Tang poetry. That is to say, the dictionary is organized according to the pronunciation and tone of the characters *in the eighth century.* Needless to say, what with sound change and all that, plenty of words that were homonyms back then are homonyms no longer. Sheesh.

I'd be interested to see if someone could come up with an effective scheme for romanizing Chinese. Systems I've seen in the past (hanyu pinyin etc.) tend to get hung up on the huge number of homonyms. And I wonder if people would write in different dialects? I just asked my officemate, who's from Sichuan, for the pronunciation of the character 誦 (song) and he said it should be "shong" (which is how it comes out in his accent, but which totally doesn't exist in standard pinyin). On top of this, people would have to learn to write in a more vernacular style than the partly classicizing usage that is currently considered proper for written Chinese. Interesting thought.

I have more thoughts but I'm worried that this is getting way way off topic. If this goes on much longer I am going to have to get my own blog.

Posted by xiaolongnu on 2 October 2003 (Comment Permalink)

Hear hear! (to your last remark)

You've nailed the main problem with romanization: different dialects. But the fact is it's very hard to write anything but Mandarin using the current system (and of course the government has no interest in making it easier; they want everybody to speak Mandarin). Probably the only way something as drastic as romanization would happen is in the context of such a major political upheaval that China broke up into regional units, and the "dialects" would finally be revealed for the separate languages they are.

And people having to write in a more vernacular style would be a Good Thing. It was only after Greece (for example) got away to some extent from the Attic-worshiping katharevousa that great poetry could be written. Trying to write like someone who lived centuries before your time is fatal for genuine literature.

Posted by language hat on 2 October 2003 (Comment Permalink)

I am going to take you back to Rampo.
Edogawa Rampo was a popular writer when I was growing up in Tokyo.
To ignore A from Allan and make kawa
out of Edgar and Ran (like ranbo) instead
of Sanpo is anarchistic and fun.
We must thank Charles Baudelaire, who championed Poe long before Poe was appreciated in his own country, Baudelaire made Poe world famous.

Posted by Fung Lin Hall on 2 October 2003 (Comment Permalink)

Definitely Sino-Japanese is tremendous fun for people who like to do things the hard way (a group which includes me 95% of the time, but NOT for example when I'm looking up a word which is classified under an entirely arbitrary radical).

The Chinese simplification was really just a way of marking territory; if the complexity of the traditional system was counted as 100, the simplified system is at about 95. And many common characters were made terribly ugly.

I do approve of the Ch'in regularization, though. The three pre-Han manuscripts of Lao Tzu recently discovered included dozens of unique, never-seen-before characters, of which only about 90% could be easily explained. Rather few lead to new insights into the text, either. (Caveat: my number and percentage here are ballpark at best).

Posted by Zizka on 6 October 2003 (Comment Permalink)

This discussion is now closed. My thanks to everyone who contributed.

© Copyright 2007 Jonathon Delacour