Monday 28 January 2002

Chinese-English translations

Doc’s entry Raising the Red Flag reminded me of an experience I had in Beijing in 1980. Working with a Chinese photographer, Mr Hu, I was photographing Ming and Qing dynasty scrolls in the Imperial Museum (located in the Forbidden Palace where much of Bertolucci’s The Last Emperor was filmed). This was at the tail end of the Cultural Revolution and everyone was still wearing Mao suits—although the high level Party functionaries whom I occasionally met must have bought their Mao suits at Versace or Ermenegildo Zegna.

I was only able to communicate with Mr Hu through a translator, Miss Feng, who was an expert in technical translation. She spoke flawless English and, to assist her with any specialized words, she had a Chinese-English/English-Chinese technical dictionary.

Since the electrical voltage used to fluctuate wildly during the course of the day— modifying the color temperature of our tungsten photo lights—we were forced to take regular breaks while waiting for it to stabilize. A large analog voltmeter mounted on the wall enabled us to monitor the voltage. (One morning, when I asked Mr Hu why the voltage had held rock-steady at 220V for two hours, he told me that the power authority had given priority to the museum because the Yugoslav President was visiting.)

But this morning the voltage was down to 212V and we were taking another enforced break. I saw Miss Feng’s dictionary on a table, picked it up, and began to idly flick through the pages. As I’d expected there were sections on such subjects as marine biology, aeronautical engineering, orthopedic surgery, and of course photography. I amused myself for a while by looking up the Chinese characters for “aperture,” “shutter-speed,” and “developer.”

Then, turning to the front of the dictionary, I found a section of Chinese political phrases and their English equivalents. “Capitalist running dog.” “US hegemony.” “Imperialist lackey.” I’d always assumed that these terms had been coined by Western journalists as a way of sending up Chinese political speech. But here they were, in print. This was how the Chinese actually thought and spoke. Or some of them at least.

Suddenly I felt a presence behind me and turned to find Ms Feng. I was embarrassed to be caught reading her dictionary without having asked permission.

“This dictionary is very interesting, Miss Feng,” was all I could think to say as I handed it back.

“Yes, Mr Delacour,” she replied. “And some parts of it are more interesting than others.”

Permalink | Comments (1)

Wednesday 20 February 2002

Language skills

This morning I took the train to Penrith—50 kilometers (35 miles) from the center of Sydney, where I live—to do some work at FirmwareDesign. I always spend the 50 minute ride doing the same thing: reading a Japanese novel.

My Japanese is adequate. I can travel in Japan for weeks at a time without speaking English, always able to buy a ticket to the next destination, book a room at an inn, order drinks and food in the tiny yakitori-ya where I like to spend my evenings. Once the owner and customers recover from the shock of encountering a foreigner at close quarters, the evenings always turn out well. I have a conversational repertoire and, even if the topic strays from my areas of expertise, I can conduct quite a complex conversation with only 50% comprehension. I learned a long time ago that the only mistake is to let on that you don’t understand what’s going on. Bluff and you’ll eventually be able to make an appropriate remark.

Reading is a problem, however, so I try to practice reading in Japanese whenever I can. The train ride is ideal. My eventual goal is to read Nagai Kafu’s A Strange Tale from East of the River in Japanese but that’s way beyond my present skills. Instead I’m reading a trashy novel, a kind of Mills & Boon or Harlequin novel for men. Imagine something eight or nine rungs down the literary ladder from Jackie Collins or Sidney Sheldon. I chose this book partly for lascivious reasons but mainly because I knew I’d understand most of the grammar and vocabulary.

Still, it’s something of a challenge. Japanese is an oblique language, in which 60% of sentences don’t have a subject. You have to infer from the context who’s doing what to whom. On the third or fourth page I realized that Reimi, the heroine, who was driving back to Tokyo with her colleague Junko at the start of the story, hadn’t bought and delivered food to the software engineers who were working back late at the office. “Shit,” I remember thinking to myself, “they’re still in the Volvo on the expressway. She’s only wondering about whether she should buy food.”

This morning the train arrived at Penrith station, just as Wakura, the unscrupulous villain, had given Reimi a glass of wine laced with an hallucinogenic drug. I gathered up my book and dictionary and within less than a minute was outside, looking through the window of the first cab on the rank at a bucket of chips wedged between the driver’s and the passenger’s seat.

“Pardon my breakfast,” said the driver as I sat down beside him. “That’s OK,” I replied. I told him the destination and relaxed back in my seat. Then the strangest thing happened. He launched into a detailed explanation about something and I couldn’t understand a word he was saying. What language is he speaking, I asked myself. I listened closely but it seemed totally unrecognizable. We drove for two or three blocks, him chattering away and me nodding in reply. I didn’t have a clue what he was talking about; until I heard the word “vinegar” and I realized the conversation had something to do with his chips. “Oh,” I told him, “I love the taste of vinegar on chips.”

And he was off again. “…brown sauce…” I heard him say. “Do you mean like Worcestershire Sauce,” I asked him, “or thicker?” And so I clawed my way back into the conversation. He was a Scotsman. He’d lived in Australia for 40 years and hadn’t lost his brogue, which was as thick as the brown sauce he used to pour on the chips that he bought at the fish and chip shop halfway on his way home from the Edinburgh pub where he used to go every Friday night in winter.

The fog lifted and we had a great chat. About how Australia gradually took over his heart and became home to him and his wife and three kids; about his five grandchildren and how their flat Aussie accents occasionally betray their Scottish heritage. I told him how I loved hearing Asian children in the supermarket speaking with Australian accents. He told me about having the same experience with the children of Pakistani immigrants back in Scotland. “You try to match the voice to the face and it doesn’t fit,” he said.

I’m not sure why I found it so hard to understand him at the beginning. I might still have been in Tokyo, worrying about how Reimi could avoid Wakura’s clutches (and knowing that she wouldn’t). It didn’t matter. I discovered I could carry a conversation with only 5% comprehension and everything would turn out fine.

Permalink | Comments (0)

Monday 06 May 2002

Heian women’s writing

During the entire Heian period… Chinese remained the language of scholars, priests, and officials, occupying a role analogous to that of Latin in the West. Despite the steady emancipation from foreign tutelage, Chinese characters retained their overwhelming prestige and were the exclusive medium for any serious form of writing among men.

Ivan Morris, The World of the Shining Prince

Upper-class Heian women were actively discouraged from learning to read and write in Chinese, no doubt to ensure that they posed no threat to male political dominance (although, as Morris points out, not until a thousand years later, after the Pacific War, would the status of Japanese women improve beyond that of their Heian ancestors).

In her diary, Murasaki Shikibu recounts that she would listen as her younger brother was learning the Chinese classics and that she

became unusually proficient at understanding those passages that he found too difficult to grasp and memorize. Father, a most learned man, was always regretting the fact: “Just my luck!” he would say. “What a pity she was not born a man.” But then I gradually realized that people were saying “It’s bad enough when a man flaunts his Chinese learning; she will come to no good,” and since then I have avoided writing the simplest character.

Even so, after her husband’s death in 1001, she continued to read the Chinese books that he had left in a cupboard “crammed to bursting point,” thereby attracting the disapproval of her servants. “‘It’s because she goes on like this that she is so miserable. What kind of lady is it who reads Chinese books?’ they whisper.”

This prohibition conferred on Heian women an unintended advantage since it left them free to write in vernacular Japanese, employing an early variant of the hiragana script, called onnade (women’s writing).

For a period of about 100 years, the main genres of classical Japanese literature — nikki (diaries), kiko (travel accounts), zuihitsu (essays), and monogatari (tales or romances) — were pioneered by women writers who, using a supposedly inferior writing system, mastered the difficult process of forging (in Richard Bowring’s words) “a flexible written style out of a language that [had] only previously existed in a spoken form.”

The Tale of Genji

Their writing speaks to us across the gulf of a thousand years with passion and immediacy, in works such as Murasaki Shikibu’s Tale of Genji, Sei Shonagon’s Pillow Book, the Izumi Shikibu Diary, the Gossamer Years, and the Lady Sarashina’s As I Crossed a Bridge of Dreams.

Heian men persisted in writing in the Chinese-Japanese hybrid language which — as it was designed for the keeping of official records — was ill-suited to recording either spoken Japanese or the sad, sweet mysteries of everyday life. With one notable exception: Ki no Tsurayuki, a distinguished poet who adopted the persona of a woman to write the beautiful Tosa Diary in the hiragana script.

Permalink | Comments (3)

Monday 02 September 2002

An impoverishment of language

“Let me reiterate back to what I was saying previously…”

George W. Bush? No, though you’d be forgiven for thinking so. It’s Rex (a.k.a The Moose) Mossop, a famous Australian Rugby League (football) star turned sports commentator whose mangling of the English language won him tens of thousands of loyal fans who couldn’t care less about Rugby League but tuned in every week in the hope of a new Rexism such as “They haven’t a hope of scoring unless they make some forward progress,” “A good punch never hurt anybody,” or “The game’s not over until the final whistle.”

Rex’s greatest linguistic moment occurred in 1972, when he made a citizen’s arrest on an alleged pervert he discovered spying on nude bathers at a beach near his home. Interviewed on TV that evening, Mossop—who had long campaigned against the nudist beach—famously remarked: “I’m sick and tired of having male genitalia thrust down my throat.”

I fondly recalled The Moose when reading Joseph Duemer’s comments on the linguistically-challenged George W. Bush:

How, then, can I claim that the language of our current president is somehow lacking? The short answer is that GWB’s language is not a dialect of English—a variety spoken by a group—but the result of an individual affliction (though one aided & abetted by a class identification that makes him particularly insensitive to the relations between words & things, words & acts. This is a man who grew up insulated from consequences.) We can use language to either sharpen or dull or perceptions & concepts: whether we choose accuracy or muddle depends, not upon language, but upon how we use language. That is, our use of language reflects our moral & ethical constructions. By this argument, GWB’s morality is as incoherent as his syntax. Which is why American soldiers may very soon be engaged in house-to-house combat in Baghdad.

In his post, Joseph pointed to an opinion piece in the Boston Globe in which James Carroll wrote:

The United States, in fact, is in a crisis of language. This is what it means to have a president who, proudly inarticulate, has no real understanding of the relationship between words and acts, between rhetoric and intention.

His vacuous reflection of our mute anguish can be consoling because familiar - hence the high poll numbers - but it is the last thing the country needs. Mawkish bluster in cowboy clothes does nothing to nurture a community of purpose. It does the opposite.

As a candidate, Bush openly displayed his willful illiteracy. At a loss for words, and proud of it. Many voters were charmed. Others were appalled. Few understood, however, that this abdication of leadership by the intelligent use of language would be dangerous to democracy at home, a grievous threat to peace abroad.

Australian football fans were charmed by Rex too. But The Moose called a couple of football games each week—he wasn’t the leader of the greatest industrial and military power in human history.

At first glance, Bush’s presidency is incomprehensible for non-Americans: it’s not just that half the voters in the United States could take this amiable buffoon seriously enough to put him into the White House. Even more inexplicably, he manages to sustain high approval ratings. Just a month ago, an ABCNEWS/Washington Post poll found 69 percent of Americans in favor of Bush’s overall job performance (down from 92 percent in October last year). Even now, 95% of Republican voters approve of the job Bush is doing.

In Australia, we watch him on the evening news as he struggles to put together a coherent sentence, looking up with pride from his printed speech on the rare occasion he manages to say something vaguely sensible—like the dullest boy in the slowest class, desperate for the teacher’s approval.

We think he is a joke.

We are not Americans.

And because we are not Americans we have not undergone the intense social programming whereby Americans are constructed. As Richard Eyre wrote, “In many respects, the US is still a religious country with a strong streak of Christian fundamentalism, but the true religion of America is not Christianity: it is America itself.”

Non-Americans, needless to say, do not worship in the Church of America. We do not believe in the American flag, the Constitution, the Supreme Court, the Senate or the House of Representatives. Most importantly, we have no loyalty to the institution of the Presidency of the United States. For us, GBW is just another politician (although he seems more inept than most). The fact that he is President counts for nothing at all. Our loyalties, such as they are, lie elsewhere.

Yet, a significant majority of Americans support President Bush. I can only surmise it is because each incumbent is sheathed in the power and prestige of the Presidency, so that even the least deserving individual is accorded the respect due to the office. In many ways, this is admirable for it ensures that social and political institutions survive the incompetence and venality of individual office-holders.

Perhaps that’s why, while the rest of us listen in disbelief as Bush bumbles and stumbles his way through one linguistic debacle after another, most Americans hear an eloquent preacher extolling the truth and virtue of the American way. It’s difficult not to conclude, though, that Bush’s failure to enlist international support for the Iraqi adventure—apart from Australia, American’s lapdog—is due in part at least to his inability to speak eloquently and persuasively on behalf of his cause, a task that calls for the rhetorical skills of a Churchill or a Kennedy, not the down-home bonhomie of a West Texas good ole boy.

Joseph Duemer takes it a step further when he writes:

Simple-minded linguistic determinism clearly won’t do—we do not understand our world(s) exclusively through the medium of a single language; otherwise, I would not have been able to enter into the spirit of Vietnam before I began learning Vietnamese. But—& this is important—I learned much more about Vietnam in my bones after I began studying the language. Linguistic determinism ignores the fact that all languages are part of Language & the Language is among the most basic things that makes us human. You can get a lot done with even rudimentary elements of a shared language: get a meal, fall in love, arrange the price for something…

I learned much more about Japan in my bones after I began studying the language. And I learned much more about myself. When you attempt to describe your thoughts and feelings in another tongue, when you try to overcome the limitations of your upbringing and socialization in order to connect with someone who shares only a few of your primary values, you begin to comprehend not just how language forms us but also how fragile and arbitrary is the nature of belief.

That points to what I find most troubling about George W. Bush: his absolute certainty based upon a breathtaking insularity. Bush would be an infinitely more capable and effective President if he’d taken the time and trouble to learn another language. Given his inadequate grasp of his first language, however, it’s unlikely he could have ever mastered a second.

Update. In the comments to this post, Dorothea Salo and Burningbird pointed out that George W. Bush speaks “not half bad” Spanish. Given that I speak “not half bad” Japanese, this revelation rather undercuts my argument. “His accent sucks,” wrote Dorothea, “but no more than that of my third-semester students, all of whom were quite comprehensible.” My (Japanese) accent doesn’t suck—despite my relatively weak vocabulary, on the telephone I am frequently mistaken for a native Japanese speaker because my Japanese “sounds natural.” Yet, crappy accent or no, the fact that GBW speaks halfway decent Spanish amazes me. At this point, late on a Tuesday night, I’m tempted to email Joseph Duemer and ask him what he thinks it might mean.

Other commenters noted that, with just a 50% voter turnout, only a quarter of the American population voted for George W. Bush. That doesn’t explain why his approval rating remains so high. Perhaps it means that, if voting was compulsory in the US, he would have been elected with a substantial majority.

Permalink | Comments (19)

Saturday 08 February 2003

LANG? Enough already!

A couple of posts (lang=”ja” and the attribute selector and Can I CITE you on that?) and lots of comments later, we’re still no closer to resolving how to properly mark up Japanese words written in Romaji (Japanese transliterated using Roman characters).

I started marking up Japanese words—pretty much the only foreign words I include with any regularity—while implementing Mark Pilgrim’s Dive Into Accessibility tips:

Day 7: Identifying your language

You know what language you’re writing in, so tell your readers… and their software.

Who benefits?

  1. Jackie benefits. Her screen reader software (JAWS) needs to know what language your pages are written in, so it can pronounce your words properly when it reads them aloud. If you don’t identify your language, JAWS will try to guess what language you’re using, and it can guess incorrectly, especially if you quote source code or include other non-language content in your pages.
  2. Google benefits, even if you are writing in English, but especially if you are writing in some other language. According to the Google Zeitgeist, 50% of Google users search in languages other than English, and many of these users specify in their Google preferences to only search for pages in specific languages. Google’s language auto-detection algorithms are better than most, but why make Google’s job more difficult?

Except that, as the JAWS information page explains:

JAWS installs with an enhanced, multi-lingual software speech synthesizer, “Eloquence for JAWS”. Languages include: American English, British English, Castilian Spanish, Latin American Spanish, French, French Canadian, German, Italian, Brazilian Portuguese, and Finnish.

No Japanese. Similarly, my copy of IBM Home Page Reader supports the following languages: German, Spanish, French, Italian, Brazilian Portuguese, Suomi (Finnish), British English, and American English but not Japanese.

Even if Japanese were included, it’s doubtful how useful the “correct” pronunciation would be since Japanese is frequently transliterated without the macrons that indicate long vowel sounds i.e. taiheiyo senso instead of taiheiyō sensō (or as taiheiyou sensou, which is even worse).

If I set my Google preferences to search only for pages written in Japanese and do a search for “taiheiyo senso”, then apart from pages in which the phrase appears as a file or directory name, the result list only includes pages with taiheiyo senso written in Romaji.

But if I search for Japanese script for taiheiyo senso (Pacific War), the result list only includes pages in which taiheiyō sensō is written in Japanese script.

Thus, since anyone searching for a Romanized Japanese word will almost certainly want to see results in any language other than Japanese, it’s difficult to see how Google benefits from the inclusion of either lang="ja" or xml:lang="ja".

Bertil Wennergren provided further confirmation by taking the matter to “the high court” (comp.infosystems.www.authoring.html) but, as he noted in his comment, “no solution to the actual problem emerged.” Jukka Korpela summed it up:

This is depressing, but thanks for pointing this out. I think many of have not met this problem yet, either because we have Japanese support installed or because we haven’t visited pages where Romanized Japanese has language markup. The observation reminds us that we should not write language markup in too much detail, until the definitions and implementations have matured. (For an entire document, or for a block quotation, and for a book title, for example, language markup is surely recommendable, and not much work. But even for them, maybe it’s better to suppress the lang markup, if the text is transliterated or transcribed.)

So that’s it for me. From now on I’ll wrap Romanized Japanese words in a span tag, use CSS to italicize them, and—where the meaning isn’t immediately clear from the context—add a title tag to provide it. As in, taiheiyō sensō:

<span class="romaji" title="Pacific War">taiheiy&#333; sens&#333;</span>

Permalink | Comments (3)

Friday 18 April 2003

German assistance requested

In W.G Sebald’s Austerlitz, the eponymous protagonist, in the course of investigating the fate of his mother, describes a book by H.G. Adler “on the subject of the setting up, development, and internal organization of the Theresienstadt ghetto.

Reading this book,which line by line gave me an insight into matters I could never have imagined when I myself visited the fortified town, almost entirely ignorant as I was at that time, was a painstaking business because of my poor knowledge of German, and indeed, said Austerlitz, I might well say it was almost as difficult for me as deciphering an Egyptian or Babylonian text in hieroglyphic or cuneiform script. The long compounds, not listed in my dictionary, which were obviously being spawned the whole time by the pseudo-technical jargon governing everything in Theresienstadt had to be unravelled syllable by syllable.

Austerlitz cites the following German compound words:

  • Barackenbestandteillager
  • Zusatzkostenberechnungsschein
  • Bagatellreparaturwerkstätte
  • Menagetransportkolonnen
  • Küchenbeschwerdeorgane
  • Reinlichkeitsreihenuntersuchung
  • Entwesungsübersiedlung

If a German speaker could explain in a comment the meaning of each of these words I’d be most grateful.

Permalink | Comments (6)

Sunday 27 April 2003

Relatively speaking

Like Stavros, I’m not “a full-fledged linguist (like languagehat)” but rather “an enthusiastic dabbler.” And a linguistic relativist it seems, in the sense that Stavros explains in his essay, Linguistic Relativism and Korean:

The Sapir-Whorf Hypothesis, which is variously referred to as the ‘Whorfian Hypothesis,’ ‘linguistic relativism,’ and ‘linguistic determinism’ (a description of the strong formulation meant by implication to be a bad thing, I think) concerns the relationship between language and thought, and suggests in its strongest form that the structure of a language determines the way in which speakers of that language perceive and understand the external world. This formulation is generally understood by many to be untenable, but the hypothesis also exists in a weaker form: that language structure and content does not determine a view of the world, but that it shapes thought to some degree, and is therefore a powerful impetus in influencing speakers of a given language to adopt a certain world-view.

Is it only those of us with, as Stavros puts it, “little knowledge of Hardcore Linguistics” to whom the weaker form of Sapir-Whorf (i.e. linguistic relativism) seems self-evident?

I doubt I’d ever thought about linguistics until I was in my mid-twenties, when I saw Godard’s Two or Three Things I Know About Her. There’s a conversation in the film between Juliette, the protagonist, and her child, Christophe, who comes to the doorway of her bedroom to tell his mother about his dream the night before: walking along a narrow path next to a precipice he encounters a pair of twins and wonders how they will manage to pass. Suddenly the twins merge into a single person and he realizes that they are North and South Viet Nam reuniting. Godard cuts to a close-up of Juliette and we hear Christophe’s voice asking: “Mummy, what does language mean?”

Juliette replies: “Language is the house in which man dwells.”

I remember being absolutely entranced, and I suppose I still am, by the beauty of this idea: that we live within language rather than language living within us. Not that language determines our thinking but that each language encourages its speakers to perceive the world in a particular way.

I didn’t realize until tonight—thanks to Google—that Godard was probably paraphrasing Heidegger:

Language is the house of Being. In its home man dwells. Those who think and those who create with words are the guardians of this home.

Yet Heidegger’s viewpoint would appear to contradict Sapir-Whorf, in the sense of the counter-argument that Stavros poses:

A possible opposite claim, from a sociolinguistic viewpoint, is that the thought (and thus culture) of a linguistic group is mirrored in the structure and content of their language, that because they behave and understand things in a certain way, their language reflects those behaviours and understandings - the idea that language is molded, if not determined, by culture.

Heidegger seems to be suggesting a far more active role in the construction of language (and therefore) culture for those who think (philosophers?) and those who create (writers and poets?). Hopefully, a fully-fledged philosopher will clarify Heidegger’s intention.

After providing us with a concise yet thorough introduction to the origins of the Korean language, its distinguishing grammatical features, and the influence of Confucian ethics on the language, Stavros states that the question which most interests him is this:

Do structures and forms like these in the Korean language shape the way in which Koreans think, particularly in terms of their relationships not so much to the world but to the people in it, to such a degree that we can say that language has given them a world-view substantially different than, for example, my own, as an English native speaker? It certainly seems so, to me.

It certainly seems to me, too, that the structure of the Japanese language has given the Japanese a world-view substantially different to that of an English native speaker (or, for that matter, a Korean speaker). This difference in how the world is perceived has always been, for me, one of the great attractions of learning Japanese. Not in the sense that it has radically changed my world-view, since the level of my Japanese is such that I only occasionally “think” in Japanese (though I do more often dream in Japanese). It’s rather that communicating in another language is such a direct way of making the familiar strange (Shklovsky’s ostranenie or Brecht’s Verfremdungseffekt).

Given that linguistic relativism seems self-evident to Stavros and myself, a couple of questions immediately present themselves:

What kinds of people accept or reject linguistic relativism, and for what reasons? (I noticed that I quickly skipped through the section about Chomsky and Pinker in Stavros’s essay, wanting to get to the material about the Korean language.)

If an English native speaker achieves a high degree of fluency in another language, do they perceive or behave differently in any substantial sense when speaking one language rather than the other?

Permalink | Comments (13)

Monday 28 April 2003

The cafe universe

Stavros’s Linguistic Relativism and Korean essay continues to resonate, giving rise to some terrific comments on my previous entry, including a pointer from the Dynamic Driveler to a rather skeptical view of the Sapir-Whorf hypothesis by Mick Underwood which, nevertheless, contains a couple of fascinating references to Wittgenstein’s views on language. The first made me laugh out loud:

Wittgenstein said that he was once asked by one of his colleagues whether Germans think in the order they speak in or think normally first and then mix it all up afterwards.

Though I’ve never studied German, I do know that the verb comes at the end of the sentence, as in Japanese. I’m not sure, however, what other characteristics Japanese shares with German.

For example, Japanese uses post-positional particles to indicate grammatical and interpersonal relationships and these particles follow the element (e.g. Tokyo ni, “Tokyo to”, meaning to Tokyo). Similarly, the basic word order in Japanese is reversed in that modifying clauses precede the element being modified; as in the Japanese sentence, Tokyo de katta hon o yonde iru.

Tokyo de katta hon o yonde iru
Tokyo in bought book object marker reading am
(Someone) is reading (a) book (they) bought in Tokyo.

The weird thing is that—as long as I don’t think about it too much—there’s no need to “think normally first then mix it all up afterwards” into Japanese. Somehow the “mixed up” order seems perfectly logical.

The other interesting reference is a quotation from Wittgenstein’s Tractatus Logico Philosophicus:

The limits of my language indicate the limits of my world.

Mick Underwood comments:

This is often advanced in support of the Sapir-Whorf hypothesis. (Actually, given the context in Wittgenstein’s Tractatus, I’m not at all sure that that’s what he was saying, but it’s a good quote, anyway!)

Although I recognized this quotation immediately as coming from a voiceover commentary in Godard’s Two or Three Things I Know About Her (mentioned in my previous entry), what struck me was an observation that Language Hat had made about Godard’s films:

The funny thing is that as I read your post I had my video of Comment ça va? on the tv. I find it helps with Godard to not always watch intently but also to just have the movie going, picking up bits here and there that I might not notice watching in a more connected way. You’re never going to get everything in a Godard movie, after all. And the more I see the movies, the more I realize that a huge percentage of the dialog is a quotation of or reference to something else, poetry or philosophy or other movies. Tout se tient.

Of course you’ll need to understand French in order to “not always watch intently but… just have the movie going, picking up bits here and there that [one] might not notice watching in a more connected way.” I certainly can’t do that with Japanese movies though, now that I think about it, that sounds like something worth trying.

But Language Hat is absolutely correct about Godard’s movies being packed densely with quotations or references to poetry, philosophy, linguistics, and—of course—other movies.

Coffee cup still frame from Godard's Two or Three Things I Know About Her

The Wittgenstein quotation occurs in one of the most arresting sequences in any Godard movie, the one that James Monaco calls in his book, The New Wave, “the café universe,” in which shots of Juliette watching a young couple alternate with close-ups of a cup of coffee, as a male voice (Godard himself?) speaks:

Perhaps an object like this will make it possible to link up… to move from one subject to another, from living in society, to being together. But then, since social relationships are always ambiguous, since my thought is only a unit, since my thoughts create rifts as much as they unite, since my words establish contacts by being spoken and create isolation by remaining unspoken, since an immense moat separates the subjective certitude that I have for myself from the objective reality that I represent to others, since I never stop finding myself guilty even though I feel I am innocent.

A spoon is stirring up the cup of coffee. It is withdrawn. A small circle of foam is left swirling round on the surface.

Given the fact that every event transforms my daily existence and that I invariably fail to communicate… I mean to understand, to love, to be loved, and as each failure makes me feel my loneliness more keenly, as… as… as I can’t tear myself away from the objectivity that is crushing me nor from the subjectivity which is driving me into exile, as I can neither raise myself into Being nor allow myself to sink back into Nothingness… I must go on listening. I must go on looking about me even more attentively than before… the world… my fellow creatures… my brothers.

…the world today, alone, where revolutions are impossible, where bloody wars haunt me, where capitalism isn’t even sure of its rights… and the working class is in retreat… where progress… the thundering progress of science gives to future centuries an obsessive, haunting presence… where the future is more present than the present, where distant galaxies are at my door. My fellow creatures… my brothers.

A lump of sugar tumbles into the coffee and breaks into crystals. The dark circle of the cup glistens with bubbles, like galaxies.

But where to begin? But where to begin with what? God created the heavens and the earth. Of course, but that’s an easy way out. There must be a better way of explaining it all… We could say that the limits of language are the limits of the world… that the limits of my language are the limits of my world. And in that respect, whatever I say must limit the world, must make it finite. And when logical, mysterious death finally abolishes these limits, and when there are, then, neither questions nor answers, everything will be blurred. But if, by chance, things become clear again, they would only become so through the phantom of conscience. Then, everything will fall into place.

It’s impossible to do justice to the spectacular beauty of this sequence, particularly in a cinema, where the coffee cup fills the gigantic TechniScope screen, acting as a counterpoint to the intimate tone of the narration. “This is not a film talking, it is a man,” writes James Monaco. “It is the most personal—and most painful—moment in all of Godard.”

I was surprised to learn from Language Hat’s comment that he loved Godard’s films, that he’d “wanted the video [of Two or Three Things] for years, and finally got it.” And yet I shouldn’t have been, since I cannot think of another filmmaker who cares as much about language—and, by extension, the ethics of film language—as Godard.

The limits of my language are the limits of my world.

Language is the house in which we dwell.

I bought the video of Two or Three Things I Know About Her ages ago and must have watched it a half dozen times. Might be time to watch it again.

Permalink | Comments (7)

Sunday 04 May 2003

Linguistic imperialism?

In 1972, after teaching science in a private high school for a couple of years, I wangled a job as a chemistry teacher in the state technical education system (TAFE), which was a far more congenial environment since the students were older and highly motivated. Better still, night classes counted as time and a half in one’s teaching load and the other staff members hated teaching at night, so I was able to compress my “full time” job into three calendar days (including ample time for preparation). A steady income to spend on cameras, tons of free time to take pictures: I was on my way to becoming a photographer.

On my first day at Sydney Technical College, the staff and students assembled in the auditorium to hear speeches of welcome from the principal, the Student Union president, and the registrar. Only one part of one speech made any impact on me. The registrar, after noting the high proportion of Asian students at the college and drawing attention to the fact that the university entrance exams would all be written in English, forbade the use of any language other than English in the classrooms, laboratories, library, cafeteria, corridors, and elevators. Students who disobeyed this rule faced disciplinary action. I was gobsmacked. How could you deny people the freedom to speak their own language? I later asked one of my colleagues. He laughed and said, Welcome to TAFE, pal!

I recount this anecdote as a way of indicating an awareness of linguistic injustice and, simultaneously, my inability to do much about it. Should I have approached the registrar to express my disapproval? Possibly. Would that have accomplished anything? Almost certainly not. What did I do, practically? Encouraged my students to speak English during class and lab time and made it clear that the rest of the time they could speak any language they chose.

The registrar’s speech came to mind when I read Baldur Bjarnason’s response to the wide-ranging discussion inspired by Stavros’ essay Linguistic Relativism and Korean:

Culture forms language. Language is a symptom, not a cause.

There has been an interesting discussion on linguistic relativism on several weblogs recently.

The discussion fails to recognise that language is a cultural product.

A Weapon.

Linguistic relativism is the equivalent of staring down the barrel of a gun while ignore the person whose finger is on the trigger.

Language, linguistic dominance, are the cannons of cultural warfare. Without a language, a culture is defenseless.

The linguistic relativists might be right in all of their observations, but they are simply staring at the bullet and mistaking it for the lock, stock, barrel and sniper all rolled into one convenient lump of lead.

Language is wielded, formed—your arms and armour.

It kills. Just ask the Welsh, Kenyans, Native Americans or South-American Natives.

Linguistic relativism is a nice idea to those who belong to a dominant, still imperialistic culture (and this applies to the English, Japanese, Koreans and Germans, all cultures that are strong and on the offensive in the war of globalisation).

But there is nothing relative about a bullet in the head.

Or fighting for the survival of your nation and culture.

The odds are stacked against us, in your favour.

Baldur’s first and third sentences—straddling his sardonic reference to our “interesting” discussion—are merely assertions, quite capable of rephrasing so that their meanings are reversed:

Language forms culture. Culture is a symptom, not a cause…

The counter-argument fails to recognise that culture is a linguistic product.

Baldur admits as much when he writes: “linguistic relativists might be right in all of their observations.” However, his essential argument is not just that “without a language, a culture is defenseless” but also that:

Linguistic relativism is a nice idea to those who belong to a dominant, still imperialistic culture (and this applies to the English, Japanese, Koreans and Germans, all cultures that are strong and on the offensive in the war of globalisation).

In other words, it’s all very well for you English, Japanese, Korean, and German speakers to conduct a pleasant academic discourse about linguistic relativism but, in doing so, you are ignoring the fate of other languages, which are succumbing to the linguistic assault being mounted by your respective cultures.

[Stavros and I must be classed as serial offenders by virtue(?) of our being native speakers of one imperialistic language (English) and enthusiastic students of another (Korean in Stavros’s case, Japanese in mine). Language Hat’s status as a linguistic imperialist would seem to depend on whether or not his command of the languages of dominant cultures is balanced by his fluency in languages under threat.]

Why the flippancy? It’s not that I disagree with Baldur since it’s almost self-evident that “without a language, a culture is defenseless.” And when I quoted Heidegger (“Language is the house of Being. In its home man dwells.”) and Wittgenstein (“The limits of my language indicate the limits of my world.”), there wasn’t even the most tenuous implication that that “language” meant English, Japanese, Korean, German, or any other “dominant” language.

Nor is it that I’m disinclined to listen to a disapproving lecture delivered from a moral high-horse when I basically agree with the key argument.

I guess it’s that I can’t see a way to put Baldur’s ideas to use. On the one hand there’s a vague accusation of indifference towards the gradual extinction of precious linguistic resources. On the other there’s no hint of a suggestion as to how I might assist in preserving these endangered languages.

Should I abandon my study of Japanese and turn my attention to Gikuyu or Icelandic? Hardly. I love the Japanese language: the sound of it, how it looks, the feelings it evokes, its obliqueness, its lack of subjects or agents, its tendency to “view the world as a natural state or a change brought about by some force.”

Even if I were to become a language activist, I’d devote my energy to fighting on behalf of English, given that language can be threatened internal as well as external enemies. As Baldur rightly observed:

The English are lazy when it comes to their own language. They treat it like a ten dollar hooker with no self-respect and a high tolerance for having the shit beaten out of her.

AKMA alluded to the problem when he cited Orwell’s 1984:

“Don’t you see that the whole aim of Newspeak is to narrow the range of thought? In the end we shall make thoughtcrime literally impossible, because there will be no words in which to express it.”

I don’t know whether AKMA had in mind Diane Ravitch’s new book, The Language Police, How Pressure Groups Restrict What Students Learn, when he quoted Orwell but reviews in The New York Times and The Los Angeles Times CalendarLive (links via Arts & Letters Daily) leave little doubt that left-wing and right-wing pressure groups have nearly succeeded in turning Orwell’s imaginary Newspeak into a reality by organizing the banning of a whole range of words from the textbooks or test questions used in American schools:

Among those rejected by the “bias and sensitivity” panel was a passage about the patchwork quilts made by 19th century frontier women: “The reviewers objected to the portrayal of women as people who stitch and sew, and who were concerned about preparing for marriage.” The fact that the passage was historically accurate was considered no defense for its “stereotypical” image of women and girls.

Another story about two young African American girls, one an athlete, the other a math whiz, who help each other learn new skills, was red-flagged for stereotyping blacks as athletic (even though one of the girls was not an athlete but a mathlete).

A passage on the uses and nutritional values of peanuts was rejected because some students are allergic to peanuts. Stranger still, a story about a heroic blind youth who climbed to the top of Mt. McKinley was rejected, not only because of its implicit suggestion that blind people might have a harder time than people with sight, but also because it was alleged to contain “regional bias”: According to the panel’s bizarre way of thinking, students who lived in non-mountainous areas would theoretically be at a “disadvantage” in comprehending a story about mountain climbing. Stories set in deserts, cold climates, tropical climates or by the seaside, Ravitch learned, are similarly verboten as test topics, since not all students have had personal experience of these regions.

Also forbidden: owls (the animals are taboo for Navajos), Mt. Rushmore (offensive to Lakotas), dinosaurs (suggestive of evolution, hence offensive to creationists), dolphins (regionally offensive because they live in the sea) and Mary McLeod Bethune (this early 20th century civil rights pioneer had the lack of foresight to use the no-longer-fashionable word “negro” in the school she founded).

Denis Dutton’s “review” of the Guidelines for Bias-Free Writing offers an even more dispiriting glimpse into the world of the Bias Persons and their attempts to sanitize the English language. The astonishing thing is that this censorship has not been directly imposed by governments. Rather, book publishers have voluntarily adopted “bias and sensitivity” guidelines which reflect the sensitivities of anyone who cares to complain about anything.

What these groups on both the right and left have in common, Ms. Ravitch notes, is that they all “demand that publishers shield children from words and ideas that contain what they deem the ‘wrong’ models for living.” Both sides “believe that reality follows language usage,” that if they “can stop people from ever seeing offensive words and ideas, they can prevent them from having the thought or committing the act that the words imply.”

(“In the end we shall make thoughtcrime literally impossible, because there will be no words in which to express it.”)

The results of a similar process are evident in conversations with young Japanese who know next to nothing about Japan’s colonization of Manchuria and Korea, the rape of Nanking, the inhumane treatment of Allied POWs, and war crimes in countries under Japanese occupation, let alone the biological warfare experiments by Unit 731 on Chinese prisoners and villagers.

Why are they so ignorant? Why do so many Japanese believe that, because of the atomic bombs dropped on Hiroshima and Nagasaki, they were the greatest victims of the Pacific War? Because they learned history from sanitized textbooks.

So, ultimately, as much as I sympathize with the disappeared languages of the Welsh, Kenyans, Native Americans or South-American Natives, I suspect that every language is under threat—from linguistic imperialism, from benign or malicious neglect, from Language Police acting out of the “best” intentions. And I can’t help but believe that reading and discussing Heidegger and Wittgenstein form at least part of an acceptable response to the problem.

Permalink | Comments (14)

Tuesday 06 May 2003

Ignorance bought and paid for (in Japanese too)

Golly, Blogaria’s a strange old world. Before dinner I started an entry about Heidegger’s On the Way to Language. But on Monday night SBS screens the English Premier League Highlights show—and what a delightful hour it turned out to be: eating a delicious home-cooked Thai chicken stir fry accompanied by a couple of glasses of Cabernet Merlot while watching Australians Harry Kewell and Mark Viduka save Leeds United from relegation and, simultaneously, thwart Arsenal’s last chance of staying in Premiership contention (which is not to say I’m celebrating Manchester United’s victory).

I returned to my Heidegger post only to find a trackback from Stavros responding in his usual forthright fashion to a New York Times article (link via Language Hat) about one “William C. Hannas, ‘a linguist who speaks 12 languages and works as a senior officer at the Foreign Broadcast Information Service,’ author of a newly released book which claims that Asian science has suffered because the main Asian languages are written in “character-based rather than alphabetic” systems.” Stavros adds:

Not to get off on a rant here, but : in and of itself, this seems to me to be the most vile form of egregiously wrongheaded bullshit, and I suspect Mr Hannas is precisely the sort of person that I’d take great pleasure in pummelling until he whimpered like a frightened infant (a reaction that may reveal to some extent why I left academia many years ago, having dipped no more than a toe in its calm waters). But that’s not the thing that bothered me.

The article states, presumably parrotting Mr Dipshit, that “Western specialists are better informed today […and] now recognize that the writing systems of East Asia, including Chinese, Japanese and Korean, are “syllabaries,” in which each character corresponds to a syllable of sound.”

Now, I can’t speak for written Japanese (for which I think this may in part be true, depending on which way of writing the language one chooses - Jonathon may be the better person in the immediate neighbourhood to address that), and I’m only semi-certain it is true as far as my knowledge goes for Chinese, but this is completely and laughably wrong in the case of Korean.

Stavros is correct in saying that in Japanese this may be partly true, depending on which way of writing the language one chooses. Japanese can be written using the hiragana and katakana syllabaries—in which each character corresponds to a syllable of sound—but the only Japanese who regularly do so are kindergartners. By the end of their first year of elementary school, Japanese children are expected to have memorized and be using 80 kanji characters, many of which have multisyllabic pronunciations, such as:

migi (right), hidari (left), ame (rain), hana (flower), yasu (rest), sora (sky), tsuki/getsu (moon), yama (mountain), ito (thread), onna (woman), shita (below), ue (above), mori (wood/grove), mizu/sui (water), ao (blue), ishi (stone), aka (red), kawa (river), mura (village), shiro/haku (white).

By the end of elementary school, Japanese twelve year olds will be using 1006 kanji characters, hundreds of which have multisyllabic pronunciations.

In other words, as far as Japanese is concerned, the assertion that the language is based on characters corresponding to a syllable of sound is utter nonsense. Unless you’re referring to five year olds—but then there aren’t too many five year olds of any nationality winning Nobel prizes.

And, since Stavros went to the trouble of rendering “a rude bit of English, sloppily and phonetically… into the Hangul alphabet in 5 letters and two syllables for Mr Hannas, sounding something like ‘puhk kyu!’”, here’s the equivalent in Japanese:

Katakana: fakku-yuu

In this case, “fakku-yuu!” Naturally, since this is an English loan expression, I’ve used the katakana phonetic syllabary.

Permalink | Comments (2)

Wednesday 07 May 2003

Hannas revisited

Mark Griffith from pushed back against my post about William C. Hannas, the “Master Linguist” in such a pleasant and well-informed way that his comment deserves reproducing:

Mind you (trying to sound most demurring and non-confrontational here!) isn’t it the case that Japanese newspapers and books are crammed full of hiragana and katakana alongside the imported (and very occasionally home-grown) Chinese kanji?

The last time I did a rough count, something approaching sixty per cent of the ink marks on a Japanese newspaper page were from one of the two syllabaries. And Japanese dictionaries spell kanji for users using hiragana (most of which users I assume are above kindergarten age). Of course most kanji are multi-syllable, but a small number are also one-syllable in their spoken form.

Indeed Japanese text is crammed full of (monosyllabic) hiragana and katakana alongside the Chinese kanji characters, as can be seen in this introduction to the career of the actress Hara Setsuko, star of films by Ozu, Kurosawa, and Naruse (the hiragana appear in blue, katakana in red, kanji in black).

Introduction to Hara Setsuko's career in Japanese

And Mark’s estimate that syllabic hiragana and katakana characters outnumber the kanji is also correct, as Jack Halpern makes clear in the introduction to his New Japanese-English Character Dictionary:

A running Japanese text consists of a mixture of kanji and kana, with the latter normally outnumbering the former.

In the example, hiragana are used for particles such as wa, ga, nado, no, ni, mo as well as for verb endings such as -shita, -rete, and -shite imasu. Katakana are used for loan words such as terebi (television).

However, it’s important to note that, since Japanese text is written without spaces, the process of reading involves skipping from one set of kanji characters to another. As Halpern explains:

Hiragana characters server as natural borderlines that help the reader segment the text into meaningful units. For this reason, a Japanese text is easier to read than a running Chinese text, which consists of Chinese characters only.

It’s also important to stress that much of the “meaning” of the text comes from the Chinese characters (in the same way that one could get the gist of an English text even if the prepositions, pronouns, and verb endings were missing).

And while it is true that “Japanese dictionaries spell kanji for users using hiragana”, that only goes to prove that Japanese is not fundamentally a phonetic language, since the hiragana are provided to give the pronunciation, not the meaning—a Japanese reader can look at a forgotten or unfamiliar kanji and be able to figure out its meaning, without necessarily knowing its pronunciation. To be fair, the form of the kanji often suggests possible pronunciations, but that’s very different from the phonetic (“each character corresponds to a syllable of sound”) definition of Japanese that Mr Hannas suggests.

So isn’t it a bit steep to say it’s utter nonsense to claim Japanese uses characters representing syllables, and that only kindergarten children spell with kana? Sounds to me a rather reasonable simplification to introduce new readers to a tricky language with a fascinating hybrid script.

Not really, since only kindergarten children spell exclusively with kana whereas Hannas quite specifically (and wrongly) states that “the writing systems of East Asia, including Chinese, Japanese and Korean, are ‘syllabaries,’ in which each character corresponds to a syllable of sound.” That is only true for the writing system used by kindergarten students. As soon as they start elementary school, Japanese children are rapidly introduced to multisyllabic Chinese characters.

Is it Hannas’ main argument you don’t like? The bit about precision overriding innovation? That’s obviously a big claim of his, but the syllable component of Japanese seems slightly more than utter nonsense to me.

No, I don’t necessarily disagree with Hannas’ contention that East Asia has failed to make significant scientific and technological breakthroughs compared to Western nations (although, to be honest, there is definite disagreement about whether or not this is actually true). But I think he’s absolutely wrong to blame the writing systems of China, Japan and Korea for that. [Does he also blame the Thai and Vietnamese writing systems for those countries having failed to make “significant scientific and technological breakthroughs”?)

To the contrary, I strongly believe that one of the reasons for Japan’s rapid and successful industrialization after the Meiji Restoration in the mid-nineteenth century is that their uniquely flexible writing system—coupled with a historical willingness to accept ideas from abroad—allowed the Japanese to easily import, comprehend, and put to use an astonishing range of Western cultural, political, aesthetic, and technological ideas.

I have an alternative (language-based) theory for why Japanese science and technology may be less innovative than their Western counterparts, but that will have to be the subject of another post.


Trevor Hill at Glome offers an absolutely first rate rejoinder to Hannas’ nonsensical assertions about East Asian languages, including a persuasive argument that Chinese characters actually facilitate abstract thinking. (I’d actually been planning to write a post along the same lines, based on some ideas in the Halpern essay I cited above, but Trevor has done such a great job that I may not bother.)

Permalink | Comments (3)

Saturday 10 May 2003

Enabling CJK language support

Following the lead of Trevor Hill at, Stavros has posted two entries that include Korean characters: This Is a Test of Korean and Seeing Asian Characters. The screenshot below shows how the Korean characters appear as question marks without Korean Language Support enabled (in Windows 2000):

Korean characters appear as question marks without Korean language support

With Korean Language Support enabled (instructions here), the Korean is rendered properly:

Korean characters appear correctly with Korean language support installed

Having enabled Korean, I also installed UniPad, a Unicode text editor that I expected would allow me to enter Korean text—I was hoping to eventually impress Stavros by displaying a Korean sentence in a weblog entry. No such luck. I just couldn’t figure out how to get the individual Hangul components (Jamo?) into a single syllable. (The UniPad Help indicates this isn’t supported so I tried Word 2000 and failed. Yet Stavros says he can do it with Microsoft’s execrable Notepad, which I also used without success.) So my Korean text-entry career is stalled. (Well, to be honest, it was stalled before I turned the key in the ignition, since the sum total of my Korean knowledge is what I’ve managed to glean from the introduction to the Berlitz Korean for Travellers phrasebook.)

In any case, enabling Korean (or Japanese) in the OS only gets me halfway there. If I’m to display CJK in my weblog posts, I also have to change my character encoding from charset=iso-8859-1 to charset=UTF-8. It seems there are two ways to do this:

  1. Hard code the character encoding sent in the HTTP headers as charset=UTF-8 in each Movable Type template.
  2. Change the character encoding by modifying the “Preferred Language” in my MT user profile (not an option because only “US English” is available).
  3. Set the PublishCharset flag in mt.cfg to UTF-8.

Since Stavros didn’t specify which procedure he adopted, any suggestions will be gratefully received (and adopted).

Something else that puzzled me is that although I couldn’t originally read the Korean characters in Trevor Hill’s post (this was when I had Japanese enabled in Windows 2000 but not Korean), the Chinese and Japanese characters appeared to render correctly. I say “appeared” because even though the Japanese characters are correct, I’m only guessing that the Chinese characters are also correct (the same three characters appear in both the Chinese and Japanese words for SARS).

Explanation of how diseases such as diabetes and SARS are rendered using Chinese characters in both the Chinese and Japanese languages

This would seem to indicate that identical Chinese and Japanese characters share the same Unicode entities, which is not what I’d have expected. It could be that if I enable Chinese in Windows 2000, the Chinese characters for SARS will be different. Which leads me to another question: do I need to enable both Traditional and Simplified Chinese? I assume I do—since Traditional Chinese is used in Taiwan and Hong Kong while Simplified Chinese is used in mainland China—though naturally I’m curious as to which Trevor Hill has used.

Finally, I should confess to having had some misgivings about this entire CJK enterprise. Even though it’s a pain to have to create images of Japanese text in Photoshop every time I want to include Japanese characters in a weblog entry, at least everyone can see the characters whether or not they have Japanese enabled in their OS. If I start to encode Japanese text in my entries, only visitors with Japanese support enabled will be able to see the Japanese characters. Everyone else would see the Japanese text as question marks or hollow boxes. This struck me as a significant problem, since I didn’t want to force visitors to enable Japanese in their browser or OS.

On the other hand, never in a million years would I render English text as an image since this causes major accessibility problems:

  • Text in an image can’t be resized.
  • The ALT text would have to replicate the text in the image.
  • Text in a image can’t indexed by Google.

Therefore, if I wouldn’t use “image text” in English, why should I use it for Japanese (or any other language)?

Accordingly, I’ve come to the conclusion that it’s actually preferable to render Japanese text properly, using Unicode/UTF-8 encoding. Anyone who is sufficiently interested in seeing the Japanese characters can enable Japanese support in their OS (Windows, Macintosh, or Linux). Everyone else can tolerate the question marks or hollow boxes or skip that entry.

It occurs to me—and I’m sure that I’m not the first to come to this conclusion—that the best way to popularize Unicode and to celebrate the intrinsic beauty of language is to write our weblog posts not just in English but in any other language we understand and love. So thanks to Trevor and Stavros (apologies to anyone else I’ve missed) for leading the way.

Permalink | Comments (4)

Sunday 11 May 2003

This is a test of Japanese

I’ve implemented Trevor Hill’s Movable Type modifications (explained by Stavros in his comment on my previous post) by:

  • Turning on the PublishCharset UTF-8 and NoHTMLEntities 1 configuration settings in mt.cfg.
  • Ensuring that the character encoding in each of my templates is set to use MTPublishCharset rather than a hard-coded charset i.e. <meta http-equiv="Content-Type" content="text/html; charset=<$MTPublishCharset$>" />
  • Modifying the send_http_header code in lib/MT/ (using Trevor’s code sample from his post in the MT forum).

So now I should be good to go. Here’s the obligatory test post, with Japanese characters (I’m following John’s example and using a proverb). If you have Japanese support enabled in your OS and can see the characters (or not), please leave a comment. (Please note that you don’t have to go through the above rigmarole just to see the characters.)


Language Hat made the excellent suggestion that for visitors who can’t or don’t want to enable CJK support, “it would be good policy to always accompany [Asian text] with transliterations.” (I think I would probably have done that but it’s good to have it as a formal policy.)

So, the transliterated Japanese is San nin yoreba monju no chie, which means “three people together have the wisdom of a Buddha”; or as we would say in English, “two heads are better than one”. A related proverb plays on the fact that the Chinese character for kashimashii (“noisy, clamourous”) is made up of three small versions of the character for “woman”:


Or Onna san nin yoreba kashimashii (“where three women gather, there is a noisy clamor”). As Kittredge Cherry points out in her book Womansword: What Japanese Words Say About Women:

Of all the characters imported from China, [kashimashii] is almost always the first example that springs to mind when linguistic sex discrimination is discussed. Three women add up to a sin worse than noise when the same character is pronounced kan. This spells wickedness or mischief, and it can be stretched into the verb form kansuru, meaning to seduce, assault, or rape. The hidden corollary to the kashimashii character is that a trio of men getting together is nothing remarkable. There is no character composed of three male ideograms. In fact, the male symbol almost never appears as a component of other characters.

Other words reinforce the concept that women can cause a hubbub. In old Japan, the most likely spot for women to gather was beside the well (idobata) where they drew water and washed clothes, so the term “well-side conference” (idobata kaigi) is still used to describe a group of gossiping women. The word for chatterbox (oshaberi), which literally means “honorable talker,” is almost always used to describe—or put down—a woman. Gossip is considered something women do, while there are few similarly derogatory terms for men who babble about trivial topics.

Permalink | Comments (27)

Monday 12 May 2003

Unicode rocks

The feedback on my previous post, This is a test of Japanese, indicates that our East Asian languages experiment is proving to be surprisingly successful. The number of visitors—mainly on Mac OS X and Linux/Unix systems—who could see the Japanese characters without making any modifications to their system surpassed my expectations.

A couple of visitors reported success with Windows 98 and Windows XP, though Phil Ringnalda suffered a typically Microsoftian experience:

You two are killing me. I know how to enable CJK (I assume someone will be along with Chinese, anyway) characters, just fine. Then Windows says “show me your Windows CD”, and I say “how about this piece of crap ‘recovery disk’ that’s all I got instead?” Time to toss this laptop for something better. How’s OS X’s support for Japanese?

From the feedback so far, Phil, OS X’s support for Japanese is excellent. Looks like you might be another step closer to buying a PowerBook. If you decide on the 12-inch model though, make sure you buy a pair of asbestos gloves—it seems that cute little sucker runs hot.

Kurt Easterwood came up with a great piece of advice:

I wonder if it might not be a good idea to point users to how to install the Japanese IME from Microsoft. This page has good instructions and download links. (It’s from the same site Stavros linked to for the Korean IME.)

The page in question, Declan’s Guide to Installing and Using Microsoft’s Japanese IME, is a “comprehensive guide to installing and using the Microsoft Japanese IME for Window95/98/ME, Windows 2000 and Windows XP. The IME allows users of non-Japanese versions of Windows to read and enter Japanese hiragana, katakana and kanji scripts in IME enabled applications.” I’ll write a post with links to these and other East Asian language resources and link to it from my sidebar.

There were two reports that the Japanese characters didn’t appear in my RSS feed. That’s the fault of Burningbird’s Evil Twin who cast a spell that made me forget to change the character encoding from iso-8859-1 to UTF-8 in my RSS 1.0 and 2.0 templates. My apologies—it’s fixed. I’ve just downloaded and installed FeedReader and both RSS feeds look fine. (I thought about using the highly-regarded SharpReader but I’m trying to avoid the .NET framework for a little while longer.) Hopefully the Japanese characters is also displaying properly in other newsreaders.

I noticed one vaguely interesting glitch when I looked at the post in Mozilla 1.3.1 on my RedHat 7.2 installation: the period at the end of the sentence is near the top of the block of characters rather than near the baseline.

Mozilla 1.3.1/RedHat Linux 7.2 Japanese characters as seen in Mozilla on Linux
Mozilla 1.3.1/Windows 2000 Japanese characters as seen in Mozilla on Windows
IE 6/Windows 2000 Japanese characters as seen in IE on Windows

The IE Windows text also looks smoother than it’s Mozilla equivalent. Now I’m wondering whether it might be useful to include a font:family font-family declaration in my stylesheet—though to do that I’ll have to find out the names of the Japanese fonts on Mac OS X, Linux, and BSD Unix. And I’d really rather write Japanese-related entries than continue fussing with the technicalities of East Asian typography, particularly now that Stavros has made such an impressive start to his long-promised review of Hangul, the Korean writing system.

Korean is a subject-object-verb language, for example, and has a rich system of postpositional case markers. Chinese, a subject-verb-object language, does not. Korean has a complicated system of honorifics, part of which is expressed as verb endings. Chinese does not, and doesn’t have any characters to represent these verb-ending morphemes.

I hadn’t realized that Korean and Japanese so similar structurally: like Korean, Japanese is a subject-object-verb language with postpositional case markers and a system of honorifics. In one of my Japanese grammar books, Senko K. Maynard’s An Introduction to Japanese Grammar and Communication Strategies, it says that “Japanese is suggested to be distantly related to Korean, and therefore to the Altaic languages (among them, Mongolian and Turkish).” I’m looking forward to seeing how Stavros’ series unfolds and am hoping he’ll cover how to use the Korean IME to write Korean sentences (there’s one I’m dying to include in a post).

Permalink | Comments (5)

Wednesday 30 July 2003

Japanese Text


If the characters above look like the ones in this illustration

Japanese text: kore wa nihongo no tekisuto desu. yomemasu ka? (This is Japanese text. Can you read it?)

then you have Japanese support installed or enabled in your OS. If not, and you’d like to be able to see the Japanese characters, you’ll need to install or enable Japanese support.

Jim Breen, coordinator of the EDICT Project (Japanese-English dictionary), has a helpful Japanese page, with lots of information about his various dictionary projects and Japanese computing (from which I’ve extracted some of the following links).

Note that the processes for enabling Chinese and/or Korean support are similar to those described for Japanese.

Reading Japanese text on a Windows PC

If you are using a non East Asian version of Windows, the procedure for setting up your PC to read and write Japanese depends on the flavor of Windows you are running.

Users of Windows 95/98/ME must install the Microsoft Global IME (Input Method Editor) and associated fonts. Users of Windows 2000 and Windows XP Pro do not have to download the IME or fonts, although they will need to install them since they are not installed by default.

The most complete set of instructions is available at:

Side-by-side instructions for Windows XP Pro and Windows 2000 are available at:

Reading Japanese text on a Macintosh

Although I started using Japanese on the Macintosh in the late eighties, I haven’t used a Macintosh for a long time. These instructions will probably be enough to get you going:

Reading Japanese text on a Linux system

Japanese just seemed to work automatically when I installed Red Hat Linux. Information about Japanese support in SuSE Linux, which—according to Jim Breen—applies to other Linux distributions too, is available here:

Creating CJK content in Movable Type

Publishing Movable Type posts containing CJK characters is a little more complicated. There is some background material in Trevor Hill’s post Asian Languages… and mine on Enabling CJK Language Support (particularly the comments). In summary, you will need to:

  1. Modify your mt.cfg file so that MT does not use the Perl module HTML::Entities to encode characters into HTML entities.
  2. Modify mt.cfg to override the default character encoding (based on your “Preferred Language.”
  3. Modify the send_http_header in lib/MT/ as suggested by Trevor Hill in the MT forum.

Specifically, you will need to:

  1. Find the line in mt.cfg that says
    # NoHTMLEntities 1
    and remove the # so that it reads
    NoHTMLEntities 1
  2. Find the line in mt.cfg that says
    # PublishCharset Shift_JIS
    and modify it to read
    PublishCharset UTF-8
  3. Find the sub send_http_header code block in lib/MT/ and replace it with:
    sub send_http_header {
    my $app = shift;
    my($type) = @_;
    $type ||= 'text/html; charset=utf-8';
    # if (my $charset = $app->{charset}) {
    # $type .= "; charset=$charset"
    # if $type =~ m!^text/! && $type !~ /\bcharset\b/;
    # }
    if ($ENV{MOD_PERL}) {
    } else {
    $app->{cgi_headers}{-type} = $type;
    print $app->{query}->header(%{ $app->{cgi_headers} });

After that, all you have to do is generate CJK text in a Unicode-compliant application and paste it into MT’s Entry field.

Permalink | Comments (15)

Friday 08 August 2003

True person table performance

Though I haven’t been to a strip club in a long time, I’ve been watching movies about stripping lately: Dancing at the Blue Iguana, shot by Michael Radford from a script “improvised” by the actors, with a stunning performance by Sandra Oh; Strip Notes, Daryl Hannah’s shapeless “video journal,” based on her research for the character she played in Blue Iguana; and Live Nude Girls Unite, a sharp, funny documentary by Julia Query and Vicky Funari about Ms Query’s campaign to unionize the dancers at the Lusty Lady in San Francisco.

Strip club neon sign, saying Live Show in English and Live  Performance in Chinese Late last week, on my way to visit a friend at St Luke’s hospital near Kings Cross, I noticed this neon sign at the entrance of a Darlinghurst Road strip club and was immediately curious about the Chinese characters (the sign on the other side of the entrance was in Korean and Japanese, the latter in katakana saying raibu shoh).

zhenren biaoyan, live performanceI’m not good at recognizing stylized characters but the second character is easy—hito (person)—and I was sure I’d quickly figure out the rest from one or other of my kanji dictionaries. After an hour or so of frustration I gave up and emailed Trevor Hill (, sending him the photo on the left.

Trevor’s reply arrived promptly. “It’s zhenren biaoyan,” he told me, “or ‘live performance/show,’ just like the English to the left.”

I’d guessed that there were two compounds but I’d never have got them in a million years because although the Japanese equivalent is shinjin hyouen, there are no such words in Japanese. The four characters are (in Japanese):

  •  makoto (true)
  •  hito (person)
  •  omote (table, surface)
  •  enjiru (to perform)

I got stuck on makoto because in the neon sign it has nine strokes but in Japanese it has ten. The bottom two characters had me totally baffled, though they were immediately obvious once Trevor had given me the answer. I guess the moral of the story is that I should take a break from my Japanese texts and spend more time walking around Chinatown.

[If the characters in the bulleted list above don’t appear correctly, you might want to enable Japanese support in your OS. Here’s how to do it.]

Permalink | Comments (16)

Wednesday 13 August 2003

Thunderbird is go!

Once I discovered the wonder of UniCode, I realized I needed a new email client. Eudora is my everyday email client but it’s not Unicode-aware so for the past few years I’ve been using Rimarts Becky! to send and read Japanese email. I had no trouble creating and reading Chinese messages in Becky! (not that I understand Chinese but I’ve developed an interest how kanji are written differently in Chinese and Japanese). However, despite my best efforts, I’ve had no success with Korean (not that I understand Korean either but I suspect the WonderChicken will come up with a reason). So I went hunting for another email client. A Google search on “unicode email client” yielded a Multilingual Browsers & Email Clients page, with recommendations for Outlook Express, Netscape, Mozilla, and Opera plus several standalone email clients:

  • Scribe
  • TabMail
  • The Bat
  • Becky!
  • LingoMAIL

I’d rather give up email than use Outlook Express and I don’t like browser-based email clients. I gave Scribe a spin but couldn’t even get it to work with Japanese, let alone Chinese or Korean.

Then I recalled Phil Ringnalda saying something about Thunderbird. Who in our neighborhood, apart from Phil, wouldn’t be wary about installing a 0.1 version of an application? But I dived in and, fifteen minutes later, had sent and received a series of test messages in Chinese, Japanese, and Korean. And ten of those fifteen minutes were spent locating my Berlitz Korean for Travellers Phrase Book then figuring out how to get the Korean IME to work.

I’m used to simply typing romaji to enter Japanese (and it took ten seconds or so to suss out pinyin) so I thought I’d be able to type ch’an maek⋅chu⋅rŭl chu⋅se⋅yo (“I’d like a cold beer, please”) on my English keyboard—just as I’d type bīru o itadakitai’n desu ga in romaji—and that the IME would convert the hanglish to Hangul. But the only way I could enter Korean was by referring to this keyboard map. Maybe someone can tell me where I’m going astray.

Anyway, if you’re looking for a CJK email client, look no further. The 0.1 version of Thunderbird is better than anything else I’ve tried.

Permalink | Comments (16)

Saturday 16 August 2003

A quotation for all seasons

Victor Klemperer's diaries: book coversI started to read—at Language Hat’s suggestion—Victor Klemperer’s diaries, a remarkable account of the everyday life of a Jew living in Hitler’s Germany from 1933 to 1945.

In early February 1945, Klemperer was one of 198 registered Jews in Dresden, having thus far escaped being deported to Riga, Auschwitz, or Theresienstadt because, like all the remaining Jews in the city, he had a non-Jewish spouse. If his wife, Eva, had died or had divorced him, Klemperer’s name would have been instantly placed on the list for deportation. In fact, on Tuesday 13 February 1945, all physically fit Jews were ordered to report on the following Friday. Klemperer was headed for a death camp.

But on the night of Tuesday 13 February 1945, RAF Bomber Command launched a twin attack on Dresden: an initial raid, which marked the target area and set it alight, was followed by a much heavier raid three hours later, when the German fighter defence had run out of fuel and the firefighters and rescue workers were struggling to contain the fires that had already taken hold in the center of the city. The resulting firestorm was responsible for most of the estimated 35,000 fatalities. On the following two days the US Eighth Air Force launched further attacks on the beleaguered city and in the ensuring confusion Victor Klemperer and his wife fled across Germany for the next three months “until finally the village they had reached in southern Bavaria was overrun by American forces.”

A couple of months ago, in an post titled Provocation and Retribution, I wrote:

As I continue to read books and watch films about the persecution and extermination of the Jews and the annihilation of German civilians in the Allied bombing raids, it’s difficult not to imagine one as retribution for the other.

The cover photographs of the two volumes of Klemperer’s diaries illustrate this cause and effect relationship with great economy: firstly, the enforcement of a boycott against Jewish shops; and then, two women moving rubble in the ruins of Dresden’s Frauenkirche.

I’d read less than a hundred pages of the first volume before realizing that I know too little of the history of the Third Reich to understand many of Klemperer’s references. So I went back to a book I’d bought around the same time, Robert Gellately’s Backing Hitler. I’ve already quoted a conversation Klemperer had with two of his students who, despite being anti-Nazi, had no sympathy for two young women executed for allegedly spying for Poland:

They saw no fault in the procedures of the secret trial, nor were they troubled in the least that the accused had been denied essential legal rights.

Klemperer’s first diary entry is for 14 January 1933. Hitler became Chancellor on 30 January 1933. Two months later to the day, on March 30, Klemperer writes:

Frau Dember related the case of the ill-treatment of a Communist prisoner which had leaked out: torture, with castor oil, beatings, fear—attempted suicide. Dr Salzburg’s second son, a medical student, has been arrested—letters from him had been found in the home of a Communist.

The same entry ends:

In a toyshop a children’s ball with a swastika.

Gellately describes the ease with which the German people relinquished their civil liberties:

Hitler’s appointment as Chancellor on 30 January 1933 was followed next day by the dissolution of the Reichstag. His slogan for the elections called for 5 March, “Attack on Marxism”, was bound to appeal to solid citizens and property owners. Hermann Göring, one of the few Nazis in Hitler’s Cabinet, took immediate steps to introduce emergency police measures. Over the next weeks the Nazis did not need to use the kind of massive violence associated with modern takeovers like the Russian Revolution. There was little or no organized opposition, and historian Golo Mann said of those times that “it was the feeling that Hitler was historically right which made a large part of the nation ignore the horrors of the Nazi takeover…. People were ready for it.” To the extent that terror was used, it was selective, and it was initially aimed mainly at Communists and other (loosely defined) opposition individuals who were portrayed as the “enemies of the people”.

By mid-February 1933, Göring had replaced numerous police chiefs throughout Prussia because they belonged to the Social Democratic party.

Reading about the tacit complicity of ordinary Germans in Hitler’s rise to power, one is inevitably reminded of Martin Niemöller’s warning about the consequences of capitulation in the face of tyranny:

First they came for the Communists, but I was not a Communist, so I said nothing.
Then they came for the Social Democrats, but I was not a Social Democrat, so I did nothing.
Then came the trade unionists, but I was not a trade unionist.
And then they came for the Jews, but I was not a Jew, so I did little.
Then when they came for me, there was no one left to stand up for me.

Until I went searching for the correct wording on the Web—at first I thought that Dietrich Bonhoeffer had made the famous statement—I wasn’t aware that this quotation has, in Gerry Cordon’s words, “a life of its own”, that there is no “master” version.

The version above—the one quoted by Gerry Cordon—mentions Communists, Social Democrats, trade unionists, Jews, and me (Niemöller himself), in that order. A similar version is cited by the Jewish Virtual Library, with the explanation that the “exact phrasing was supplied by Sibylle Sarah Niemöller von Sell, Martin Niemöller’s wife”.

But, as Gerry Cordon points out, different people “use the quotation to imply different meanings—even altering it to suit their purpose”:

  • When Time magazine used the quotation, they moved the Jews to the first place, added Roman Catholics, and dropped both the Communists and the Social Democrats—Jews, trade unionists, Catholics, me.
  • Former Vice-President Al Gore also added the Catholics, but dropped the trade unionists—Communists, Social Democrats, Catholics, Jews, me.
  • In the quotation inscribed on the Holocaust memorial in the heavily Catholic city of Boston, Catholics were added, Social Democrats removed, and Jews moved into second place—Communists, Jews, trade unionists, Catholics, me.
  • The US Holocaust Museum includes the Social Democrats but drops the Communists—Social Democrats, trade unionists, Jews, me.
  • omits the Social Democrats and moves the Jews to first place—Jews, Communists, trade unionists, me.
  • The version read into the Congressional Record by Congressman Henry Reuss of Wisconsin (14 October 1968, page 31636) omits the Communists, moves Jews to first place, and adds Catholics and industrialists—Jews, Catholics, unions, industrialists, me (and the Protestant church).

Harold Marcuse, a UCSB historian and author of Legacies of Dachau, has extensively researched the famous quotation—his Niemöller page addresses the questions:

  • What did Niemöller really say?
  • Which groups did he name?
  • In what order?

Martin NiemollerProfessor Marcuse describes Martin Niemöller as a Lutheran pastor in a wealthy Berlin suburb—someone who, at least until the mid-1930s, was “a typical Christian antisemite who openly professed his belief that the Jews had been punished through the ages because they had ‘brought the Christ of God to the cross.’” Initially a supporter of Hitler, he became an opponent of the Nazis when they started to interfere in church affairs. As a consequence of his outspoken sermons Niemöller was arrested in 1937 and sent to Sachsenhausen concentration camp, then moved in 1941 to Dachau where he was confined until the war’s end.

Marcuse suggests that the quotation arose from a visit by Niemöller and his wife to Dachau:

Shortly after the end of the war Niemöller became convinced that the German people had a collective responsibility (he often used the word Schuld, guilt) for the Nazi atrocities. In October 1945 Niemöller was the the prime mover behind the German Protestant Church’s “Confession of Guilt” (“Stuttgarter Schuldbekenntnis”). In later speeches Niemöller claimed that a November 1945 visit to Dachau, where the crematorium was being kept as a memorial site, began that process of recognition.

I think that it was in this context that Niemöller’s most quoted saying evolved. This early statement implies that he may have thought first of the Communists, then the disabled, then Jews, and finally countries conquered by Germany. However, it is also likely that he modified what he said for different audiences, perhaps including other groups, or changing the order depending on his goals. (I am suggesting that there may not be ONE SINGLE master quotation, but several versions used by MN himself.)

In the earliest texts that Harold Marcuse has been able to locate, Niemöller “spoke of the Communists, the disabled, and the Jews, in that order. He also mentioned Jehovah’s Witnesses”. Thus, despite the ambiguity, it seems certain that the Communists were named first—as suggested by Klemperer’s report of the Communist who was arrested and tortured in March 1933.

What is most interesting is not that Niemöller used different versions himself but rather the self-serving way the quotation has been “reworked” by others to suit their own ends: the version in the US Congressional Record being clearly the most egregious example of such distortion, since it replaces “Communists” with “industrialists”.

Ironically, it is just this kind of manipulation and subversion of language that Victor Klemperer exposed in his book The Language of the Third Reich, which describes how “the existing social culture was manipulated and subverted as the German people had their ethical values and their thoughts about politics, history and daily life recast in a new language.”

Happily, that is all in the past. As I recently heard George W. Bush say on television: “These are good days in the history of freedom”.

Permalink | Comments (6)

Thursday 04 September 2003

So euphonious to me

Apart from the foreign movies and documentaries that SBS broadcasts, there’s hardly anything worth watching on free-to-air television in Australia, now that the current season of The Sopranos has finished. And I’m not willing to shell out $78 (US$50/€46) per month for cable (which is what I’d have to pay for a package that includes classic and contemporary movies). So most of my time in front of the TV is spent watching movies on DVD or those I’ve taped from SBS.

I’ve seen a lot of Chinese movies lately: Wang Xiaoshuai’s Beijing Bicycle, Hsiao-hsien Hou’s City of Sadness, Zhang Yimou’s Not One Less and The Road Home, and—most recently—He Ping’s Red Firecracker, Green Firecracker.

All these movies, despite their disparate stories and styles, have one thing in common: the characters speak Mandarin. And, even though I don’t understand a word of Mandarin, I adore the sound of that lovely musical language.

Last night SBS broadcast From the Queen to the Chief Executive, Herman Lau’s movie about a young man held in gaol in Hong Kong, “at Her Majesty’s pleasure”; in other words, detained for an indefinite period without any expectation of release. The film was well-written and directed with good acting and engaging characters, yet within ten minutes I was ready to turn off the TV. I persevered, and I’m glad I did, but it was tough going because the Cantonese dialog made it difficult for me to enjoy the film. I’ve never been interested in Hong Kong action movies either, partly because I find their mixture of humor and violence crass and predictable, mainly because I dislike the sound of Cantonese. (Even Amy at the Chinese restaurant, who was born in Hong Kong, once admitted to me that Cantonese doesn’t sound particularly pleasant.)

Last week, in the train, I was sitting in front of a two men who were speaking in (what I guessed was) a South-Asian language and I caught myself thinking, “What an unattractive-sounding language.” Immediately I started to wonder about what makes one language sound more pleasing than another to an individual ear.

Not surprisingly, Google searches on “beautiful sounding language” and “language sounds beautiful” yield conflicting opinions—though Tolkien appears near the top of each list of results: Quenya is described as “the most beautiful sounding model language, spoken by one of the most compelling fictional races ever portrayed” while Tolkien’s love of Welsh is frequently cited.

The languages I find most beautiful are Japanese, Mandarin, Spanish, German, and Vietnamese (of course Japanese is far-and-away my favorite, though I admit that Mandarin is more euphonious).

No doubt it’s politically incorrect to suggest that one language sounds better than another—such a preference is admittedly subjective. Perhaps every language sounds beautiful to its native speakers (Amy excepted). Yet the very existence of a word like “euphonious” suggests that some languages do sound better than others.

euphonious adjective (of sound, especially speech) pleasing to the ear

euphonious pleasant-sounding, sweet-sounding, mellow, mellifluous, dulcet, sweet, honeyed, lyrical, silvery, silver-toned, golden, bell-like, rhythmical, lilting, pleasant, agreeable, soothing; harmonious, melodious, melodic, tuneful, musical, symphonious; informal easy on the ear; rare mellifluent, canorous.
-opposite(s): cacophonous.

In the absence of objective criteria, I’ll continue to regard Mandarin as euphonious and Cantonese as cacophonous (as I await the barrage of complaints from Cantonese speakers).

Permalink | Comments (27)

Wednesday 10 September 2003

A Kanji learning tool, er, toy…

Dave Rogers just bought himself Another New Toy, er, Tool… a Sony Clié PDA. And though Dave insists that he “didn’t acquit [himself] too spectacularly as a bargain hunter”, I vehemently disagree. The Clié 665C that he picked up at a discount for US$199.97 has a list price in Australia of AU$749—that’s about US$493/€440 (at today’s exchange rate AU$100 = US$66.07 or €58.73).

Dave bought his Clié 665C to replace a Handspring Visor that he clearly used regularly, whereas I recently bought a Clié to replace a Palm III I hadn’t used for years. Dave’s post prompted me to think about the convolutions we go through to justify certain purchases.

Animation of KingKanji screen showing a kanji being drawn and the correct stroke order being illustratedWhy was I in the market for a new PDA? Because somehow I’d stumbled across KingKanji, “an award-winning Japanese/kanji flashcard program [for Palm OS and Pocket PC] that emphasizes writing as well as reading”. I installed KingKanji on the Palm III and was immediately impressed. In addition to the flashcard lessons that test your knowledge of kanji and vocabulary, the program allows you to practice writing kanji with the stylus and includes stroke animations for over 1200 characters including the grades one through six Jōyō kanji (that Japanese children study in elementary school). This animation succinctly demonstrates the program’s intrinsic coolness. (I emailed Gakusoft, the developers, asking for permission to use it but haven’t heard back from them. If they object to my illustrating how great their product is, I’ll remove the animated GIF.) Gakusoft also offer a Chinese study program called KingHanzi but I don’t intend trying to learn Chinese in this lifetime.

Though KingKanji ran acceptably in the Palm III’s 2Mb of RAM, I couldn’t even load a couple of other applications:

  • CJKOS (which allows users of the English version of Palm OS to read and enter Chinese, Japanese and Korean); and
  • Dokusha (a freeware “integrated Japanese text reader, Japanese-English dictionary, Kanji dictionary and study system for Palm OS).

Dave Rogers legitimately justified his Clié purchase because he found last year’s model at a bargain price. I researched the available Palm and Clié models and went hunting for discounts. At the local OfficeWorks (the Australian equivalent of Office Depot), they had the grayscale Palm 125M on special for AU$300. While I was playing with a Palm Zire 71 (AU$599), a woman approached me and said, “Do you know much about these? I’m just not sure which one I should buy.”

I asked her what she wanted to use it for. “Organizing my appointments and addresses and taking notes,” she replied.

“Buy the cheapest model that has the features you need,” I advised her. A look of relief passed over her face, she plucked a Palm 125M box off the shelf, and walked straight to the checkout. I’d solved her problem though I didn’t realize I was on the way to solving my own.

Sony Clie PEG-SJ22OfficeWorks didn’t have any Clié’s so the following day I went to a Sony store to check them out. I’d already decided that AU$600 (US$394/€352) was my absolute limit. The PEGSJ33G—which Dave Rogers originally had his sights on—had a list price of AU$549 (US$360/€322) but I didn’t need MP3 playback. The PEGSJ22G—with a 33MHz processor instead of the PEGSJ33G’s 66MHz chip—was AU$449 (US$295/€264). Better still, the PEGSJ22G has a user-replaceable battery, accessed by unscrewing the backplate.

Unfortunately no one was offering discounts on the Clie so I was going to have to pay full price. My justification process ran like this:

  1. I was obeying the essential rule of computer purchasing—don’t buy the hardware for its own sake but because it runs a piece of software you need.
  2. I’d already tested the KingKanji on my old Palm III so I knew it was worth having.
  3. KingKanji looked miles better on a 320x320 pixel color screen.
  4. 16Mb of RAM was essential to run CJKOS and Dokusha (although, to be honest, I didn’t yet know how useful these programs would be).
  5. The SJ22 was the cheapest model I’d been able to find with all the features I needed.

Justification enough, one might assume. But one sticking point remained: How could I be sure I’d use a $449 Sony Clié to learn kanji when my $25 (cardboard) Tuttle Kanji Cards were gathering dust in a drawer?

I had no answer to that question. It would require a leap of faith. I remembered a woman who told me she’d finally given up smoking after attending a $300 Stop Smoking course. It must have been a really good course, I said. No, she replied, I kept thinking about all the money I’d wasted on cigarettes and couldn’t bear the thought of wasting the $300 course fee too.

I pulled out my credit card.

I already had a couple of spare 16Mb Memory Sticks (and relatively modest storage requirements) so, unlike Dave, I didn’t buy a 128Mb Memory Stick. But I couldn’t resist the AU$100 external battery holder that uses four AA batteries to either directly power the Clié or recharge its internal battery. I love accessories. As Dave says, “it’s a character flaw”.

I’m delighted with my Clié, which I use constantly—even though I haven’t stored a single appointment or address. I bought it to learn kanji and vocabulary and I have no interest in using it for anything else. Traveling by bus or train, waiting in line for tickets, for friends to arrive at a restaurant, or for a movie to start, I’m steadily improving my Japanese vocabulary.

A tiny voice occasionally nags me that the cardboard kanji cards would have been just as effective, that despite all my protestations I’m just as materialistic as anyone else—I simply put myself through more hoops than someone for whom shopping is an unalloyed (and guiltless) pleasure. But maybe I’m just typical of a generation for whom applying computing power is the natural response to any conceivable problem.

Permalink | Comments (4)

Tuesday 30 September 2003

Mojikyo fixes a bug (in me)

Reading the opening chapters of Piers Brendon’s The Dark Valley: A Panorama of the 1930s a couple of weeks ago prompted me to start work on an entry about George W. Bush’s aircraft carrier stunt and why it so greatly vexed me. But the writing hasn’t gone smoothly and, in dire need of distraction, I hit the jackpot: xiaolongnu, a Chinese language specialist who regularly comments at Languagehat (and occasionally here) had alerted Languagehat to the existence of the Mojikyō Institute, a Japanese organization that produces the Konjaku-Mojikyo, a dictionary of mainly Chinese Characters, with a free font set of about 110,000 characters plus an input program.

The Konjaku-Mojikyo includes about 20,000 Chinese characters defined by Unicode (ISO 10646), and about 50,000 Chinese characters collected in the Professor Morohashi’s 13-volume Daikanwajiten (Great Kanji Japanese Dictionary), “the most comprehensive and authoritative reference work on the subject of Chinese characters”. The Mojikyo contains a wealth of other characters including Oracle Bone inscriptions, Siddham (Sanskrit) characters, Japanese Kana , Chu Nom (the original characters used in medieval Vietnam) , Shui Script (characters used by that Chinese ethnic minority) , and Tangut (Xixia) Script.

The Mojikyō Character Map, to which xiaolongnu originally referred, is a freeware application developed from the profits of the commercial Konjaku-Mojikyo software published on CD-ROM by the Kinokuniya Bookstore (the commercial version allows more convenient searching and finding information about the characters).

Languagehat set the bait, confident that several of his readers would be interested “in all this great stuff”. As he later admitted, I was at the top of his list. Happily, I didn’t disappoint him—as I explained in my comment on his post, I could put the Mojikyo Character Map to immediate use:

Something that’s bugged me for ages is that Nagai Kafū’s Bokutō kidan (A Strange Tale from East of the River) uses an obsolete kanji for the boku character. Amazon lists the book as “墨東綺譚” but the first character is a much simplified version of the original that appears on the cover and title page of Kafū’s novel. Now it looks like I might be able to find the correct boku character.

Late last night, needing a break from Bush’s aviation exploits, I convinced myself that I should download the 34 files (totalling 52MB) needed for the installation. We’ll call that the thin edge of the wedge. This morning I decided it wouldn’t hurt to install the Mojikyo Character Map and quickly see if I could find Kafū’s boku character.

Although extracting the 34 files was a little tedious, Jack Wiedrick’s instructions made the actual installation a snap. I use Extensis Suitcase to manage my Japanese fonts so I simply activated the Mojikyo fonts with Suitcase and double-clicked on the Mojikyo Character Map application. I was in business:

Mojikyo Character Map application

These days, on the rare occasion that someone asks me why I continue to study Japanese, I answer: “So I can read Nagai Kafū’s A Strange Tale from East of the River in the original Japanese, rather than a translation.”

Cover of Nagai Kafū's Bokuto Kidan (A Strange Tale from East of the River)Kafū’s Strange Tale is, in the words of his English translator Edward Seidensticker, “in many ways scarcely a novel at all”. Its nominal subject is an aging writer (Oe Tadasu) who, while researching a novel he is writing, wanders “east of the (Sumida) river” from Asakusa to the lower-class Tamanoi district.

Trapped by a sudden storm, he meets a prostitute, Oyuki, who invites herself under his umbrella and then him into her house. Oe embarks on an affair with Oyuki, spending the hot summer evenings with her in Tamanoi; when the cold weather returns he ends the affair.

The Strange Tale contains another story—the one Oe is struggling to write—about a retired teacher (Taneda Jumpei) who elopes with Osumi, a bar-girl who was once his maidservant. Part of the novella’s appeal lies in the skill with which Kafū plays one story off against the other—in Keiko I. McDonald’s words, “expand[ing] his ‘discourse time’ by telling two stories that interact and complement each other”.

I also admire Strange Tale because, as Seidensticker explains, “it belongs to the uniquely Japanese genre to which [Kafū’s] Quiet Rain also belongs, the leisurely, discursive ‘essay-novel’, its forebears the discursive essay and ‘poem story’ (utamonogatari) of the Heian Period, and the linked verse of the Muromachi Period and after”.

Japanese characters for Bokuto kidan (correct kanji  for boku)Put it down to my anal-retentive temperament, but it’s always irritated me that I couldn’t write Bokutō kidan correctly in Japanese because the Microsoft Japanese IME doesn’t support the first (boku) character. Worse still, I have three kanji dictionaries— Halpern’s New Japanese-English Character Dictionary, Spahn & Hadamitzky’s The Kanji Dictionary, and Haig & Nelson’s New Nelson Japanese-English Character Dictionary—and Kafū’s boku isn’t in any of them.

Why did Kafū use such an “obscure” character? Well, for one thing, such kanji were more commonly used in the first half of the 20th century, when Kafū was writing. Also because of his upbringing: his father and maternal-grandfather were trained in the Chinese classics and Kafū himself entered the Chinese department of the School of Foreign Studies in 1897 though, as Seidensticker explains, “he scarcely went near the place and failed to graduate”. That particular character may have evoked a specific feeling or impression in his readers or he may even have used it because in Kafū’s time the use of uncommon Chinese characters in one’s writing was a sign of erudition (an attitude that persists amongst some contemporary Japanese).

Japanese characters for Bokuto kidan (simplified kanji  for boku)Not being able to represent the character correctly is not just a problem for me—as I explained before, Amazon in Japan uses a simplified version in its listing for the book.

[The primary meanings of the four characters in the Amazon title are, in order, “india ink”, “east”, “figured cloth; beautiful”, and “talk”. In his entry for the boku character that Amazon uses, Halpern includes a Chinese variant (mò), which looks like Kafū’s character minus the three-stroke radical on the left.]

But I found the correct character with the Mojikyo Character Map on my second attempt. Although Halpern’s dictionary uses a different method (SKIP, based on geometrical patterns), most kanji dictionaries require you to identify the radical (the primitive by which it is indexed), count the total number of strokes in the character (or the number of strokes less those in the radical), and finally locate the particular character within a list of characters with that radical and stroke count. It sounds more difficult than it actually is. Unless the dictionary doesn’t contain the character you’re looking for.

Kafū’s boku character has the three-stroke radical sanzui (#85) on the left—and a total of 18 strokes. My first match (Mojikyo 050021; below, left) was close but, as I realized almost immediately, not quite correct. And it only contains 17 strokes. Interestingly, this one is kind of “half-way” between the correct character and the simplified version that Amazon uses and that (not surprisingly) the Microsoft IME supports. I scanned through the grid of characters until I reached the 18-stroke section and there it was (Mojikyo 079131; below, right). Success!

Mojikyo characters 050021 (left) and 079131 (right)

But my elation rapidly turned to disappointment when I realized that I couldn’t represent the correct character (Mojikyo 079131) via Unicode.

Mojikyo contextual Copy menuThe Mojikyo Character Map provides a contextual menu that allows you to copy a character in a number of formats for pasting into other applications. Copying the Unicode tag for Mojikyo 079131 produces the Unicode (Decimal) tag &#28665;, which is also the Unicode tag for my first (incorrect) match, Mojikyo 050021. And &#28665; yields .

At first I thought that this might be because Kafū’s boku character (Mojikyo 079131) is included in Morohashi’s Daikanwajiten but is not part of the current Unicode standard. But Jack Wiedrick’s documentation indicates that:

  • Gold characters are included in the JIS standard;
  • Cyan characters are included in the ISO10646 (=Unicode) specification; and
  • White characters are not included in either standard.

And Kafū’s boku character—perhaps xiaolongnu can suggest an alternative name—is rendered in cyan, which means it is part of the Unicode standard. So perhaps, on my first use of the Mojikyo Character Map, I’ve discovered a bug. I’ve emailed the Mojikyo Institute and am waiting on their reply. But at least, in finding the character Kafū used, I’ve fixed what was bugging me.

Permalink | Comments (16)

Sunday 05 October 2003

Not a bug in Mojikyo, but rather a feature of Unicode

It wasn’t a bug in Mojikyo, nor the fact that Windows is a sorry excuse for an operating system—rather it turned out to be the inherent design of Unicode that limits my ability to display (on a Web page) both variants of the Chinese character mentioned in my previous post.

Variants of the boku character used in the title of Kafu's Bokuto kidanA comprehensive explanation came via email from Mr Tanimoto of the Mojikyo Institute, confirming what Brian Hunziker and gaemon had suggested in their comments: that the two variants of the character “boku” (shown at the left) have the same Unicode number (or, in Unicode-speak, “share a single codepoint”). In his comment, Brian linked to a screenshot of the Macintosh Character Palette showing how Mac OS X allows one to choose between the two variants; in response to an email request, he graciously made new screenshots and gave me permission to reproduce them. In the illustration below, the green triangles to the right of certain characters indicate that alternatives exist to the character being currently displayed. At the bottom of the Character Palette, a button provides access to the character variants which share the same codepoint.

Macintosh Japanese character palette
(Scaled down screenshot— click to see full size image)

In a BYTE article titled Unicode Evolves, Ken Fowles explains how codepoints work:

The Unicode/ISO10646 standard provides one uniform 16-bit encoding that can store information from all the world’s commonly used scripts. The key word here is “standard.” Unicode itself is a standard, not a technology. Where technology gets involved is how the software makes use of the standard.

The Unicode concept of parking characters into a 64-KB space sounds simple enough — until you realize there are three or four times that many characters in the world’s written languages. So a key part of Unicode’s design is to handle that 64-KB space as valuable real estate since it has to support a large number of scripts in one consistent encoding.

Several parts of Unicode’s design help it maximize this use of what’s called a codepoint , the permanent Unicode address of each character. For example, diacritic marks in most other character sets are not stored as unique characters, but in Unicode each diacritic can be separately tracked and shared among several characters. Codepoints are conserved through Han Unification, sort of like a highway carpool lane where two or three characters with similar appearance share the same space. To Unicode, small differences in appearance should be handled as a font issue, not by inventing another character encoding. Also, Unicode does not guarantee a particular sort order, since software should handle that separately.

Thus, the two variants of “Kafū’s boku character” share a single codepoint (&#28665;). The crucial concept—the one that led me to to wonder if a bug in Mojikyo caused it to produce the same Unicode character entity for each variant—is that, as Mr Tanimoto explained in his email, Unicode does not differentiate between design differences within the same character—each character is assigned a codepoint and “the judgement of which design is adopted is left to the font maker”.

The Mojikyo system, on the other hand, takes an entirely different approach by separately registering all the different designs of a particular character and assigning to each variant a separate Mojikyo number. Mr Tanimoto illustrated the relationship between Unicode and Mojikyo—as it applies to the boku character—with an ASCII diagram in his email, which I’ve recreated here:

Relationship between Mojikyo and Unicode numbers for boku character

As Brian Hunziker’s screenshot shows, the Hiragino Mincho Pro font includes both variants:

Detail of Macintosh Character Palette showing font variant selection

Unfortunately, as one might expect, the IME Pad (the Windows XP “equivalent” of the Macintosh Character Palette) and MS Mincho font combo leave a lot to be desired:

Windows XP IME pad

I wrapped the word “equivalent” in quotation marks because there is really no way that the butt-ugly Windows IME Pad can compete with the design, functionality, and appearance of the Macintosh Character Palette. Nor do any of the Windows Japanese fonts (MS Mincho, MS Gothic, and Arial Unicode MS) include the range of character variants included in Apple’s beautiful Hiragino Mincho Pro font.

“I hope I’ll not derail this into a Mac vs. PC discussion as that certainly is not my intention”, wrote Brian Hunziker in his comment. That’s OK, I’m sufficiently irritated to derail it myself:

<rant>The relentless mediocrity of Japanese support under Windows absolutely typifies Microsoft’s “near enough is good enough” approach to functionality and interface design. In fact, Windows Japanese support seems about as good as that offered by the Japanese Language Kit I was using on the Macintosh in the late eighties.

I get so tired of hearing about all the super-smart people who work for Microsoft when it’s abundantly clear that either they don’t have a clue about how to do things properly or else they don’t give a rat’s arse about anything but gouging money out of users and causing us grief.

Using the Windows operating system—as distinct from using Windows applications, many of which are superb—is like having to take photographs with a Soviet Zorki or Kiev camera when you could be using a Leica or a Hasselblad. Sure, you can take great pictures with a shitty camera but, since you’re constantly fighting the deficiencies in the equipment, there’s hardly any joy in the process. Elegance is one word that’s conspicuously absent from the Microsoft vocabulary.</rant>

Why don’t I switch? Primarily because I have thousands of dollars invested in Windows applications. Though, as I said to Brian Hunziker in an email, his screenshots “may have gently nudged me onto the slippery slope towards buying a Macintosh.”

Until then, I’ll rely on the Mojikyo Character Map to make up for the deficiencies in Windows, using Mojikyo’s RTF output to copy the character variant I need to Photoshop via Word (for some reason, the RTF output won’t paste directly into Photoshop for Windows). I only have access to all the character variants, of course, because I’ve installed the Mojikyo fonts. And, regardless of which operating system one uses, the chances are you’ll see Mojikyo font 050021 instead of Mojikyo font 079131 when I include the &#28665; Unicode entity—like this: .

That’s the reason that I’m using images to illustrate the characters—and to do that I’m taking advantage of another service offered by the Mojikyo Institute: links to 24x24 and 96x96 pixel GIF images of all the characters included in the Mojikyo character set. I’ve linked to the 24x24 pixel GIFs (in the previous paragraph), using these IMG tags:

<img src="" alt="Mojikyo font 050021" name="mojikyo_font_050021" width="24" height="24" />

<img src="" alt="Mojikyo font 079131" name="mojikyo_font_079131" width="24" height="24" />

The 96x96 pixel versions look like this:

Mojikyo font 050021    Mojikyo font 079131

and require the following links:

<img src="" alt="Mojikyo font 050021" name="mojikyo_font_050021" width="96" height="96" />

<img src="" alt="Mojikyo font 079131" name="mojikyo_font_079131" width="96" height="96" />

This means you can embed any of the Mojikyo characters in a Web page, without requiring that visitors have the Mojikyo fonts installed. (Note that the user license does not allow the GIF images to be downloaded, redistributed, or loaded onto another server.)

And, if you discover a Chinese character that is not currently contained in the Mojikyo character set, you can ask the Institute to create a new character (providing you tell them where you discovered the character).

So, to sum up, I couldn’t access the boku character that Kafū used because none of the default Windows Japanese fonts includes that particular variant. And I couldn’t display Kafū’s boku in a weblog entry because Unicode needs to preserve codepoints so that they don’t run out of permanent addresses. And I was able to find Kafū’s boku on my Windows PC with the aid of the Mojikyo Character Map because the Mojikyo Institute regards Chinese characters as “a very important cultural asset of the human race” and—like Apple—is committed to making that wonderful variety of characters widely available. The fly in the ointment is, as one might expect, Microsoft. (I’m sure Dave Rogers would agree—I was amused (though hardly suprised) when I followed his pointer to these Dan Bricklin photographs of the BloggerCon audience.)

Permalink | Comments (2)

Saturday 11 October 2003

There ain’t no such thing as plain text

I wish Joel Spolsky had published his excellent introduction to Unicode and character encoding a week earlier, because then I wouldn’t have wasted a couple of hours trying to write a snippet of PHP code to convert Japanese characters to Unicode character entities. In the fourth paragraph of The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Joel Spolsky reveals what what finally provoked him into writing his essay:

When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.

That statement knocked me for a six. Historically—as Joel Spolsky infers—American programmers have been indifferent to dealing with languages other than English. But PHP started out in 1995 as a series of Perl scripts written by Rasmus Lerdorf who was born in Greenland, lived in Denmark for much of his childhood, then spent a number of years in Canada before moving to the United States. In 1997, Zeev Suraski and Andi Gutmans—both Israelis, who between them speak Hebrew, English, and German—completely rewrote the core PHP code, turning it into what became known as the Zend engine. If anyone would be sensitive to language and character set issues, you’d surely expect it to be these guys and their colleagues.

Yet a Google search on “unicode support in php” turned up an interesting, and ultimately dispiriting, series of threads. Firstly, this reply by Andi Gutmans to an October 2001 question on the PHP Internationalization Mailing List about “the current status of multi-byte character handling in PHP, and also some kind of forecast of when it is expected to work in a stable manner”:

No one seems to be working seriously on full Unicode support except for the mainly Japanese work Rui [Hirokawa] has done. I thought that the Email from Carl Brown was quite promising but adding good i18n support to PHP will require much more interest and volunteers. It seemed that not many people were very interested.

More recently, l0t3k replied to an August 2003 question about Unicode support:

i certainly am not an official voice of PHP, but some movement is happening (albeit slow and scattered) to provide some form of Unicode support. the Japanese i18N group have recently created a path to allow the engine to process scripts in various encodings, Unicode included [1].

[1] refers to another thread in which Masaki Fujimoto reported on progress with the i18n (internationalization) features of the Zend Engine 2, adding:

yes, I know most of you (== non-multibyte encoding users) do not care about this kind of i18n features (and somehow feel ‘more than enough’) as the comments in shows, so I paid close attention not to do any harm with original codes: everything is done in #ifdef ZEND_MULTIBYTE.

What’s really dispiriting is the conversation at PHP Bugs to which Masaki Fujimoto refers, where the issues of Unicode and internationalization are met with either indifference, hostility, or—as in this question—both:

And why on earth would you save PHP files in any other format than ascii?

Color me flabbergasted. If you tried to imagine the target audience for Joel Spolsky’s essay, this guy is standing right on the bullseye. As Joel explains:

If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that “plain” text is ASCII.

There Ain’t No Such Thing As Plain Text.

If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

But why did I want to use PHP to convert Japanese characters to Unicode entities anyway? Procrastination mainly (anything to avoid my essay about the George W Bush aircraft carrier stunt). Curiosity too. While working on another essay, about Ozu Yasujiro, I wanted to make a table listing his films: their Japanese titles, translations of those titles, the actual English titles, and the year of release.

Since I have a PC exclusively devoted to Japanese (so that I can use some native Japanese applications), I wound up creating the table in Word 2000 and save the document as HTML. When I examined the HTML in Dreamweaver on my main (English) PC, I noticed that Word had transformed the Japanese characters into the equivalent Unicode entities. When I type Japanese into Dreamweaver, on the other hand, the Japanese characters simply appear within the HTML.

As an example, the next two lines of text both read Ochazuke no aji (The Flavor of Green Tea over Rice, the title of one of Ozu’s films) and, if you have Japanese support enabled, should look the same:



But, if you check the source code in your browser, you’ll see Japanese characters in the first line and Unicode entities in the second, like this:

Japanese and Unicode characters for ochazuke no aji

I might be utterly mistaken, but I can’t help thinking that using the Unicode entities might be preferable (ie more reliable) than using the actual Japanese characters. Though, as long as the character encoding is set to utf-8, it may not make any difference. I’d be interested in what anyone else thinks about this. Since I thought it would be useful to get some advice from the experts, I’ve emailed Joel Spolsky and Masaki Fujimoto. (I didn’t think there was any point in bothering Mr ASCII.)


Masaki Fujimoto and Joel Spolsky graciously replied to my email, basically confirming the points that Michael Glaesemann made in his comment. Joel Spolsky wrote that he has been using UTF-8 for all the translations of Joel on Software (currently translated into 28 languages) and “has not had a single person complain about not being able to read it”.

Whilst favoring the use of characters rather than Unicode entities, Masaki Fujimoto pointed out that entities offer two additional advantages:

  • avoiding implicit encoding translation (some software—including PHP can implicitly convert one encoding to another and using entites allows you to skip this);
  • avoiding any null-bytes problems (UTF-16 and UTF-32 can contain null-bytes, which can cause various kind of problems with unicode-unaware software.

Fujimoto-san also explained that the Japanese think of Uncode entities as a “kind of work around for the japanese-unavailable-environment” and so would never normally use entity references. He also noted out that it is not really all that difficult to make PHP completely Unicode-aware, with the main roadblocks being that:

  • because PHP does not distinguish binary data from strings it is not possible to change “string type” to “unicode-aware string type” without breaking any binary contents;
  • most of the core PHP developers live in Europe, so they are not so interested in the Unicode issue.

I have a feeling that the second roadblock will be easier to dismantle than the first, that the interest of European PHP developers in Unicode will increase proportionately with the economic influence of China.

Permalink | Comments (8)

Sunday 12 October 2003

I’m not giving up my day job (to become a PHP programmer)

Having colored myself “flabbergasted”, I now need to color myself “embarrassed” since Scott Reynen has comprehensively demonstrated that PHP does have limited Unicode support, which he uses to create his Daily Japanese Lessons. Even more impressively, Scott followed up by doing what I couldn’t manage—writing a snippet of PHP code to convert japanese characters to unicode character entities. As I admitted in Scott’s comments, “I should leave PHP coding to those who actually know what they’re doing”.

Regarding the issue of which is better—CJK characters or Unicode entities—Michael Glaesemann’s comment has convinced me beyond any doubt that it’s best to stick with the characters.

Permalink | Comments (6)

Sunday 26 October 2003

Immersion Japanese (DVD style)

Ozu Yasujiro, DVD Box SetThe first Ozu DVD box set, which arrived last week, turned out to be everything I’d hoped for and more. Six DVDs—Tokyo monogatari, Higanbana, Ohayō, Akibiyori, and Samma no aji, plus a bonus disk (Tokuten disuku)—as well as a lavishly illustrated booklet.

The box itself is covered with coarse-woven fabric, reminiscent of the background that forms the background to the opening titles of so many of Ozu’s films.

Main title for Ozu's Tokyo monogatariAs soon as I unwrapped the package I popped Tokyo monogatari into the DVD player: the distinctive opening titles combined with Saito Ichiro’s theme music prompting an intense nostalgia.

As for many other Western viewers, this was the first Ozu film I ever saw. I never tire of watching Tokyo Story—usually regarded as Ozu’s masterpiece—even though Banshun (Late Spring) remains my favorite. Tokyo Story is lovely to look at despite the fact that, as David Bordwell explains, “it does not survive in good condition: the original negative was destroyed by a laboratory fire, and the internegative struck from positive prints does not render the chiaroscuro that Ozu and [cinematographer] Atsuta sought”.

Natsuko, who doesn’t admire Ozu’s films, asked me recently what is it about them that I love so much? And how is it that I can watch the same films over and over again? I said something along the lines of “one can never grow tired of seeing the beauty and sadness of everyday life depicted with unflinching honesty”. What I struggled to convey to Natsuko, Donald Richie expresses with great economy in the preface to his book on the director:

What remains after an Ozu film is the feeling that, if only for an hour or two, you have seen the goodness and beauty of everyday things and everyday people; you have had experiences you cannot describe because only film, not words, can describe them; you have seen a few small, unforgettable actions, beautiful because real. You are left with a feeling of sadness too, because you will see them no more. They are already gone. In the feeling of transience, of the mutability and beauty of all life, Ozu joins the greatest Japanese artists. It is here that we taste, undiluted and authentic, the Japanese flavor.

Ozu Yasujiro, Tokyo monogatari, DVDRichie also points out that “Ozu’s method, like all poetic methods, is oblique”. Ozu offers a severely constrained vision of the world in order to transcend those constraints; his films are suffused with human emotion because they are so rigorously constructed; time is stretched in an Ozu film because his movies are longer than average even though they contain hardly any “story”. The cumulative effect of these formal strategies is that “technique restricted comes to make us see more, [while] tempo slowed comes to make us feel more.” The end result is that we are gradually drawn into the film and “invited to infer and deduce” its meaning which, because of the almost non-existent “plot”, resides in the characters and their behaviour. And so

…we are often given that rare spectacle of a character existing for himself alone. This we observe with the delight that precise verisimilitude always brings, and with a heightened awareness of the beauty and fragility of human beings.

This effect is not at all diminished by the lack of English subtitles. Firstly, I understand more of the Japanese than I expected to. And, perhaps more importantly, I’ve adopted the same strategy I use whenever I spend time with friends in Japan. Instead of worrying about the meaning of every word of a conversation, I content myself with absorbing the essence of what’s being said—which is much easier for them, since they don’t have to continually make allowances for my lack of fluency. As long as I behave as though I know what’s going on—which I usually do—they mostly talk to me as they would to another Japanese.

And since I’m no longer concerned with reading the subtitles I can pay closer attention to other aspects of the film and its characters. It’s strange that although I’ve done this with conversational Japanese (and now with watching films without subtitles), it wasn’t until I encountered Alaric Radosh’s advice that I realized I could apply the same strategy to reading:

When you read easy, don’t look up unfamiliar words. I mean, you can look them up occasionally when you just have to know. But, for the most part, skip those words, like you did when you were a kid reading in your native language. When you do look them up eventually, you will only understand them and remember them all the better for having become familiar with them beforehand in this way.

(I can’t begin to explain what a difference Alaric’s approach has made to my Japanese reading. Whereas I used to be fixated on learning kanji and vocabulary, I now spend much of my study time actually reading and am amazed at how many words I recognize in context, words that I would probably not have recognized in a vocabulary list.)

Bokuto kidan DVD coverWhen I was ordering the first Ozu Box Set, I thought I’d check whether Toyoda Shirō’s 1960 film version of Nagai Kafū’s Bokutō kidan (A Strange Tale from East of the River) had been released on either DVD or VHS.

Toyoda, who never achieved the reputation or critical regard accorded his contemporaries Ozu, Naruse, and Mizoguchi, was a member of the jumbungaku movement, a group of directors with an interest in filming serious works of literature. In addition to Bokutō kidan, Toyda adapted Mori Ogai’s Gan (The Wild Geese) and Kawabata’s Yukiguni (Snow Country) for the screen.

Toyoda’s Bokutō kidan was not available but, much to my surprise, Kaneto Shindo’s 1991 version has been released on DVD. I couldn’t resist and added it to the Ozu order. I haven’t had a chance to see it yet since I have a huge backlog of movies to watch: four Ozu movies, Kurosawa’s Seven Samurai, two versions of The Loyal 47 Ronin, and Takahata Isao’s Grave of the Fireflies (recommended by Language Hat). And it’s the end of the month, which means that the second Ozu Box Set has just been released.

I’ve stopped eating out, I’ve given up drinking, I no longer go to the cinema, and I’ve abandoned any hope of buying a new laptop (Macintosh or PC). All my spare cash is going towards Ozu DVDs. And I’m not sure where this is going to stop because I just discovered that there’s a Kurosawa Masterworks DVD Triple Box Set. Though, since I taped a dozen or so Kurosawa movies when they were shown on SBS, I might forget about Kurosawa and hold off for the Mizoguchi and Naruse Box Sets. That way I could look forward to drinking a couple of beers on New Year’s Eve, while I’m watching a film from the fourth (and final) Ozu Box Set.

Permalink | Comments (2)

Tuesday 28 October 2003


The sixth (Special Bonus) DVD is in my recently delivered Ozu Box Set is fascinating too, for its title just as much as the contents. As you’d expect, the bonus DVD includes:

  • Ozu Yasujiro Special Bonus DVDa profile of Ozu’s career (Ozu Yasujiro’s World);
  • an interview with German director Wim Wenders;
  • an interview with Ryū Chishū about working with Ozu at Shochiku’s Ōfuna studio (near Kamakura);
  • documentaries about the making of Tokyo Story and An Autumn Afternoon;
  • cinema previews of some of the later films;
  • footage of the press conference at which the Box Sets were announced;
  • a compilation called Ozu no fūkei: sentakumono, entotsu/denchū (Ozu’s Scenery: Items to Be Washed, Chimneys/Telegraph Poles), which contains a selection of the brief exterior shots with which Ozu punctuates his films.

Ozu Yasujiro's Scenery, Items to be washed, chimneys and telegraph poles

Ozu used interior “still life” shots for the same purpose: to separate the various sections of a film and to indicate a change in narrative direction (he stopped using fades and dissolves early in his career). Donald Richie calls these transitional shots “empty scenes”; Paul Schrader refers to them as “codas”; for David Bordwell they are “intermediate spaces”.

But the real surprise is that the bonus DVD is titled まほろば (Mahoroba), a word I’d never heard of. I asked Natsuko what it meant but it was a mystery to her too. Nor could I find mahoroba in any of my Japanese-English dictionaries, print or electronic. Even more surprisingly, it was not listed in Jim Breen’s EDICT Japanese-English Dictionary file, which currently has approximately 106,000 entries.

Yet a Google search in Japanese yields about 55,500 results whilst a search in English for its Romaji equivalent returns about 99,700 entries! The top site for both searches is Internet Mahoroba, an ISP and web hosting provider. Other Japanese results include a ski club, a band, a resort hotel, and a patisserie. Episode 6 of an anime called Iria: Zeiram is titled Mahoroba (Shangrila). And in Pinnacle—a high-level D&D campaign—there’s a character called Yuriko Mahoroba.

According to the Mahoroba Restaurant in Vernon, BC, mahoroba means:

Surrounded by mountains, and nice to live (from oldest Japanese book: Kojiki)

The Kojiki (古事記) or Record of Ancient Matters is a loose account of Japanese history from the mythical age of the gods to the reign of the Empress Suiko (592-628).

I emailed xiaolongnu and Jim Breen, asking them if they’d heard of mahoroba. xiaolongnu wrote back that it sounds like a Buddhist term:

“maho” being the Japanese pronounciation of Chinese “mohe” which translates the common Sanskrit prefix “maha” meaning “great” (as in Mahatma —” Great Soul” — Mohandas Gandhi’s epithet). I can’t make anything out of “roba” in the absence of kanji (it’s that old signal to noise problem again).

The absence of kanji does make interpretation difficult yet mahoroba seems to be spelled almost exclusively in hiragana (a search for まほろ場 yields only 14 results).

In a follow-up email, xiaolongnu picked up the reference to the Kojiki, noting that mahoroba “is associated with the notion of Yamato (i.e. an idealized homeland in the mountains)”. Some further Googling revealed that a Japanese musician named Sojiro has released an album called Mahoroba

with a theme of deep respect and understanding for the Jōmon culture and people that had high technology and strong spirit more than 5000 years ago. Examining his own roots, SOJIRO elevated the album into a worldwide work. MAHOROBA, the old Japanese word, means Utopia.

Jim Breen replied that he’d found one online dictionary (available in three locations) with an entry for まほろば. I followed his suggestion and checked Excite, where the Sanseido Daijirin J-J dictionary provides a definition confirming mahoroba as an old Yamato word meaning “surrounded by mountains”, that it was used in the Kojiki, and that it is the same word as mahorama and mahora.

Jim also found it in the Fifth Edition of the (paper only) Kenkyusha J-E dictionary:

まほろば [すぐれた場所] an excellent [a splendid, an unsurpassed] location

And he mentioned that he is adding mahoroba/mahorama/mahora to EDICT as:

まほろば;まほらま;まほら /(n) excellent location (Yamato word)/splendid place/

Tonight, as I was finishing this entry, I called another friend, Nana, who loves Japanese art and literature. I asked if she’d heard of mahoroba. She hadn’t but, as I was midway through explaining that the word was used as the title for the Ozu bonus DVD, she suddenly said, Atta! (Got it!). As I’d been speaking, Nana had been looking it up in both her Sanseido dictionaries.

As Nana explained it, one of the dictionary definitions associates maho with two kanji: (makoto: truth, reality, genuineness, a Buddhist sect originating in the 13th century) and (shū: excel, excellence, beauty, surpass). Adding the ra to maho—as in mahora—turns it into a place, she added.

“It’s an old Yamato word meaning ‘surrounded by hills or mountains’,” said Nana, “but the sense I get is that it’s beautiful and special, a kind of mythical place that’s perfect and complete.”

I remembered that in one of the Google results someone had mentioned “Arcadia” as well as “Utopia” so I looked up “Arcadia” in my electronic Oxford dictionary/thesaurus/encyclopedia.

“How does this sound?” I asked Nana. “‘A mountainous district in the Peloponnese of southern Greece. In ancient times Arcadia was the home of the god Pan and a noted centre of song and music. In poetic fantasy Arcadia is a rustic paradise, the idyllic pastoral home of song-loving shepherds.’”

Nana agreed that mahoroba had a similar connotation. A Japanese Arcadia.

I’d been wondering whether the Ozu bonus DVD had been titled Mahoroba to convey the sense that the world Ozu created in his films was a kind of beautiful, mythical place, remote from the reality of contemporary Japan. I asked Nana what she thought.

“I think once you’ve watched thirty Ozu films,” she replied, “you’ll have a better idea of what mahoroba means.”

Ozu Yasujiro in center-rear, working as an assistant-director at Shochiku's Kamata studio

Permalink | Comments (10)

Thursday 30 October 2003

Mega Memory™

Do you suffer from these symptoms?

  • Poor Concentration?
  • Short Term Memory Loss?
  • Slow Mental Ability?
  • Mental Exhaustion?
  • Mental Fatigue?
  • Clogged Mind?
  • Forgetfulness?
  • Blankness?
  • Poor Recall?

You may need Mega Memory™!

Mega Memory packageI doubt I would ever have heard of Mega Memory™ had I not mislaid my mobile phone a few months ago. Perhaps I left it in a hotel room—or it might be somewhere in my house. I was packing to go to Melbourne at the beginning of last month and couldn’t find the handset anywhere so I called Telstra to see if anyone had been using it but the customer service representative said that there hadn’t been any calls made from that phone for six weeks. I had her put a bar on the number, just in case. On Monday morning, having finally decided to replace it, I drove to the nearest Telstra shop to choose a new phone and (hopefully) a cheaper monthly plan.

On the way I tuned the car radio to 2UE so I could listen to John Laws, the thinking person’s Rush Limbaugh. I spend so much of my life in a left-liberal ghetto that if I’m in the car on a weekday between 9am and midday—I never turn on the radio at home—I like to catch up via Lawsie with what the majority of Australians think and believe. Which is how I heard the Mega Memory™ advertisement.

Since I started my crash course in reading Japanese a few months ago, I’ve been suffering this low-level anxiety about how I’ll ever remember the twelve hundred kanji and thousands of compound words that I need to know in order to read even tolerably well. On the way home from the Telstra shop—with a new Nokia phone and a ten-dollar-a-month-cheaper plan—I thought briefly about stopping at a pharmacy but decided to wait. I’ve always been skeptical about vitamin supplements—they’re only of any use if your dietary vitamin intake is inadequate and I make sure my diet is healthy and well-balanced.

But on my way back from the pool yesterday afternoon, I dropped by the local pharmacy and walked over to the vitamin section where Karen, the pharmacist, was arranging the stock on the shelves. I asked her if she had any Mega Memory™.

“We sure do,” she replied, plucking a blue packet off the shelf and handing it to me. I was instantly reassured by the picture of the brain, which seems to be pulsing with billions of easily retrievable facts. I turned the package over and read the blurb on the side:

Mega Memory™ is a blend of traditional herbs combined with a special selection of vitamins and amino acids, which help nourish and support healthy brain and memory function. Mega Memory™ may also help to improve alertness, better recall, clear your mind, enhance mental ability, help you think quicker, improve your accuracy and memory retention. Great for students, or anyone who needs to retain a lot of information in a short space of time!

“Do you think it’s actually any good?” I asked her.

“I think it might be. A guy came back after taking it for a few weeks and told me he’d started to remember all these events from his childhood. It contains Gingko biloba and Brahmi which are both supposed to enhance your memory.”

I read the blurb again: Great for students, or anyone who needs to retain a lot of information in a short space of time! That’s definitely me, I thought to myself. I need to retain a lot of kanji and compounds in a short space of time.

I looked at the Consumer Information Panel on the back and saw that Mega Memory™ also contains Schizandra chinensis and Gotu Kola, plus a dozen other ingredients.

“I’ll take it,” I told Karen and followed her back to the cash register.

“That’ll be $29.95,” she said. “Cash or credit?”

“Credit,” I replied, opening the pocket of my sports bag, only to find a $5 note and my gym membership card.

“Oh, I forgot to bring my credit card,” I explained. “I don’t like to take my wallet or a lot of cash to the pool.”

Karen burst out laughing. “You might need this more than you realize.”

Cover, Kevin Trudeau's Mega MemoryI went back and picked up the Mega Memory™ pills later that afternoon. So far, I’ve taken two but I realize it might take a month or so until I start to see the benefit.

There’s also a book, Kevin Trudeau’s Mega Memory: How To Release Your Superpower Memory In 30 Minutes Or Less A Day.

Mega Memory watchI’m trying to read less in English and more in Japanese so the chemical approach is probably best for me. But Karen is a savvy businesswoman so it might be worth suggesting to her that she do a cross-marketing deal with the bookshop two doors up the street: buy a six month supply of Mega Memory™ pills and get the Kevin Trudeau book free.

There’s even a Mega Memory watch, with an integrated 128MB flash drive and a built-in USB cable that tucks neatly into the watch band. Reliable Mass Storage Solution On Your Wrist. I could use it to back up my LexiKAN Flash Card files.

I’m feeling pretty optimistic about the whole Mega Memory concept. I’ll keep you posted on how it works out.

Permalink | Comments (6)

Sunday 23 November 2003


Thanks to Natsuko I learned some new words yesterday, including:

  • haggler (huckster, cadger)
  • wold (formerly-wooded hilly tracts in certain regions of England)
  • lath (a thin flat strip of wood)
  • black-pot (a beer mug, a toper; though I suspect, in this context, a kind of food, perhaps leftovers; no, as Language Hat explains in his comment, it’s black pudding i.e. congealed pig’s blood in a length of intestine)
  • chitterling (fried smaller intestines of a pig)
  • vamp (to make one’s way on foot; to tramp or trudge).

Natusko comes to my place for breakfast most Saturday mornings, then borrows the car for the rest of the day. It was her idea that we should help each other with our reading—hers in English, mine in Japanese—after I asked her last week to explain a sentence construction in an Akutagawa story called Hana (Nose). I had known Akutagawa only as the author of the stories upon which Kurosawa’s film Rashomon was based. But Natsuko was, of course, familiar with Akutagawa’s story of a priest with an excessively long nose, who is delighted to have it shortened only to be then disappointed by the negative response to his good fortune.

We’d agreed to start with English, which is how, once I’d cleared away the breakfast dishes, I came to be sitting at the living room table with my own copy of Hardy’s Tess of the D’Urbervilles as Natsuko, comfortably ensconced on the sofa, read from hers.

“Then what might your meaning be in calling me ‘Sir John’ these different times, when I be plain Jack Durbeyfield, the haggler.” Natsuko paused. “What’s haggler?”

“It’s normally a customer who argues to get the price of something reduced but that doesn’t make sense here.” I went to my study and came back with the dictionary, which revealed that an older meaning is ‘huckster’ or ‘cadger’.

“What’s huckster and cadger?” Natsuko asked.

I flipped through the “H” section, from ‘haggler’ to ‘huckster’, thankful that twenty-five years ago I’d bought a copy of the two-volume Shorter Oxford English Dictionary on Historical Principles.

“A huckster can mean someone who bargains or haggles but it can also mean a small trader… now, this is better. A cadger is a carrier who travels between town and country with butter, eggs, and shop-wares or someone who sells things in the street. That makes sense because Jack Durbeyfield is carrying an empty egg basket when he meets the parson.”

Natsuko continued reading. I explained the meanings of “whim”, “antiquary”, “direct lineal representative”, “Knights Hospitallers”, and “baronetcy”.

Cover of Thomas Hardy's Tess of the D'Urbervilles“I’m wondering why you chose this book,” I told her, as I was looking up “wold”.

“Well, you know I’m trying to save money,” she replied. “I already had a copy on my shelf and classics are cheaper than contemporary books because there’s no royalty to pay the author. The one I bought for you was only $7.95 at Kinokuniya but a modern book would cost about $20. Why do you ask that?”

“To be honest, I hadn’t thought about it until we struck all these words I’d never heard of. I suppose I was thinking that a more modern book might be easier to start with.”

(Although my ambition is to read Kafū and Tanizaki in Japanese, at the rate my Japanese reading skills are improving I’ll be thrilled if I can finish the Japanese translation of an Agatha Christie novel.)

“But I like this author,” she said. “When I was living in Seattle, I read Far from the Madding Crowd and it made me cry. Reading Thomas Hardy gives me the same feelings I used to have when I read Yamamoto Shūgorō as a teenager.”

Natsuko wasn’t surprised that I’d never heard of Yamamoto Shūgorō.

“You only know about literary writers,” she said, a trace impatiently, “like Kawabata and Tanizaki and Enchi Fumiko. Yamamoto Shūgorō was a taishū writer. He wrote all kinds of books—frequently about the common people but also detective and samurai stories as well as jidai-mono.”

Taishū (大衆) means “general public” and taishū bungaku is popular literature (though I imagine that Yamamoto Shūgorō is a cut or two above Agatha Christie). Jidai-mono are historical novels.

“In any case,” Natsuko added, “I think it’s better to read a classic novel. If you can understand the classics, then you can understand contemporary books. But not the other way round.”

She was right, of course. You won’t encounter too many “lath-like striplings” in a John Grisham novel. Natsuko started reading again.

“The clergyman explained that, as far as he was aware, it had quite died out of knowledge, and could hardly be said to be known at all. His own investigations had begun on a day in the preceding spring when, having been engaged in tracing the vicissitudes of the D’Urberville family, he had observed Durbeyfield’s name on his waggon, and had thereupon been led to make inquiries, till he had no doubt on the subject.”

I’d expected her to stumble over “vicissitudes” as she had over a number of other uncommon words; but, to my delight, she pronounced it perfectly.

“What is this ‘vicissitudes’?” she asked.

“It means that someone’s situation changes,” I explained, “often in an unexpected or unpleasant way. They might be doing well and then things turn bad… people often talk about ‘life’s vicissitudes’, meaning life’s ups and downs.”

Natsuko thought for a while, then said: “Like ‘the vicissitudes of George Bush’? He barely won the election, then after September the 11th he became very popular but now, with the problems in Iraq, his popularity is falling.”

“That’s pretty much it.”

“I like this word, ‘vicissitudes’,” she said. “If I use words like this, people will think I’m educated.”

“People can already tell you are educated,” I told her, “whether or not you use words like ‘vicissitudes’.”

How strange, I thought to myself, that I’d read almost all of the eighteenth and nineteenth century classic English novels but nothing by Thomas Hardy. I haven’t even seen Polanski’s movie, Tess.

Natsuko reached the end of the first chapter, put down the book, and picked up the photocopy I’d made of Akutagawa’s story—Kinokuniya hadn’t had any copies of Dondon yomeru: iro-iro na hanashi (Selected Stories for Steadily Improving Your Reading).

“Now it’s your turn,” she said.

“OK,” I replied, “Here we go… Ike-no-o no hitotachi wa, minna naigu no hana no koto o shitte ita. Sono hana wa, nagasa jū hachi senchi kurai de, sōsēji no yō na katachi o shite, kao no mannaka ni burasaggate ita.

(“Everyone in Ike-no-o knew about the distinguished priest’s nose. About eighteen centimeters long and shaped like a sausage, it dangled down the center of his face…”)

Natsuko interrupted, saying “Interesting that you chose this story, isn’t it? About the vicissitudes of a monk with a long nose.”

Permalink | Comments (9)

Friday 31 December 2004


牝狐   めぎつね (megitsune), minx

In Mori Jun’ichi’s Laundry, Teru (Kubozuka Yōsuke), a young man with mild brain damage from a childhood accident, runs his grandmother’s coin laundry where he meets and falls for a customer, Mizue (played by Koyuki, Tom Cruise’s improbable love interest in The Last Samurai).

When Mizue leaves a scarf in the dryer, Teru runs after her and returns it as she reaches her apartment block. Mizue invites him in for a cup of tea.

Teru with a cup of tea and Mizue with a cigarette, sitting on Mizue's bed

On his way back to the laundrette Teru, in a voiceover, says:

For five minutes she didn’t say a word. My heart was beating faster. It was my first time touching a woman’s hand… other than grandma’s. That night I casually told grandma about what happened. Very casually. Grandma called her a minx.

It has been a long time since I’d heard someone use the word “minx” and I was curious about its Japanese equivalent.

The word Teru’s grandmother used is megitsune. (Baa-san wa, sono hito no koto wo megitsune to itta.)

I fired up my Canon WordTank G50. Though megitsune wasn’t listed in the Japanese-English (J/E) dictionary, it was in the Kōjien (J/J):


The character is a prefix for female while means fox.

The Japanese definition (Tenjite, otoko wo damasu warugashikoi onna wo nonoshitte iu ko) means (roughly) “a derogatory term for a crafty woman who distracts or deceives men.” My (electronic) New Oxford Dictionary of English defines “minx” more broadly:

an impudent, cunning, or boldly flirtatious girl or young woman.

Whereas the Shorter Oxford English Dictionary offers this definition:

Minx. 1542. [Of unkn. origin.] 1. A pet dog. Udall. 2. A pert girl, hussy. Now often playful. 1592. b. A lewd woman —1728. 2.b. This is some Minxes token. Shaks.

I’d vaguely thought that “minx” was in some way related to an animal but was surprised to learn that it originally meant a pet dog. I was probably thinking of “manx“—a tail-less cat “believed to have originated hundreds of years ago on the Isle of Man.”

The New Oxford Thesaurus of English provides these alternatives for “minx,” most of which seem unsatisfactory in that they emphasize the sexual at the expense of the deceitful:

tease, seductress, coquette, trollop, slut, Lolita, loose woman, hussy; informal tramp, floozie, tart, puss; Brit. informal scrubber, madam; N. Amer. informal princess, vamp; vulgar slang cock-teaser, prick-teaser; archaic baggage, hoyden, fizgig, jade, quean, wanton, strumpet.

Nowadays an “impudent, cunning, or boldly flirtatious” young woman might simply be seen as assertive so perhaps the word has fallen out of favor. That seems a pity. There’ll never be a shortage of deceitful young women (nor of men eager to be deceived) and, in any case, “minx” has always struck me as a word whose lighthearted sound matches its meaning—a young woman whose sly behavior rarely has serious consequences.

Permalink | Comments (5) | TrackBacks (0)

Wednesday 19 January 2005


The first sentence of Seymour Hersh’s current New Yorker article, The Coming Wars, caught me by surprise:

George W. Bush’s reëlection was not his only victory last fall.

Not that I believed that Bush had enjoyed only a single victory in the autumn of 2004. Rather I was astonished to see the “e” with an umlaut in the English word “reëlection.”

Except in this case it wasn’t an umlaut but a diaeresis, whereas I had always assumed that the two were synonymous. Not so, according to the Wikipedia entry for umlaut:

In linguistics, the process of umlaut (from German um- “around”, “transformation” + Laut “sound”) is a modification of a vowel which causes it to be pronounced more to the front of the mouth to accommodate a vowel in the following syllable, especially when that syllable is an inflectional suffix. This process is found in many—especially Germanic—languages.

For example, the German noun Mann (man) with the a pronounced as in English “father” (but short), becomes Männer [m’En@r, m’En6] in the plural, with the ä pronounced like the e in “edit”, a front vowel sound that is assimilated to the vowel in the -er suffix.

The word is also used to refer to the diacritical mark composed of two small dots placed over a vowel ¨ to indicate this change in German. A similar mark is used to indicate diaeresis in other languages, but the umlaut dots are very close to the letter’s body in a well-designed font, while the diaeresis dots are a bit further above—in computer screen fonts the difference is usually not noticeable, but in printed material it is.

Whereas, regarding the diaeresis, the Wikipedia says:

In French, Greek, and Dutch, and in English borrowings from them, this is often done to indicate that the second of a pair of vowels is to be pronounced as a separate vowel rather than being treated as silent or as part of a diphthong, as in the word naïve or the names Chloë and Zoë. Welsh also uses the accent for this purpose, with the diaeresis usually indicating the stressed vowel. French also uses diaeresis over “i” [and “e”?] to indicate syllabification in, for example, Gaëlle and païen. It is called trema or deelteken in Dutch, tréma in French.

The diaeresis is also occasionally used on native English words for the above purposes (as in “coöperate”, “reënact”, and the surname “Brontë”), but this usage has become very rare since the 1940s. The New Yorker magazine is noted as one of the few sources that still spells “coöperate” with a diaeresis.

Mystery solved! In the Seymour Hersh article, the diaeresis is used not only in the word “reëlection,” but also in “preëmptive,” “coördinate,” and “coöperation.” Interestingly, “cooperating”—as in “Most have been cooperating in the war on terrorism”—appears without a diaeresis, which suggests either an editorial error or that the diaeresis is not used in a present participle. (Perhaps Language Hat can clarify this apparent inconsistency or advise whether he is aware of a different New Yorker rule that specifies the use of a diaeresis when “cooperating” appears as a gerund.)

Why am I interested in this? Because for a long time, I used the umlaut/diaeresis or the circumflex as a substitute for the macron ¯, to indicate long vowel sounds in Romanized Japanese:

A macron (from Gr. μακρός makros “large”) is a diacritic ¯ placed over a vowel originally to indicate that the vowel is long. The opposite is a breve ˘, used to indicate a short vowel. These distinctions are usually phonemic.

In modern Old English transliterations, the macron has been used in this way. In Latvian it is also used to indicate long vowels. In Hawaiian (where it is known as the kahakō) it is again used to indicate long vowels, which in turn influence the placement of accent stress in words. Early writing in Māori did not distinguish vowel length. Some have advocated that the double vowel orthography be used to distinguish vowel length. However, the Māori Language Commission (Te Taura Whiri) advocate a macron be used to designate a long vowel. The use of the macron is now wide spread in modern Māori writings, though many people use a diaeresis mark instead (e.g. Mäori instead of Māori) due to lack of support on computers.

It is also used in many dictionaries and textbooks to mark vowel length in languages that do not feature this diacritic in everyday use; for example it is used in the Hepburn transcription of Japanese to indicate a long vowel, as in kōtsū (交通) “traffic” as opposed to kotsu (骨) “bone” or “knack (fig.)”.

As it happens, I first found the Unicode character entities for the various macrons when a Google search for “macron characters” led me to this Māori macron characters in XHTML page. I was absolutely delighted since I had never been happy using the umlaut/diaeresis or circumflex and I’ve always hated the wapuro style—rendering ö as ou and ü as uu, which is how one enters long vowels into a Japanese word processor (wapuro) or an input method editor on a computer with CJK support.

But, back to the umlaut, whose most fascinating use is revealed by the first result in a Google search on “umlaut”—the Wikipedia entry for Heavy metal umlaut:

A heavy metal umlaut is an umlaut over letters in the name of a heavy metal band. Umlauts and other diacritics with a blackletter style typeface are a form of foreign branding intended to give a band’s logo a tough Germanic feel. They are also called röckdöts. The heavy metal umlaut is never referred to by the term diaeresis in this usage, nor does it affect the pronunciation of the band’s name.

The entry goes on to explain the history of the heavy metal umlaut, its use in popular literature, and other usages of diacritics in band or album naming, thus demonstrating one of the things I love most about the Wikipedia: a scholarly attention to detail applied to an arcane or trivial aspect of everyday life:

At one Mötley Crüe performance in Germany, the entire audience started chanting, “Moertley Creuh!” Queensrÿche frontman Geoff Tate stated, “The umlaut over the ‘y’ has haunted us for years. We spent eleven years trying to explain how to pronounce it.”

If anyone tells you they regard the Wikipedia as lacking authority, just point them to the Wikipedia entry for Heavy metal umlaut.

Permalink | Comments (5)

© Copyright 2007 Jonathon Delacour