Tuesday 04 February 2003

lang=”ja” and the attribute selector

Following up on a comment Sean Conner made on my post Archive organization re-viewed, I stumbled upon a fascinating post titled I’m turning Japanese, I think I’m turning Japanese, I really think so, which deals with how certain browsers treat the lang attribute:

JeffK mentioned that over the past few days, when he views The Boston Diaries his browser asks if he wants to download and install Japanese language support.

Sean traced the “problem” to an entry he’d written that included the Japanese words, dojinshi and manga, which—rather than simply slapping some <i> tags around them—he’d treated semantically:

The <SPAN LANG="ja" TITLE="fan art">dojinshi</SPAN> market .... <SPAN LANG="ja" TITLE="comic book">Manga</SPAN> publishers ...

I’ve followed a similar practice, though not as elegantly as Sean. For example, my recent entry, Tansu, includes:

A <i xml:lang="ja" lang="ja">tansu</i> is a Japanese chest of drawers or a cabinet with deep drawers at the bottom.

Note that I’ve added the lang attribute to the <i> tag, whereas Sean includes it in a <span> tag wrapped around each Japanese word.

He neatly explains the browser’s request to install Japanese language support:

Since I seem to already have the Japanese language support installed I didn’t notice anything odd when I loaded the page to proof read the entry. But it seems that other browsers that don’t have the Japanese language support saw the language attribute for “Japanese,” realized they weren’t installed, so decided to ask the user if it was okay to install Japanese language support. But I’m using an Anglicized spelling for a Japanese word so there’s no real need to download Japanese language support for what I used, so how do I get around that?

Sean figured out a workaround: “fudging it… by using lang="x-ja" which is allowed (any language code starting with “x” is for private use).”

I also have Japanese language support enabled on all my computers so I wouldn’t have noticed a request to install it when I checked my post in a browser. I’m wondering if anyone who read that post encountered a similar request to install Japanese language support. Or is my inclusion of the xml:lang="ja" element acting as an auxiliary fudge.

But that’s not all. In Sean’s original post, the words dojinshi and manga appear italicized, yet there’s no class attribute within the <span> tag—though this is how I would italicize the text, as in:

<span class="lang-attr" lang="ja" title="fan art">dojinshi</span>


.lang-attr {font-style: italic;}

Instead, Sean’s stylesheet contains the following declaration:

span[lang] { font-style: italic;}

Sean Conner is using an attribute selector! How cool is that? I’d never even heard of attribute selectors but there they are in Eric Meyer’s Cascading Style Sheets 2.0: Programmer’s Reference:

X[attr]  Selects any element X with the attribute attr.

X[attr="val"]  Selects any element X whose attribute attr has the value val.

X[attr~="val"]  Selects any element X whose attribute attr contains a space-separated list of values which includes val.

X[attr|="val"]  Selectes any element X whose attribute attr has a value which is a hyphen-separated list that begins with val.

I can see myself putting attribute selectors to good use from now on, thanks to Sean. And I’m still curious about whether I can trigger a request by your browser to install Japanese language support by mentioning kimono, sushi, geisha, haiku, and anime.

Permalink | Technorati


Attribute selectors are a fantastically useful addition to CSS, so naturally IE for Windows doesn't support them at all :/

Posted by Simon Willison on 5 February 2003 (Comment Permalink)

A combination of attribute selectors and CSS-generated content: http://www.royal-ts.de/mtarchives/000917.php

Posted by RoyalTS on 5 February 2003 (Comment Permalink)

Thanks for pointing out the problem with 'lang="jp"'. I'm going to face the same problem when I launch the new version of my site.

Now, writing 'lang="x-jp"' is no solution, since "x-jp" does not mean "Japanese" (it means nothing). So if you do that, you might just as well ditch the whole "lang" attribute.

There is perhaps an alternative way to avoid this problem, if you use XHTML.

In XHTML we're to use both "lang" and "xml:lang", if we serve the pages as "text/html" (hoping that old browsers can understand the code). If we're bleading-edge however, we'll serve the XHTML as true XML with the media type "application/xhtml+xml" (or "text/xml"), and in that case we should use "xml:lang" only.

This all leads up to the idea that we could use _"xml:lang" only_ for languages such as "jp" that could trigger this problem with downloading of language packs. I'm quite sure that MSIE (probably the only browser doing this) won't understand 'xml:lang="jp"' (this needs to be tested, of course...), and hopefully when some day it does understand it, it will have remedied this bug (I wouldn't bet on it though...). And so, hopefully, the needless bother with the language pack download will be avoided.

So we could write e.g. <span xml:lang="sv" lang="sv">Hej!</span> for Swedish content, but make do with <span xml:lang="ja">Sayonara</span> for Japanese.

That would not be altogether in line with the HTML compatibility guidelines of the XHTML 1.0 recommendation, but I think it can easily be excused, and the code will be perfectly valid, and the meaning will be "this is in Japanese", and good modern browsers will understand that.

Posted by Bertilo on 5 February 2003 (Comment Permalink)

You should really be using {cite lang="ja"} and not span or i. You're citing the foreign term by using it. And yes, this is vegan dog food I do in fact eat myself. I just wrote cite lang="mk" yesterday.

Posted by Joe Clark on 5 February 2003 (Comment Permalink)


That's not a canonical use of cite. Here's the example from the W3C:

As <CITE>Harry S. Truman</CITE> said,
<Q lang="en-us">The buck stops here.</Q>

I also disagree that using a loanword is a "citation" of it at all.

Frankly, I think this case is exactly what the <i> tag was invented for: loanwords are italicized in English. Until we come up with a special "loanword" tag, that's it.

On a philosophical level, it's kind of funny that using the lang="ja" attribute brought up the Japanese pack install issue, even though the word is rendered in romaji. Yes, it is a Japanese word, and I suppose that the tagging is appropriate, in an abstract sense. In a practical sense, I suspect those tags were really intended to bracket those funny-looking not-English squiggles.

Posted by Adam Rice on 5 February 2003 (Comment Permalink)

OK, I wasn't going to say anything, just keep my ignorance to myself, but since Adam Rice brings it up, I feel emboldened to ask: what's wrong with just using italics? What do you gain with all those complicated spans and langs and attributes? My HTML is very primitive, and I know I'll never use all that stuff; I'm just curious why you do. Doesn't the reader just see the word in itals anyway?

Posted by language hat on 5 February 2003 (Comment Permalink)

I apologize first for my english (not quite good)

What about using the tag dfn instead of a span.
All the examples are in fact definitions of japaneese words so is this possible:
dojinshi etc....
Don't know if this is legal?

Posted by spoutnik on 5 February 2003 (Comment Permalink)

Well, it buys me a way to embed the meaning of the word as well, using the TITLE attribute; MS-IE and Mozilla will display that as a tooltip (a small text pop-up) when you hover the mouse over the word. The reason for using the LANG attribute is a bit more obscure, but I found reference IBM has a screen reader that will use the LANG attribute to properly inflect the word. Capisca?

Now, as to which tag to use? It never occured to me to use for this---I just went ahead and used and tossed in the LANG attribute. Go figure ...

Originally in my style sheet, I used:

*[lang] { font-style: italic; }

And Mozilla handled it as I expected (since it seems to only apply the style sheet starting with ) but IE had italicized everything, since I had started the page with:

and IE was applying the entire style sheet starting with , not with . I wasn't interested in figuring out who was following the standards, but I thought adding:

*[lang="en"] { font-style: normal; }

would fix that (as according to my understanding of CSS, the more specific style wins) but nope---IE was still italicizing the entire page. So I switched to

span[lang] { font-style: italic; }

since the only tag I had used with the LANG attribute was . But now IE (at least 5.0) isn't italicizing s with the LANG attribute but I'm not going to fight Microsoft since I'm now using Mozilla almost exclusively.

Posted by Sean Conner on 5 February 2003 (Comment Permalink)

Um ... I guess I get this for not previewing, but when it says “HTML is not allowed,” I expected it to turn <SPAN> into &lt;SPAN&gt; ... not to actually remove the tag entirely ...

My mistake ...

Posted by Sean Conner on 5 February 2003 (Comment Permalink)

I can confirm that when I view this site now Phoenix prompts me to install Japanese language support.

Posted by john on 5 February 2003 (Comment Permalink)

Same with Mozilla, which makes sense, I guess.

Posted by David on 5 February 2003 (Comment Permalink)

I agree on Joe Clark, that CITE should be used instead of SPAN or I on references like these. At least, the first time they appear. Using a SPAN afterwards would be okay. That "loanwords are italicized in English" is purely a matter of style (not just English style, but CSS style as well).

And Joe, at C, we're waiting for you.

Posted by Kris on 5 February 2003 (Comment Permalink)

Oh... Phoenix and Mozilla do it too... On Windows, of course, not on Linux. So it's a Windows thing... And that ruins my solution with "xml:lang" since Phoenix and Mozilla understand that attribute perfectly. :(

Anyway, lots of comments here confuse things badly. A foreign word can be most anything inside an HTML page. It does not become "cite" or "dfn" just by being foreign. If it actually is a citation, of a defining instance, then do mark it up as such - and put 'lang="whatever"' (and/or 'xml:lang="whatever"' if you use XHTML). But the _semantic role_ of the foreing word is a completely different thing, and that has nothing to do with its being foreign. If its just another word in the text, and has no special meaning for which there is HTML markup, then we need to use a "span" element in order to have somewhere to put the "lang" (and/or "xml:lang") attributes.

Posted by Bertilo on 5 February 2003 (Comment Permalink)

This discussion is now closed. My thanks to everyone who contributed.

© Copyright 2007 Jonathon Delacour