Tuesday 18 June 2002

Accessibility tip 01: DOCTYPE

Seems like grammar is in the air. Mark Pilgrim’s first accessibility tip begins:

You start your sentences with a capital letter; start your HTML with a DOCTYPE. It’s just basic grammar.

Well, I’m off to a promising start. All my Movable Type templates begin with a DOCTYPE:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

Since I write most of my posts in Dreamweaver MX, I’ve noticed that the default Dreamweaver XHTML document actually starts with the line:

<?xml version="1.0" encoding="iso-8859-1"?>

(I think I removed this because it could mess up the RSS Auto-discovery mechanism. Or something. In any case, the character encoding is specified in a meta-tag in the document HEAD.)

And Dreamweaver’s HTML tag reads:

<html xmlns="http://www.w3.org/1999/xhtml">

whereas mine says:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

No doubt, someone will explain whether the added lang=”en” attributes are necessary.

Uh-Oh! Here’s trouble!

After reading the W3C’s explanation of the various flavors of markup, I’ve realized I didn’t have to settle for XHTML Transitional.

XHTML 1.0 Strict - Use this when you want really clean structural mark-up, free of any tags associated with layout. Use this together with W3C’s Cascading Style Sheet language (CSS) to get the font, color, and layout effects you want.

XHTML 1.0 Transitional - Many people writing Web pages for the general public to access might want to use this flavor of XHTML 1.0. The idea is to take advantage of XHTML features including style sheets but nonetheless to make small adjustments to your mark-up for the benefit of those viewing your pages with older browsers which can’t understand style sheets. These include using BODY with bgcolor, text and link attributes.

But, before changing the DOCTYPE in my templates, I thought I’d check that my weblog index page still validated as XHTML Transitional. It doesn’t (although it used to). The W3C’s validator found dozens of errors, all related to links to other sites.

For example, in my previous post I included Steve Himmer’s link to the Amazon search results for “Georges Perec.”

<a href="http://www.amazon.com/exec/obidos/search-handle-url/index=books&field-keywords=georges%20perec&bq=1/103-7928379-7451003">

The validator returned four errors associated with this URL:

I understand that (quoting A List Apart) “DOCTYPEs are a key component of compliant web pages: your markup and CSS won’t validate without them” and “DOCTYPES are also essential to the proper rendering and functioning of web documents in compliant browsers like Mozilla, IE5/Mac, and IE6/Win.”

But if a link to a page at Amazon causes my weblog page not to validate, what’s to be done?

<later>Answer: escape the ampersands.</later>



I can't help with the lang="en" bit, but a good reason for leaving out the xml version line is that it drops IE6 out of standards mode and back into quirks mode - IE needs the first line of the document to be a doctype if it's going to use standards mode. This can be quite useful in that it lets you create standards compliant pages but still have IE6 render them in quirks mode (should you want to do such a thing).

Posted by: Simon Willison on 18 June 2002 at 09:16 AM

With those errors, you're pages aren't just invalid, they're malformed. This means that even non-validating XML parsers will fail to parse them.

Since the ampersand is the character that starts an entity or character reference in XML, you have to escape the characters that you want to be treated as literal ampersands in your document.

Change your link to look like this:

<a href="">">http://www.amazon.com/exec/obidos/search-handle-url/index=books&amp;field-keywords=georges%20perec&amp;bq=1/103-7928379-7451003">

Note that when I typed that in, I had to escape the ampersand twice is a sense to get it to appear the way that you need to type it in.

This FAQ might help if you have more questions:


Posted by: Jason Diamond on 18 June 2002 at 11:05 AM

It's actually all part of my secret diabolical plan to ruin your chances of validations.

Enron My employers like to call it 'Rolling Invalidations'.

Posted by: steve on 18 June 2002 at 11:29 AM

Simon, thanks for the explaining why one should remove the XML line.
Jason, thank you too, particularly for pointing me to the Ampersands for URLs explanation. What made it really clear was his statement that "the document is interpreted before it is acted on."

The thing you have to keep in mind is that the document is interpreted before it is acted on. This is true for XML and HTML. When <a href="http://foo/bar?field1=hello&amp;field2=world"> is in the HTML, it is interpreted to mean
an "a" element with an "href" attribute having value "http://foo/bar?field1=hello&field2=world"
and this is later used to construct an anchor that, when activated, will take the user to http://foo/bar?field1=hello&field2=world.

Steve, thanks for nothing. ;-)

Posted by: Jonathon Delacour on 18 June 2002 at 12:36 PM

Anytime. As they say in 'Porgy and Bess', I've got plenty of nothing.

Well, they say nuttin', but I don't.

Posted by: steve on 18 June 2002 at 09:22 PM

On lang="en":

HTML added the lang attribute to the tag long before XML came around. When XML came around, internationalization was a big deal in its construction, so it explicitly created an xml:lang attribute that would work not only in (X)HTML, but in *any* XML vocabulary.

The xml: prefix (as well as any prefix beginning with "xml", "XML", or any mixed-case flavor of those letters) is reserved for the XML working group on the W3C to create such XML-wide attributes. (xmlns: is another such.)

So doing both is another case of belt and suspenders. Non-XML-grokking HTML engines will grab the lang attribute; XML-grokking engines can grab either.

Posted by: Dorothea Salo on 19 June 2002 at 12:03 AM

This discussion is now closed. My thanks to everyone who contributed.

© Copyright 2002-2003 Jonathon Delacour