XHTML and the sixth day of creation

12 March 2001

On the first day God created SGML as a way of structuring documents so that they would have something to live up to. (Any resemblance of God to Charles Goldfarb or any other creature is unintentional.) Tim Berners-Lee was shown SGML and saw that it was good but waaaay too complex.

So, on the second day, Berners-Lee created HTML and saw that it was good and actually usable. Because HTML had a fixed and determined set of elements (paragraphs, headings, bulleted lists, etc.), a browser could look at HTML and know what to expect. Thus was the world of documents made simpler. Much simpler. Much much simpler. Too simple. Over-simplified. And inflexible.

So, on the third day, Jon Bosak, Tim Bray, Michael Sperberg-McQueen and some others created XML which, like SGML, enabled page designers to create their own types of elements and their own pre-defined document types. And they looked at XML and said that it was way cool and just what we need, for XML documents can be validated against their document type definitions (DTDs) and can be structured so that a machine can read them and know which piece of text is a part number and which is a dollar amount.

On the fourth day, the world looked at how the Web was developing and looked at XML and saw that maybe they needed something more. XML was unwieldy for some of the non-PC applications that were getting plugged into the Web -- cable boxes to refrigerators -- and that XML stuff was still pretty hard to do. Plus, XML isn't backwards compatible with the older Web browsers. Even HTML, because it's so flexible and people write it so sloppily, requires multi-megabyte interpreters (called "browsers") to be understood.

And so, on the sixth day (on the fifth day everyone downloaded everything they could before Napster was shut off), XHTML was created. XHTML is compatible with HTML 4, so if you develop your pages using it those pages will still work in browsers that aren't so old that they choke on javascript. And, of course, XHTML can be read by anything that can read XML, for it is technically an XML document specification. While XHTML is less flexible than the XML it's written in (for it has a fixed tagset), it's a stricter disciplinarian than HTML; browsers are currently happy to read even the sloppiest of HTML pages, but to be a valid instance of XHTML, authors have to remember to do things like match all their tags with the appropriate end-tag, only use lowercase for the tag names, get tag-nesting right and put attribute values into quotes. The free ride is over. But this discipline is required if Web pages are going to be read by low-wattage applications like household appliances. (Yes, our toaster-ovens are now setting the tune.)

There are three initial basic types of XHTML documents. "Strict" is a minimal set of tags. "Transitional" will let you do all your fancy-ass HTML formatting tricks. "Frameset" is for the loser pages that use HTML frames. You can use style sheets with any of these, thus regaining formatting capabilities such as that have been stripped out. Will XHTML replace HTML? While some tag junkies may think so because they believe the universe is ultimately rational, there's not a chance in hell if only because the first browser that refuses to show you an HTML page because it's not properly done in XHTML is the browser you'll throw off your desktop. But XHTML is very likely to become the standard for people creating Web pages for a living, for it adds enough rigor to make their work reusable both by a wide range of devices and by computing applications trying to make sense of the loveable mess we call the Web.

Resources:

General XHTML and reference:

http://www.xhtml.org/

http://www.w3.org/TR/xhtml1/

http://www.wdvl.com/Authoring/Languages/XML/XHTML/dif.html

XML:

http://www.xml.com

http://www.xml.org

David Weinberger is editor of Journal of the Hyperlinked Organization