Archives - Strategy


7 September 2000 - Three flavors of XHTML 1.0

XHTML 1.0 defines its task as "A Reformulation of HTML 4 in XML 1.0", and in achieving that it continues the precedent set by HTML 4 of having three different definitions of the language. All three definitions can be used to create valid XHTML, as each has its own Document Type Definition (DTD).

The Transitional DTD is probably the closest to HTML as commonly practiced on the Web. It includes a full range of formatting-oriented markup and supports the target attribute for linking between frames.

The Frameset DTD provides the markup needed to build frame-based sites, like the frameset, frame, and noframes elements. The Frameset DTD is intended for documents containing frames, not documents which appear inside of frames but don't contain frames themselves. Otherwise, it is very much like the Transitional DTD.

The Strict DTD represents XHTML the way the W3C would like to see it. Deprecated elements (like isindex) have been removed, formatting elements and attributes (like the font element and the align attribute) stripped, and all support for frames (including the target attribute) removed. Formatting and presentation are largely left to Cascading Style Sheets (CSS).

Moving forward into XHTML 1.1, it looks like the W3C is going to use the Strict DTD as its foundation, though they've gone to the trouble of creating modules representing the features the Strict DTD lacks. When XHTML 1.1 comes out, this may be a powerful motivation for learning more about XHTML Modularization.

An even simpler version of XHTML, XHTML Basic, strips down the XHTML 1.1 vocabulary to a minimum level for communicating in environments where full support for XHTML may not be available. This could include cellphones, PDAs, embedded browsers, or even simple XML programs that need to reuse simple textual markup from XHTML.

Developers can choose which version of XHTML to use on a document-by-document basis. Most sites converting from legacy HTML will probably find it easiest to move to the Transitional DTD. Sites using frames will like have to use the Frameset and Transitional DTDs. Developers who want a head start moving toward XHTML 1.1 - and can live without support for frames - can use the Strict DTD.


11 September 2000 - XML's perspective on information

For all of its limitations, HTML has done a remarkable job of presenting information in a form that both users and content developers can understand. Everything is a document, and described in terms of a document - text, headings, images, tables, etc. While the markup might be a little obscure sometimes, and can certainly get obfuscated, the general structure of documents made it fairly easy to figure out how to put information into an HTML document, and left an enormous amount of room to designers to present information in creative ways.

XML is a tool for creating documents, but these aren't necessarily documents in the HTML or paper senses of the word. XML documents are a series of characters that contains structured and labeled information, with a well-defined beginning and end. While these 'documents' can be used to convey traditional document-like structures, they can also convey programming object structures, database tables, lists of information, and nearly any other structure you can create with a computer.

XHTML takes advantage of XML's orderly structures and clear labels to clean up HTML a bit, but it's also preparing the way for a world in which developers - both application developers and Web document developers - can include their own structures and labels within an HTML framework, or even include HTML's structures and labels within whatever XML framework they come up with.

Developers moving to XHTML from HTML may want to take a closer look at the possibilities XML opens up, and consider how they might like to extend the familiar HTML vocabulary for their own projects. Creating a vocabulary doesn't magically create applications capable of processing that vocabulary, but it does make it possible to build more interesting applications that go beyond the traditional (perhaps eventually even legacy) 'Web browser'.


9 October 2000 - Finding your way among XHTML specs, Part I

The W3C has created a number of different Recommendations which rely, cross-reference, and influence each other. Developers trying to work with XHTML may find that they need to learn Cascading Style Sheets (CSS) as well, while some developers may need to extend their understanding of the Document Obect Model (DOM), and others may be exploring Namespaces in XML.

In this two-part tip, we'll start by looking at XHTML 1.0 and its supporting specs, and then move on in the next tip to XHTML 1.1 and the new features and specs developers may (or may not) need to learn.

It's possible to use XHTML 1.0 exactly the same way as HTML 4.0. Most HTML developers never learned to read a Document Type Definition (DTD), and built documents using their understanding of document structures gleaned from tools, experience, and reference material rather than by reading the formal outlines presented in the HTML 4.0 DTDs.

Similarly, there's no requirement that developers understand DTDs for them to use XHTML 1.0. While strictly conforming validating XHTML processors will check the document structures against those DTDs, developers can base their document structures on reference- material that is friendlier to humans and included in many XHTML books.

Developers do need to understand the rules for making their XHTML documents into well-formed XML. Nesting tags properly and using the empty tag syntax appropriately will take care of most of these needs, and the XHTML 1.0 specification is a fairly self-contained description of them.

It's probably a very good idea for developers planning to make the transition to XHTML to become acquainted with Cascading Style Sheets, if they haven't already. CSS gives developers very flexible control over how their information is presented while requiring very little modification of the (X)HTML document itself. Developers planning projects that will adhere to the Strict DTD will probably need to use CSS if their formatting plans go beyond the extremely basic.

Developers already using the Document Object Model (DOM) will find themselves at home in XHTML, which uses the same DOM features as HTML. DOM programmers can also apply their skills to generating and manipulating XHTML on the server, using XML parsers to read and modify documents before sending them to users. Many of the DOM implementations (notably Internet Explorer 5) currently include extensions which may not work in strictly-conforming environments, but the basic rules and structures are the same.

The DOM also comes in multiple levels (1, 2, and 3 so far) and is being broken into modules. DOM Level One covers an XML Core and HTML Extensions, while DOM Level Two covers far more and has been broken into modules. DOM Level Three is still getting started, but it finally addresses the key issues of loading and saving documents from a DOM tree. Developers can pick and choose which of these pieces they need, though only DOM Level One has widespread implementation at present.

Although XHTML 1.0 raises (in Section 3.1.2) the possibility of mixing different XML vocabularies (like MathML, SVG, or SMIL) with XHTML, and notes the use of XML Namespaces to identify different vocabularies. These multi-vocabulary documents are not strictly conforming, however, and it will likely take XHTML Modularization's arrival to make them generally useful. Developers who want to get a head start on vocabulary-mixing may want to take a look at the Namespaces in XML specification, but most developers can stick with XHTML 1.0's single default namespace declaration.

Developers who want to move forward into mixed vocabularies (or create their own vocabularies) have a lot more to deal with, as we'll explore in the next tip.


13 October 2000 - Finding your way among XHTML specs, Part II

While figuring out XHTML 1.0 can be fairly difficult for many developers, XHTML 1.1 and its likely successors demand a lot more learning, at least for the core of developers who want to take advantage of its capabilities for extending the vocabularies used in XHTML documents.

XHTML 1.1 still uses the familiar HTML vocabulary, though the main thrust of XHTML 1.1 moves far more deeply into XML and its promise of Extensible Markup Language. While XHTML 1.0 described itself as "Extensible Hypertext Markup Language", it did very little to live up to the promise of extensibility, leaving that to non-standard implementations (described in 3.1.2) and future drafts.

The W3C itself has at least three specifications that are prime candidates as extensions to XHTML: MathML, which defines markup for representing mathematical equations; SMIL, the Synchronized Multimedia Integration Language; and SVG, Scalable Vector Graphics, which allows developers to describe images as vectors rather than as bitmaps. All three of these languages are built on an XML base which can be easily mixed with XHTML.

Making this integration work requires the tools provided in Namespaces in XML, the DTDs of XML 1.0, and eventually XML Schemas. These standards are not well-known for their ease of use. Namespaces remain burdened with controversies, DTDs are criticized as unnecessarily complex for their capabilities, and XML Schemas have both fans and opponents.

These heavy requirements may require that Web developers fragment into two classes of markup creators those whose work sticks to established vocabularies, using documented combinations of XHTML and other vocabularies, and those who create their own vocabularies. The second group of developers will need to know the ins and outs of vocabulary development as well as the integration mechanisms described in Modularization of XHTML, while the first may stick with the definitions provided by XHTML 1.1 and XHTML Basic .

Eventually, XHTML may also come to include XLink and XPointer as critical specifications for creating hypertext links. For now, XHTML developers may continue to treat them as exciting curiosities rather than critical tools, but XHTML development will likely come to include far more than today's HTML.


20 October 2000 - What to convert to XHTML first

While developers are starting to poke at XHTML, most existing Web sites already have enormous quantities of information stored in HTML which may or may not be XHTML-friendly. Worse, pages generated by CGI, ASP, or other scripting technologies may not be easy to change. Where should developers start?

In general, there are two rules that can provide some guidance. Content that may need to be delivered in multiple formats can take enough advantage of XHTML to justify the cost. Some content is simply easy to convert to XHTML simple documents where Tidy can do all of the work, or new documents where using XHTML from the start inflicts less cost than conversions from older HTML.

The biggest advantage XHTML 1.0 gives developers is the ability to repurpose content using XML tools - you can, for instance, create transformations from XHTML to Wireless Markup Language (WML) using XSLT stylesheets or DOM scripts. If you have documents for which delivery in multiple formats is critical, you may want to consider storing them in XHTML, or even generating XHTML from custom XML vocabularies.

Converting documents because they're 'easy' is pretty subjective, since everyone seems to have a different definition of 'easy.' Simple HTML, where layout is mostly headlines and paragraphs, can generally be converted with very few blips. Complex layouts using tables, gifs, and the occasional creative whitespace hack can be much more difficult, especially if you feel constrained to preserve the exact look of the original document in every browser environment it served. (Recent browsers make XHTML work much easier.)

New projects can benefit immediately from XHTML's cleaner structures. The discipline of cleanly nested structures will have an effect on the level of discipline in code used to generate documents, and may help Web developers build more maintainable projects. (Yes, I know that's likely a dream.) Developers using client-side scripting to build dynamic HTML sites will also find XHTML easy to work with. Many of these developers have already adopted some of XHTML's strictures to mark structures more precisely.

Even projects which won't benefit immediately may want to transition to XHTML for new work - if nothing else, it means that future transitions should be much simpler.


For more information about this list, visit the main page.

Copyright 2000 by Simon St.Laurent