“Why is your website HTML 4.01?” (9th May 2006)
Sean Fraser from Elementary Group Standards e-mailed me that question. Here’s my reply, as I sent it. Following is my lengthy comment on 456 Berea Street, on a page which raised much awareness about Sean’s survey of 50 ‘elite’ standardista website.
Response to Sean Fraser
Hi Sean, thanks for your interest I’ll be happy to answer questions on this subject.
It takes me a long time to write articles because I have to fit them in around my day job I’m not a professional blogger!
Site Surgeon actually uses XHTML 1.1 sent as
application/xhtml+xmlto devices which support it Devices which do not mention that MIME type in their HTTP Accept header get an HTML 4.01 Strict document The script doesn’t check “
qvalues” and has potential bugs, but is available here:
It always sends the XHTML 1.1 version to the W3C validator.
I apply this technique on Site Surgeon to demonstrate the (lack of) differences between correctly authored XHTML and the equivalent HTML It also shows clients that I can work with modularised XHTML if they need to In truth, HTML 4.01 Strict would be the better choice for this website since it only uses the elements and attributes available in HTML.
I do use HTML exclusively on my other websites, though:
I do this because HTML performs slightly better than XHTML when sent as
text/htmlXHTML 1.0 “may” be sent as
text/htmlbut it has:
- A slightly longer
- Extra baggage in the form of XML specific attributes such as
xml:langThese are ignored by HTML devices, so are dead weight.
- The “space-slash” character pairs are added to elements which do not need to be closed in the
text/htmlenvironment More dead weight.
Item 2 and 3 are treated as invalid markup by HTML user agents These slightly slow rendering times as the user agent must perform some (fairly simple) markup corrections They add filesize, too.
text/htmlenvironment, many end tags are optional (or even forbidden) Leaving these out can reduce page sizes by a significant amount, especially if there are lots of list items or any data tables.
As such, an XHTML 1.0 page will always be slightly less efficient than the equivalent HTML, even without using all the optmisations HTML allows Because XHTML 1.0 uses the same elements and attributes as HTML, it can only do the things HTML can do There is no advantage to using XHTML in the
Since HTML is more efficient in terms of download speeds and processing times, that tips the balance for me.
Longer Reply on 456 Berea Street
To cut a long story short, HTML is the best choice. Indeed, that short list of proofs is what this longer reply became after I boiled it down.
- An XHTML
DOCTYPEdoesn’t make browsers process your document using XHTML rules. Only
Content-Typecan do that and only in browsers which support it (many don’t).
- XHTML 1.0 compatible with Appendix C is limited to the elements, attributes and techniques of HTML 4.01.
- XHTML 1.1 must not be sent with a
text/htmlbecause it is not compatible with HTML rules.
- HTML 4.01 is always more efficient than the equivalent XHTML.
- HTML does everything XHTML 1.0 compatible with Appendix C can do, yet is more efficient.
- HTML is the better format for use in text/html documents.
The key to this is the
Content-Typeheader being used.
When a server sends a file to a browser, it first sends a few lines of text explaining what it is about to send. These lines of text are called the HTTP Response Headers.
The HTTP Response Headers for this page are:
Transfer-Encoding: chunked Date: Tue, 20 Jun 2006 11:02:23 GMT Content-Type: text/html; charset=iso-8859-1 Server: Apache/2.2.0 X-Powered-By: PHP/5.1.2 Vary: Accept,User-Agent 200 OK
You can set these up using your server configuration files. For Apache, these include the
.htaccessfiles. For example, to make the server include a
Content-Typeheader for all HTML files (
.html), you’d do something like this:
AddType 'text/html; charset=utf-8' .htm .html
This applies to all types of file. To send all Cascading Style Sheet (CSS) files (
.css) with the correct Content-Type header, you’d use something like this:
AddType 'text/css; charset=utf-8' .css
For PNG images (
.png) you’d use something like this:
AddType 'image/png' .png
For XHTML documents (
.xhtml) you’d use something like this:
AddType 'application/xhtml+xml' .xhtml
Content-Typeheader tells the browser what format the data they are about to receive is in. The browser decides how to handle the data according to this header.
text/htmlheader is used, the browser processes the document using the rules of HTML.
application/xhtml+xmlheader is used, the
+xmlpart means browsers which support it will process the document using XML rules. The
/xhtmlpart means they can treat the elements as being part of the XHTML namespace (
<p>means ‘paragraph’, etc) as standardised in RFC3023.
DOCTYPEdoes not make browsers switch from HTML rules to XHTML rules. Only
Content-Typehas this effect. If you send a document which uses XHTML markup but uses a
text/html, browsers will attempt to process it using HTML rules.
HTML is Here to Stay
Many devices do not support
application/xhtml+xml, so you must provide a
text/htmlversion to make sure everyone can access your website. Your website will mainly be processed using HTML rules because most people are using devices which do not support the rules of XHTML.
IE 7.0 will not support for the rules of XHTML, so HTML will remain the mainstream for some years. HTML browsers will be using the web indefinitely.
Furthermore, HTML5 could become a more practical format for commercial use than XHTML 2. This means that XHTML rules may never become the mainstream. Instead, the HTML rules may simply be developed every few years in new versions of HTML much like it was during the 1990’s.
HTML is more Efficient
When you write an XHTML 1.0 page compatible with Appendix C and send it as
text/html, your markup is processed using HTML rules. This means your pages have a fair amount of needless baggage:
- Slashes in
<meta>tags are treated as invalid attributes or as garbage characters.
DOCTYPEis little longer than the HTML equivalent.
- You have an
xmlnsattribute adding filesize. This attribute is not valid HTML, so it is ignored when sent in a
- You have
xml:langas well as
lang. All of the
xml:langattributes are redundant since HTML browsers will use the
- You have lots of tags which are not required (especially end tags).
In the HTML 4 Elements Table, you can see that the “Start Tag” and “End Tag” of many elements are “Optional”. Optional tags have been allowed in HTML since HTML 2.0 and are a fundamental part of the language, so they are safe to use.
On pages with many paragraphs, tables or lists these add up to be significant. Around 5% of filesize can be sometimes be saved by using HTML and removing the optional end tags.
Correction from Tommy Olsson
The below was due to my misreading of RFC3023, which I accept. (The bottom of page 5, specifically.)
@Ben: Good summary, but there’s one error in what you wrote. The
Content-Typeheader is only used (by browsers) to choose which parser to use (XML or SGML). That header does not make an XHTML document XHTML, only XML (despite the
xhtmlin the MIME media subtype).
The thing that says that an XHTML document is really XHTML is the
xmlnsattribute, with the correct value, on the root element. Of course, that’s ignored for non-XML documents, so it takes a combination of
Content-Type: application/xhtml+xmland the proper
You can use
application/xml, or even
text/xml, as the
Content-Typeand still have the document recognised as XHTML, provided that you have the correct