“Why is your website HTML 4.01?” (9th May 2006)
Sean Fraser from Elementary Group Standards [website offline since early 2011] e-mailed me that question. Here’s my reply, as I sent it. Following is my lengthy comment on 456 Berea Street, on a page which raised much awareness about Sean’s survey of 50 ‘elite’ standardista website.
Response to Sean Fraser
Hi Sean, thanks for your interest I’ll be happy to answer questions on this subject.
It takes me a long time to write articles because I have to fit them in around my day job I’m not a professional blogger!
Site Surgeon actually uses XHTML 1.1 sent as
application/xhtml+xml
to devices which support it Devices which do not mention that MIME type in their HTTP Accept header get an HTML 4.01 Strict document The script doesn’t check “q
values” and has potential bugs, but is available here:[These scripts are pointless so I removed the page some time later.]
It always sends the XHTML 1.1 version to the W3C validator.
I apply this technique on Site Surgeon to demonstrate the (lack of) differences between correctly authored XHTML and the equivalent HTML It also shows clients that I can work with modularised XHTML if they need to In truth, HTML 4.01 Strict would be the better choice for this website since it only uses the elements and attributes available in HTML.
I do use HTML exclusively on my other websites, though:
I do this because HTML performs slightly better than XHTML when sent as
text/html
XHTML 1.0 “may” be sent astext/html
but it has:
- A slightly longer
DOCTYPE
than HTML.- Extra baggage in the form of XML specific attributes such as
xmlns
andxml:lang
These are ignored by HTML devices, so are dead weight.- The “space-slash” character pairs are added to elements which do not need to be closed in the
text/html
environment More dead weight.Item 2 and 3 are treated as invalid markup by HTML user agents These slightly slow rendering times as the user agent must perform some (fairly simple) markup corrections They add filesize, too.
In the
text/html
environment, many end tags are optional (or even forbidden) Leaving these out can reduce page sizes by a significant amount, especially if there are lots of list items or any data tables.As such, an XHTML 1.0 page will always be slightly less efficient than the equivalent HTML, even without using all the optmisations HTML allows Because XHTML 1.0 uses the same elements and attributes as HTML, it can only do the things HTML can do There is no advantage to using XHTML in the
text/html
environment.Since HTML is more efficient in terms of download speeds and processing times, that tips the balance for me.
Longer Reply on 456 Berea Street
To cut a long story short, HTML is the best choice. Indeed, that short list of proofs is what this longer reply became after I boiled it down.
Summary
- An XHTML
DOCTYPE
doesn’t make browsers process your document using XHTML rules. OnlyContent-Type
can do that and only in browsers which support it (many don’t).- XHTML 1.0 compatible with Appendix C is limited to the elements, attributes and techniques of HTML 4.01.
- XHTML 1.1 must not be sent with a
Content-Type
oftext/html
because it is not compatible with HTML rules.- HTML 4.01 is always more efficient than the equivalent XHTML.
- HTML does everything XHTML 1.0 compatible with Appendix C can do, yet is more efficient.
- HTML is the better format for use in text/html documents.
Content-Type
The key to this is the
Content-Type
header being used.When a server sends a file to a browser, it first sends a few lines of text explaining what it is about to send. These lines of text are called the HTTP Response Headers.
The HTTP Response Headers for this page are:
Transfer-Encoding: chunked Date: Tue, 20 Jun 2006 11:02:23 GMT Content-Type: text/html; charset=iso-8859-1 Server: Apache/2.2.0 X-Powered-By: PHP/5.1.2 Vary: Accept,User-Agent 200 OK
You can set these up using your server configuration files. For Apache, these include the
httpd.conf
and.htaccess
files. For example, to make the server include aContent-Type
header for all HTML files (.htm
or.html
), you’d do something like this:AddType 'text/html; charset=utf-8' .htm .html
This applies to all types of file. To send all Cascading Style Sheet (CSS) files (
.css
) with the correct Content-Type header, you’d use something like this:AddType 'text/css; charset=utf-8' .css
For PNG images (
.png
) you’d use something like this:AddType 'image/png' .png
For XHTML documents (
.xhtml
) you’d use something like this:AddType 'application/xhtml+xml' .xhtml
The
Content-Type
header tells the browser what format the data they are about to receive is in. The browser decides how to handle the data according to this header.When the
text/html
header is used, the browser processes the document using the rules of HTML.When the
application/xhtml+xml
header is used, the+xml
part means browsers which support it will process the document using XML rules. The/xhtml
part means they can treat the elements as being part of the XHTML namespace (<p>
means ‘paragraph’, etc) as standardised in RFC3023.
DOCTYPE
The
DOCTYPE
does not make browsers switch from HTML rules to XHTML rules. OnlyContent-Type
has this effect. If you send a document which uses XHTML markup but uses aContent-Type
oftext/html
, browsers will attempt to process it using HTML rules.HTML is Here to Stay
Many devices do not support
application/xhtml+xml
, so you must provide atext/html
version to make sure everyone can access your website. Your website will mainly be processed using HTML rules because most people are using devices which do not support the rules of XHTML.IE 7.0 will not support for the rules of XHTML, so HTML will remain the mainstream for some years. HTML browsers will be using the web indefinitely.
Furthermore, HTML5 could become a more practical format for commercial use than XHTML 2. This means that XHTML rules may never become the mainstream. Instead, the HTML rules may simply be developed every few years in new versions of HTML much like it was during the 1990’s.
HTML is more Efficient
When you write an XHTML 1.0 page compatible with Appendix C and send it as
text/html
, your markup is processed using HTML rules. This means your pages have a fair amount of needless baggage:
- Slashes in
<img>
and<meta>
tags are treated as invalid attributes or as garbage characters.- Your
DOCTYPE
is little longer than the HTML equivalent.- You have an
xmlns
attribute adding filesize. This attribute is not valid HTML, so it is ignored when sent in atext/html
document.- You have
xml:lang
as well aslang
. All of thexml:lang
attributes are redundant since HTML browsers will use thelang
attribute.- You have lots of tags which are not required (especially end tags).
In the HTML 4 Elements Table, you can see that the “Start Tag” and “End Tag” of many elements are “Optional”. Optional tags have been allowed in HTML since HTML 2.0 and are a fundamental part of the language, so they are safe to use.
On pages with many paragraphs, tables or lists these add up to be significant. Around 5% of filesize can be sometimes be saved by using HTML and removing the optional end tags.
Correction from Tommy Olsson
The below was due to my misreading of RFC3023, which I accept. (The bottom of page 5, specifically.)
@Ben: Good summary, but there’s one error in what you wrote. The
Content-Type
header is only used (by browsers) to choose which parser to use (XML or SGML). That header does not make an XHTML document XHTML, only XML (despite thexhtml
in the MIME media subtype).The thing that says that an XHTML document is really XHTML is the
xmlns
attribute, with the correct value, on the root element. Of course, that’s ignored for non-XML documents, so it takes a combination ofContent-Type: application/xhtml+xml
and the properxmlns
attribute.You can use
application/xml
, or eventext/xml
, as theContent-Type
and still have the document recognised as XHTML, provided that you have the correctxmlns
attribute.