Smart Headers & HTML5 (30th November 2008)

This comparison is for HTMLWG, to inform further changes to the header association features for data tables in HTML5. The idea arose during TPAC 2008. It is tracked as Action 85 in HTMLWG.

Quick Reference

  1. Similarities
  2. Differences
  3. Ben’s Advice

That’s All, Folks!

This comparison has influenced HTML5:

Ben's research was instrumental to the changes made here; his research probably had more of an effect on the spec than all the other discussions put together. I cannot emphasise enough how much more important objective analysis and logical argumentation is compared to opinions and assertions.

Ian Hickson

Final update was on 20th December 2008.

Feedback Finished

Feedback was welcomed until 15th December 2008. I extended this to 20th December 2008.

Spread the word to any other lists, websites or individuals you think are relevant.

Introduction

Smart Headers is compared with HTML5. James Graham has built a prototype for both.

HTML4 is not included due to ambiguities which become significant when assessing exactly what happens in genuine data tables.

Similarities

3 Mechanisms Applied

Data Cells without Header Cells

Table Header Cells are <th>

Regular Associations are Header Cell → Data Cells

Optimised for Regular Associations

A regular data table has each header cell directly above or to the left of all the data cells it must be associated with. Smart Headers and HTML5 are optimised for this case by making associations automatically, without requiring scope or headers+id. Specifically:

Irregular Associations are Header Cells ← Data Cell

Upon finding a data cell with the headers attribute, both algorithms will search for the corresponding header cells:

  1. Split the headers attribute value into its constituent tokens.
  2. For each token, both algorithms scan the document (via getElementById) for the first element with matching id.
  3. For each token, search for a header cell with matching id in that table.
  4. If the element is a <th> in the current <table>, associate it with the data cell.

Irregular Associations via headers+id

An irregular data table has one or more header cells in a position which is not directly above or to the left of all the data cells it must be associated with. Smart Headers and HTML5 support these cases but require headers+id. Specifically:

Incremental Association

James Graham: “In principle I believe either algorithm could be written to run incrementally.”

Arbitrary Levels of Header Cells

Specialist Markup Takes Precendence

Differently Spanned Header Cells (Adjacent or Distant)

Data Cells Spanning Different Spans of Header Cells

Both HTML5 and Smart Headers use 4.9.13.1 Forming a table to determine which slots in a table each cell covers, including spanned cells. As such, each header cell is associated with all cells that cover one or more slots in the area that header cell applies to.

Differences

Header Cells for Header Cells

Header Cells Blocking Header Cells

Equally Spanned Header Cells, Adjacent

Equally Spanned Header Cells, Data Cells Between

Empty Cells

Broken Tables are Unsupported

Tables with incorrect semantics inevitably lead to incorrect results.

Ben’s Advice for HTML5

Both algorithms reflect extensive feedback, research and testing. Each makes design choices which are not universally agreed on. Below are my suggested changes for HTML5. They are informed by:

Header Cells for Header Cells

Tables can have multiple levels of header cells for columns or rows. Sometimes, the higher level of header cell is the only way to disambiguate the lower level of header cell when moving along the lower level of header cells.

<td> with Header Cell Semantics

Let <td> act the same as <th> when given header cell semantics via the scope or headers+id features.

Equally Spanned Header Cells, Adjacent

Ignore the sizes of adjacent header cells until you break out of header cells and into data cells.

Equally Spanned Header Cells, Distant

Block the current header cell from associating any further along that axis if you find a header cell with the same span after one or more data cells.

Equally Spanned Header Cells Using scope, Distant

Should have the same blocking logic as the auto state.

Emptiness of Cells

Define emptiness the same way for header cells and data cells.

Empty Data Cells

These must get header cells associated with them.

Empty Headers Cells

These should not create associations since it’s effectively a no-op.

Heuristics

Early on, I considered <td><b> and <td><strong> as aliases of <th>. I now think the algorithm should not use heuristics due to their unreliability in real data tables.

Wide Header Cell Heuristic

Do not make a header cell at the start of a row with empty data cells act as if it spanned all those empty data cells.