Smart Headers & HTML5 (30th November 2008)
This comparison is for HTMLWG, to inform further changes to the header association features for data tables in HTML5. The idea arose during TPAC 2008. It is tracked as Action 85 in HTMLWG.
Quick Reference
That’s All, Folks!
This comparison has influenced HTML5:
Ben's research was instrumental to the changes made here; his research probably had more of an effect on the spec than all the other discussions put together. I cannot emphasise enough how much more important objective analysis and logical argumentation is compared to opinions and assertions.
Ian Hickson
Final update was on 20th December 2008.
Feedback Finished
Feedback was welcomed until 15th December 2008. I extended this to 20th December 2008.
- E-mail me directly.
- Reply to the thread on Public-HTML, if you’re in HTMLWG.
- Use the
#whatwg
IRC channel or any other channel logged by Krijn Hoetmer. (If you direct a line at BenMillard, I’ll read that day of logs.) - E-mail me a link if it’s in a blog entry, forum topic or other mailing list.
Spread the word to any other lists, websites or individuals you think are relevant.
Introduction
Smart Headers is compared with HTML5. James Graham has built a prototype for both.
- Smart Headers prototype description by James Graham, or Smart Headers for convenience.
- Forming relationships between data cells and header cells in HTML5, or HTML5 for convenience.
HTML4 is not included due to ambiguities which become significant when assessing exactly what happens in genuine data tables.
Similarities
3 Mechanisms Applied
- A data cell with the
headers
attribute gets each header cell with correspondingid
attribute associated with it. <th scope>
is associated according to which of the 4 supported values was used:row
- associates rightwards, recognising
rowspan
. rowgroup
- associates rightwards and downwards to the end of that table section.
col
- associates downwards, recognising
colspan
. colgroup
- associates rightwards and downwards to the end of that column group.
- Plain
<th>
(or<th scope>
with an unsupported value) is automatically associated downwards or rightwards, depending on its position in the table.
Data Cells without Header Cells
- If there are no table header cells, clearly no associations will be made.
- Step 1 in Smart Headers says “If this returns any headers”, so none may be returned.
- Step 3.1 in Smart Headers may “[…] return an empty
cell_list
[…]” - HTML5 says: “Each data cell can be assigned zero or more header cells.”
Table Header Cells are <th>
isHeading
in Smart Headers only enables<th>
elements to be applied as header cells.- Smart Headers does not treat
<td scope>
as a header cell. - Processing Model in HTML5 says: “Header cells correspond to
th
elements.” <td headers>
is present in HTML5 but must only point to<th>
.
Regular Associations are Header Cell → Data Cells
- Both algorithms start at a header cell and run across all data cells until reaching the last data cell it applies to in their
scope
and automatic modes.
Optimised for Regular Associations
A regular data table has each header cell directly above or to the left of all the data cells it must be associated with. Smart Headers and HTML5 are optimised for this case by making associations automatically, without requiring scope
or headers
+id
. Specifically:
- Smart Headers has “the smart span algorithm” as a list of substeps.
- HTML5 has “the auto state” in Step 1.5.
Irregular Associations are Header Cells ← Data Cell
Upon finding a data cell with the headers
attribute, both algorithms will search for the corresponding header cells:
- Split the
headers
attribute value into its constituent tokens. - For each token, both algorithms scan the document (via
getElementById
) for the first element with matchingid
. - For each token, search for a header cell with matching
id
in that table. - If the element is a
<th>
in the current<table>
, associate it with the data cell.
Irregular Associations via headers
+id
An irregular data table has one or more header cells in a position which is not directly above or to the left of all the data cells it must be associated with. Smart Headers and HTML5 support these cases but require headers
+id
. Specifically:
- Smart Headers has “the basic algorithm”.
- HTML5 has Step 2, specifically Step 2.2 onwards.
Incremental Association
James Graham: “In principle I believe either algorithm could be written to run incrementally.”
Arbitrary Levels of Header Cells
- The number of table header cells which may associate with a data cell by any and all means is unlimited in Smart Headers:
- The number of adjacent
<th>
elements is unlimited in Smart Headers: - The number of tokens in the
headers
attribute is unlimited in Smart Headers. - There is no limit on how many
<th scope>
may cover a data cell.
- The number of adjacent
- HTML5 says: “Each data cell can be assigned zero or more header cells.”
Specialist Markup Takes Precendence
- Smart Headers has The Basic Algorithm, where the first step checks for
headers
+id
. If it finds any corresponding header cells, they are used and no further associations take place for that cell. - Smart Headers has The Smart Span Algorithm, where step 2 determines the value of
scope
. If it is one of the 4 supported values, it uses those associations and does not use the automatic association mode. - HTML5 step 1.5 specifies that
<th scope="foo">
will associate with everything covered by foo, avoiding the automatic association steps. - HTML5 step 2.2.2 lets
headers
+id
associate an arbitrary data cell to arbitrary header cells, regardless of other headers’ positions and spans.
Differently Spanned Header Cells (Adjacent or Distant)
- Smart Headers has The Get Cells From Axis Algorithm, where step 3.4.2 is unchanged by the
data_cell_found
flag:- Headers of different height in the same row associate.
- Headers of different width in the same column associate.
- HTML5 step 1.5 specifies that
<th scope="foo">
will associate for the entirety of foo, where foo is one of the 4 defined states, regardless of other headers and their spans within foo. - Smart headers has The Smart Span Algorithm, where step 3 gives the same priority to supported
scope
states, same as the previous bullet point.
Data Cells Spanning Different Spans of Header Cells
Both HTML5 and Smart Headers use 4.9.13.1 Forming a table to determine which slots in a table each cell covers, including spanned cells. As such, each header cell is associated with all cells that cover one or more slots in the area that header cell applies to.
- Discussion of timelines, including Ferrari road car timeline with expected results, covers tables with muliple levels of complex spans on both axes which are nonetheless regular in their positioning of header cells relative to the corresponding data cells.
Differences
Header Cells for Header Cells
- Smart Headers has The Get Cells From Axis Algorithm, where Step 3.4 and 3.4.2 associate header cells with header cells when they lay in particular positions with particular spans.
- Smart Headers has The Algorithm to Extract Table Headers From a Headers Attribute, where Step 3 lets a matching
<th id>
act as a header cell for the<th headers>
which pointed to it. - HTML5 says “Each data cell can be assigned zero or more header cells.” It does not say “Each data cell or header cell [...].” As such, header cells cannot be associated with header cells in HTML5. Header cells can “leapfrog” each other to continue associating with data cells, though.
Header Cells Blocking Header Cells
- Smart Headers blocks the current header cell from applying any further when it reaches a header cell of equal size in line with it on the same axis when there are one or more data cells between them.
- HTML5 blocks the current header cell from applying any further when it reaches a header cell of equal size in line with it on the same axis, with zero or more data data cells between them.
Equally Spanned Header Cells, Adjacent
- Smart Headers has The Get Cells From Axis Algorithm, where steps 3.4 and 3.4.1 run with the
data_cell_found
flag set tofalse
:- Adjacent headers of equal height in the same row associate.
- Adjacent headers of equal width in the same column associate.
- Step 1.5.5 in HTML5 stops the current header associating any further when a header of equal height is found in the same row, whether or not they are adjacent.
- Step 1.5.10 in HTML5 stops the current header associating any further when a header of equal width is found in the same column, whether or not they are adjacent.
Equally Spanned Header Cells, Data Cells Between
- Smart Headers has The Get Cells From Axis Algorithm, where steps 3.4 and 3.4.1 are run with the
data_cell_found
flag set totrue
:- A header of equal height to the current header cell will stop the current header cell from applying any further along the row if there were data cells between them.
- A header of equal width to the current header cell will stop the current header cell from applying any further down the column if there were data cells between them.
- Step 1.5.5 in HTML5 stops the current header associating any further when a header of equal height is found in the same row, whether or not they are adjacent.
- Step 1.5.10 in HTML5 stops the current header associating any further when a header of equal width is found in the same column, whether or not they are adjacent.
Empty Cells
- Smart Headers treats empty cells the same as any other.
- HTML5 defines emptiness for data cells but not for header cells.
- HTML5 says: “User agents may remove empty data cells when analyzing data in a table.”
Broken Tables are Unsupported
Tables with incorrect semantics inevitably lead to incorrect results.
- Tables made in word processors where column widths don’t line up create arbitrary spans in the table. Data cells end up in columns whose header cells should not apply to them. This is not supported.
- Tables which use
<td>
for header cells do not provide the necessary semantics. This is unsupported, even for simple tables.
Ben’s Advice for HTML5
Both algorithms reflect extensive feedback, research and testing. Each makes design choices which are not universally agreed on. Below are my suggested changes for HTML5. They are informed by:
- Research I’ve done during 2007 and 2008.
- Other people’s research which I’ve seen.
- My experience in professional auditing and retrofitting of accessibility to websites.
- Feedback I’ve read on forums, mailing lists and IRC.
- The common design patterns I’ve noticed on the web and in print during my life.
Header Cells for Header Cells
Tables can have multiple levels of header cells for columns or rows. Sometimes, the higher level of header cell is the only way to disambiguate the lower level of header cell when moving along the lower level of header cells.
#whatwg
discussion & examples with Hixie, September 2008.- E-mail summarising use cases, follow-up with more use cases.
- Multiple levels of header cell are already associated with data cells.
<td>
with Header Cell Semantics
Let <td>
act the same as <th>
when given header cell semantics via the scope
or headers
+id
features.
- In my data tables research from 2007, I found 18% of tables gave
<td>
header cell semantics in a way which seemed it would work. - Row header cells which have a column header cell use
<td>
with header cell semantics as often as they use<th>
. (Compared to 20% which use<th>
throughout.) - HTML4 (along with the books, tutorials, blogs and forum threads influenced by it) have been telling authors to apply header cell semantics to
<td>
in situations like the above for over a decade. <td headers>
pointing to<td id>
occurs in specialist tables.<th id>
works just as well, of course. But specialists can be very attached to existing practice and it is supported by some common ATs.<td scope="foo">
is sometimes used to avoid the centered bold styling of<th>
. It’s a lousy practice but at least it would remain accessible.- Increasing the proportion of fully accessible data tables from about 20% to about 40% at the cost of changing n lines in the HTML5 specification seems far more viable than retrofitting
<th>
to the n million documents currently using<td>
with header association markup. (Estimated from the proportions in my research and my understanding of how HTML5 would change.) - For header cell semantics to be there and for us not to support them is a terrible waste. (It’s rare for authors to even try making things accessible.)
- Support Existing Content.
- The HTML5 authoring guide can recommend authors use
<th>
and explain why this is better.
Equally Spanned Header Cells, Adjacent
Ignore the sizes of adjacent header cells until you break out of header cells and into data cells.
- Tables with multiple levels of column headers usually have the widest header cells in the upper levels. But sometimes the lower levels are the same width or wider. All the column header cells spanning that column are expected to apply.
- Tables with multiple levels of row headers may only have 1 row in some sections. Each row header cell is still expected to associate across that row.
- Tables with multiple levels of row header cells sometimes repeat the 1st level row header cell for each row it applies to.
Equally Spanned Header Cells, Distant
Block the current header cell from associating any further along that axis if you find a header cell with the same span after one or more data cells.
- Tables which repeat their column header cells every n rows expect the next set of repeated header cells to block the current set of header cells from going further since they share the same spans. Multiple levels of header cells may be repeated.
- A table has one or more levels of column headers and the table is also split into several sections. Each section starts by using one (or, occassionally, more) header cells which span the entire width of the table. Each section header cell is expected to combine with the column header cells because they have different spans. But each section header cell expects the next section header cell to block it from going further since they have the same span.
- (HTML5 already does this. It should remain even if the changes are made to Equally Spanned Header Cells, Adjacent.)
Equally Spanned Header Cells Using scope
, Distant
Should have the same blocking logic as the auto state.
- FTSE 100 Listings has 2 levels of column header cells. Both levels are wider than the lower level. They all use
<th scope="col">
, some withcolspan
as well. Each repeated header cell should block the earlier instances of it to avoid duplication, just as plain<th>
would.
Emptiness of Cells
Define emptiness the same way for header cells and data cells.
- Currently, HTML5 only defined emptiness of data cells.
Empty Data Cells
These must get header cells associated with them.
- Removing empty data cells is unclear. I figure it means UAs are not required to make associations with empty data cells.
- When navigating a table non-visually, orientation is difficult. Being able the query the current cell for all its header cells helps orientation, even when the user is on an empty cell.
- Tables can have many adjacent empty cells. If no associations are given to them, there will be no announcements in ATs as users move between them. Users will get nothing if they query an empty cell for its header cells. This will be disorienting.
Empty Headers Cells
These should not create associations since it’s effectively a no-op.
- Typically, an empty header cell is there to fill a slot. It is not expected to make associations.
- Associating an empty header cell with the current cell adds no orienting information to the current cell.
- An AT might reasonably provide a count for the number of header cells associated with the current cell. If one or more header cells are empty, a user will receive fewer pieces of header text than they expected. Preventing empty header cells from making associations makes the UI more straightforwards.
Heuristics
Early on, I considered <td><b>
and <td><strong>
as aliases of <th>
. I now think the algorithm should not use heuristics due to their unreliability in real data tables.
- Although
<td><b>
is clearly used as an alias for<th>
it is used for other things, too. Making<td><b>
an alias for<th>
would improve about 16% of the tables I surveyed in 2007. - However, it would break other uses in reasonable tables.
<td><strong>
for important cells or<td><b>
for every data cell in an important row would create false associations with subsequent cells. - Adding extra heuristics to handle such cases opens a can of worms.
Wide Header Cell Heuristic
Do not make a header cell at the start of a row with empty data cells act as if it spanned all those empty data cells.
- It does fit existing table arrangements where an author has not spanned a section header across the whole width of a table.
- However, it is risky because data may be unavailable. Often a placeholder character is used but these can just as easily be empty.
- Calendars exist where each row is one day but some days have no events. If the row header uses
<th>
, that whole row becomes a wide header and creates false associations with subsequent cells.