Collections of Interesting Data Tables

Genuine data tables found on the web which seem complex or otherwise noteworthy, by Ben ‘Cerbera’ Millard.

Feedback

Corrections of any size and links to other collections are welcome. In order of preference:

  1. Participate in the Data Table Collections (Research) thread of W3C’s public-html mailing list.
  2. Add to the Accessify Forum topic.
  3. E-mail me: cerbera@projectcerbera.com.

Please include “Table Collections” in e-mail subject lines to help me track feedback.

Goals and Deliverables

This document contains no products or recommendations. It documents how tables get authored in reality so specifications can make HTML data table features more realistic and robust. The tables can also be used as real-world trials of prototypes and implementations.

So far, it has influenced the design of HTML5’s table header association algorithm (draft).

A slightly different approach has been developed between myself, Simon Pieters and James Graham. James Graham’s Table Inspector is the prototype. It was demonstrated in Boston, November 2007 at a W3C HTMLWG Unconference session.

Progress

Began on 19th May 2007 with the most recent update on 6th November 2007. Some editorial changes on 7th April 2008.

This research is often inactive while I earn a living. I am seeking sponsorship to make this research more sustainable.

Sample Size

How Authors Indicate Headers in Data Tables

Collections

Simulated Retrofitting

Ways tables can be modified to become more accessible whilst keeping their meaning. Check the Method for Retrofitting Simulations.

astro:
6 genuine tables from the U.S. Naval Observatory’s Astronomical Applications Department Data Services.
clark2006:
19 genuine tables from Joe Clark’s Table examples for PDF/UA 1 (2006.01.27). (PDF/UA.)
finance:
2 genuine tables about money, with notes in the next section..
form:
1 genuine table with forms controls in it.
odi:
7 genuine tables from Office for Disability Issues (ODI) research, New Zealand.
thatcher:
2 genuine tables examples from USA government, sent to me by Jim Thatcher.
sports:
1 genuine table, with notes in the next section.
tides:
1 genuine Gorleston tide table, UK Broads Authority.

On the Web

From browsing of the web, including deliberate searches for interesting tables. I biased the search towards the more popular websites for any given query.

Astronomy

The Astronomical Almanac from the US Naval Observatory:
  • Solar system measurements and constants.
  • PDF or ASCII. Are HTML tables so hard? E-mail them.

Computing

APIs Usage in VB6 “FileInfo” Project by Karl E. Peterson:
  • Col headers use <th align="left">. So we can’t rely on align to tell us if a <th> is being used for data?
  • Row headers use <td><b>. So this is used for both types of header.
  • <br> instead of rowspan. Make a variant to simulate retrofitting this.
  • Uses the frames attribute for controlling borders.
The Best Gaming Video Cards for the Money: May 2007 from Tom’s Hardware:
  • Column headers use plain <th>.
  • Column headers are styled to look identical to data cells. We can’t use visual appearence to tell when <th> is being used for data?
  • Each cell has a list of 0 or more graphics cards:
    • Empty cells (0 cards) use <td>&nbsp;</td>.
    • Items are separated with a comma.
    • Could use <ul> and <li>. Make a variant.
    • Could use invididual cells. Make a variant.
Harmonia GUI Framework by Andrew Fedoniouk:
  • Headers use plain <td>.
  • Table has three layers:
    1. A cell spanning the entire table width indicates the first layer of sections.
    2. A cell in column 1 indicates the next layer.
    3. Column 2 indicates layers within those indicates by column 1.
    Rowspanning is used in columns 1 and 2 when these inner layers contain more than one row.
  • An interesting anomaly is when “Module” and “Class/struct/type declaration” are the same:
    • Rather than repeat the same name twice, the module name is spanned into the next column.
    • Since it acts as a row header, it should be marked up as a header.
    • This might prevent the “smart colspan” algorithm working.
    • Make a variant where the name is repeated.
    • Make a variant where it uses <th>.
    • Make a varaint where the first row of column headers use scope="col".
    • Make a variant where the first row of column headers are inside a <thead>, implying scope="col".
  • Some cells use <td><strong> but are not headers. We cannot imply <td><strong> is a table header?
Keryx (X)HTML Elements Best Practice Sheet by Lars Gunther:
  • Is an XML page:
    • Empty cells use <td />. Is the same as <td></td>.
  • Column headers use <th>.
  • Column headers are in a <thead>.
  • No columnar values of scope are used.
  • Uses <colgroup> but only for controlling borders.
  • Row headers use <th scope="col">.
  • Table is split into sections:
    • First section uses <tbody> and <th scope="rowspan">. Make a variant where each section does this.
    • The other sections use <td colspan> as a header which stretches across the table. Can this be told apart from a wide data row? Make a variant using <th colspan>.
  • Some data cells span columns, some span rows, some span rows and columns.
  • Abbreviations are expanded with a mix of punctuation and <b>. Make a variant using <abbr title>. Make a variant using <dfn>.
Layout height attributes on body and html elements by Anne van Kesteren:
  • Uses <caption> correctly.
  • Column headers use <th>.
  • Column headers are inside a <thead>.
  • 3 levels of column headers with different column spans:
    1. Top level spans the furthest.
    2. Next level spans less.
    3. Final level does not span.
  • Column span boundaries have a regular alignment.
  • HTML4’s scope cannot express this table because it would need nested <colgroup>? Make a variant.
  • Browser names use plain <td>. But you need these as headers to understand the data? Make a variant.
  • Shortened terms expanded after the table. Make a variant using <abbr title>.
  • Review it against HTML4’s header search algorithm. Ask Leif Halvard Silli to do this?
Optimize string handling in VB6 - Part II by Tuomas Salste:
  • Tables as diagrams of memory structures.
  • Regular data tables.
  • Comparisons.
  • Headers usually done with <th>. Sometimes done with <td align="center">.
  • Inconsistencies between markup and styling on one page by one author. E-mail them about this.
  • Clean, minimal markup on the whole. Maybe authors will be happy to write new tables this simply?
Linkback by Wikipedia:
  • Column headers use <th>.
  • Row headers use plain <td>.
  • The empty header cell uses <th></th>.
  • Empty data cells use <td>None</td>.
  • At least 2 data cells contain a <ul> in 3 of the 4 columns. Block-level markup does not indicate a layout table.
The QA Matrix by W3C QA:
  • Four distinct columns sharing the “Properties” header (or a list of 4 items, depending how you look at it).
  • 0 or 1 lists in the final cell of each row.
  • Empty cells marked “-”.
date Parameters from the PHP Manual:
  • Column headers use <th>.
  • Column headers are in a <thead>.
  • Row headers use <td><var>.
  • Table is split into sections:
    • Section headers use <td align="center"><span class><em>.
    • Section headers only span the first column. E-mail them about it.
    • Other cells in the section header row use <td>---</td>. Can this be considered an empty cell?
    • Make a variant with <th>.
    • Make a variant with <td colspan>.
    • Make a variant with <th colspan>.
    • Make a variant with <tbody><th scope="rowgroup">.
    • Make a variant with <tbody><th colspan scope="rowgroup">.
  • Uses a single <colgroup> for the table with a <col> element for each column. Why? E-mail them about it.
  • Supplies a summary which repeats the previous paragraph. Why? E-mail them about it.
  • The strangeness seems symptomatic of someone who is trying too hard without fully understanding the markup. E-mail them about it.

Education

A table of worldwide ages of consent, including US states by Avert:
  • <th> used for column headers.
  • Column headers are in the same <tbody> as all the data. Didn’t use <thead>.
  • <td class> used for row headers. Why not <th>? E-mail them.
  • The column of row headers has a column header.
  • Row headers get 2 layers deep in several places but are never heirarchical.
  • Footnotes are numbered in the table and wrapped in <sup> which corresponds to a <ol> later in the page. Perhaps this could be built on to produce a robust footnotes system leverging existing elements for HTML5?
  • A row of averages is placed at the bottom in the same row group as the data. Didn’t use <tfoot>.
School Teachers’ Review Body Statistical tables as annex to the 2005 written evidence from the DfES by teachernet:
  • All done as Excel spreadsheets.
  • Some have heirarchical row headers. Did they choose Excel because they couldn’t figure out the HTML to do this? E-mail them.
  • Most of these tables are dead simple. So why not use HTML? E-mail them.
Science and engineering departmental population at doctorate-granting institutions, by field: 1987-94 by the National Science Foundation:
  • ASCII used in a <pre> instead of HTML table elements. E-mail them. Make a variant.
  • All their most recent tables are done in Excel and PDF. For example, Graduate Students and Postdoctorates in Science and Engineering: Fall 2005. Is HTML so hard? E-mail them.
  • One row of column headers.
  • Indented table sections where most rows are 4 levels deep! Are their headers supposed to accumulate? E-mail them.
  • If they must accummulate, probably needs the headers+id patch technique.
  • Row header text is too long for rowspan to be practical? Make a variant.
  • Totals and subtotals appear at the start of each top-level table section.
  • No column spans or row spans.
  • Footnotes appear immediately after the table. This seems to be a strong convention in print, ASCII, HTML and other formats?

Finance

FTSE 100 Listings from Money Extra with loads more UK stock tables:
  • Column headers cover two rows.
  • Entire headers block gets repeated after every 20 rows of data.
  • Uses scope="col", so the scope has to stop after it runs down some data rows and hits another header with the same scope.
  • As scope="col" is used in cells with colspan, to accomodate this table we would need to:
  • Maybe it’s too funky to accomodate? Removing the scope attributes would be an easy retrofit. Make a variant.
Departmental financial statements from Disability Services Queensland:
Uses the same headers+id heirarchical row header patching technique as Stephen Ferg in the USA. E-mail them about any influence.
FTSE ACT 250 by Yahoo! Finance:
  • <td align="center"> for column headers. Maybe this should be an alias for <th> in certain situations? Such as in a row which only contains <td align="center">?
  • Some uses of <td align="center"> and <td> containing <b> for different purposes:
    • Row headers use <td><b><a>...</a></b></td>. So <b> is the only child of <td> for these headers.
    • Columns 3 and 4 use <td align="center">. Need to be very careful if we allow this as an alias for <th>.
    • Column 3 uses <td><b>...</b> ...</td> and column 4 uses <td><img> <b>...</b></td>. So <b> is not the only child of <td> for these data cells.
    We must be very careful about when things can be interpreted as <th>?
  • Why aren’t they using <th>? E-mail them.
  • There are 2 layout tables as ancestors of this data table.
  • There are no layout tables as descendants of this data table. A data table can only be at the bottom of nested tables?
University of Wisconsin–Madison Facts: Budget:
  • Snapshots and taken on 29th September 2007, with retrofitting simulations:
  • Attempted to use headers+id but got it wrong:
    • Bogus reference to acprog in a headers attribute value.
    • Empty string for headers values in the “2005-2006 Budget: allocation by program” table from the “Student support” section onwards.
  • Tables are captioned with <caption>.
  • Purpose of table is summarised in the summary attribute.
  • Column headers use <th>.
  • Long header text is abbreviated with the abbr attribute.
  • Two headers use <td>.
  • Table is split into headed sections.
  • Section headers use <th> with a colspan which covers the full table width.
  • Cell arrangement is regular and doesn’t really need headers+id? Make a variant.

Government

Bolton Museums - Contact Us:
  • No column headers. Make a variant with column headers.
  • Table begins with a <th colspan><i> across the whole width:
    • It is the first section header.
    • Implying this is a caption would be wrong.
    • <i> does not hint at a semantic intention, unlike <b>.
  • Subsequent sections also begin with <th colspan> across the whole width.
  • Sections end with a row of cells using <td><br /></td>. A cell containing a <br> is an empty cell which should be ignored?
  • All sections are in the same <tbody>. Make a variant using one <tbody> for each section.
  • Row headers use <td>. Make a variant using <th>.
  • Row headers sometimes span more than 1 row.
  • The row header spans more than one column when there is no names of people. Make a variant where the person’s name cell is there but is empty.
  • People’s names use <td><b>. Implying these are headers would be wrong. Make a variant where this boldness is done via CSS.
Bureau of Labor Statistics, particularly these areas influenced by Stephen Ferg:
Minimal <th> and <td> are used. Minimal headers+id is added to patch up the HTML4 header search algorithm where needed.
National Statistics Online (UK)
It’s all PDF except for commentary and graphs?
TABLE Z-2 - 1910.1000 TABLE Z-2 from the US Department of Labor, Occupational Safety & Health Administration:
Try saying that three times quickly.
Expanded Homicide Data Table 2 from the Federal Bureau of Investigation:
  • Column headers use <th> with scope="col".
  • They expect col to be sensitive to the colspan. I once thought this, too. Probably unaware of the colgroup value, which is also rather strange to set up.
  • <th> with scope="row" for row headers, augmented with headers+id for the heirarchical row header.
  • headers+id for every cell which has an headers applied to it.
  • A unified header algorithm needs to drop duplicate associations caused by the overlapping association methods in tables like this.
  • Footnotes in a <ul> after the table.

Interactive

Dog selector test:
Faking a table for a form.

Timetables

Events - Lions Club of Fleet:
  • <td><h3> for header cells.
  • If you could recognise these as headers, you’d need to be smart about colspan even through the headers are defaulting to colspan=1.
  • Endnotes are in the final row of the data section.
Timetables - Isle of Man Steam Packet Company Ferry Services:
  • Several PDF documents, each of which contains several pages of colour-coded timetables.
  • Why did they use PDF? Are HTML tables so much harder? E-mail them.

Products

Fitting Bras, Correct Bra Size and Comparisons from Bigger Bras:
  • 2 levels of column headers:
    • Column headers use <td align="center">.
    • Table is split into 2 sections, with column headers for each section.
  • 3 levels of row headers:
    • Row headers use <td align="center">.
    • Data cells also use <td align="center">.
  • Cannot imply <td align="center"> is a header.
  • Some cells legitimately contain two pieces of data.

Sports

Detailed Review

I wrote a detailed review of sports tables which included:

The AFB reviewed some sports sites in early 2006, finding problems with data tables. Disabled people can be sports fans, if you hadn’t realised. Heard of the Paralympics?

ESPN

None of their tables use <th>. Their column headers use <td> with CSS to make it bold! But at least retrofitting <th> would be easy. E-mail them about it.

Their data tables are usually given a caption by placing a <td colspan> in the first row which spans all columns in that table. I call this an “embedded caption”. Is it so hard to style <caption>? Test it.

NHL Player Card for Daniel Alfredsson:
  • Abbreviations for headers described by a glossary, which is the next table in this review.
  • Embedded caption.
  • Data is mostly numerical and layed out regularly. Pretty tame.
NHL Statistics Glossary:
  • Embedded caption.
  • Only 2 columns of data. A small number of columns does not indicate a layout table.
  • No cells acting as column headers.
  • First column kind of acts as row headers. Should authors bother with row headers in 2-column tables? The user probably just heard it and it can be heard again by moving one cell left.
  • If it were retrofitted with <th>, would our algorithms work? Make a variant.
NHL Boxscore:
  • 14 tables styled to look like data tables.
  • 2 of these are layout tables. Each contain 2 of the other 14 data tables.
  1. Untitled table showing scores per quarter:
    • No caption.
    • Row headers include some data.
    • Final column uses bold styling applied via CSS to indicate importance.
    • Top left cell is completely empty.
    • Seems indistinguishable from a layout table.
    • Make a variant.
  2. Three Stars:
    • 3 column layout table.
    • Multiple details per cell.
    • There are no column headers, just an embedded caption.
    • Probably won’t hurt if this was erroneously identified as a data table?
  3. Game Information:
    • 2 column layout table.
    • Multiple details per cell.
    • No column headers, just an embedded caption.
  4. Team Statistical Comparison:
    • Layout table.
    • Contains 6 tables in one cell.
    • Each of these tables is a diagram and not really a data table.
    • Need to see the colours and tell them apart to understand the data.
    • Make a variant where these are genuine data tables without depending on colour.
  5. 1st Period Summary:
    • Uses a <td> spanning the entire table width using align=center in sections where there is no data to report. Imply that is a headers would break this table.
    • Regular data table with one detail per cell.
    • Column headers are repeated.
    • Columns 3 and 4 start with individual headers but are replaced by a spanned header. “Smart colspan” wouldn’t recognise this because it would fail in other tables, IIRC.
  6. 2nd Period Summary, 3rd Period Summary and OT Summary are the same as 1st Period Summary.
  7. Player Summary is a layout table which contains 2 data tables which are the same:
    • 2 rows of column headers.
    • First column header is actually a caption for the table and shouldn’t be alongside the other two table headers. Make a variant.
    • First row headers span several columns.
    • Column headers span a single column.
    • Column headers use abbreviations which are not expanded. Make a variant. Can the text content of an <abbr> element in a column header be an alias for an abbr attribute value?
    • Row headers use <td>.
    • Data is very regular with one detail per cell, except player positions which are in the same column as player names. Make a variant.
  8. Goaltending Summary is a layout table which contains 2 data tables which are the same:
    • Column headers use some abbreviations which are not expanded. Make a variant.
    • 3 rows in total, 1 row of data. A small number of rows does not indicate a layout table.
    • Row header is marked up using <td>.
    • Very regular with one detail per cell.
  9. Shots on Goal:
    • Caption is embedded into the row of headers. Make a variant.
    • Column headers use abbreviations which are not expanded. Make a variant.
    • Row headers use <td>.
    • Very regular with one detail per cell.
MLB Stats 2007:
  • The tables nested inside the layout table all follow the same pattern:
    • First column has a picture of the player.
    • Second column has 3 data items lumped in together:
      • Player rank for this category.
      • Player name, linked to their player card.
      • Abbreviated name of the team they play for.
    • The first two columns start with the table caption and don’t have a real column header.
    • Rightmost column in each is a column of data with a header.
    • In practise, these are also layout tables?
  • Can these complex mixtures of data tables, layout tables and hybrid tables be told apart?
  • How common are situations like this?
  • Is retrofitting accessibility to this even possible? Make a variant.
  1. Sortables:
    • Embedded caption.
    • Column headers are needed to disambiguate the link in each data cell.
    • Using <td colspan="2"> instead of <th colspan="2">. Make a variant. E-mail them about it.
  2. Two-column layout table:
    • First cell in each column uses same markup as genuine table headers elsewhere.
    • The key difference is this table contains other tables. That means it cannot be a data table.
PGA Tour Statistics:
  • Column headers are repeated after every 10 data rows.
  • Row headers are either the number, the player name, or both.
  • Candidates for row headers are marked up as plain <td>.
  • Empty cells use <td>--</td>:
    • PHP Manual uses 3.
    • Other places use 1.
    • Yet to a find a place where they are significant.
    • So maybe a cell which only contains hyphens is always intended as an empty cell?
  • Server-side sorting via a hyperlink in the column header.
    • Sorted row is styled like a table using CSS but uses <td class> rather than <td><b>.
  • Very regular data with one detail per cell.
Tiger Woods - Player Card:
  • 3 data tables.
  1. PGA Season Overview - 2007:
    • Row headers use plain <td>.
    • 4 rows in total and only 2 are for data. A small number of rows does not indicate a layout table.
    • Very regular data with one detail per cell.
  2. PGA Tour Stat Ranks - 2007:
    • First column header spans two columns even though they contain different details. Make a variant which gives the second column a “value” header.
    • Row headers use plain <td>.
    • Very regular data with one detail per cell.
  3. 2007 Tournaments:
    • Most useful row headers are probably the event names, in column 2:
      • Make a variant where these use <th>.
      • Make a variant where column 2 is swapped with column 1.
    • Regular data but each cell in column 2 and column 4 contains multiple details.
    • Table ends with a full-width row which contains an endnote.
Indy Racing League Race Schedule:
  • Borderline layout table.
  • Column 2 uses <td><b> but the <b> does not contain everything. It is not intended as a header cell. This trend is consistent with other tables.
  • Regular data but column 2 and column 3 have several details in each cell.
  • Column 2 has track name and location because they are closely related.
NHRA Results:
  • Regular data but cell 3 has loads of details dumped into it:
    • Borderline layout table because of this.
    • Contains 3 rows of data, each consists of:
      • Vehicle class, which should be a header.
      • Winning driver’s name, inside <b>, which should be a data cell.
      • Winning top speed, which should be a data cell.
      • Winning time, which should be a data cell.
      • Make a variant.
      • Is the data packed in this way so it fits in the website’s layout? E-mail them.
  • Probably doesn’t need row headers as there are only 3 columns.
Eurosport
Overall Team Standings: Stage 20:
  • scope for column headers which are using <th>. Amazing!
  • scope for row headers in middle of row. Seems they think this applies it leftward as well as rightward. I used to think that, too.
  • The tabs above the table are links to 5 other tables built the same way.
  • Zebra rows using <tr class> with a value alternating between row and alt.
Soccer
League Table - Premier League Soccer (UK):
  • I made a snapshot of the Premier League table on 16th September 2007.
  • Entire table is written to the page using Javascript, specifically:
    • .innerHTML rather than document.write.
    • Javascript constructs the table markup from an XMLHttpRequest.
    • There is no table if Javascript is unavailable. You get alert boxes if features are unsupported or an error occurs.
    • It starts at about line 800, all embedded into the page.
  • <td> only with CSS to make the headers bold, just like ESPN.
  • Probably the most widely recognised data table in the UK.

Elsewhere

Collections I’ve seen but not worked on:

If you send in a collection I shall add it to this list but I might not work on it.

Method for Retrofitting Simulations

For each table found on the web:

  1. If it is part of an existing collection:
    1. Create a subdirectory for this table.
  2. Otherwise:
    1. Create a new directory for this new collection.
    2. Create a subdirectory for this table.
  3. Create an original.html file with the table markup from the original page.
  4. Create some variants of it, usually these:
    minimal.html:
    Strip the original to the simplest markup without changing cell arrangements. Add border=1 to make structure visible.
    scope.html:
    Add scope attributes to the minimal.html example, with grouping elements as necessary.
    scope-abbr.html:
    Add abbr attributes to the scope.html example where appropriate.
    Special variants:
    • Simpler header arrangements.
    • Adjescant empty cells as spanned empty cells.
    • Translate to English.
    • Add <abbr title>.
    • Non-conformant markup where conformant markup is inadequate.
    • Etc.
  5. Get a feel for conformance and sanity using:
  6. Upload to the web (duh).
  7. Update this page if a new collection was created.

Future?

No more original.html files; they are too big a bottleneck. Dumping links with a summary is more useful for categorising the use cases. It also helps other Participants find things to do.