Collections of Interesting Data Tables
Genuine data tables found on the web which seem complex or otherwise noteworthy, by Ben ‘Cerbera’ Millard.
Feedback
Corrections of any size and links to other collections are welcome. In order of preference:
- Participate in the Data Table Collections (Research) thread of W3C’s
public-html
mailing list. - Add to the Accessify Forum topic.
- E-mail me:
cerbera@projectcerbera.com
.
Please include “Table Collections” in e-mail subject lines to help me track feedback.
Goals and Deliverables
This document contains no products or recommendations. It documents how tables get authored in reality so specifications can make HTML data table features more realistic and robust. The tables can also be used as real-world trials of prototypes and implementations.
So far, it has influenced the design of HTML5’s table header association algorithm (draft).
A slightly different approach has been developed between myself, Simon Pieters and James Graham. James Graham’s Table Inspector is the prototype. It was demonstrated in Boston, November 2007 at a W3C HTMLWG Unconference session.
Progress
Began on 19th May 2007 with the most recent update on 6th November 2007. Some editorial changes on 7th April 2008.
This research is often inactive while I earn a living. I am seeking sponsorship to make this research more sustainable.
Sample Size
- 150 variants to simulate retrofitting techniques.
- 96 genuine tables on the Web collected and analysed.
- 27 more tables awaiting review.
How Authors Indicate Headers in Data Tables
- 15% did not use HTML table elements.
- 1% have no header cells but are data tables.
- Using
<th>
with any attributes and any nested elements:- 20% for all headers.
- 18% for some headers.
- 45% for no headers.
- (Equals 99% with the first two groups due to rounding.)
- Using
<td>
with any attributes and a<b>
or a<strong>
as the only child:- 5% for all headers.
- 11% for some headers.
- 70% for no headers.
- (Equals 102% with the first two groups due to rounding.)
- Using
<td>
with any attributes exceptscope
orheaders
and any nested elements except<b>
or<strong>
as the only child:- 35% for all headers.
- 20% for some headers.
- 31% for no headers.
- (Equals 102% with the first two groups due to rounding.)
Collections
Simulated Retrofitting
Ways tables can be modified to become more accessible whilst keeping their meaning. Check the Method for Retrofitting Simulations.
astro
:- 6 genuine tables from the U.S. Naval Observatory’s Astronomical Applications Department Data Services.
clark2006
:- 19 genuine tables from Joe Clark’s Table examples for PDF/UA 1 (2006.01.27). (PDF/UA.)
finance
:- 2 genuine tables about money, with notes in the next section..
form
:- 1 genuine table with forms controls in it.
odi
:- 7 genuine tables from Office for Disability Issues (ODI) research, New Zealand.
thatcher
:- 2 genuine tables examples from USA government, sent to me by Jim Thatcher.
sports
:- 1 genuine table, with notes in the next section.
tides
:- 1 genuine Gorleston tide table, UK Broads Authority.
On the Web
From browsing of the web, including deliberate searches for interesting tables. I biased the search towards the more popular websites for any given query.
Astronomy
- The Astronomical Almanac from the US Naval Observatory:
-
- Solar system measurements and constants.
- PDF or ASCII. Are HTML tables so hard? E-mail them.
Computing
- APIs Usage in VB6 “FileInfo” Project by Karl E. Peterson:
-
- Col headers use
<th align="left">
. So we can’t rely onalign
to tell us if a<th>
is being used for data? - Row headers use
<td><b>
. So this is used for both types of header. <br>
instead ofrowspan
. Make a variant to simulate retrofitting this.- Uses the
frames
attribute for controlling borders.
- Col headers use
- The Best Gaming Video Cards for the Money: May 2007 from Tom’s Hardware:
-
- Column headers use plain
<th>
. - Column headers are styled to look identical to data cells. We can’t use visual appearence to tell when
<th>
is being used for data? - Each cell has a list of 0 or more graphics cards:
- Empty cells (0 cards) use
<td> </td>
. - Items are separated with a comma.
- Could use
<ul>
and<li>
. Make a variant. - Could use invididual cells. Make a variant.
- Empty cells (0 cards) use
- Column headers use plain
- Harmonia GUI Framework by Andrew Fedoniouk:
-
- Headers use plain
<td>
. - Table has three layers:
- A cell spanning the entire table width indicates the first layer of sections.
- A cell in column 1 indicates the next layer.
- Column 2 indicates layers within those indicates by column 1.
- An interesting anomaly is when “Module” and “Class/struct/type declaration” are the same:
- Rather than repeat the same name twice, the module name is spanned into the next column.
- Since it acts as a row header, it should be marked up as a header.
- This might prevent the “smart colspan” algorithm working.
- Make a variant where the name is repeated.
- Make a variant where it uses
<th>
. - Make a varaint where the first row of column headers use
scope="col"
. - Make a variant where the first row of column headers are inside a
<thead>
, implyingscope="col"
.
- Some cells use
<td><strong>
but are not headers. We cannot imply<td><strong>
is a table header?
- Headers use plain
- Keryx (X)HTML Elements Best Practice Sheet by Lars Gunther:
-
- Is an XML page:
- Empty cells use
<td />
. Is the same as<td></td>
.
- Empty cells use
- Column headers use
<th>
. - Column headers are in a
<thead>
. - No columnar values of
scope
are used. - Uses
<colgroup>
but only for controlling borders. - Row headers use
<th scope="col">
. - Table is split into sections:
- First section uses
<tbody>
and<th scope="rowspan">
. Make a variant where each section does this. - The other sections use
<td colspan>
as a header which stretches across the table. Can this be told apart from a wide data row? Make a variant using<th colspan>
.
- First section uses
- Some data cells span columns, some span rows, some span rows and columns.
- Abbreviations are expanded with a mix of punctuation and
<b>
. Make a variant using<abbr title>
. Make a variant using<dfn>
.
- Is an XML page:
- Layout height attributes on body and html elements by Anne van Kesteren:
-
- Uses
<caption>
correctly. - Column headers use
<th>
. - Column headers are inside a
<thead>
. - 3 levels of column headers with different column spans:
- Top level spans the furthest.
- Next level spans less.
- Final level does not span.
- Column span boundaries have a regular alignment.
- HTML4’s
scope
cannot express this table because it would need nested<colgroup>
? Make a variant. - Browser names use plain
<td>
. But you need these as headers to understand the data? Make a variant. - Shortened terms expanded after the table. Make a variant using
<abbr title>
. - Review it against HTML4’s header search algorithm. Ask Leif Halvard Silli to do this?
- Uses
- Optimize string handling in VB6 - Part II by Tuomas Salste:
-
- Tables as diagrams of memory structures.
- Regular data tables.
- Comparisons.
- Headers usually done with
<th>
. Sometimes done with<td align="center">
. - Inconsistencies between markup and styling on one page by one author. E-mail them about this.
- Clean, minimal markup on the whole. Maybe authors will be happy to write new tables this simply?
- Linkback by Wikipedia:
-
- Column headers use
<th>
. - Row headers use plain
<td>
. - The empty header cell uses
<th></th>
. - Empty data cells use
<td>None</td>
. - At least 2 data cells contain a
<ul>
in 3 of the 4 columns. Block-level markup does not indicate a layout table.
- Column headers use
- The QA Matrix by W3C QA:
-
- Four distinct columns sharing the “Properties” header (or a list of 4 items, depending how you look at it).
- 0 or 1 lists in the final cell of each row.
- Empty cells marked “-”.
date
Parameters from the PHP Manual:-
- Column headers use
<th>
. - Column headers are in a
<thead>
. - Row headers use
<td><var>
. - Table is split into sections:
- Section headers use
<td align="center"><span class><em>
. - Section headers only span the first column. E-mail them about it.
- Other cells in the section header row use
<td>---</td>
. Can this be considered an empty cell? - Make a variant with
<th>
. - Make a variant with
<td colspan>
. - Make a variant with
<th colspan>
. - Make a variant with
<tbody><th scope="rowgroup">
. - Make a variant with
<tbody><th colspan scope="rowgroup">
.
- Section headers use
- Uses a single
<colgroup>
for the table with a<col>
element for each column. Why? E-mail them about it. - Supplies a
summary
which repeats the previous paragraph. Why? E-mail them about it. - The strangeness seems symptomatic of someone who is trying too hard without fully understanding the markup. E-mail them about it.
- Column headers use
Education
- A table of worldwide ages of consent, including US states by Avert:
-
<th>
used for column headers.- Column headers are in the same
<tbody>
as all the data. Didn’t use<thead>
. <td class>
used for row headers. Why not<th>
? E-mail them.- The column of row headers has a column header.
- Row headers get 2 layers deep in several places but are never heirarchical.
- Footnotes are numbered in the table and wrapped in
<sup>
which corresponds to a<ol>
later in the page. Perhaps this could be built on to produce a robust footnotes system leverging existing elements for HTML5? - A row of averages is placed at the bottom in the same row group as the data. Didn’t use
<tfoot>
.
- School Teachers’ Review Body Statistical tables as annex to the 2005 written evidence from the DfES by teachernet:
-
- All done as Excel spreadsheets.
- Some have heirarchical row headers. Did they choose Excel because they couldn’t figure out the HTML to do this? E-mail them.
- Most of these tables are dead simple. So why not use HTML? E-mail them.
- Science and engineering departmental population at doctorate-granting institutions, by field: 1987-94 by the National Science Foundation:
-
- ASCII used in a
<pre>
instead of HTML table elements. E-mail them. Make a variant. - All their most recent tables are done in Excel and PDF. For example, Graduate Students and Postdoctorates in Science and Engineering: Fall 2005. Is HTML so hard? E-mail them.
- One row of column headers.
- Indented table sections where most rows are 4 levels deep! Are their headers supposed to accumulate? E-mail them.
- If they must accummulate, probably needs the
headers
+id
patch technique. - Row header text is too long for
rowspan
to be practical? Make a variant. - Totals and subtotals appear at the start of each top-level table section.
- No column spans or row spans.
- Footnotes appear immediately after the table. This seems to be a strong convention in print, ASCII, HTML and other formats?
- ASCII used in a
Finance
- FTSE 100 Listings from Money Extra with loads more UK stock tables:
-
- Column headers cover two rows.
- Entire headers block gets repeated after every 20 rows of data.
- Uses
scope="col"
, so the scope has to stop after it runs down some data rows and hits another header with the samescope
. - As
scope="col"
is used in cells withcolspan
, to accomodate this table we would need to:- be smart about
colspan
inside thescope
algorithm; - or allow smart
colspan
to fill in the gaps afterwards.
- be smart about
- Maybe it’s too funky to accomodate? Removing the
scope
attributes would be an easy retrofit. Make a variant.
- Departmental financial statements from Disability Services Queensland:
- Uses the same
headers
+id
heirarchical row header patching technique as Stephen Ferg in the USA. E-mail them about any influence. - FTSE ACT 250 by Yahoo! Finance:
-
<td align="center">
for column headers. Maybe this should be an alias for<th>
in certain situations? Such as in a row which only contains<td align="center">
?- Some uses of
<td align="center">
and<td>
containing<b>
for different purposes:- Row headers use
<td><b><a>...</a></b></td>
. So<b>
is the only child of<td>
for these headers. - Columns 3 and 4 use
<td align="center">
. Need to be very careful if we allow this as an alias for<th>
. - Column 3 uses
<td><b>...</b> ...</td>
and column 4 uses<td><img> <b>...</b></td>
. So<b>
is not the only child of<td>
for these data cells.
<th>
? - Row headers use
- Why aren’t they using
<th>
? E-mail them. - There are 2 layout tables as ancestors of this data table.
- There are no layout tables as descendants of this data table. A data table can only be at the bottom of nested tables?
- University of Wisconsin–Madison Facts: Budget:
-
- Snapshots and taken on 29th September 2007, with retrofitting simulations:
- Attempted to use
headers
+id
but got it wrong:- Bogus reference to
acprog
in aheaders
attribute value. - Empty string for
headers
values in the “2005-2006 Budget: allocation by program” table from the “Student support” section onwards.
- Bogus reference to
- Tables are captioned with
<caption>
. - Purpose of table is summarised in the
summary
attribute. - Column headers use
<th>
. - Long header text is abbreviated with the
abbr
attribute. - Two headers use
<td>
. - Table is split into headed sections.
- Section headers use
<th>
with a colspan which covers the full table width. - Cell arrangement is regular and doesn’t really need
headers
+id
? Make a variant.
Government
- Bolton Museums - Contact Us:
-
- No column headers. Make a variant with column headers.
- Table begins with a
<th colspan><i>
across the whole width:- It is the first section header.
- Implying this is a caption would be wrong.
<i>
does not hint at a semantic intention, unlike<b>
.
- Subsequent sections also begin with
<th colspan>
across the whole width. - Sections end with a row of cells using
<td><br /></td>
. A cell containing a<br>
is an empty cell which should be ignored? - All sections are in the same
<tbody>
. Make a variant using one<tbody>
for each section. - Row headers use
<td>
. Make a variant using<th>
. - Row headers sometimes span more than 1 row.
- The row header spans more than one column when there is no names of people. Make a variant where the person’s name cell is there but is empty.
- People’s names use
<td><b>
. Implying these are headers would be wrong. Make a variant where this boldness is done via CSS.
- Bureau of Labor Statistics, particularly these areas influenced by Stephen Ferg:
-
- New England information office, Boston.
- New York-New Jersey information office, New York City.
- Mid-Atlantic information office, Philadelphia.
- Southeast information office, Atlanta.
- Midwest information office, Chicago .
<th>
and<td>
are used. Minimalheaders
+id
is added to patch up the HTML4 header search algorithm where needed. - National Statistics Online (UK)
- It’s all PDF except for commentary and graphs?
- TABLE Z-2 - 1910.1000 TABLE Z-2 from the US Department of Labor, Occupational Safety & Health Administration:
- Try saying that three times quickly.
- Expanded Homicide Data Table 2 from the Federal Bureau of Investigation:
-
- Column headers use
<th>
withscope="col"
. - They expect
col
to be sensitive to thecolspan
. I once thought this, too. Probably unaware of thecolgroup
value, which is also rather strange to set up. <th>
withscope="row"
for row headers, augmented withheaders
+id
for the heirarchical row header.headers
+id
for every cell which has an headers applied to it.- A unified header algorithm needs to drop duplicate associations caused by the overlapping association methods in tables like this.
- Footnotes in a
<ul>
after the table.
- Column headers use
Interactive
- Dog selector test:
- Faking a table for a form.
Timetables
- Events - Lions Club of Fleet:
-
<td><h3>
for header cells.- If you could recognise these as headers, you’d need to be smart about
colspan
even through the headers are defaulting tocolspan=1
. - Endnotes are in the final row of the data section.
- Timetables - Isle of Man Steam Packet Company Ferry Services:
-
- Several PDF documents, each of which contains several pages of colour-coded timetables.
- Why did they use PDF? Are HTML tables so much harder? E-mail them.
Products
- Fitting Bras, Correct Bra Size and Comparisons from Bigger Bras:
-
- 2 levels of column headers:
- Column headers use
<td align="center">
. - Table is split into 2 sections, with column headers for each section.
- Column headers use
- 3 levels of row headers:
- Row headers use
<td align="center">
. - Data cells also use
<td align="center">
.
- Row headers use
- Cannot imply
<td align="center">
is a header. - Some cells legitimately contain two pieces of data.
- 2 levels of column headers:
Sports
Detailed Review
I wrote a detailed review of sports tables which included:
- 6 more from ESPN;
- 5 more from Eurosport;
- and 2 from the Intercontinental Rally Challenge website, including the weirdest approach to coding a table I’ve ever seen.
The AFB reviewed some sports sites in early 2006, finding problems with data tables. Disabled people can be sports fans, if you hadn’t realised. Heard of the Paralympics?
ESPN
None of their tables use <th>
. Their column headers use <td>
with CSS to make it bold! But at least retrofitting <th>
would be easy. E-mail them about it.
Their data tables are usually given a caption by placing a <td colspan>
in the first row which spans all columns in that table. I call this an “embedded caption”. Is it so hard to style <caption>
? Test it.
- NHL Player Card for Daniel Alfredsson:
-
- Abbreviations for headers described by a glossary, which is the next table in this review.
- Embedded caption.
- Data is mostly numerical and layed out regularly. Pretty tame.
- NHL Statistics Glossary:
-
- Embedded caption.
- Only 2 columns of data. A small number of columns does not indicate a layout table.
- No cells acting as column headers.
- First column kind of acts as row headers. Should authors bother with row headers in 2-column tables? The user probably just heard it and it can be heard again by moving one cell left.
- If it were retrofitted with
<th>
, would our algorithms work? Make a variant.
- NHL Boxscore:
-
- 14 tables styled to look like data tables.
- 2 of these are layout tables. Each contain 2 of the other 14 data tables.
- Untitled table showing scores per quarter:
- No caption.
- Row headers include some data.
- Final column uses bold styling applied via CSS to indicate importance.
- Top left cell is completely empty.
- Seems indistinguishable from a layout table.
- Make a variant.
- Three Stars:
- 3 column layout table.
- Multiple details per cell.
- There are no column headers, just an embedded caption.
- Probably won’t hurt if this was erroneously identified as a data table?
- Game Information:
- 2 column layout table.
- Multiple details per cell.
- No column headers, just an embedded caption.
- Team Statistical Comparison:
- Layout table.
- Contains 6 tables in one cell.
- Each of these tables is a diagram and not really a data table.
- Need to see the colours and tell them apart to understand the data.
- Make a variant where these are genuine data tables without depending on colour.
- 1st Period Summary:
- Uses a
<td>
spanning the entire table width usingalign=center
in sections where there is no data to report. Imply that is a headers would break this table. - Regular data table with one detail per cell.
- Column headers are repeated.
- Columns 3 and 4 start with individual headers but are replaced by a spanned header. “Smart colspan” wouldn’t recognise this because it would fail in other tables, IIRC.
- Uses a
- 2nd Period Summary, 3rd Period Summary and OT Summary are the same as 1st Period Summary.
- Player Summary is a layout table which contains 2 data tables which are the same:
- 2 rows of column headers.
- First column header is actually a caption for the table and shouldn’t be alongside the other two table headers. Make a variant.
- First row headers span several columns.
- Column headers span a single column.
- Column headers use abbreviations which are not expanded. Make a variant. Can the text content of an
<abbr>
element in a column header be an alias for anabbr
attribute value? - Row headers use
<td>
. - Data is very regular with one detail per cell, except player positions which are in the same column as player names. Make a variant.
- Goaltending Summary is a layout table which contains 2 data tables which are the same:
- Column headers use some abbreviations which are not expanded. Make a variant.
- 3 rows in total, 1 row of data. A small number of rows does not indicate a layout table.
- Row header is marked up using
<td>
. - Very regular with one detail per cell.
- Shots on Goal:
- Caption is embedded into the row of headers. Make a variant.
- Column headers use abbreviations which are not expanded. Make a variant.
- Row headers use
<td>
. - Very regular with one detail per cell.
- MLB Stats 2007:
-
- The tables nested inside the layout table all follow the same pattern:
- First column has a picture of the player.
- Second column has 3 data items lumped in together:
- Player rank for this category.
- Player name, linked to their player card.
- Abbreviated name of the team they play for.
- The first two columns start with the table caption and don’t have a real column header.
- Rightmost column in each is a column of data with a header.
- In practise, these are also layout tables?
- Can these complex mixtures of data tables, layout tables and hybrid tables be told apart?
- How common are situations like this?
- Is retrofitting accessibility to this even possible? Make a variant.
- Sortables:
- Embedded caption.
- Column headers are needed to disambiguate the link in each data cell.
- Using
<td colspan="2">
instead of<th colspan="2">
. Make a variant. E-mail them about it.
- Two-column layout table:
- First cell in each column uses same markup as genuine table headers elsewhere.
- The key difference is this table contains other tables. That means it cannot be a data table.
- The tables nested inside the layout table all follow the same pattern:
- PGA Tour Statistics:
-
- Column headers are repeated after every 10 data rows.
- Row headers are either the number, the player name, or both.
- Candidates for row headers are marked up as plain
<td>
.- Similar to an RNIB example table.
- Empty cells use
<td>--</td>
:- PHP Manual uses 3.
- Other places use 1.
- Yet to a find a place where they are significant.
- So maybe a cell which only contains hyphens is always intended as an empty cell?
- Server-side sorting via a hyperlink in the column header.
- Sorted row is styled like a table using CSS but uses
<td class>
rather than<td><b>
.
- Sorted row is styled like a table using CSS but uses
- Very regular data with one detail per cell.
- Tiger Woods - Player Card:
-
- 3 data tables.
- PGA Season Overview - 2007:
- Row headers use plain
<td>
. - 4 rows in total and only 2 are for data. A small number of rows does not indicate a layout table.
- Very regular data with one detail per cell.
- Row headers use plain
- PGA Tour Stat Ranks - 2007:
- First column header spans two columns even though they contain different details. Make a variant which gives the second column a “value” header.
- Row headers use plain
<td>
. - Very regular data with one detail per cell.
- 2007 Tournaments:
- Most useful row headers are probably the event names, in column 2:
- Make a variant where these use
<th>
. - Make a variant where column 2 is swapped with column 1.
- Make a variant where these use
- Regular data but each cell in column 2 and column 4 contains multiple details.
- Table ends with a full-width row which contains an endnote.
- Most useful row headers are probably the event names, in column 2:
- Indy Racing League Race Schedule:
-
- Borderline layout table.
- Column 2 uses
<td><b>
but the<b>
does not contain everything. It is not intended as a header cell. This trend is consistent with other tables. - Regular data but column 2 and column 3 have several details in each cell.
- Column 2 has track name and location because they are closely related.
- NHRA Results:
-
- Regular data but cell 3 has loads of details dumped into it:
- Borderline layout table because of this.
- Contains 3 rows of data, each consists of:
- Vehicle class, which should be a header.
- Winning driver’s name, inside
<b>
, which should be a data cell. - Winning top speed, which should be a data cell.
- Winning time, which should be a data cell.
- Make a variant.
- Is the data packed in this way so it fits in the website’s layout? E-mail them.
- Probably doesn’t need row headers as there are only 3 columns.
- Regular data but cell 3 has loads of details dumped into it:
Eurosport
- Overall Team Standings: Stage 20:
-
scope
for column headers which are using<th>
. Amazing!scope
for row headers in middle of row. Seems they think this applies it leftward as well as rightward. I used to think that, too.- The tabs above the table are links to 5 other tables built the same way.
- Zebra rows using
<tr class>
with a value alternating betweenrow
andalt
.
Soccer
- League Table - Premier League Soccer (UK):
-
- I made a snapshot of the Premier League table on 16th September 2007.
- Entire table is written to the page using Javascript, specifically:
.innerHTML
rather thandocument.write
.- Javascript constructs the table markup from an
XMLHttpRequest
. - There is no table if Javascript is unavailable. You get
alert
boxes if features are unsupported or an error occurs. - It starts at about line 800, all embedded into the page.
<td>
only with CSS to make the headers bold, just like ESPN.- Probably the most widely recognised data table in the UK.
Elsewhere
Collections I’ve seen but not worked on:
- Project Gutenberg eBook tables from the late Laura Wisewell. Possibly ones from other authors, too.
- USA Federal Statistics, suggested by Steve Faulkner.
- New Zealand government tables collection from Terrence Wood.
- Educational tables from Laura Carlson.
- HTMLWG table examples from
public-html
, gathered by Laura Carlson. - Re: Data Table Collections (Research) from Karl Dubost.
- Examples created solely to demonstrate a particular HTML feature are generally not useful.
If you send in a collection I shall add it to this list but I might not work on it.
Method for Retrofitting Simulations
For each table found on the web:
- If it is part of an existing collection:
- Create a subdirectory for this table.
- Otherwise:
- Create a new directory for this new collection.
- Create a subdirectory for this table.
- Create an
original.html
file with the table markup from the original page. - Create some variants of it, usually these:
minimal.html
:- Strip the original to the simplest markup without changing cell arrangements. Add
border=1
to make structure visible. scope.html
:- Add
scope
attributes to theminimal.html
example, with grouping elements as necessary. scope-abbr.html
:- Add
abbr
attributes to thescope.html
example where appropriate. - Special variants:
-
- Simpler header arrangements.
- Adjescant empty cells as spanned empty cells.
- Translate to English.
- Add
<abbr title>
. - Non-conformant markup where conformant markup is inadequate.
- Etc.
- Get a feel for conformance and sanity using:
- Table Inspector Extension by Gez Lemon.
- HTML Validator for Mozilla by Marc Gueury.
- Validation Service for RELAX NG by Henri Sivonen.
- Upload to the web (duh).
- Update this page if a new collection was created.
Future?
No more original.html
files; they are too big a bottleneck. Dumping links with a summary is more useful for categorising the use cases. It also helps other Participants find things to do.