Untangling Bad Tables (5th September 2008)

When I sent detailed analysis about headers+id, I thought it would put the thread back on track. There have been 10 replies since mine, several of considerable length. Precisely 0 were replies to the facts my analysis provided.

I echo a request made during the telecon to see the real table these tests were derived from. Complete and “in situ”, as they’d say on Time Team. This would enable better analysis about which columns are actually redundant and how the data is best arranged.

Story of the Redesign

In the small hours of 31st August 2008, I realised nobody has put forward an actual redesign of the Test File 3 data table. I spent several hours over the next couple of nights figuring out how it can be untangled. I kept notes of each observation, design change and rationale.

During the next week, I gathered peer review while writing headers Issue “Resolved” and also this blog entry.

Peer Review

Ranging from a quick skim-read to detailed fact-checking:

Lachlan Hunt and James Graham recommended that I include all the columns in some Redesign tables, just in case they had highly varying values. I did this and covered a couple of other likely scenarios.

Comparison of Techniques

The improvements to a table can be studied numerically even before users test it.

Number of Cells Rows Size in Bytes
Total Headers Data Empty
Test File
No scope or headers50941081,704
Only scope501436081,872
headers+id502030082,625
Corrected
<th>502030081,659
<th scope> & <td scope>502030082,020
headers+id pointing to <th> and <td>502030082,927
10 rows, headers+id pointing to <th> and <td>202521500329,615
Redesign
Redesigned <th>522130151,212
Section headers replace 2 columns, redesigned <th>472026161,139
Section headers replace 2 columns, footnotes replace 3 columns, redesigned <th>38172016905
<caption> replaces 2 columns, footnotes replace 3 columns, redesigned <th>37162015879
10 rows, redesigned <th>180291501133,188
10 rows, section headers replace 2 columns, redesigned <th>160291301152,935
10 rows, section headers replace 2 columns, footnotes replace 3 columns, redesigned <th>127261001152,461
10 rows, <caption> replaces 2 columns, footnotes replace 3 columns, redesigned <th>125241001132,383

Footnotes

Overall, Redesign 3 and Redesign 7 (its 10-row version) are my favourites.

Cell Count

Each investment in Test File 3 has a row header and 3 row subheaders. Corrected 4 extends Test File 3 (with corrections) from 2 investments to 10 investments. The table goes from having 2 row headers and 6 row subheaders to having 10 row headers and an astonishing 30 row subheaders! This demonstrates the terrible effect bad design can have on table complexity.

Redesign 1 and Redesign 2 stay the most faithful to Test File 3. Because the row subheaders were converted into column headers, each investment has 1 row header and 0 row subheaders. Their 10-row versions, Redesign 5 and Redesign 6, go from having 2 row headers and 0 row subheaders to having 10 row headers and 0 row subheaders.

Moving those row subheaders into column headers may have seemed like false economy. But as the amount of data in the table becomes plausible, the improvement is remarkable.

Byte Count

The tests had an HTML4 DOCTYPE and were delivered as text/html. Removing optional end tags is consistent with that.

Notice that Redesign 7 has 10 investments and still takes fewer bytes than Corrected 3, which has 2 investments but uses headers+id. Corrected 4 is just Corrected 3 with 10 investments. It takes nearly 3 times more markup than Redesign 7.

Test File 3 uses row numbers as part of the values in id and headers. The amount of markup per row increases when the row number reaches double figures. It increases again if the row number reaches 3 digits. This accelerating growth is avoided in Redesigns and they use less markup in the first place.

Judgement Calls

Each pair of Redesigns suits a different scenario for the original data.

If the Running Costs area is about how the Actual value changes each week, I think the column headers can be flattened to 2 levels and 4 more columns can be removed (the 2 duplicates per investment of Budgeted and Forecasted).

Redundant Columns

Columns where all the values are identical add unhelpful repetition to the table:

  1. Type
  2. Status
  3. Allocation
  4. ROI
  5. NPV

Columns are useful for comparing values. If the values are identical, no comparison is required. The shared values could be provided as a footnote below the table, removing 5 whole columns.

If the Type or Status vary, I imagine there would be many shared values. Instead of being in columns, full-width sectional headers could be used to group the table. This is compatible with Smart Headers and is a design pattern on the web and in print. The 5 columns are still removed.

If the other 3 columns (Allocation, ROI and NPV) vary, I imagine they would have many unique values. In this case, it would be best to retain those columns.

Column Headers above Row Headers

The top-left cell is the Child Investment header. The <title> of the page already describes the data is about child investment portfolios and the cells below it are header cells. This header can be removed.

After pondering the table for some time, the Properties header cell started to look weirder and weirder. Each cell below it is a sort of row subheader. They seem unrelated to land ownership and real estate, so why “Properties”? Eventually I realised it’s meant in the sense of programmer jargon: each row subheader introduces one aspect of the Running Cost column group.

By removing the Properties header this area starts looking like a standalone table, with column headers and row headers. The row headers in column 1 are only required to tell which investment each aspect is for. But each investment shares the same aspects. These aren’t row subheaders, they’re column headers!

Row Subheaders to Column Headers

Making Budget, Aquired and Forecast into column headers beneath each date would create 3 levels of headers. Since column groups cannot be nested in HTML4 and scope="col" isn’t smart about colspan, this cannot be expressed using scope in HTML4. Here’s what WCAG suggests:

HTML Technique 5.1.2 of WCAG 1
headers+id is required for 2 or more header levels, even though HTML4 scope can cover 2 levels in a regular table.
H51 for WCAG 2
Plain <th> for a single level of column and row headers.
H63 for WCAG 2
<td scope> is fine for a similar table.
H43 for WCAG 2
headers+id is used when multiple header cells are associated with a data cell. The example uses a regular arrangement of cells.

The “Smart Colspan” aspect of the Smart Headers algorithm proposed for HTML5 understands the significance of colspan. As does the algorithm presently in HTML5. They require neither scope nor <colgroup> for this case work. A fine example of how accessibility features in HTML5 look set to make accessible content genuinely easier to author than HTML4, WCAG 1 or WCAG 2 currently permit.

Header Proliferation?

3 levels of column headers may seem like a terrible proliferation of header cells. But it’s only 1 more header cell than the table had originally, since I removed Child Investment and Properties.

The rowspan="2" for each column header preceeding Running Cost is incremented to 3. This does not change markup size. In the data area, 14 rowspan attributes are removed from only 2 rows of data, reducing markup size. This reduction grows with the number of rows.

Test Files use 1 row branching into 3 after several columns. The absence of rowspan in the redisgned data means each investment uses 1 continuous row. This simplifies the markup, making it easier to read and to generate. The structure is also simplified, which I imagine makes it more intuitive to move around in non-visually.

Arrangement Differences in Detail

In the Test Files, the value for an aspect of an investment has 2 rows between it and the same aspect of a different investment. For example, there are 2 rows between the Forecast value of Partner Portal and the Forecast value of Partner Portal 2. You must skip past 2 rows each time, which causes slight delay. I found doing this for a minute straight really draining, even as a sighted mouse user.

I expect the difficulties would be worse still a screen magnifier. If the structure of the table were understood, perhaps it wouldn’t be so bad in a screen reader. Then again, figuring out the structure would be that much harder in the first place.

In contrast, flat rows means 1 type of comparable data down each column. This makes like-for-like comparison between adjacent investments a single movement.

Unhelpful summary

Test File tables provide summary="Child investment portfolios with budgeted, actual and forecast running costs for dates". The <title> of the page already says it“”s about “Child investment portfolios”, so this is redundant. The Running Cost column header has dates below it, making clear it’s “for dates”.

The row subheaders are a peculiar arrangement. Warning non-visual users about them may well be useful. Avoiding peculiar arrangements in the first place removes the need for any warning, though.

As such, summary is unnecessary for the Redesign table.

Useful <caption>

The Test Files are given a <title> and <h1> which only makes sense for a Test File.

My example adds a <caption> which seems plausible for what the table would use in a genuine web page. By including the year in the <caption>, it becomes redundant in the column headers. I imagine this would cut out a fair bit of repetition in text-to-speech and visibly saves space.

Human Date Format

Test File tables use MM/DD/YYYY format for dates, which is commonplace in the USA. To a Brit like myself, putting the middle unit (months) before the smallest unit (days) and putting the biggest unit (years) after that is an unnatural sequence. To a European, putting the biggest unit at the end may not make sense. If the now redundant /YYYY portion is removed, the remaining DD/MM becomes nonsense to just about everyone.

You could keep the /YYYY portion in the header, with the repetition that goes with it.

Instead of that, I wondered if this was an opportunity to pick a less ambigious representation of dates in the table. When I noticed how wide the date column headers were, I realised ordinal dates with full month names would fit quite well. So that’s what I used.

Later, I realised they were all in December. But moving “December” to the <caption> leaves behind a simple ordinal number for each cell. This might not look or sound like a date. So I kept the month names to help these cells stay clear and usable for all.

Thoughts

Other approaches to redesigning the table probably exist and may be even more effective. This is just the route I took (during the small hours of 2 mornings, I hasten to add!) with what context is available from the Test File versions (which isn’t very much). I encourage others to try redesigning the original table Test Files…and provide corrections to the numbers in these notes. c{:¬P

As the number of rows increases, the improvements I devised had a greater and greater impact. Even with a 2-row Test File, I was able to make significant simplifications to the markup and arrangement of the cells. Growing it to plausible size (10 rows of data) increased the benefits dramatically.

Redesigning a table does require expertise. Then again, so does headers+id. But where headers+id can only paper over the cracks for users of one AT, an expert redesign can help everyone.

Conclusion

All the above considered, I’d call turning bad tables into good tables a plausible and desirable solution.