Untangling Bad Tables (5th September 2008)
When I sent detailed analysis about headers
+id
, I thought it would put the thread back on track. There have been 10 replies since mine, several of considerable length. Precisely 0 were replies to the facts my analysis provided.
I echo a request made during the telecon to see the real table these tests were derived from. Complete and “in situ”, as they’d say on Time Team. This would enable better analysis about which columns are actually redundant and how the data is best arranged.
Story of the Redesign
In the small hours of 31st August 2008, I realised nobody has put forward an actual redesign of the Test File 3 data table. I spent several hours over the next couple of nights figuring out how it can be untangled. I kept notes of each observation, design change and rationale.
During the next week, I gathered peer review while writing headers
Issue “Resolved” and also this blog entry.
Peer Review
Ranging from a quick skim-read to detailed fact-checking:
- Anne van Kesteren
- Henri Sivonen
- Ian Hickson
- James Graham
- Lachlan ‘Lachy’ Hunt
- Shawn Medero
- Simon ‘zcorpan’ Pieters
Lachlan Hunt and James Graham recommended that I include all the columns in some Redesign tables, just in case they had highly varying values. I did this and covered a couple of other likely scenarios.
Comparison of Techniques
The improvements to a table can be studied numerically even before users test it.
Number of Cells | Rows | Size in Bytes | |||||
---|---|---|---|---|---|---|---|
Total | Headers | Data | Empty | ||||
Test File | |||||||
No scope or headers | 50 | 9 | 41 | 0 | 8 | 1,704 | |
Only scope | 50 | 14 | 36 | 0 | 8 | 1,872 | |
headers +id | 50 | 20 | 30 | 0 | 8 | 2,625 | |
Corrected | |||||||
<th> | 50 | 20 | 30 | 0 | 8 | 1,659 | |
<th scope> & <td scope> | 50 | 20 | 30 | 0 | 8 | 2,020 | |
headers +id pointing to <th> and <td> | 50 | 20 | 30 | 0 | 8 | 2,927 | |
10 rows, headers +id pointing to <th> and <td> | 202 | 52 | 150 | 0 | 32 | 9,615 | |
Redesign | |||||||
Redesigned <th> | 52 | 21 | 30 | 1 | 5 | 1,212 | |
Section headers replace 2 columns, redesigned <th> | 47 | 20 | 26 | 1 | 6 | 1,139 | |
Section headers replace 2 columns, footnotes replace 3 columns, redesigned <th> | 38 | 17 | 20 | 1 | 6 | 905 | |
<caption> replaces 2 columns, footnotes replace 3 columns, redesigned <th> | 37 | 16 | 20 | 1 | 5 | 879 | |
10 rows, redesigned <th> | 180 | 29 | 150 | 1 | 13 | 3,188 | |
10 rows, section headers replace 2 columns, redesigned <th> | 160 | 29 | 130 | 1 | 15 | 2,935 | |
10 rows, section headers replace 2 columns, footnotes replace 3 columns, redesigned <th> | 127 | 26 | 100 | 1 | 15 | 2,461 | |
10 rows, <caption> replaces 2 columns, footnotes replace 3 columns, redesigned <th> | 125 | 24 | 100 | 1 | 13 | 2,383 |
Footnotes
- Test File tables are from
headers
+id
Testing (Bug 5822). - Corrected tables are what I consider fairer versions of them.
- Redesign tables have a variety of cell re-arrangements and footnotes.
- Redesign 1 is the nearest to the original and they get further away after that.
- Comparing tables with the same number of investments makes the most sense. All tables have 2 investments unless the link text starts “10 rows”.
Overall, Redesign 3 and Redesign 7 (its 10-row version) are my favourites.
Cell Count
Each investment in Test File 3 has a row header and 3 row subheaders. Corrected 4 extends Test File 3 (with corrections) from 2 investments to 10 investments. The table goes from having 2 row headers and 6 row subheaders to having 10 row headers and an astonishing 30 row subheaders! This demonstrates the terrible effect bad design can have on table complexity.
Redesign 1 and Redesign 2 stay the most faithful to Test File 3. Because the row subheaders were converted into column headers, each investment has 1 row header and 0 row subheaders. Their 10-row versions, Redesign 5 and Redesign 6, go from having 2 row headers and 0 row subheaders to having 10 row headers and 0 row subheaders.
Moving those row subheaders into column headers may have seemed like false economy. But as the amount of data in the table becomes plausible, the improvement is remarkable.
Byte Count
The tests had an HTML4 DOCTYPE and were delivered as text/html
. Removing optional end tags is consistent with that.
Notice that Redesign 7 has 10 investments and still takes fewer bytes than Corrected 3, which has 2 investments but uses headers
+id
. Corrected 4 is just Corrected 3 with 10 investments. It takes nearly 3 times more markup than Redesign 7.
Test File 3 uses row numbers as part of the values in id
and headers
. The amount of markup per row increases when the row number reaches double figures. It increases again if the row number reaches 3 digits. This accelerating growth is avoided in Redesigns and they use less markup in the first place.
Judgement Calls
Each pair of Redesigns suits a different scenario for the original data.
If the Running Costs area is about how the Actual value changes each week, I think the column headers can be flattened to 2 levels and 4 more columns can be removed (the 2 duplicates per investment of Budgeted and Forecasted).
Redundant Columns
Columns where all the values are identical add unhelpful repetition to the table:
- Type
- Status
- Allocation
- ROI
- NPV
Columns are useful for comparing values. If the values are identical, no comparison is required. The shared values could be provided as a footnote below the table, removing 5 whole columns.
If the Type or Status vary, I imagine there would be many shared values. Instead of being in columns, full-width sectional headers could be used to group the table. This is compatible with Smart Headers and is a design pattern on the web and in print. The 5 columns are still removed.
If the other 3 columns (Allocation, ROI and NPV) vary, I imagine they would have many unique values. In this case, it would be best to retain those columns.
Column Headers above Row Headers
The top-left cell is the Child Investment header. The <title>
of the page already describes the data is about child investment portfolios and the cells below it are header cells. This header can be removed.
After pondering the table for some time, the Properties header cell started to look weirder and weirder. Each cell below it is a sort of row subheader. They seem unrelated to land ownership and real estate, so why “Properties”? Eventually I realised it’s meant in the sense of programmer jargon: each row subheader introduces one aspect of the Running Cost column group.
By removing the Properties header this area starts looking like a standalone table, with column headers and row headers. The row headers in column 1 are only required to tell which investment each aspect is for. But each investment shares the same aspects. These aren’t row subheaders, they’re column headers!
Row Subheaders to Column Headers
Making Budget, Aquired and Forecast into column headers beneath each date would create 3 levels of headers. Since column groups cannot be nested in HTML4 and scope="col"
isn’t smart about colspan
, this cannot be expressed using scope
in HTML4. Here’s what WCAG suggests:
- HTML Technique 5.1.2 of WCAG 1
headers
+id
is required for 2 or more header levels, even though HTML4scope
can cover 2 levels in a regular table.- H51 for WCAG 2
- Plain
<th>
for a single level of column and row headers. - H63 for WCAG 2
<td scope>
is fine for a similar table.- H43 for WCAG 2
headers
+id
is used when multiple header cells are associated with a data cell. The example uses a regular arrangement of cells.
The “Smart Colspan” aspect of the Smart Headers algorithm proposed for HTML5 understands the significance of colspan
. As does the algorithm presently in HTML5. They require neither scope
nor <colgroup>
for this case work. A fine example of how accessibility features in HTML5 look set to make accessible content genuinely easier to author than HTML4, WCAG 1 or WCAG 2 currently permit.
Header Proliferation?
3 levels of column headers may seem like a terrible proliferation of header cells. But it’s only 1 more header cell than the table had originally, since I removed Child Investment and Properties.
The rowspan="2"
for each column header preceeding Running Cost is incremented to 3
. This does not change markup size. In the data area, 14 rowspan attributes are removed from only 2 rows of data, reducing markup size. This reduction grows with the number of rows.
Test Files use 1 row branching into 3 after several columns. The absence of rowspan
in the redisgned data means each investment uses 1 continuous row. This simplifies the markup, making it easier to read and to generate. The structure is also simplified, which I imagine makes it more intuitive to move around in non-visually.
Arrangement Differences in Detail
In the Test Files, the value for an aspect of an investment has 2 rows between it and the same aspect of a different investment. For example, there are 2 rows between the Forecast value of Partner Portal and the Forecast value of Partner Portal 2. You must skip past 2 rows each time, which causes slight delay. I found doing this for a minute straight really draining, even as a sighted mouse user.
I expect the difficulties would be worse still a screen magnifier. If the structure of the table were understood, perhaps it wouldn’t be so bad in a screen reader. Then again, figuring out the structure would be that much harder in the first place.
In contrast, flat rows means 1 type of comparable data down each column. This makes like-for-like comparison between adjacent investments a single movement.
Unhelpful summary
Test File tables provide summary="Child investment portfolios with budgeted, actual and forecast running costs for dates"
. The <title>
of the page already says it“”s about “Child investment portfolios”, so this is redundant. The Running Cost column header has dates below it, making clear it’s “for dates”.
The row subheaders are a peculiar arrangement. Warning non-visual users about them may well be useful. Avoiding peculiar arrangements in the first place removes the need for any warning, though.
As such, summary
is unnecessary for the Redesign table.
Useful <caption>
The Test Files are given a <title>
and <h1>
which only makes sense for a Test File.
My example adds a <caption>
which seems plausible for what the table would use in a genuine web page. By including the year in the <caption>
, it becomes redundant in the column headers. I imagine this would cut out a fair bit of repetition in text-to-speech and visibly saves space.
Human Date Format
Test File tables use MM/DD/YYYY
format for dates, which is commonplace in the USA. To a Brit like myself, putting the middle unit (months) before the smallest unit (days) and putting the biggest unit (years) after that is an unnatural sequence. To a European, putting the biggest unit at the end may not make sense. If the now redundant /YYYY
portion is removed, the remaining DD/MM
becomes nonsense to just about everyone.
You could keep the /YYYY
portion in the header, with the repetition that goes with it.
Instead of that, I wondered if this was an opportunity to pick a less ambigious representation of dates in the table. When I noticed how wide the date column headers were, I realised ordinal dates with full month names would fit quite well. So that’s what I used.
Later, I realised they were all in December. But moving “December” to the <caption>
leaves behind a simple ordinal number for each cell. This might not look or sound like a date. So I kept the month names to help these cells stay clear and usable for all.
Thoughts
Other approaches to redesigning the table probably exist and may be even more effective. This is just the route I took (during the small hours of 2 mornings, I hasten to add!) with what context is available from the Test File versions (which isn’t very much). I encourage others to try redesigning the original table Test Files…and provide corrections to the numbers in these notes. c{:¬P
As the number of rows increases, the improvements I devised had a greater and greater impact. Even with a 2-row Test File, I was able to make significant simplifications to the markup and arrangement of the cells. Growing it to plausible size (10 rows of data) increased the benefits dramatically.
Redesigning a table does require expertise. Then again, so does headers
+id
. But where headers
+id
can only paper over the cracks for users of one AT, an expert redesign can help everyone.
Conclusion
All the above considered, I’d call turning bad tables into good tables a plausible and desirable solution.