Syntax Highlighting with HTML for Diverse Formats
What is significant and universal to the code sample we publish? This article defines just that and provides the HTML and CSS to present it online.
Code and file formats contain data by using structures and syntax. Making the structures stand out helps the user navigate around the sample. Editing marks by the author draw attention to particular points of interest. They both deserve formatting but general syntax and the bulk of the data does not.
HTML | Description | ||
---|---|---|---|
Features | Tokens | (Unmarked) | Basic syntax including operators, separators and maths. |
Comment | <i> | Read by coders but usually not processed. | |
Magic Comment | <i><i> | Prologs, document type and format boundaries. | |
Element | <b> | Main structures of the language or format. | |
Magic Element | <b><b> | Labels, includes and compile-time conditions. | |
Value | <span> | Literal strings, patterns and numbers. | |
Magic Value | <span><span> | Constants and enumerations, such as HTML NCRs. | |
Edits | Truncation | … | Something was removed to save space. |
Important | <strong> | Draw the reader’s attention to this part. | |
Error | <del> | Mistake or incorrect usage. | |
Changed | <ins> | Differences between samples. | |
Varies | <var> | Expected to change by some means. | |
Note | title | Short explanation of a specific part. | |
Reference | <a href> | Hyperlink to further documentation. |
Each keyword of a language usually represents a Feature, such as an Element or Value.
Portable Markup
All samples have some of these basic structures. All authors make some edits. The same HTML can be shared by all samples and authors.
Differences of Style
Different languages and IDEs have different styles. One style for all samples is more coherent on a website but may be less recognisable. Use <pre class>
to style samples differently:
<pre class="css">…</pre>
<pre class="python">…</pre>
Styling only those things in the table means the regular stuff is framed by the special stuff, with only the special stuff standing out. That is what makes a highlit sample more legible than a plain sample.
Examples of Features
Presentational elements have short names, weak semantics and useful default styling.
Comment: <i>
The distinctive comment syntax of HTML:
the <code><a href="forms.html#the-optgroup-element">optgroup</a></code> element <!--has an ancestor <code>select</code> element and--> is immediately followed by
PHP has 3 ways to make a comment:
echo 'This is a test'; // end-of-line c++ style comment
/* Multi line comment starts up here
and continues over another line */
echo 'String value';
echo 'One Final Test'; # End-of-line shell-style comment
Magic Comment: <i><i>
A complete HTML doctype switches from Quirks Mode into Standards Mode:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Element: <b>
Selectors in CSS apply styles to matching elements. Type selectors use an element’s name. Selectors are like the elements of CSS:
.tags li, #footer li { display: inline; padding: 0 0.5em; }
Section titles are the elements of .ini
configuration files on Windows:
[boot loader]
timeout=30
Control structures are the elements of programming languages like Visual Basic:
If Width > MAX_WIDTH Then Width = MAX_WIDTH
Magic Element: <b><b>
GTA 2 mission scripts can make different instructions compile for PC and Playstation:
#ifdef PC
DECLARE_DOOR_INFO (928, 935, 2)
DECLARE_DOOR_INFO (232, 239, 2)
#endif
#ifdef PSX
DECLARE_DOOR_INFO (366, 367, 2)
#endif
Python scripts can reference other scripts:
from html5lib import treebuilders, serializer, treewalkers
An XML stylesheet processing instruction associates a CSS file with it:
<?xml-stylesheet href="mystyle.css" media="screen,projection,tv,handheld"?>
Value: <span>
Constants, variables and properties can store values but are not really values themselves. This is more about literal strings and numbers.
Quoted and unquoted values in .htaccess
:
AddType 'text/html; charset=utf-8' .html .htm
Redirect permanent "/foo/bar baz.quux" http://example.org/foo/bar-baz.quux
Literal strings as parameters of function calls in ECMAScript:
this.setAttribute('for', 'status');
filterWord = this.value.toLowerCase().replace('<', '<').replace('>', '>');
Magic Value: <span><span>
Constants are a type of magic value, usually written in uppercase:
SWP_FRAMECHANGED = &H20 'The frame changed: send WM_NCCALCSIZE
The hexadecimal value here would be formatted as a Value. Just like strings and ‘magic numbers’.
Examples of Edits
Phrase elements have short names, useful semantics and the default styling can be helpful.
Truncation: …
Showing the Python syntax for a multi-line string but only showing the start of the value:
long_description="""Multi-line string starts here…"""
Important: <strong>
Highlighting how short the href
values in HTML can be:
<ul>
<li><a href='/'>Home</a>
<li><a href='/blog/'>Life of Ben (Blog)</a>
…
</ul>
Error: <del>
Ending a PHP string by accident:
$foo = 'Ben's arbitrary string.';
Also suitable for mistakes, such as redundant class
and title
attributes:
<li><a href="search.php" class="menu" title="Search this site">Search</a></li>
Changed: <ins>
Correcting the HTML sample above:
<li><a href="/search">Search</a></li>
Marking changes to a UA string:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0)
Step-by-step samples can show what has changed between steps.
Varies: <var>
File paths where segments are likely to be different on the reader’s machine:
C:\Documents and Settings\Windows Login Name\Application Data\Mozilla\Firefox\Profiles\profile name\chrome\
HTML samples which intend dummy content to be replaced:
<title>CSS 2.1 Test Suite: Description of test</title>
Note: title
Normally used in tandem with a Feature or Edit to clarify it. Explaining corrections in a sample:
<li><a href="/search.php" class="menu" title="Search this site">Search</a></li>
Notes can help readers match specific parts of the sample with explanations outside of the sample.
Reference: <a href>
Links should be rare and have a visible style, so users can see what is clickable without distraction. Linking the first instance of each unique keyword would even be excessive, especially in in short listings.
The first named character reference is linked to the full specification from HTML4. A title
is also given, to make this destination clearer:
<p>There’s a voice,<br>
keeps on callin’ me.<br>
Down the road,<br>
that’s where I’ll always be.
Detailed enumerations and experiments for particularly esoteric functions in programming can be an interesting tangent for the curious coder to explore.
Examples of Combinations
Each structure can be combined by nesting them as appropriate. HTML is good at nesting.
Important Errors: <strong><del>
If marking an error as important you’d use both <del>
and <strong>
:
.filtered {
filter:progid:DXImageTransform.Microsoft.Alpha(opacity=50;
border: 1px dotted black;
Important parts within an error use the opposite nesting order and can be given a note:
.filtered {
filter:progid:DXImageTransform.Microsoft.Alpha(opacity=50;
border: 1px dotted black;
Formats & Languages
On this website, I provide samples of some widely known technologies:
.htaccess
to configuremod_gzip
in Apache 1.3- BBcode
- CSS for Horizontal Lists
- HTML for Horizontal Lists
- ECMAScript for Krijn Hoetmer’s IRC Logs
- PHP I wrote in 2007
- Regular Expressions in a
.htaccess
file robots.txt
to control Google Image Search- Visual Basic 6 I wrote in 2007 (Visual Basic 6 I wrote in 2010)
- W3C Document Type Definitions for ‘spec laywering’ the
scope
attribute
Some lesser-known technologies, too:
Terminal Output
- Obscure as this type of content is, the
<samp>
element exists specifically to address it. - Each line of terminal output usually contains one action or item, so whitespace is significant. The
<pre>
element matches that. - Terminals usually show what you typed in, so
<kbd>
can be used around those bits. - Filenames, paths, URLs and so forth can use
<code>
.
Using <pre><samp>
around the whole thing, with other elements inside, provides a surprising amount of semantic richness. Probably more than is needed, in fact. Either way, semantic elements are just as convenient for styling as their formatting cousins.
Researching the State of the Art
Syntax highlighting on the web is usually done in ways I dislike:
<font color="foo">
/<font color="#rrggbb">
- Not valid HTML and for good reason! Very inefficient and can be a nightmare if you want to change colours.
<span style="color: #rrggbb">
/<span style="color: rgb(rrr,ggg,bbb)">
- Validates but is even less efficient than
<font color>
! <span class="foo">
/<span class="foo bar baz">
- Validates and is more maintainable. But this is barely as efficient as
<font color>
, requires CSS to have any visible effect and has no semantics.
Client-Side Scripted Highlighting Techniques
The HTML can be added by a script. But this must still be downloaded. Would scripting provide a better user experience than baking the HTML into each page? There are several aspects to consider:
- the HTTP delay upon the user’s first page view;
- the delay upon subsequent page views;
- any delays between content being rendered and the highlighting being applied;
- any mistakes in what the script does (can be hard to fix, especially if the author didn’t write the script);
- and the user’s sensitivity to the above.
Baking the HTML into each page seems quite favourable:
- No acronym HTTP delay upon the user’s first page view.
- No delay upon subsequent page views.
- No delays between content being rendered and the highlighting being applied.
- Any mistakes of the author can be fixed by the author.
- The user has no delays to be sensitive about and mistakes can be fixed if reported.
Did You Mean “Highlighted”?
To me, if feels like ‘highlighted’ is an error and ‘highlit’ is correct. Consider the words ‘floodlit’ and ‘backlit’ for comparison.