Syntax Highlighting with HTML for Diverse Formats

Started on 2008-12-09, made public on 2009-08-03. Updated on 2009-11-30, 2010-09-07, 2011-04-17, 2011-06-13, 2012-04-05, 2012-07-01 and 2014-09-01.

What is significant and universal to the code sample we publish? This article defines just that and provides the HTML and CSS to present it online.

Code and file formats contain data by using structures and syntax. Making the structures stand out helps the user navigate around the sample. Editing marks by the author draw attention to particular points of interest. They both deserve formatting but general syntax and the bulk of the data does not.

Mapping Features and Edits to HTML
Features Tokens(Unmarked)Basic syntax including operators, separators and maths.
Comment<i>Read by coders but usually not processed.
Magic Comment<i><i>Prologs, document type and format boundaries.
Element<b>Main structures of the language or format.
Magic Element<b><b>Labels, includes and compile-time conditions.
Value<span>Literal strings, patterns and numbers.
Magic Value<span><span>Constants and enumerations, such as HTML NCRs.
Edits Truncation&hellip;Something was removed to save space.
Important<strong>Draw the reader’s attention to this part.
Error<del>Mistake or incorrect usage.
Changed<ins>Differences between samples.
Varies<var>Expected to change by some means.
NotetitleShort explanation of a specific part.
Reference<a href>Hyperlink to further documentation.

Each keyword of a language usually represents a Feature, such as an Element or Value.

Portable Markup

All samples have some of these basic structures. All authors make some edits. The same HTML can be shared by all samples and authors.

Differences of Style

Different languages and IDEs have different styles. One style for all samples is more coherent on a website but may be less recognisable. Use <pre class> to style samples differently:

<pre class="css">…</pre>
<pre class="python">…</pre>

Styling only those things in the table means the regular stuff is framed by the special stuff, with only the special stuff standing out. That is what makes a highlit sample more legible than a plain sample.

Examples of Features

Presentational elements have short names, weak semantics and useful default styling.

Comment: <i>

The distinctive comment syntax of HTML:

the <code><a href="forms.html#the-optgroup-element">optgroup</a></code> element <!--has an ancestor <code>select</code> element and--> is immediately followed by

PHP has 3 ways to make a comment:

echo 'This is a test'; // end-of-line c++ style comment
/* Multi line comment starts up here
   and continues over another line */
echo 'String value';
echo 'One Final Test'; # End-of-line shell-style comment

Magic Comment: <i><i>

A complete HTML doctype switches from Quirks Mode into Standards Mode:


Element: <b>

Selectors in CSS apply styles to matching elements. Type selectors use an element’s name. Selectors are like the elements of CSS:

.tags li, #footer li { display: inline; padding: 0 0.5em; }

Section titles are the elements of .ini configuration files on Windows:

[boot loader]

Control structures are the elements of programming languages like Visual Basic:

If Width > MAX_WIDTH Then Width = MAX_WIDTH

Magic Element: <b><b>

GTA 2 mission scripts can make different instructions compile for PC and Playstation:

#ifdef PC
DECLARE_DOOR_INFO (928, 935, 2) 
DECLARE_DOOR_INFO (232, 239, 2)
#ifdef PSX
DECLARE_DOOR_INFO (366, 367, 2) 

Python scripts can reference other scripts:

from html5lib import treebuilders, serializer, treewalkers

An XML stylesheet processing instruction associates a CSS file with it:

<?xml-stylesheet href="mystyle.css" media="screen,projection,tv,handheld"?>

Value: <span>

Constants, variables and properties can store values but are not really values themselves. This is more about literal strings and numbers.

Quoted and unquoted values in .htaccess:

AddType 'text/html; charset=utf-8' .html .htm
Redirect permanent "/foo/bar baz.quux"

Literal strings as parameters of function calls in ECMAScript:

this.setAttribute('for', 'status');
filterWord = this.value.toLowerCase().replace('<', '&lt;').replace('>', '&gt;');

Magic Value: <span><span>

Constants are a type of magic value, usually written in uppercase:

SWP_FRAMECHANGED = &H20 'The frame changed: send WM_NCCALCSIZE

The hexadecimal value here would be formatted as a Value. Just like strings and ‘magic numbers’.

Examples of Edits

Phrase elements have short names, useful semantics and the default styling can be helpful.

Truncation: &hellip;

Showing the Python syntax for a multi-line string but only showing the start of the value:

long_description="""Multi-line string starts here…"""

Important: <strong>

Highlighting how short the href values in HTML can be:

 <li><a href='/'>Home</a>
 <li><a href='/blog/'>Life of Ben (Blog)</a>

Error: <del>

Ending a PHP string by accident:

$foo = 'Ben's arbitrary string.';

Also suitable for mistakes, such as redundant class and title attributes:

<li><a href="search.php" class="menu" title="Search this site">Search</a></li>

Changed: <ins>

Correcting the HTML sample above:

<li><a href="/search">Search</a></li>

Marking changes to a UA string:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0)

Step-by-step samples can show what has changed between steps.

Varies: <var>

File paths where segments are likely to be different on the reader’s machine:

C:\Documents and Settings\Windows Login Name\Application Data\Mozilla\Firefox\Profiles\profile name\chrome\

HTML samples which intend dummy content to be replaced:

<title>CSS 2.1 Test Suite: Description of test</title>

Note: title

Normally used in tandem with a Feature or Edit to clarify it. Explaining corrections in a sample:

<li><a href="/search.php" class="menu" title="Search this site">Search</a></li>

Notes can help readers match specific parts of the sample with explanations outside of the sample.

Reference: <a href>

Links should be rare and have a visible style, so users can see what is clickable without distraction. Linking the first instance of each unique keyword would even be excessive, especially in in short listings.

The first named character reference is linked to the full specification from HTML4. A title is also given, to make this destination clearer:

<p>There&rsquo;s a voice,<br>
 keeps on callin&rsquo; me.<br>
Down the road,<br>
 that&rsquo;s where I&rsquo;ll always be.

Detailed enumerations and experiments for particularly esoteric functions in programming can be an interesting tangent for the curious coder to explore.

Examples of Combinations

Each structure can be combined by nesting them as appropriate. HTML is good at nesting.

Important Errors: <strong><del>

If marking an error as important you’d use both <del> and <strong>:

.filtered {
    border: 1px dotted black;

Important parts within an error use the opposite nesting order and can be given a note:

.filtered {
    border: 1px dotted black;

Formats & Languages

On this website, I provide samples of some widely known technologies:

Some lesser-known technologies, too:

Terminal Output

Using <pre><samp> around the whole thing, with other elements inside, provides a surprising amount of semantic richness. Probably more than is needed, in fact. Either way, semantic elements are just as convenient for styling as their formatting cousins.

Researching the State of the Art

Syntax highlighting on the web is usually done in ways I dislike:

<font color="foo">/
<font color="#rrggbb">
Not valid HTML and for good reason! Very inefficient and can be a nightmare if you want to change colours.
<span style="color: #rrggbb">/
<span style="color: rgb(rrr,ggg,bbb)">
Validates but is even less efficient than <font color>!
<span class="foo">/
<span class="foo bar baz">
Validates and is more maintainable. But this is barely as efficient as <font color>, requires CSS to have any visible effect and has no semantics.

Client-Side Scripted Highlighting Techniques

The HTML can be added by a script. But this must still be downloaded. Would scripting provide a better user experience than baking the HTML into each page? There are several aspects to consider:

Baking the HTML into each page seems quite favourable:

Did You Mean “Highlighted”?

To me, if feels like ‘highlighted’ is an error and ‘highlit’ is correct. Consider the words ‘floodlit’ and ‘backlit’ for comparison.