Creating a Table of Contents Generator in PHP

Creating a Table of Contents Generator in PHP

Recently i had to create a “Table of Contents” Aka “Contents” Generator for Brugbart. I was inspired by several other websites who provides similar functionality through JavaScript, i just don’t like JavaScript, mainly because i find it difficult to get it to work equally cross-browser, without having to first check if the browser supports the functions.

Anyway i decided that it was about time, that i included a Contents Generator, to make it easier for users to navigate in our Articles, Tutorials and References. I first tried to use php DOM, but i had two problems, i had no idea about how to work with the DOM, later i found the second problem, my host apparently didn’t support the DOM functions.

That was where i decided to return to the familiar territory of regular expressions.

A “Contents” list can be a list of subsections on a page, or links to other pages. But its usually a list of links, to sections on the same page. These sections are marked up with headings, each with a unique id, which can be used when linking to them.

An example can be seen below.
<a href="http://example.com/Links.html#Section1">Section 1</a>

Note the “#Section1″ part of the url, this refers to the id of the heading accordingly, example below.
<h2 id="Section1">How to Create Links in HTML</h2>
Now, what we want is to count the headings, and take the text of each, and use that as anchor text of our “Contents” list links. I wrote a function in PHP which dose exactly that, and i even found a CSS based solution to the nested list bug, which i will provide after explaining my function, shown below.
function TableOfContents($html) {
  preg_match_all("/(<h([0-6]{1})[^<>]*>)([^<>]+)(</h[0-6]{1}>)/", $html, $matches, PREG_SET_ORDER);

  $LI = 0; // List Item Count
  $HL = 2; // Heading Level
  $SubHeading = false;
  foreach ($matches as $val) {
    ++$LI;
 
 if ($val[2] == $HL) { // If the heading level didn’t change.
   $List[“$LI”]        = ‘<li><a href=”#Sec’.$LI.’”>’. $val[3] . ‘</a></li>’;
 } else if ($val[2] > $HL) { // If bigger then last heading level, create a nested list.
     $List[“$LI”]        = ‘<li><ul><li><a href=”#Sec’.$LI.’”>’. $val[3] . ‘</a></li>’;
   if ($SubHeading === true) {
     $SubHeading = false;
   } else { $SubHeading = true; }
 } else if ($val[2] < $HL) { // If less then last Heading Level, end nested list.
   $List[“$LI”]        = ‘</ul></li><li><a href=”#Sec’.$LI.’”>’. $val[3] . ‘</a></li>’;
 }
 
 $Sections[“$LI”]    = $val[1] . $val[2] . $val[3]; // Original heading to be Replaced.
 $SectionWIDs[“$LI”] = ‘<h’ . $val[2] . ‘ id=”Sec’.$LI.’”>’ . $val[3] . $val[4]; // This is the new Heading.
 
    $HL = $val[2];
  }
  switch ($HL) { // Final markup fix, used if the list ended on a subheading, such as h3, h4. Etc.
    case 3:
     $List[“$LI”] = $List[“$LI”] . ‘</ul></li>’;
    break;
    case 4:
     $List[“$LI”] = $List[“$LI”] . ‘</ul></li></ul></li>’;
    break;
    case 5:
     $List[“$LI”] = $List[“$LI”] . ‘</ul></li></ul></li></ul></li>’;
    break;
    case 6:
     $List[“$LI”] = $List[“$LI”] . ‘</ul></li></ul></li></ul></li></ul></li>’;
    break;
  }
    $Settu = ”;
  foreach ($List as $val) { // Puts together the list.
    $Settu = $Settu . $val;
  }

  return ‘<div id=”TOC”><p>Contents:</p><ul>’ . $Settu . ‘</ul></div>’ . str_replace($Sections, $SectionWIDs, $html); // Returns the content
}

One of the first problems you will face when doing something like this, is how to find all the headings on a page, without knowing how many there are before hand. This is where i figured to use “preg_match_all”, this function will find all patterns on a page, which matches the given regular expression, and then include them in an array.

The regular expressions i used is “/(<h([0-6]{1})[^<>]*>)([^<>]+)(</h[0-6]{1}>)/”. Each parentheses is a back reference, which basically remembers whatever was matched within.

The first reference “(<h([1-6]{1})[^<>]*>)”, matches the entire start tag of the heading, the second “([1-6]{1})” matches the heading level, the number used, it allows a single character consisting of the numbers 1 between 6.

The Third reference “([^<>]+)”, the one in the middle, matches the text of the heading, it allows all characters except “<>” which have special meaning in html, this makes sure that we don’t accidentally match the end tag.

Finally the fourth reference “(</h[0-6]{1}>)” matches the end tag.

The below variables are used by the loop we are going to use.
$LI = 0; // List Item Count
$HL = 2; // Heading Level
The $HL variable sets the heading level, i used h2 since my own site only uses h2 and upwards in its content. You will need to set this to h1 if you have multiple h1s that you would like listed, even though its usually a bad idea to have more then one h1. However if your design requires it, then you should also change the switch to account for the extra heading level.

The “foreach ($matches as $val) {” takes each match, and puts it into “$val”, which we then use in the if statement. We need this if statement for our nested lists, this is to make sure that each element is properly closed. You should read the comments on the if statement for further information.

You may already know about the “space above” bug occurring when having nested lists, the easiest solution is to to apply display: inline; to the nested list itself, example below.
<!doctype html>
<html lang="en">

<head>
<title>The solution to The Nested List Bug!</title>
<style type=”text/css”>
ul {
list-style-type: none;
margin: 0; padding: 0;
}

ul ul {
display: inline;
}
ul ul li {
margin-left: 0.5em;
}
</style>
</head>

<body>
<ul>
<li>List Item</li>
<li>List Item</li>
<li>List Item</li>
<li>
<ul>
<li>Sub Item</li>
<li>Sub Item</li>
<li>Sub Item</li>
</ul>
</li>
<li>List Item</li>
<li>List Item</li>
</ul>
</body>

</html>

The css we need to use for this list, is included below. You can change it as you like, so that it may fit into your own design.

#TOC { /* Table Of Content */
float: right; /* Makes the content text wrap nicely around. */
padding: 0 2em 1em 2em !important;
}
#TOC p {
margin: 0 !important;padding: 0 !important;
}
#TOC ul {
list-style-type: none;
margin: 0; padding: 0;
}

#TOC ul ul {
display: inline;
}
#TOC ul ul li {
margin-left: 0.5em;
}

All we need to do now, is to call the function on a variable containing the content, presumebly fetched from our database.

$Content = TableOfContents($Content);

Jacob Kristensen (Aka BlueBoden), is the Developer and CEO of Brugbart Webdesign.

No related posts.

You may also Like

×