Groff: Relocate table of contents

2023-03-14

I recently discovered the program groff, which is a minimal and simple to use text formatter (somewhat like LaTeX).

One thing whith groff is that the table of contents has to be generated at the end of the document, because of groff being a single-pass text formatter. This means that groff passes over the file only once. To generate a toc it need to know on which page your headings are located and it cannot know page number without having seen your headings.

So I wrote a little script that relocates the toc after running groff:

#!/bin/sh

file="$1"
base="${file%.*}"

if [ "$(grep -i '.TC' "$file")" = "" ]; then
    preconv "$file" | refer -PS -e | groff -me -ms -kept -T pdf > "$base".pdf
    exit 0
fi

groff -m ms -k -Tps "$file" > "$base".ps

csplit --prefix=/tmp/TOC example.ps /ble\ of\ Contents/-5 # --> TOC00, TOC01
csplit --prefix=/tmp/CONT /tmp/TOC00 /^%%Page:\ 1\ 2/ # --> CONT00, CONT01
csplit --prefix=/tmp/TOC /tmp/TOC01 /^%%Trailer/ # --> TOC00, TOC01

cat /tmp/CONT00 /tmp/TOC00 /tmp/CONT01 /tmp/TOC01 | ps2pdf - "$base".pdf

rm "$base".ps
rm /tmp/TOC*
rm /tmp/CONT*

The first part of the script checks if the .ms file contains the string “.TC”, which is used to generate the toc. If grep cannot find the given string, it simply compiles the pdf using groff and exits right away.

if [ "$(grep -i '.TC' "$file")" = "" ]; then
    preconv "$file" | refer -PS -e | groff -me -ms -kept -T pdf > "$base".pdf
    exit 0
fi

If the script did not exit, it continues by compiling a postscript document using groff, which then can be converted into a pdf later.

groff -m ms -k -Tps "$file" > "$base".ps

The postscript file is then split into several parts.

First it looks for the sequence “ble of Contents”, which is the heading of the toc itself and splits the file 5 lines above this line.

csplit --prefix=/tmp/TOC example.ps /ble\ of\ Contents/-5 # --> TOC00, TOC01

Then the first file resulting from the split is split again above the line saying %%Page: 1 2, that is where the title page ends.

csplit --prefix=/tmp/CONT /tmp/TOC00 /^%%Page:\ 1\ 2/ # --> CONT00, CONT01

And last the second file from the first split is splut at the line saying %%Trailer, which indicates the footer of the postscript file.

csplit --prefix=/tmp/TOC /tmp/TOC01 /^%%Trailer/ # --> TOC00, TOC01

These splits are then concatenated in their new order as well as converted into a pdf file.

cat /tmp/CONT00 /tmp/TOC00 /tmp/CONT01 /tmp/TOC01 | ps2pdf - "$base".pdf

The script has some requirements it needs to meet, otherwise the relocation won’t work properly (yet). These are that the string “ble of Contents” cannot be present in the file before the actual table of contents and that the abstract or the title page respectively cannot use more than one page.

You’ll find the script on my GitLab and you are free to suggest any changes or improvements.