Petter Reinholdtsen

2013-03-24-docbook-shortstory
24th March 2013

A few days ago, during a discussion in EFN about interesting books to read about copyright and the data retention directive, a suggestion to read the old 1968 short story Kodémus by Tore Åge Bringsværd came up. The text is only available in old paper books, and thus hard to read for the current and future generations. Some of the people participating in the discussion contacted the author, and reported back 2013-03-19 that he was ok with releasing the short story using a Creative Commons license (CC-BY-NC-ND). The text was quickly scanned and OCR-ed, and we were ready to start on the editing and typesetting.

As I already had some experience formatting text in my project to provide a Norwegian version of the Free Culture book by Lawrence Lessig, I chipped in and set up a DocBook processing framework to generate a PDF, HTML and EPUB version of the short story. The tools to transform DocBook to different formats are already in my Linux distribution of choice, Debian, so all I had to do was to use the dblatex, dbtoepub and xmlto tools to do the conversion. After a few days, we decided to replace dblatex with xsltproc/fop (aka docbook-xsl), to get the copyright information to show up in the PDF and to get a nicer <variablelist> typesetting.

There were a few challenges, of course. We want to typeset the short story to look like the original, and that require fairly good control over the layout. The original short story have three parts/scenes separated by a single horizontally centred star (*), and the paragraphs do not contain only flowing text, but dialogs and text that started on a new line in the middle of the paragraph.

I initially solved the first challenge by using a paragraph with a single star in it, ie <para>*</para>. This did not look too good without the centring. The next approach was to create a new preprocessor directive <?newscene?>, mapping to "<hr/&gr;" for HTML and "<fo:block text-align="center"><fo:leader leader-pattern="rule" rule-thickness="0.5pt"/></fo:block>" for FO/PDF output (did not try to implement this in dblatex, as we had switched at this time). The HTML XSL file looked like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
  <xsl:template match="processing-instruction('newscene')">
    <hr/>
  </xsl:template>
</xsl:stylesheet> 

And the FO/PDF XSL file looked like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
  <xsl:template match="processing-instruction('newscene')">
    <fo:block text-align="center">
      <fo:leader leader-pattern="rule" rule-thickness="0.5pt"/>
    </fo:block>
  </xsl:template>
</xsl:stylesheet> 

Finally, I came across the <bridgehead> tag, which seem to be a good fit for the task at hand, and I replaced <?newscene?> with <bridgehead>*</bridgehead>. It isn't centered, but we can fix that with XSL rules if the current visual layout isn't enough.

I did not find a good DocBook compliant way to solve the linebreak/paragraph challenge, so I ended up creating a new processor directive <?linebreak?>, mapping to <br/> in HTML, and <fo:block/&gr; in FO/PDF. I suspect there are better ways to do this, and welcome ideas and patches on github. The HTML XSL file now look like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
  <xsl:template match="processing-instruction('linebreak)">
    <br/>
  </xsl:template>
</xsl:stylesheet> 

And the FO/PDF XSL file looked like this:

<?xml version='1.0'?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'
  xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <xsl:template match="processing-instruction('linebreak)">
    <fo:block/>
  </xsl:template>
</xsl:stylesheet> 

One unsolved challenge is our wish to expose different ISBN numbers per publication format, while keeping all of them in some conditional structure in the DocBook source. No idea how to do this, so we ended up listing all the ISBN numbers next to their format in the colophon page.

If you want to check out the finished result, check out the source repository at github (future/new repository).

Created by Chronicle v4.6