1 Title: Typesetting a short story using docbook for PDF, HTML and EPUB
2 Tags: english, docbook, freeculture, opphavsrett
5 <p>A few days ago, during a discussion in
6 <a href="http://www.efn.no/">EFN</a> about interesting books to read
7 about copyright and the data retention directive, a suggestion to read
8 the 1968 short story Kodémus by
9 <a href="http://web2.gyldendal.no/toraage/">Tore Åge Bringsværd</a>
10 came up. The text was only available in old paper books, and thus not
11 easily available for current and future generations. Some of the
12 people participating in the discussion contacted the author, and
13 reported back 2013-03-19 that the author was OK with releasing the
14 short story using a <a href="http://www.creativecommons.org/">Creative
15 Commons</a> license. The text was quickly scanned and OCR-ed, and we
16 were ready to start on the editing and typesetting.</p>
18 <p>As I already had some experience formatting text in my project to
19 provide a Norwegian version of the Free Culture book by Lawrence
20 Lessig, I chipped in and set up a
21 <a href="http://www.docbook.org/">DocBook</a> processing framework to
22 generate PDF, HTML and EPUB version of the short story. The tools to
23 transform DocBook to different formats are already in my Linux
24 distribution of choice, <a href="http://www.debian.org/">Debian</a>, so
25 all I had to do was to use the
26 <a href="http://dblatex.sourceforge.net/">dblatex</a>,
27 <a href="http://docbook.sourceforge.net/release/xsl/current/epub/README">dbtoepub</a>
28 and <a href="https://fedorahosted.org/xmlto/">xmlto</a> tools to do the
29 conversion. After a few days, we decided to replace dblatex with
31 <a href="http://wiki.docbook.org/DocBookXslStylesheets">docbook-xsl</a>),
32 to get the copyright information to show up in the PDF and to get a
33 nicer <variablelist> typesetting, but that is just a minor
36 <p>There were a few challenges, of course. We want to typeset the
37 short story to look like the original, and that require fairly good
38 control over the layout. The original short story have three
39 parts/scenes separated by a single horizontally centred star (*), and
40 the paragraphs do not contain only flowing text, but dialogs and text
41 that started on a new line in the middle of the paragraph.</p>
43 <p>I initially solved the first challenge by using a paragraph with a
44 single star in it, ie <para>*</para>, but it made sure a
45 placeholder indicated where the scene shifted. This did not look too
46 good without the centring. The next approach was to create a new
47 preprocessor directive <?newscene?>, mapping to "<hr/>"
48 for HTML and "<fo:block text-align="center"><fo:leader
49 leader-pattern="rule" rule-thickness="0.5pt"/></fo:block>"
50 for FO/PDF output (did not try to implement this in dblatex, as we had
51 switched at this time). The HTML XSL file looked like this:</p>
54 <?xml version='1.0'?>
55 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
56 <xsl:template match="processing-instruction('newscene')">
59 </xsl:stylesheet>
60 </pre></blockquote></p>
62 <p>And the FO/PDF XSL file looked like this:</p>
65 <?xml version='1.0'?>
66 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
67 <xsl:template match="processing-instruction('newscene')">
68 <fo:block text-align="center">
69 <fo:leader leader-pattern="rule" rule-thickness="0.5pt"/>
72 </xsl:stylesheet>
73 </pre></blockquote></p>
75 <p>Finally, I came across the <bridgehead> tag, which seem to be
76 a good fit for the task at hand, and I replaced <?newscene?>
77 with <bridgehead>*</bridgehead>. It isn't centred, but we
78 can fix it with some XSL rule if the current visual layout isn't
81 <p>I did not find a good DocBook compliant way to solve the
82 linebreak/paragraph challenge, so I ended up creating a new processor
83 directive <?linebreak?>, mapping to <br/> in HTML, and
84 <fo:block/> in FO/PDF. I suspect there are better ways to do
85 this, and welcome ideas and patches on github. The HTML XSL file now
89 <?xml version='1.0'?>
90 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
91 <xsl:template match="processing-instruction('linebreak)">
94 </xsl:stylesheet>
95 </pre></blockquote></p>
97 <p>And the FO/PDF XSL file looked like this:</p>
100 <?xml version='1.0'?>
101 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'
102 xmlns:fo="http://www.w3.org/1999/XSL/Format">
103 <xsl:template match="processing-instruction('linebreak)">
105 </xsl:template>
106 </xsl:stylesheet>
107 </pre></blockquote></p>
109 <p>One unsolved challenge is our wish to expose different ISBN numbers
110 per publication format, while keeping all of them in some conditional
111 structure in the DocBook source. No idea how to do this, so we ended
112 up listing all the ISBN numbers next to their format in the colophon
115 <p>If you want to check out the finished result, check out the
116 <a href="https://github.com/sickel/kodemus">source repository at
118 (<a href="https://github.com/EFN/kodemus">future/new/official
119 repository</a>). We expect it to be ready and announced in a few