- <title>Speeding up the Debian installer using eatmydata and dpkg-divert</title>
- <link>http://people.skolelinux.org/pere/blog/Speeding_up_the_Debian_installer_using_eatmydata_and_dpkg_divert.html</link>
- <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Speeding_up_the_Debian_installer_using_eatmydata_and_dpkg_divert.html</guid>
- <pubDate>Tue, 16 Sep 2014 14:00:00 +0200</pubDate>
- <description><p>The <a href="https://www.debian.org/">Debian</a> installer could be
-a lot quicker. When we install more than 2000 packages in
-<a href="http://www.skolelinux.org/">Skolelinux / Debian Edu</a> using
-tasksel in the installer, unpacking the binary packages take forever.
-A part of the slow I/O issue was discussed in
-<a href="https://bugs.debian.org/613428">bug #613428</a> about too
-much file system sync-ing done by dpkg, which is the package
-responsible for unpacking the binary packages. Other parts (like code
-executed by postinst scripts) might also sync to disk during
-installation. All this sync-ing to disk do not really make sense to
-me. If the machine crash half-way through, I start over, I do not try
-to salvage the half installed system. So the failure sync-ing is
-supposed to protect against, hardware or system crash, is not really
-relevant while the installer is running.</p>
-
-<p>A few days ago, I thought of a way to get rid of all the file
-system sync()-ing in a fairly non-intrusive way, without the need to
-change the code in several packages. The idea is not new, but I have
-not heard anyone propose the approach using dpkg-divert before. It
-depend on the small and clever package
-<a href="https://packages.qa.debian.org/eatmydata">eatmydata</a>, which
-uses LD_PRELOAD to replace the system functions for syncing data to
-disk with functions doing nothing, thus allowing programs to live
-dangerous while speeding up disk I/O significantly. Instead of
-modifying the implementation of dpkg, apt and tasksel (which are the
-packages responsible for selecting, fetching and installing packages),
-it occurred to me that we could just divert the programs away, replace
-them with a simple shell wrapper calling
-"eatmydata&nbsp;$program&nbsp;$@", to get the same effect.
-Two days ago I decided to test the idea, and wrapped up a simple
-implementation for the Debian Edu udeb.</p>
-
-<p>The effect was stunning. In my first test it reduced the running
-time of the pkgsel step (installing tasks) from 64 to less than 44
-minutes (20 minutes shaved off the installation) on an old Dell
-Latitude D505 machine. I am not quite sure what the optimised time
-would have been, as I messed up the testing a bit, causing the debconf
-priority to get low enough for two questions to pop up during
-installation. As soon as I saw the questions I moved the installation
-along, but do not know how long the question were holding up the
-installation. I did some more measurements using Debian Edu Jessie,
-and got these results. The time measured is the time stamp in
-/var/log/syslog between the "pkgsel: starting tasksel" and the
-"pkgsel: finishing up" lines, if you want to do the same measurement
-yourself. In Debian Edu, the tasksel dialog do not show up, and the
-timing thus do not depend on how quickly the user handle the tasksel
-dialog.</p>
-
-<p><table>
-
-<tr>
-<th>Machine/setup</th>
-<th>Original tasksel</th>
-<th>Optimised tasksel</th>
-<th>Reduction</th>
-</tr>
-
-<tr>
-<td>Latitude D505 Main+LTSP LXDE</td>
-<td>64 min (07:46-08:50)</td>
-<td><44 min (11:27-12:11)</td>
-<td>>20 min 18%</td>
-</tr>
-
-<tr>
-<td>Latitude D505 Roaming LXDE</td>
-<td>57 min (08:48-09:45)</td>
-<td>34 min (07:43-08:17)</td>
-<td>23 min 40%</td>
-</tr>
-
-<tr>
-<td>Latitude D505 Minimal</td>
-<td>22 min (10:37-10:59)</td>
-<td>11 min (11:16-11:27)</td>
-<td>11 min 50%</td>
-</tr>
-
-<tr>
-<td>Thinkpad X200 Minimal</td>
-<td>6 min (08:19-08:25)</td>
-<td>4 min (08:04-08:08)</td>
-<td>2 min 33%</td>
-</tr>
-
-<tr>
-<td>Thinkpad X200 Roaming KDE</td>
-<td>19 min (09:21-09:40)</td>
-<td>15 min (10:25-10:40)</td>
-<td>4 min 21%</td>
-</tr>
-
-</table></p>
-
-<p>The test is done using a netinst ISO on a USB stick, so some of the
-time is spent downloading packages. The connection to the Internet
-was 100Mbit/s during testing, so downloading should not be a
-significant factor in the measurement. Download typically took a few
-seconds to a few minutes, depending on the amount of packages being
-installed.</p>
-
-<p>The speedup is implemented by using two hooks in
-<a href="https://www.debian.org/devel/debian-installer/">Debian
-Installer</a>, the pre-pkgsel.d hook to set up the diverts, and the
-finish-install.d hook to remove the divert at the end of the
-installation. I picked the pre-pkgsel.d hook instead of the
-post-base-installer.d hook because I test using an ISO without the
-eatmydata package included, and the post-base-installer.d hook in
-Debian Edu can only operate on packages included in the ISO. The
-negative effect of this is that I am unable to activate this
-optimization for the kernel installation step in d-i. If the code is
-moved to the post-base-installer.d hook, the speedup would be larger
-for the entire installation.</p>
-
-<p>I've implemented this in the
-<a href="https://packages.qa.debian.org/debian-edu-install">debian-edu-install</a>
-git repository, and plan to provide the optimization as part of the
-Debian Edu installation. If you want to test this yourself, you can
-create two files in the installer (or in an udeb). One shell script
-need do go into /usr/lib/pre-pkgsel.d/, with content like this:</p>
-
-<p><blockquote><pre>
-#!/bin/sh
-set -e
-. /usr/share/debconf/confmodule
-info() {
- logger -t my-pkgsel "info: $*"
-}
-error() {
- logger -t my-pkgsel "error: $*"
-}
-override_install() {
- apt-install eatmydata || true
- if [ -x /target/usr/bin/eatmydata ] ; then
- for bin in dpkg apt-get aptitude tasksel ; do
- file=/usr/bin/$bin
- # Test that the file exist and have not been diverted already.
- if [ -f /target$file ] ; then
- info "diverting $file using eatmydata"
- printf "#!/bin/sh\neatmydata $bin.distrib \"\$@\"\n" \
- > /target$file.edu
- chmod 755 /target$file.edu
- in-target dpkg-divert --package debian-edu-config \
- --rename --quiet --add $file
- ln -sf ./$bin.edu /target$file
- else
- error "unable to divert $file, as it is missing."
- fi
- done
- else
- error "unable to find /usr/bin/eatmydata after installing the eatmydata pacage"
- fi
-}
-
-override_install
-</pre></blockquote></p>
-
-<p>To clean up, another shell script should go into
-/usr/lib/finish-install.d/ with code like this:
+ <title>Free software archive system Nikita now able to store documents</title>
+ <link>http://people.skolelinux.org/pere/blog/Free_software_archive_system_Nikita_now_able_to_store_documents.html</link>
+ <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Free_software_archive_system_Nikita_now_able_to_store_documents.html</guid>
+ <pubDate>Sun, 19 Mar 2017 08:00:00 +0100</pubDate>
+ <description><p>The <a href="https://github.com/hiOA-ABI/nikita-noark5-core">Nikita
+Noark 5 core project</a> is implementing the Norwegian standard for
+keeping an electronic archive of government documents.
+<a href="http://www.arkivverket.no/arkivverket/Offentlig-forvaltning/Noark/Noark-5/English-version">The
+Noark 5 standard</a> document the requirement for data systems used by
+the archives in the Norwegian government, and the Noark 5 web interface
+specification document a REST web service for storing, searching and
+retrieving documents and metadata in such archive. I've been involved
+in the project since a few weeks before Christmas, when the Norwegian
+Unix User Group
+<a href="https://www.nuug.no/news/NOARK5_kjerne_som_fri_programvare_f_r_epostliste_hos_NUUG.shtml">announced
+it supported the project</a>. I believe this is an important project,
+and hope it can make it possible for the government archives in the
+future to use free software to keep the archives we citizens depend
+on. But as I do not hold such archive myself, personally my first use
+case is to store and analyse public mail journal metadata published
+from the government. I find it useful to have a clear use case in
+mind when developing, to make sure the system scratches one of my
+itches.</p>
+
+<p>If you would like to help make sure there is a free software
+alternatives for the archives, please join our IRC channel
+(<a href="irc://irc.freenode.net/%23nikita"">#nikita on
+irc.freenode.net</a>) and
+<a href="https://lists.nuug.no/mailman/listinfo/nikita-noark">the
+project mailing list</a>.</p>
+
+<p>When I got involved, the web service could store metadata about
+documents. But a few weeks ago, a new milestone was reached when it
+became possible to store full text documents too. Yesterday, I
+completed an implementation of a command line tool
+<tt>archive-pdf</tt> to upload a PDF file to the archive using this
+API. The tool is very simple at the moment, and find existing
+<a href="https://en.wikipedia.org/wiki/Fonds">fonds</a>, series and
+files while asking the user to select which one to use if more than
+one exist. Once a file is identified, the PDF is associated with the
+file and uploaded, using the title extracted from the PDF itself. The
+process is fairly similar to visiting the archive, opening a cabinet,
+locating a file and storing a piece of paper in the archive. Here is
+a test run directly after populating the database with test data using
+our API tester:</p>