- <div class="entry">
- <div class="title"><a href="http://people.skolelinux.org/pere/blog/Speeding_up_the_Debian_installer_using_eatmydata_and_dpkg_divert.html">Speeding up the Debian installer using eatmydata and dpkg-divert</a></div>
- <div class="date">16th September 2014</div>
- <div class="body"><p>The <a href="https://www.debian.org/">Debian</a> installer could be
-a lot quicker. When we install more than 2000 packages in
-<a href="http://www.skolelinux.org/">Skolelinux / Debian Edu</a> using
-tasksel in the installer, unpacking the binary packages take forever.
-A part of the slow I/O issue was discussed in
-<a href="https://bugs.debian.org/613428">bug #613428</a> about too
-much file system sync-ing done by dpkg, which is the package
-responsible for unpacking the binary packages. Other parts (like code
-executed by postinst scripts) might also sync to disk during
-installation. All this sync-ing to disk do not really make sense to
-me. If the machine crash half-way through, I start over, I do not try
-to salvage the half installed system. So the failure sync-ing is
-supposed to protect against, hardware or system crash, is not really
-relevant while the installer is running.</p>
-
-<p>A few days ago, I thought of a way to get rid of all the file
-system sync()-ing in a fairly non-intrusive way, without the need to
-change the code in several packages. The idea is not new, but I have
-not heard anyone propose the approach using dpkg-divert before. It
-depend on the small and clever package
-<a href="https://packages.qa.debian.org/eatmydata">eatmydata</a>, which
-uses LD_PRELOAD to replace the system functions for syncing data to
-disk with functions doing nothing, thus allowing programs to live
-dangerous while speeding up disk I/O significantly. Instead of
-modifying the implementation of dpkg, apt and tasksel (which are the
-packages responsible for selecting, fetching and installing packages),
-it occurred to me that we could just divert the programs away, replace
-them with a simple shell wrapper calling
-"eatmydata $program $@", to get the same effect.
-Two days ago I decided to test the idea, and wrapped up a simple
-implementation for the Debian Edu udeb.</p>
-
-<p>The effect was stunning. In my first test it reduced the running
-time of the pkgsel step (installing tasks) from 64 to less than 44
-minutes (20 minutes shaved off the installation) on an old Dell
-Latitude D505 machine. I am not quite sure what the optimised time
-would have been, as I messed up the testing a bit, causing the debconf
-priority to get low enough for two questions to pop up during
-installation. As soon as I saw the questions I moved the installation
-along, but do not know how long the question were holding up the
-installation. I did some more measurements using Debian Edu Jessie,
-and got these results. The time measured is the time stamp in
-/var/log/syslog between the "pkgsel: starting tasksel" and the
-"pkgsel: finishing up" lines, if you want to do the same measurement
-yourself. In Debian Edu, the tasksel dialog do not show up, and the
-timing thus do not depend on how quickly the user handle the tasksel
-dialog.</p>
-
-<p><table>
-
-<tr>
-<th>Machine/setup</th>
-<th>Original tasksel</th>
-<th>Optimised tasksel</th>
-<th>Reduction</th>
-</tr>
-
-<tr>
-<td>Latitude D505 Main+LTSP LXDE</td>
-<td>64 min (07:46-08:50)</td>
-<td><44 min (11:27-12:11)</td>
-<td>>20 min 18%</td>
-</tr>
-
-<tr>
-<td>Latitude D505 Roaming LXDE</td>
-<td>57 min (08:48-09:45)</td>
-<td>34 min (07:43-08:17)</td>
-<td>23 min 40%</td>
-</tr>
-
-<tr>
-<td>Latitude D505 Minimal</td>
-<td>22 min (10:37-10:59)</td>
-<td>11 min (11:16-11:27)</td>
-<td>11 min 50%</td>
-</tr>
-
-<tr>
-<td>Thinkpad X200 Minimal</td>
-<td>6 min (08:19-08:25)</td>
-<td>4 min (08:04-08:08)</td>
-<td>2 min 33%</td>
-</tr>
-
-<tr>
-<td>Thinkpad X200 Roaming KDE</td>
-<td>19 min (09:21-09:40)</td>
-<td>15 min (10:25-10:40)</td>
-<td>4 min 21%</td>
-</tr>
-
-</table></p>
-
-<p>The test is done using a netinst ISO on a USB stick, so some of the
-time is spent downloading packages. The connection to the Internet
-was 100Mbit/s during testing, so downloading should not be a
-significant factor in the measurement. Download typically took a few
-seconds to a few minutes, depending on the amount of packages being
-installed.</p>
-
-<p>The speedup is implemented by using two hooks in
-<a href="https://www.debian.org/devel/debian-installer/">Debian
-Installer</a>, the pre-pkgsel.d hook to set up the diverts, and the
-finish-install.d hook to remove the divert at the end of the
-installation. I picked the pre-pkgsel.d hook instead of the
-post-base-installer.d hook because I test using an ISO without the
-eatmydata package included, and the post-base-installer.d hook in
-Debian Edu can only operate on packages included in the ISO. The
-negative effect of this is that I am unable to activate this
-optimization for the kernel installation step in d-i. If the code is
-moved to the post-base-installer.d hook, the speedup would be larger
-for the entire installation.</p>
-
-<p>I've implemented this in the
-<a href="https://packages.qa.debian.org/debian-edu-install">debian-edu-install</a>
-git repository, and plan to provide the optimization as part of the
-Debian Edu installation. If you want to test this yourself, you can
-create two files in the installer (or in an udeb). One shell script
-need do go into /usr/lib/pre-pkgsel.d/, with content like this:</p>
-
-<p><blockquote><pre>
-#!/bin/sh
-set -e
-. /usr/share/debconf/confmodule
-info() {
- logger -t my-pkgsel "info: $*"
-}
-error() {
- logger -t my-pkgsel "error: $*"
-}
-override_install() {
- apt-install eatmydata || true
- if [ -x /target/usr/bin/eatmydata ] ; then
- for bin in dpkg apt-get aptitude tasksel ; do
- file=/usr/bin/$bin
- # Test that the file exist and have not been diverted already.
- if [ -f /target$file ] ; then
- info "diverting $file using eatmydata"
- printf "#!/bin/sh\neatmydata $bin.distrib \"\$@\"\n" \
- > /target$file.edu
- chmod 755 /target$file.edu
- in-target dpkg-divert --package debian-edu-config \
- --rename --quiet --add $file
- ln -sf ./$bin.edu /target$file
- else
- error "unable to divert $file, as it is missing."
- fi
- done
- else
- error "unable to find /usr/bin/eatmydata after installing the eatmydata pacage"
- fi
-}
-
-override_install
-</pre></blockquote></p>
-
-<p>To clean up, another shell script should go into
-/usr/lib/finish-install.d/ with code like this:
-
-<p><blockquote><pre>
-#! /bin/sh -e
-. /usr/share/debconf/confmodule
-error() {
- logger -t my-finish-install "error: $@"
-}
-remove_install_override() {
- for bin in dpkg apt-get aptitude tasksel ; do
- file=/usr/bin/$bin
- if [ -x /target$file.edu ] ; then
- rm /target$file
- in-target dpkg-divert --package debian-edu-config \
- --rename --quiet --remove $file
- rm /target$file.edu
- else
- error "Missing divert for $file."
- fi
- done
- sync # Flush file buffers before continuing
-}
-
-remove_install_override
-</pre></blockquote></p>
-
-<p>In Debian Edu, I placed both code fragments in a separate script
-edu-eatmydata-install and call it from the pre-pkgsel.d and
-finish-install.d scripts.</p>
-
-<p>By now you might ask if this change should get into the normal
-Debian installer too? I suspect it should, but am not sure the
-current debian-installer coordinators find it useful enough. It also
-depend on the side effects of the change. I'm not aware of any, but I
-guess we will see if the change is safe after some more testing.
-Perhaps there is some package in Debian depending on sync() and
-fsync() having effect? Perhaps it should go into its own udeb, to
-allow those of us wanting to enable it to do so without affecting
-everyone.</p>
-
-<p>Update 2014-09-24: Since a few days ago, enabling this optimization
-will break installation of all programs using gnutls because of
-<a href="https://bugs.debian.org/702711">bug #702711</a>. An updated
-eatmydata package in Debian will solve it.</p>
-
-<p>Update 2014-10-17: The bug mentioned above is fixed in testing and
-the optimization work again. And I have discovered that the
-dpkg-divert trick is not really needed and implemented a slightly
-simpler approach as part of the debian-edu-install package. See
-tools/edu-eatmydata-install in the source package.</p>
-</div>
- <div class="tags">
-
-
- Tags: <a href="http://people.skolelinux.org/pere/blog/tags/debian">debian</a>, <a href="http://people.skolelinux.org/pere/blog/tags/debian edu">debian edu</a>, <a href="http://people.skolelinux.org/pere/blog/tags/english">english</a>.
-
-
- </div>
- </div>
- <div class="padding"></div>
-