The Debian installer could be
-a lot quicker. When we install more than 2000 packages in
-Skolelinux / Debian Edu using
-tasksel in the installer, unpacking the binary packages take forever.
-A part of the slow I/O issue was discussed in
-bug #613428 about too
-much file system sync-ing done by dpkg, which is the package
-responsible for unpacking the binary packages. Other parts (like code
-executed by postinst scripts) might also sync to disk during
-installation. All this sync-ing to disk do not really make sense to
-me. If the machine crash half-way through, I start over, I do not try
-to salvage the half installed system. So the failure sync-ing is
-supposed to protect against, hardware or system crash, is not really
-relevant while the installer is running.
-
-
A few days ago, I thought of a way to get rid of all the file
-system sync()-ing in a fairly non-intrusive way, without the need to
-change the code in several packages. The idea is not new, but I have
-not heard anyone propose the approach using dpkg-divert before. It
-depend on the small and clever package
-eatmydata, which
-uses LD_PRELOAD to replace the system functions for syncing data to
-disk with functions doing nothing, thus allowing programs to live
-dangerous while speeding up disk I/O significantly. Instead of
-modifying the implementation of dpkg, apt and tasksel (which are the
-packages responsible for selecting, fetching and installing packages),
-it occurred to me that we could just divert the programs away, replace
-them with a simple shell wrapper calling
-"eatmydata $program $@", to get the same effect.
-Two days ago I decided to test the idea, and wrapped up a simple
-implementation for the Debian Edu udeb.
-
-
The effect was stunning. In my first test it reduced the running
-time of the pkgsel step (installing tasks) from 64 to less than 44
-minutes (20 minutes shaved off the installation) on an old Dell
-Latitude D505 machine. I am not quite sure what the optimised time
-would have been, as I messed up the testing a bit, causing the debconf
-priority to get low enough for two questions to pop up during
-installation. As soon as I saw the questions I moved the installation
-along, but do not know how long the question were holding up the
-installation. I did some more measurements using Debian Edu Jessie,
-and got these results. The time measured is the time stamp in
-/var/log/syslog between the "pkgsel: starting tasksel" and the
-"pkgsel: finishing up" lines, if you want to do the same measurement
-yourself. In Debian Edu, the tasksel dialog do not show up, and the
-timing thus do not depend on how quickly the user handle the tasksel
-dialog.
-
-
-
-
-Machine/setup |
-Original tasksel |
-Optimised tasksel |
-Reduction |
-
-
-
-Latitude D505 Main+LTSP LXDE |
-64 min (07:46-08:50) |
-<44 min (11:27-12:11) |
->20 min 18% |
-
-
-
-Latitude D505 Roaming LXDE |
-57 min (08:48-09:45) |
-34 min (07:43-08:17) |
-23 min 40% |
-
-
-
-Latitude D505 Minimal |
-22 min (10:37-10:59) |
-11 min (11:16-11:27) |
-11 min 50% |
-
-
-
-Thinkpad X200 Minimal |
-6 min (08:19-08:25) |
-4 min (08:04-08:08) |
-2 min 33% |
-
-
-
-Thinkpad X200 Roaming KDE |
-19 min (09:21-09:40) |
-15 min (10:25-10:40) |
-4 min 21% |
-
-
-
-
-
The test is done using a netinst ISO on a USB stick, so some of the
-time is spent downloading packages. The connection to the Internet
-was 100Mbit/s during testing, so downloading should not be a
-significant factor in the measurement. Download typically took a few
-seconds to a few minutes, depending on the amount of packages being
-installed.
-
-
The speedup is implemented by using two hooks in
-Debian
-Installer, the pre-pkgsel.d hook to set up the diverts, and the
-finish-install.d hook to remove the divert at the end of the
-installation. I picked the pre-pkgsel.d hook instead of the
-post-base-installer.d hook because I test using an ISO without the
-eatmydata package included, and the post-base-installer.d hook in
-Debian Edu can only operate on packages included in the ISO. The
-negative effect of this is that I am unable to activate this
-optimization for the kernel installation step in d-i. If the code is
-moved to the post-base-installer.d hook, the speedup would be larger
-for the entire installation.
-
-
I've implemented this in the
-debian-edu-install
-git repository, and plan to provide the optimization as part of the
-Debian Edu installation. If you want to test this yourself, you can
-create two files in the installer (or in an udeb). One shell script
-need do go into /usr/lib/pre-pkgsel.d/, with content like this:
-
-
-#!/bin/sh
-set -e
-. /usr/share/debconf/confmodule
-info() {
- logger -t my-pkgsel "info: $*"
-}
-error() {
- logger -t my-pkgsel "error: $*"
-}
-override_install() {
- apt-install eatmydata || true
- if [ -x /target/usr/bin/eatmydata ] ; then
- for bin in dpkg apt-get aptitude tasksel ; do
- file=/usr/bin/$bin
- # Test that the file exist and have not been diverted already.
- if [ -f /target$file ] ; then
- info "diverting $file using eatmydata"
- printf "#!/bin/sh\neatmydata $bin.distrib \"\$@\"\n" \
- > /target$file.edu
- chmod 755 /target$file.edu
- in-target dpkg-divert --package debian-edu-config \
- --rename --quiet --add $file
- ln -sf ./$bin.edu /target$file
- else
- error "unable to divert $file, as it is missing."
- fi
- done
- else
- error "unable to find /usr/bin/eatmydata after installing the eatmydata pacage"
- fi
-}
-
-override_install
-
-
-
To clean up, another shell script should go into
-/usr/lib/finish-install.d/ with code like this:
-
-
-#! /bin/sh -e
-. /usr/share/debconf/confmodule
-error() {
- logger -t my-finish-install "error: $@"
-}
-remove_install_override() {
- for bin in dpkg apt-get aptitude tasksel ; do
- file=/usr/bin/$bin
- if [ -x /target$file.edu ] ; then
- rm /target$file
- in-target dpkg-divert --package debian-edu-config \
- --rename --quiet --remove $file
- rm /target$file.edu
- else
- error "Missing divert for $file."
- fi
- done
- sync # Flush file buffers before continuing
-}
-
-remove_install_override
-
-
-
In Debian Edu, I placed both code fragments in a separate script
-edu-eatmydata-install and call it from the pre-pkgsel.d and
-finish-install.d scripts.
-
-
By now you might ask if this change should get into the normal
-Debian installer too? I suspect it should, but am not sure the
-current debian-installer coordinators find it useful enough. It also
-depend on the side effects of the change. I'm not aware of any, but I
-guess we will see if the change is safe after some more testing.
-Perhaps there is some package in Debian depending on sync() and
-fsync() having effect? Perhaps it should go into its own udeb, to
-allow those of us wanting to enable it to do so without affecting
-everyone.
-
-
Update 2014-09-24: Since a few days ago, enabling this optimization
-will break installation of all programs using gnutls because of
-bug #702711. An updated
-eatmydata package in Debian will solve it.
-
-
Update 2014-10-17: The bug mentioned above is fixed in testing and
-the optimization work again. And I have discovered that the
-dpkg-divert trick is not really needed and implemented a slightly
-simpler approach as part of the debian-edu-install package. See
-tools/edu-eatmydata-install in the source package.
-