From dc916a80b63234b7fa80aa565be58a40f9f9ec98 Mon Sep 17 00:00:00 2001 From: Petter Reinholdtsen Date: Tue, 16 Sep 2014 13:32:23 +0200 Subject: [PATCH] New post. --- blog/data/2014-09-16-d-i-eatmydata.txt | 197 +++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 blog/data/2014-09-16-d-i-eatmydata.txt diff --git a/blog/data/2014-09-16-d-i-eatmydata.txt b/blog/data/2014-09-16-d-i-eatmydata.txt new file mode 100644 index 0000000000..48f7cedb01 --- /dev/null +++ b/blog/data/2014-09-16-d-i-eatmydata.txt @@ -0,0 +1,197 @@ +Title: Speeding up the Debian installer using eatmydata and dpkg-divert +Tags: english, debian, debian edu +Date: 2014-09-16 13:30 + +

The Debian installer could be +a lot quicker. When we install more than 2000 packages in +Skolelinux / Debian Edu using +tasksel in the installer, unpacking the binary packages take forever. +A part of the slow I/O issue was discussed in +bug #613428 about too +much file system sync-ing done by dpkg, which is the package +responsible for unpacking the binary packages. Other parts (like code +executed by postinst scripts) might also sync to disk during +installation. All this sync-ing to disk do not really make sense to +me. If the machine crash half-way through, I start over, I do not try +to salvage the half installed system. So the failure sync-ing is +supposed to protect against, hardware or system crash, is not really +relevant while the installer is running.

+ +

A few days ago, I thought of a way to get rid of all the file +system sync()-ing in a fairly non-intrusive way, without the need to +change the code in several packages. The idea is not new, but I have +not heard anyone propose the approach using dpkg-divert before. It +depend on the small and clever package +eatmydata, which +uses LD_PRELOAD to replace the system functions for syncing data to +disk with functions doing nothing, thus allowing programs to live +dangerous while speeding up disk I/O significantly. Instead of +modifying the implementation of dpkg, apt and tasksel (which are the +packages responsible for selecting, fetching and installing packages), +it occurred to me that we could just divert the programs away, replace +them with a simple shell wrapper calling +"eatmydata $program $@", to get the same effect. +Yesterday I decided to test the idea, and wrapped up a simple +implementation for the Debian Edu udeb.

+ +

The effect was stunning. In my first test it reduced the running +time of the pkgsel step (installing tasks) from 64 to less than 44 +minutes (20 minutes shaved off the installation) on an old Dell +Latitude D505 machine. I am not quite sure what the optimised time +would have been, as I messed up the testing a bit, causing the debconf +priority to get low enough for two questions to pop up during +installation. As soon as I saw the questions I moved the installation +along, but do not know how long the question were holding up the +installation. I did some more measurements using Debian Edu Jessie, +and got these results. The time measured is the time stamp in +/var/log/syslog between the "pkgsel: starting tasksel" and the +"pkgsel: finishing up" lines, if you want to do the same measurement +yourself. In Debian Edu, the tasksel dialog do not show up, and the +timing thus do not depend on how quickly the user handle the tasksel +dialog.

+ +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Machine/setupOriginal taskselOptimised taskselReduction
Latitude D505 Main+LTSP LXDE64 min (07:46-08:50)<44 min (11:27-12:11)>20 min 18%
Latitude D505 Roaming LXDE57 min (08:48-09:45)34 min (07:43-08:17)23 min 40%
Latitude D505 Minimal22 min (10:37-10:59)11 min (11:16-11:27)11 min 50%
Thinkpad X200 Minimal6 min (08:19-08:25)4 min (08:04-08:08)2 min 33%
Thinkpad X200 Roaming KDE19 min (09:21-09:40)15 min (10:25-10:40)4 min 21%

+ +

The test is done using a netinst ISO on a USB stick, so some of the +time is spent downloading packages. The connection to the Internet +was 100Mbit/s during testing, so downloading should not be a +significant factor in the measurement. Download typically took a few +seconds to a few minutes, depending on the amount of packages being +installed.

+ +

The speedup is implemented by using two hooks in +Debian +Installer, the pre-pkgsel.d hook to set up the diverts, and the +finish-install.d hook to remove the divert at the end of the +installation. I picked the pre-pkgsel.d hook instead of the +post-base-installer.d hook because I test using an ISO without the +eatmydata package included, and the post-base-installer.d hook in +Debian Edu can only operate on packages included in the ISO. The +negative effect of this is that I am unable to activate this +optimization for the kernel installation step in d-i. If the code is +moved to the post-base-installer.d hook, the speedup would be larger +for the entire installation.

+ +

I've implemented this in the +debian-edu-install +git repository, and plan to provide the optimization as part of the +Debian Edu installation. If you want to test this yourself, you can +create two files in the installer (or in an udeb). One shell script +need do go into /usr/lib/pre-pkgsel.d/, with content like this:

+ +

+#!/bin/sh
+set -e
+. /usr/share/debconf/confmodule
+info() {
+    logger -t my-pkgsel "info: $*"
+}
+error() {
+    logger -t my-pkgsel "error: $*"
+}
+override_install() {
+    apt-install eatmydata || true
+    if [ -x /target/usr/bin/eatmydata ] ; then
+        for bin in dpkg apt-get aptitude tasksel ; do
+            file=/usr/bin/$bin
+            # Test that the file exist and have not been diverted already.
+            if [ -f /target$file ] ; then
+                info "diverting $file using eatmydata"
+                printf "#!/bin/sh\neatmydata $bin.distrib \"\$@\"\n" \
+                    > /target$file.edu
+                chmod 755 /target$file.edu
+                in-target dpkg-divert --package debian-edu-config \
+                    --rename --quiet --add $file
+                ln -sf ./$bin.edu /target$file
+            else
+                error "unable to divert $file, as it is missing."
+            fi
+        done
+    else
+        error "unable to find /usr/bin/eatmydata after installing the eatmydata pacage"
+    fi
+}
+
+override_install
+

+ +

To clean up, another shell script should go into +/usr/lib/finish-install.d/ with code like this: + +

+#! /bin/sh -e
+. /usr/share/debconf/confmodule
+error() {
+    logger -t my-finish-install "error: $@"
+}
+remove_install_override() {
+    for bin in dpkg apt-get aptitude tasksel ; do
+        file=/usr/bin/$bin
+        if [ -x /target$file.edu ] ; then
+            rm /target$file
+            in-target dpkg-divert --package debian-edu-config \
+                --rename --quiet --remove $file
+            rm /target$file.edu
+        else
+            error "Missing divert for $file."
+        fi
+    done
+    sync # Flush file buffers before continuing
+}
+
+remove_install_override
+

+ +

By now you might ask if this change should get into the normal +Debian installer too? I suspect it should, but am not sure the +current debian-installer coordinators find it useful enough. It also +depend on the side effects of the change. I'm not aware of any, but I +guess we will see if the change is safe after some more testing. +Perhaps there is some package in Debian depending on sync() and +fsync() having effect? Perhaps it should go into its own udeb, to +allow those of us wanting to enable it to do so without affecting +everyone.

-- 2.47.2