]> pere.pagekite.me Git - homepage.git/blob - blog/data/2014-09-16-d-i-eatmydata.txt
Generated.
[homepage.git] / blog / data / 2014-09-16-d-i-eatmydata.txt
1 Title: Speeding up the Debian installer using eatmydata and dpkg-divert
2 Tags: english, debian, debian edu
3 Date: 2014-09-16 14:00
4
5 <p>The <a href="https://www.debian.org/">Debian</a> installer could be
6 a lot quicker. When we install more than 2000 packages in
7 <a href="http://www.skolelinux.org/">Skolelinux / Debian Edu</a> using
8 tasksel in the installer, unpacking the binary packages take forever.
9 A part of the slow I/O issue was discussed in
10 <a href="https://bugs.debian.org/613428">bug #613428</a> about too
11 much file system sync-ing done by dpkg, which is the package
12 responsible for unpacking the binary packages. Other parts (like code
13 executed by postinst scripts) might also sync to disk during
14 installation. All this sync-ing to disk do not really make sense to
15 me. If the machine crash half-way through, I start over, I do not try
16 to salvage the half installed system. So the failure sync-ing is
17 supposed to protect against, hardware or system crash, is not really
18 relevant while the installer is running.</p>
19
20 <p>A few days ago, I thought of a way to get rid of all the file
21 system sync()-ing in a fairly non-intrusive way, without the need to
22 change the code in several packages. The idea is not new, but I have
23 not heard anyone propose the approach using dpkg-divert before. It
24 depend on the small and clever package
25 <a href="https://packages.qa.debian.org/eatmydata">eatmydata</a>, which
26 uses LD_PRELOAD to replace the system functions for syncing data to
27 disk with functions doing nothing, thus allowing programs to live
28 dangerous while speeding up disk I/O significantly. Instead of
29 modifying the implementation of dpkg, apt and tasksel (which are the
30 packages responsible for selecting, fetching and installing packages),
31 it occurred to me that we could just divert the programs away, replace
32 them with a simple shell wrapper calling
33 "eatmydata&nbsp;$program&nbsp;$@", to get the same effect.
34 Two days ago I decided to test the idea, and wrapped up a simple
35 implementation for the Debian Edu udeb.</p>
36
37 <p>The effect was stunning. In my first test it reduced the running
38 time of the pkgsel step (installing tasks) from 64 to less than 44
39 minutes (20 minutes shaved off the installation) on an old Dell
40 Latitude D505 machine. I am not quite sure what the optimised time
41 would have been, as I messed up the testing a bit, causing the debconf
42 priority to get low enough for two questions to pop up during
43 installation. As soon as I saw the questions I moved the installation
44 along, but do not know how long the question were holding up the
45 installation. I did some more measurements using Debian Edu Jessie,
46 and got these results. The time measured is the time stamp in
47 /var/log/syslog between the "pkgsel: starting tasksel" and the
48 "pkgsel: finishing up" lines, if you want to do the same measurement
49 yourself. In Debian Edu, the tasksel dialog do not show up, and the
50 timing thus do not depend on how quickly the user handle the tasksel
51 dialog.</p>
52
53 <p><table>
54
55 <tr>
56 <th>Machine/setup</th>
57 <th>Original tasksel</th>
58 <th>Optimised tasksel</th>
59 <th>Reduction</th>
60 </tr>
61
62 <tr>
63 <td>Latitude D505 Main+LTSP LXDE</td>
64 <td>64 min (07:46-08:50)</td>
65 <td><44 min (11:27-12:11)</td>
66 <td>>20 min 18%</td>
67 </tr>
68
69 <tr>
70 <td>Latitude D505 Roaming LXDE</td>
71 <td>57 min (08:48-09:45)</td>
72 <td>34 min (07:43-08:17)</td>
73 <td>23 min 40%</td>
74 </tr>
75
76 <tr>
77 <td>Latitude D505 Minimal</td>
78 <td>22 min (10:37-10:59)</td>
79 <td>11 min (11:16-11:27)</td>
80 <td>11 min 50%</td>
81 </tr>
82
83 <tr>
84 <td>Thinkpad X200 Minimal</td>
85 <td>6 min (08:19-08:25)</td>
86 <td>4 min (08:04-08:08)</td>
87 <td>2 min 33%</td>
88 </tr>
89
90 <tr>
91 <td>Thinkpad X200 Roaming KDE</td>
92 <td>19 min (09:21-09:40)</td>
93 <td>15 min (10:25-10:40)</td>
94 <td>4 min 21%</td>
95 </tr>
96
97 </table></p>
98
99 <p>The test is done using a netinst ISO on a USB stick, so some of the
100 time is spent downloading packages. The connection to the Internet
101 was 100Mbit/s during testing, so downloading should not be a
102 significant factor in the measurement. Download typically took a few
103 seconds to a few minutes, depending on the amount of packages being
104 installed.</p>
105
106 <p>The speedup is implemented by using two hooks in
107 <a href="https://www.debian.org/devel/debian-installer/">Debian
108 Installer</a>, the pre-pkgsel.d hook to set up the diverts, and the
109 finish-install.d hook to remove the divert at the end of the
110 installation. I picked the pre-pkgsel.d hook instead of the
111 post-base-installer.d hook because I test using an ISO without the
112 eatmydata package included, and the post-base-installer.d hook in
113 Debian Edu can only operate on packages included in the ISO. The
114 negative effect of this is that I am unable to activate this
115 optimization for the kernel installation step in d-i. If the code is
116 moved to the post-base-installer.d hook, the speedup would be larger
117 for the entire installation.</p>
118
119 <p>I've implemented this in the
120 <a href="https://packages.qa.debian.org/debian-edu-install">debian-edu-install</a>
121 git repository, and plan to provide the optimization as part of the
122 Debian Edu installation. If you want to test this yourself, you can
123 create two files in the installer (or in an udeb). One shell script
124 need do go into /usr/lib/pre-pkgsel.d/, with content like this:</p>
125
126 <p><blockquote><pre>
127 #!/bin/sh
128 set -e
129 . /usr/share/debconf/confmodule
130 info() {
131 logger -t my-pkgsel "info: $*"
132 }
133 error() {
134 logger -t my-pkgsel "error: $*"
135 }
136 override_install() {
137 apt-install eatmydata || true
138 if [ -x /target/usr/bin/eatmydata ] ; then
139 for bin in dpkg apt-get aptitude tasksel ; do
140 file=/usr/bin/$bin
141 # Test that the file exist and have not been diverted already.
142 if [ -f /target$file ] ; then
143 info "diverting $file using eatmydata"
144 printf "#!/bin/sh\neatmydata $bin.distrib \"\$@\"\n" \
145 > /target$file.edu
146 chmod 755 /target$file.edu
147 in-target dpkg-divert --package debian-edu-config \
148 --rename --quiet --add $file
149 ln -sf ./$bin.edu /target$file
150 else
151 error "unable to divert $file, as it is missing."
152 fi
153 done
154 else
155 error "unable to find /usr/bin/eatmydata after installing the eatmydata pacage"
156 fi
157 }
158
159 override_install
160 </pre></blockquote></p>
161
162 <p>To clean up, another shell script should go into
163 /usr/lib/finish-install.d/ with code like this:
164
165 <p><blockquote><pre>
166 #! /bin/sh -e
167 . /usr/share/debconf/confmodule
168 error() {
169 logger -t my-finish-install "error: $@"
170 }
171 remove_install_override() {
172 for bin in dpkg apt-get aptitude tasksel ; do
173 file=/usr/bin/$bin
174 if [ -x /target$file.edu ] ; then
175 rm /target$file
176 in-target dpkg-divert --package debian-edu-config \
177 --rename --quiet --remove $file
178 rm /target$file.edu
179 else
180 error "Missing divert for $file."
181 fi
182 done
183 sync # Flush file buffers before continuing
184 }
185
186 remove_install_override
187 </pre></blockquote></p>
188
189 <p>In Debian Edu, I placed both code fragments in a separate script
190 edu-eatmydata-install and call it from the pre-pkgsel.d and
191 finish-install.d scripts.</p>
192
193 <p>By now you might ask if this change should get into the normal
194 Debian installer too? I suspect it should, but am not sure the
195 current debian-installer coordinators find it useful enough. It also
196 depend on the side effects of the change. I'm not aware of any, but I
197 guess we will see if the change is safe after some more testing.
198 Perhaps there is some package in Debian depending on sync() and
199 fsync() having effect? Perhaps it should go into its own udeb, to
200 allow those of us wanting to enable it to do so without affecting
201 everyone.</p>
202
203 <p>Update 2014-09-24: Since a few days ago, enabling this optimization
204 will break installation of all programs using gnutls because of
205 <a href="https://bugs.debian.org/702711">bug #702711</a>. An updated
206 eatmydata package in Debian will solve it.</p>
207
208 <p>Update 2014-10-17: The bug mentioned above is fixed in testing and
209 the optimization work again. And I have discovered that the
210 dpkg-divert trick is not really needed and implemented a slightly
211 simpler approach as part of the debian-edu-install package. See
212 tools/edu-eatmydata-install in the source package.</p>
213
214 <p>Update 2014-11-11: Unfortunately, a new
215 <a href="http://bugs.debian.org/765738">bug #765738</a> in eatmydata only
216 triggering on i386 made it into testing, and broke this installation
217 optimization again. If <a href="http://bugs.debian.org/768893">unblock
218 request 768893</a> is accepted, it should be working again.</p>