3 <title>Speeding up the debian/etch boot, a short report
</title>
6 <h1>Speeding up the debian/etch boot, a short report
</h1>
8 <p>The test machine in question is a COMPAQ ProLiant DL360 with
2CPU
9 800/
133 MHz,
1.5GiB RAM and
2x18GiB SCSI disks in HW-RAID.
11 <p>Installed etch using a sarge netinst CD and pointing APT to
12 'testing'. Install the desktop task. The kernel is
2.4.27-
2-
386.
14 <p>Installed the bootchart package from
15 http://www.minet.uni-jena.de/~joergs/bootchart/bootchart_0.8-
2_all.deb
17 <p>Booted grub after adding 'init=/sbin/bootchartd' to the kernel
18 arguments. Archive the resulting /var/log/bootchart.tgz as
19 <a href=
"2005-08-01T18:07-bootchart.png">2005-
08-
01T18:
07-bootchart.tgz
</a>. This boot took
45 seconds.
20 S20modutils is the largest single contributor and used about
13
21 seconds of this period.
23 <p>Installed kernel kernel-image-
2.4.27-
2-
686-smp and rebooted with
24 bootchart, to see if this make a difference. The stats were saved as
25 <a href=
"2005-08-01T18:16-bootchart.png">2005-
08-
01T18:
16-bootchart.tgz
</a>.
26 This boot took
1 minute
33 seconds. Again S20modutils is the slowest
27 part of the boot, spending
23 seconds of it.
29 <p>Time to reorder the boot sequence according to dependencies. Document
30 the default boot order:
34 README S20module-init-tools S36udev-mtab S48console-screen.sh
35 S02mountvirtfs S20modutils S38pppd-dns S50hwclock.sh
36 S04udev S25libdevmapper1.01 S39dns-clean S55bootmisc.sh
37 S05bootlogd S26lvm S39ifupdown S55urandom
38 S05initrd-tools.sh S30checkfs.sh S40hostname.sh S70nviboot
39 S05keymap.sh S30procps.sh S40hotplug S70xfree86-common
40 S10checkroot.sh S35mountall.sh S40networking S75sudo
41 S18hwclockfirst.sh S36discover S43portmap
42 S18ifupdown-clean S36mountvirtfs S45mountnfs.sh
44 S10sysklogd S18portmap S20exim4 S20makedev S21nfs-common S99gdm
45 S11klogd S20dbus-
1 S20inetd S20ssh S89atd S99rmnologin
46 S14ppp S20dirmngr S20lpd S21fam S89cron S99stop-bootlogd
50 <p>Discovering that a few packages was not upgraded to from sarge to
51 testing. Doing a 'aptitude dist-upgrade' to get these packages
52 upgrade: dash eject fdutils grub hotplug initrd-tools
53 kernel-image-
2.4.27-
2-
386 lvm2 ppp pppconfig pppoeconf psmisc usbutils
55 <p>Realises that a new boot is needed to get status for the packages in
56 etch and not in sarge, as several of the upgraded packages have init.d
59 <p>Booted first using -
686-smp kernel. This took
1 minute
31 seconds
60 (
<a href=
"2005-08-01T18:31-bootchart.png">2005-
08-
01T18:
31-bootchart.tgz
</a>).
61 Still S20modutils spending most of the time.
63 <p><a href=
"2005-08-01T18:36-bootchart.png"><img src=
"2005-08-01T18:36-bootchart.png" width=
"40%" align=
"right"></a>Next
64 booted with the -
383 kernel. This took
46 seconds
65 (
<a href=
"2005-08-01T18:36-bootchart.png">2005-
08-
01T18:
36-bootchart.tgz
</a>).
67 <p>Someone claimed switching to dash instead of bash should speed up
68 the boot process. I decided to test it with the -
386 kernel. The
69 default was sh-
>bash. Changed this to sh-
>dash and rebooted. The
70 boot still took
46 seconds.
72 <p>Finally, I installed and executed insserv -v and saved the output
73 (
2005-
08-
01T18:
45-insserv-
1.log). To verify that the result was
74 stable, I ran it again and saved the output
75 (
2005-
08-
01T18:
45-insserv-
2.log). This time
76 rc[
2345].d/S19stop-bootlogd was moved to S20stop-bootlogd. I suspect
77 the dependencies of stop-bootlogd are a bit strange, as this is a
78 script expected to be executed several times in the boot process.
79 Running insserv a third time didn't give any changes, so I did not
82 <p>Checking the resulting boot sequence seemed to make fairly good sense
87 README S03initrd-tools.sh S08checkfs.sh S13hotplug-net
88 S01hostname.sh S04keymap.sh S09mountall.sh S13mountnfs.sh
89 S01hwclockfirst.sh S05checkroot.sh S11bootmisc.sh S15console-screen.sh
90 S01mountvirtfs S06discover S11dns-clean S15nviboot
91 S02bootlogd S06hotplug S11networking S15xfree86-common
92 S02module-init-tools S06ifupdown-clean S11sudo S16hwclock.sh
93 S02modutils S07ifupdown S11udev-mtab S17ntpdate
94 S02procps.sh S07libdevmapper1.01 S11urandom S43portmap
95 S02udev S07lvm S12pppd-dns
97 S11makedev S15nfs-common S19klogd S21cron S21exim4 S21lpd
98 S14portmap S18rmnologin S20stop-bootlogd S21dbus-
1 S21gdm S21ppp
99 S15fam S18sysklogd S21atd S21dirmngr S21inetd S21ssh
103 <p>Time to reboot again. Used the
386 kernel as this was the fastest
104 boot. gdm came up with a login promt way before the machine was
105 finished booting. The network failed to start as the network modules
106 was not loaded. Running '/etc/init.d/discover start &&
107 /etc/init.d/networking start' fixes this problem. The new boot took
109 (
<a href=
"2005-08-01T18:55-bootchart.png">2005-
08-
01T18:
55-bootchart.tgz
</a>).
111 <p>Tried modifying the dependencies of discover, lvm and hotplug to
112 get the network drivers loaded properly. Succeeded and the new boot
113 order ended up like this:
117 README S05checkroot.sh S10libdevmapper1.01
118 S01hostname.sh S06checkfs.sh S10networking
119 S01hwclockfirst.sh S06ifupdown-clean S10pppd-dns
120 S01mountvirtfs S07ifupdown S12hotplug-net
121 S02bootlogd S07mountall.sh S12mountnfs.sh
122 S02module-init-tools S09bootmisc.sh S14console-screen.sh
123 S02modutils S09discover S14nviboot
124 S02procps.sh S09dns-clean S14xfree86-common
125 S02udev S09hotplug S15hwclock.sh
126 S03initrd-tools.sh S09sudo S16ntpdate
127 S03lvm S09udev-mtab S43portmap
128 S04keymap.sh S09urandom
132 <p>This new boot took
36 seconds (
2005-
08-
01T19:
22-bootchart.tgz), and
133 the network started as it should.
135 <p>Tested with the -
686-smp kernel as well. This new boot with the SMP
136 kernel took
58 seconds
137 (
<a href=
"2005-08-01T19:33-bootchart.png">2005-
08-
01T19:
33-bootchart.tgz
</a>) .
139 <p>It was time to test the parallel booting. I ran 'cp
140 /usr/share/doc/insserv/examples/rc* /etc/init.d/', and rebooted with
141 the
386 kernel. This time to boot took
34 seconds
142 (
<a href=
"2005-08-01T19:38-bootchart.png">2005-
08-
01T19:
38-bootchart.tgz
</a>).
143 The gain was not very impressive, but it was slightly faster.
145 <p>The SMP boot on the other hand surprised me quite a bit. The boot
146 took
1 minute
9 seconds
147 (
<a href=
"2005-08-01T19:41-bootchart.png">2005-
08-
01T19:
41-bootchart.tgz
</a>),
148 and there is a large section of the boot time unaccounted for. From
149 10 to
20 seconds into the boot, nothing is running but the CPU is used
150 quite a bit. I was so surprised I had to try again to see if this
151 happened every time. The new boot took the exact same amount of time
152 (
<a href=
"2005-08-01T19:49-bootchart.png">2005-
08-
01T19:
49-bootchart.tgz
</a>),
153 and had the same gap in the boot process where nothing is using an
154 increasing amount of CPU. I suspect some kernel module must be
155 spending the resources.
157 <p>Trying to squease more parallelism in the boot process, I adjusted
158 the dependencies slightly to end up with this sequence:
162 README S03lvm S08networking
163 S01hostname.sh S04checkfs.sh S08pppd-dns
164 S01hwclockfirst.sh S04ifupdown-clean S10hotplug-net
165 S01mountvirtfs S05ifupdown S10mountnfs.sh
166 S02bootlogd S05mountall.sh S12console-screen.sh
167 S02keymap.sh S07bootmisc.sh S12hwclock.sh
168 S02module-init-tools S07discover S12nviboot
169 S02modutils S07dns-clean S12sudo
170 S02procps.sh S07hotplug S12xfree86-common
171 S02udev S07udev-mtab S13ntpdate
172 S03checkroot.sh S07urandom S13portmap
173 S03initrd-tools.sh S08libdevmapper1.01
177 <p>The new boot with the SMP kernel still took
1 minute
9 seconds
178 (
<a href=
"2005-08-01T20:08-bootchart.png">2005-
08-
01T20:
08-bootchart.tgz
</a>).
180 <p>Next, I tried to optimize the gdm startup, moving it ahead in the
181 rc2.d sequence. After some dependency adjustements, I ended up with
186 S07makedev S12fam S14sysklogd S17atd S17dirmngr S17ppp
187 S07rmnologin S12gdm S15klogd S17cron S17exim4 S17ssh
188 S11portmap S12nfs-common S16stop-bootlogd S17dbus-
1 S17lpd S20inetd
192 <p>This new sequence was
1 second slower for the SMP kernel. It took
194 (
<a href=
"2005-08-01T20:35-bootchart.png">2005-
08-
01T20:
35-bootchart.tgz
</a>).
195 I suspect the issue here are IO contention, were several processes
196 want to read from the disk at the same time.
198 <p>Next, I tested with the
386 kernel, and this boot took
34 seconds
199 (
<a href=
"2005-08-01T20:38-bootchart.png">2005-
08-
01T20:
38-bootchart.tgz
</a>).
201 <h2>Testing
2.6 kernels
</h2>
203 <p>To see if the results were modified by using a
2.6 kernel, I installed
204 kernel-image-
2.6.11-
1-
386 and kernel-image-
2.6.11-
1-
686-smp. I was
205 asked how to configure irqbalance. I selected the default (no).
206 After installing these kernels, I booted the
386 kernel first. This
208 (
<a href=
"2005-08-01T20:55-bootchart.png">2005-
08-
01T20:
55-bootchart.tgz
</a>).
209 This new boot included more stats in the graphs, and a few new
210 processes (udevd, hald).
212 <p>Next, I booted the
2.6 SMP kernel, and this boot took
1 minute
6
214 (
<a href=
"2005-08-01T20:58-bootchart.png">2005-
08-
01T20:
58-bootchart.tgz
</a>).
215 Still almost double the time of the
2.4 kernel.
219 <p>Portmap tries to start twice. Once from rcS.d, and another from
220 rc2.d. It does a sleep
1 every time. By modifying the script to exit
221 imediately if portmap already is running, I should be able to cut down
222 the boot time by one second. Testing with this patch.
225 --- /etc/init.d/portmap.orig
2005-
09-
01 21:
03:
40.000000000 +
0200
226 +++ /etc/init.d/portmap
2005-
09-
01 21:
05:
40.000000000 +
0200
231 + pid=`pidof portmap`
232 + if [ -n "$pid" ] ; then
233 + echo "Not starting portmap daemon. Already running."
236 echo -n "Starting portmap daemon:"
238 start-stop-daemon --start --quiet --oknodo --exec /sbin/portmap -- $OPTIONS
241 <p>I booted the
2.4 386 kernel, and this worked just fine. Now gdm
242 started imediately as the rc2.d scripts started, and the boot time is
244 (
<a href=
"2005-08-01T21:09-bootchart.png">2005-
08-
01T21:
09-bootchart.tgz
</a>).
245 The total boot time to be the same as before, which is rather strange.
246 Anyway, the change seem sensible enough, so I submitted it to BTS as
247 <a href=
"http://bugs.debian.org/326100">bug #
326100</a>.
249 <p>Next, I realise that ntpdate and portmap do not really need to run
250 in rcS.d/. Moving them both to rc2.d and running insserv -v again, it
251 is time to try a new boot.
253 <p>The new boot worked fine, but it still took
34 seconds to boot with
255 (
<a href=
"2005-08-01T21:21-bootchart.png">2005-
08-
01T21:
21-bootchart.tgz
</a>).
257 <p>Time to look at readahead? I tried with a trivial implementation,
258 using 'nice cat
<files> &'. The files were all scripts in /etc/rcS.d/
259 /etc/rc2.d/ and all .o files under /lib/modules/. This had no effect
260 whatsoever. The boot still took
34 seconds
261 (
<a href=
"2005-08-01T21:45-bootchart.png">2005-
08-
01T21:
45-bootchart.tgz
</a>).
263 <p>It takes almost
10 seconds to start gdm. Why is this? Will it
264 help to move to xdm or kdm? Testing xdm first, as it is supposed to
265 be the trivial implementation. Copied the dependencies from gdm, ran
266 insserv, and booted the
386 2.4 kernel. This boot took
36 seconds
267 (
<a href=
"2005-08-01T21:57-bootchart.png">2005-
08-
01T21:
57-bootchart.tgz
</a>).
269 <p>Next try was to comment out all the entries in /etc/modules, as I
270 suspect the IDE bus probing is spending a considerable amount of time
271 at the start of the boot. Commented out ide-cd, ide-detect, ide-disk
272 and psmouse. If these modules are needed on the machine in question,
273 I expect hotplug and discover to take care of the loading. I must
274 have been right, as the boot took
29 seconds
275 (
<a href=
"2005-08-01T22:01-bootchart.png">2005-
08-
01T22:
01-bootchart.tgz
</a>).
276 Neither of the modules were loaded after boot, and the machine was
277 still working and the mouse still responsive.
279 <p>I switched back to gdm, and rebooted with the
2.4 SMP kernel to see
280 how this change affected SMP. This boot took
60 seconds
281 (
<a href=
"2005-08-01T22:06-bootchart.png">2005-
08-
01T22:
06-bootchart.tgz
</a>).
283 <p>At this point, it was about time to pack up and leave the test lap, so
284 I ended the testing here.