The test machine in question is a COMPAQ ProLiant DL360 with 2CPU 800/133 MHz, 1.5GiB RAM and 2x18GiB SCSI disks in HW-RAID.
Installed etch using a sarge netinst CD and pointing APT to 'testing'. Install the desktop task. The kernel is 2.4.27-2-386.
Installed the bootchart package from http://www.minet.uni-jena.de/~joergs/bootchart/bootchart_0.8-2_all.deb
Booted grub after adding 'init=/sbin/bootchartd' to the kernel arguments. Archive the resulting /var/log/bootchart.tgz as 2005-08-01T18:07-bootchart.tgz. This boot took 45 seconds. S20modutils is the largest single contributor and used about 13 seconds of this period.
Installed kernel kernel-image-2.4.27-2-686-smp and rebooted with bootchart, to see if this make a difference. The stats were saved as 2005-08-01T18:16-bootchart.tgz. This boot took 1 minute 33 seconds. Again S20modutils is the slowest part of the boot, spending 23 seconds of it.
Time to reorder the boot sequence according to dependencies. Document the default boot order:
# ls /etc/rcS.d/ README S20module-init-tools S36udev-mtab S48console-screen.sh S02mountvirtfs S20modutils S38pppd-dns S50hwclock.sh S04udev S25libdevmapper1.01 S39dns-clean S55bootmisc.sh S05bootlogd S26lvm S39ifupdown S55urandom S05initrd-tools.sh S30checkfs.sh S40hostname.sh S70nviboot S05keymap.sh S30procps.sh S40hotplug S70xfree86-common S10checkroot.sh S35mountall.sh S40networking S75sudo S18hwclockfirst.sh S36discover S43portmap S18ifupdown-clean S36mountvirtfs S45mountnfs.sh # ls /etc/rc2.d/ S10sysklogd S18portmap S20exim4 S20makedev S21nfs-common S99gdm S11klogd S20dbus-1 S20inetd S20ssh S89atd S99rmnologin S14ppp S20dirmngr S20lpd S21fam S89cron S99stop-bootlogd #
Discovering that a few packages was not upgraded to from sarge to testing. Doing a 'aptitude dist-upgrade' to get these packages upgrade: dash eject fdutils grub hotplug initrd-tools kernel-image-2.4.27-2-386 lvm2 ppp pppconfig pppoeconf psmisc usbutils
Realises that a new boot is needed to get status for the packages in etch and not in sarge, as several of the upgraded packages have init.d scripts.
Booted first using -686-smp kernel. This took 1 minute 31 seconds (2005-08-01T18:31-bootchart.tgz). Still S20modutils spending most of the time.
Next
booted with the -383 kernel. This took 46 seconds
(2005-08-01T18:36-bootchart.tgz).
Someone claimed switching to dash instead of bash should speed up the boot process. I decided to test it with the -386 kernel. The default was sh->bash. Changed this to sh->dash and rebooted. The boot still took 46 seconds.
Finally, I installed and executed insserv -v and saved the output (2005-08-01T18:45-insserv-1.log). To verify that the result was stable, I ran it again and saved the output (2005-08-01T18:45-insserv-2.log). This time rc[2345].d/S19stop-bootlogd was moved to S20stop-bootlogd. I suspect the dependencies of stop-bootlogd are a bit strange, as this is a script expected to be executed several times in the boot process. Running insserv a third time didn't give any changes, so I did not save that output.
Checking the resulting boot sequence seemed to make fairly good sense to me:
# ls /etc/rcS.d README S03initrd-tools.sh S08checkfs.sh S13hotplug-net S01hostname.sh S04keymap.sh S09mountall.sh S13mountnfs.sh S01hwclockfirst.sh S05checkroot.sh S11bootmisc.sh S15console-screen.sh S01mountvirtfs S06discover S11dns-clean S15nviboot S02bootlogd S06hotplug S11networking S15xfree86-common S02module-init-tools S06ifupdown-clean S11sudo S16hwclock.sh S02modutils S07ifupdown S11udev-mtab S17ntpdate S02procps.sh S07libdevmapper1.01 S11urandom S43portmap S02udev S07lvm S12pppd-dns # ls /etc/rc2.d/ S11makedev S15nfs-common S19klogd S21cron S21exim4 S21lpd S14portmap S18rmnologin S20stop-bootlogd S21dbus-1 S21gdm S21ppp S15fam S18sysklogd S21atd S21dirmngr S21inetd S21ssh #
Time to reboot again. Used the 386 kernel as this was the fastest boot. gdm came up with a login promt way before the machine was finished booting. The network failed to start as the network modules was not loaded. Running '/etc/init.d/discover start && /etc/init.d/networking start' fixes this problem. The new boot took 28 seconds (2005-08-01T18:55-bootchart.tgz).
Tried modifying the dependencies of discover, lvm and hotplug to get the network drivers loaded properly. Succeeded and the new boot order ended up like this:
# ls /etc/rcS.d/ README S05checkroot.sh S10libdevmapper1.01 S01hostname.sh S06checkfs.sh S10networking S01hwclockfirst.sh S06ifupdown-clean S10pppd-dns S01mountvirtfs S07ifupdown S12hotplug-net S02bootlogd S07mountall.sh S12mountnfs.sh S02module-init-tools S09bootmisc.sh S14console-screen.sh S02modutils S09discover S14nviboot S02procps.sh S09dns-clean S14xfree86-common S02udev S09hotplug S15hwclock.sh S03initrd-tools.sh S09sudo S16ntpdate S03lvm S09udev-mtab S43portmap S04keymap.sh S09urandom #
This new boot took 36 seconds (2005-08-01T19:22-bootchart.tgz), and the network started as it should.
Tested with the -686-smp kernel as well. This new boot with the SMP kernel took 58 seconds (2005-08-01T19:33-bootchart.tgz) .
It was time to test the parallel booting. I ran 'cp /usr/share/doc/insserv/examples/rc* /etc/init.d/', and rebooted with the 386 kernel. This time to boot took 34 seconds (2005-08-01T19:38-bootchart.tgz). The gain was not very impressive, but it was slightly faster.
The SMP boot on the other hand surprised me quite a bit. The boot took 1 minute 9 seconds (2005-08-01T19:41-bootchart.tgz), and there is a large section of the boot time unaccounted for. From 10 to 20 seconds into the boot, nothing is running but the CPU is used quite a bit. I was so surprised I had to try again to see if this happened every time. The new boot took the exact same amount of time (2005-08-01T19:49-bootchart.tgz), and had the same gap in the boot process where nothing is using an increasing amount of CPU. I suspect some kernel module must be spending the resources.
Trying to squease more parallelism in the boot process, I adjusted the dependencies slightly to end up with this sequence:
# ls /etc/rcS.d/ README S03lvm S08networking S01hostname.sh S04checkfs.sh S08pppd-dns S01hwclockfirst.sh S04ifupdown-clean S10hotplug-net S01mountvirtfs S05ifupdown S10mountnfs.sh S02bootlogd S05mountall.sh S12console-screen.sh S02keymap.sh S07bootmisc.sh S12hwclock.sh S02module-init-tools S07discover S12nviboot S02modutils S07dns-clean S12sudo S02procps.sh S07hotplug S12xfree86-common S02udev S07udev-mtab S13ntpdate S03checkroot.sh S07urandom S13portmap S03initrd-tools.sh S08libdevmapper1.01 #
The new boot with the SMP kernel still took 1 minute 9 seconds (2005-08-01T20:08-bootchart.tgz).
Next, I tried to optimize the gdm startup, moving it ahead in the rc2.d sequence. After some dependency adjustements, I ended up with this sequence:
# ls /etc/rc2.d/ S07makedev S12fam S14sysklogd S17atd S17dirmngr S17ppp S07rmnologin S12gdm S15klogd S17cron S17exim4 S17ssh S11portmap S12nfs-common S16stop-bootlogd S17dbus-1 S17lpd S20inetd #
This new sequence was 1 second slower for the SMP kernel. It took 1 minute 10 seconds (2005-08-01T20:35-bootchart.tgz). I suspect the issue here are IO contention, were several processes want to read from the disk at the same time.
Next, I tested with the 386 kernel, and this boot took 34 seconds (2005-08-01T20:38-bootchart.tgz).
To see if the results were modified by using a 2.6 kernel, I installed kernel-image-2.6.11-1-386 and kernel-image-2.6.11-1-686-smp. I was asked how to configure irqbalance. I selected the default (no). After installing these kernels, I booted the 386 kernel first. This boot took 36 seconds (2005-08-01T20:55-bootchart.tgz). This new boot included more stats in the graphs, and a few new processes (udevd, hald).
Next, I booted the 2.6 SMP kernel, and this boot took 1 minute 6 seconds (2005-08-01T20:58-bootchart.tgz). Still almost double the time of the 2.4 kernel.
Portmap tries to start twice. Once from rcS.d, and another from rc2.d. It does a sleep 1 every time. By modifying the script to exit imediately if portmap already is running, I should be able to cut down the boot time by one second. Testing with this patch.
--- /etc/init.d/portmap.orig 2005-09-01 21:03:40.000000000 +0200 +++ /etc/init.d/portmap 2005-09-01 21:05:40.000000000 +0200 @@ -13,6 +13,11 @@ case "$1" in start) + pid=`pidof portmap` + if [ -n "$pid" ] ; then + echo "Not starting portmap daemon. Already running." + exit 0 + fi echo -n "Starting portmap daemon:" echo -n " portmap" start-stop-daemon --start --quiet --oknodo --exec /sbin/portmap -- $OPTIONS
I booted the 2.4 386 kernel, and this worked just fine. Now gdm started imediately as the rc2.d scripts started, and the boot time is 34 seconds (2005-08-01T21:09-bootchart.tgz). The total boot time to be the same as before, which is rather strange. Anyway, the change seem sensible enough, so I submitted it to BTS as bug #326100.
Next, I realise that ntpdate and portmap do not really need to run in rcS.d/. Moving them both to rc2.d and running insserv -v again, it is time to try a new boot.
The new boot worked fine, but it still took 34 seconds to boot with the 386 kernel (2005-08-01T21:21-bootchart.tgz).
Time to look at readahead? I tried with a trivial implementation,
using 'nice cat It takes almost 10 seconds to start gdm. Why is this? Will it
help to move to xdm or kdm? Testing xdm first, as it is supposed to
be the trivial implementation. Copied the dependencies from gdm, ran
insserv, and booted the 386 2.4 kernel. This boot took 36 seconds
(2005-08-01T21:57-bootchart.tgz).
Next try was to comment out all the entries in /etc/modules, as I
suspect the IDE bus probing is spending a considerable amount of time
at the start of the boot. Commented out ide-cd, ide-detect, ide-disk
and psmouse. If these modules are needed on the machine in question,
I expect hotplug and discover to take care of the loading. I must
have been right, as the boot took 29 seconds
(2005-08-01T22:01-bootchart.tgz).
Neither of the modules were loaded after boot, and the machine was
still working and the mouse still responsive.
I switched back to gdm, and rebooted with the 2.4 SMP kernel to see
how this change affected SMP. This boot took 60 seconds
(2005-08-01T22:06-bootchart.tgz).
At this point, it was about time to pack up and leave the test lap, so
I ended the testing here.