+++ date = 2024-02-08 title = "Arch Linux: Improve boot time performance" tags = ["homelab", "arch", "linux"] +++ {{% figure src="/images/2024/02/arch-linux-logo.png" class="side-image" %}} I run Debian on all my servers. It's a great stable OS and I love it. Proxmox, [which I run on my homelab server](https://www.devroom.io/2020/11/12/the-big-diy-nas-update/#proxmox), is also based on Debian. However, on my desktop I run [Arch Linux](https://archlinux.org/). It's a great distro to tinker with. It comes with a lot of _up to date_ packages, but it also has the AUR - Arch User Repository. So for any app you can find, there probably is an easy way to install it. ### Slllooooowwww... As of late, I noticed that boot times on my system were getting longer. Which is strange, because I run some pretty okay hardware. As it turns out, cold booting this box takes 1min 7.538s, according to my logs. Luckily, the [Arch Wiki](https://wiki.archlinux.org/) offers a [nice guide on how to trouble shoot boot performance](https://wiki.archlinux.org/title/Improving_performance/Boot_process). There's `systemd-analyze blame` which will show the time it takes each service to start up. I've copied the top 10 here, which incidentally are also all >1 second start-up times.
❯ systemd-analyze blame 20.771s docker.service 3.514s dev-sdb3.device 2.459s systemd-journal-flush.service 1.880s upower.service 1.806s ldconfig.service 1.687s systemd-tmpfiles-setup.service 1.587s containerd.service 1.287s systemd-modules-load.service 1.032s systemd-fsck@dev-disk-by\x2duuid-96EB\x2d4C82.service 1.028s cups.serviceDocker is a clear offender here. `dev-sdb3` is also quite slow it seems. Another command recommended in the wiki is `systemd-analyze critical-chain`. This will show you the critical chain to boot your system. Again, docker is here clearly a big offender.
❯ systemd-analyze critical-chain The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character. graphical.target @33.660s └─multi-user.target @33.660s └─docker.service @12.888s +20.771s └─containerd.service @11.264s +1.587s └─network.target @11.236s └─wpa_supplicant.service @27.465s +268ms └─basic.target @10.366s └─dbus-broker.service @9.822s +541ms └─dbus.socket @9.793s └─sysinit.target @9.759s └─systemd-update-done.service @9.722s +36ms └─systemd-journal-catalog-update.service @9.375s +326ms └─systemd-tmpfiles-setup.service @7.657s +1.687s └─local-fs.target @7.587s └─boot.mount @7.458s +128ms └─systemd-fsck@dev-disk-by\x2duuid-96EB\x2d4C82.service @6.398s +1.032s └─dev-disk-by\x2duuid-96EB\x2d4C82.device @6.397sBut wait, there's more. `systemd-analyze plot > plot.svg` will generate an SVG image showing you the entire boot process in time. It's big, but there are some clear red markers that indicate issues. At the bottom right you'll find `graphical.target`, where we want to end up as quickly as possible. And it's clear `docker` is in the way. ![](/images/2024/02/pre-plot.svg) _Open the SVG in a new window to see more detail._ ## Fixed it! So, with `docker` as a clear offender in slowing down the boot process, let's fix that. There are two systemd units: `docker.service` and `docker.socket`. - `docker.service` is there to start docker and make sure it is up and running. - `docker.socket` listens on `/run/docker.sock` (or `/var/run/docker.sock` through a symlink) and will start `docker.service` when needed. I think you know where this is going. `docker.socket` is disabled by default and `docker.service` is enabled. Which makes sense, because when you boot your machine you want docker up and running as well. Especially for servers this makes sense. For my desktop, not so much. I use docker, but not always and I prefer to login and check my email while docker is booting in the background anyway. The trick thus is to disable `docker.service` from starting automatically and make sure `docker.socket` is enabled. That will take docker out of the criticial chain when booting and start docker when I'm logged in and ready to use it. ``` $ sudo systemctl disable docker.service $ sudo systemctl enable docker.socket ``` So, what does that look like in `systemd-analyze`?
❯ systemd-analyze critical-chain The time when unit became active or started is printed after the "@" character. The time the unit took to start is printed after the "+" character. graphical.target @3.893s └─multi-user.target @3.893s └─cups.service @3.672s +220ms └─nss-user-lookup.target @3.763s
❯ systemd-analyze blame 2.152s systemd-modules-load.service 1.295s dev-sdb3.device 622ms boot.mount 385ms NetworkManager.service 310ms systemd-udev-trigger.service 280ms udisks2.service 258ms systemd-remount-fs.service 220ms cups.service 203ms user@1000.service 189ms systemd-tmpfiles-setup.service![](/images/2024/02/post_plot.svg) _Open the SVG in a new window to see more detail._
❯ systemctl status docker.socket ● docker.socket - Docker Socket for the API Loaded: loaded (/usr/lib/systemd/system/docker.socket; enabled; preset: disabled) Active: active (running) since Thu 2024-02-08 10:38:47 CET; 5min ago Triggers: ● docker.service Listen: /run/docker.sock (Stream) Tasks: 0 (limit: 38400) Memory: 0B (peak: 516.0K) CPU: 1ms CGroup: /system.slice/docker.socketand
❯ systemctl status docker.service ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; preset: disabled) Active: active (running) since Thu 2024-02-08 10:39:33 CET; 5min ago TriggeredBy: ● docker.socket Docs: https://docs.docker.com Main PID: 2522 (dockerd) Tasks: 42 Memory: 222.1M (peak: 235.7M) CPU: 797ms CGroup: /system.slice/docker.service └─2522 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock## Was it worth it? Before: > Startup finished in 14.729s (firmware) + 6.386s (loader) + 12.761s (kernel) + 33.661s (userspace) = 1min 7.538s graphical.target reached after 33.660s in userspace. After: > Startup finished in 13.735s (firmware) + 4.074s (loader) + 6.744s (kernel) + 3.893s (userspace) = 28.448s graphical.target reached after 3.893s in userspace. Total boot time went down from 1m8s to 28s. I cannot explain the difference in kernel boot time, but the userspace savings are significant. From here I could probably optimize more by compiling a customized kernel or using a different bootloader. Suspend to RAM would be even faster, but that feels like cheating against a hard boot. Hopefully this will give you some pointers in how to troubleshoot slow boot times on your machine.