advanced network booting

I use network booting a lot. Very convenient, I rarely have to fiddle with boot iso images these days. My setup has evolved over the years, and I'm going to describe here how it looks today.

general intro

First thing needed is a machine which runs the dhcp server. In a typical setups the home internet router also provides the dhcp service, but usually you can't configure the dhcp server much. So I turned off the dhcp server in the router and configured a linux machine with dhcpd instead. These days this is a raspberry pi, running fedora 24, serving dhcp and dns.

I used to run this on my x86 server acting as NAS, but that turned out to be a bad idea. We have power failures now and then, the NAS goes check the filesystems at boot, which can easily take half an hour, and there used to be no dhcp and dns service during that time. In contrast the raspberry pi is back to service in less than a minute.

Obviously the raspberry pi itself can't get a ip address via dhcp, it needs a static address assigned instead. systemd-networkd does the job with this ethernet.network config file:

[Match]
Name=eth0

[Network]
Address=192.168.2.10/24
Gateway=192.168.2.1
DNS=127.0.0.1
Domains=home.kraxel.org

Address=, Gateway= and Domain= must be adjusted according to your local network setup of course. The same applies to all other config file snippets following below.

dhcp setup

Ok, lets walk through the dhcpd.conf file:

option arch code 93 = unsigned integer 16;      # RFC4578

Defines option arch which will be used later.

default-lease-time 86400;                       # 1d
max-lease-time 604800;                          # 7d

With long lease times there will be less chatter in the log file, also nobody will loose network connectivity in case the dhcpd server is down for a moment (for example when fiddeling with the config and dhcpd not restarting properly due to syntax errors).

ddns-update-style none;
authoritative;

No ddns used here. I assign static ip addresses instead (more on this below).

subnet 192.168.2.0 netmask 255.255.255.0 {
        # network
        range 192.168.2.200 192.168.2.249;
        option routers 192.168.2.1;

Default ip address and network of my router (Fritzbox). The small range configured here is used for dynamic ip addresses only, the room below 200 is left to static ip addresses.

        # dns, ntp
        option domain-name "home.kraxel.org";
        option domain-name-servers 192.168.2.10;
        option ntp-servers ntp.home.kraxel.org;

Oh, right, almost forgot to mention that, the raspberry pi also runs ntp so not each and every machine has to talk to the ntp pool servers. So we announce that here (together with the dns server), so the dhcp clients can pick up that information.

Nothing tricky so far, now the network boot configuration starts:

        if (option arch = 00:0b) {
                # EFI aarch64
                next-server 192.168.2.14;
                filename "efi-a64/grubaa64.efi";

Here I use the option arch to figure what client is booting, to pick a matching boot file. This is for 64bit arm machines, loading grub2 efi binary. grub in turn will look for a grub.cfg file in the efi-a64/ directory, so placing stuff in sub-directories separates things nicely.

        } else if (option arch = 00:09) {
                # EFI x64 -- ipxe
                next-server 192.168.2.14;
                filename "efi-x64/shim.efi";
        } else if (option arch = 00:07) {
                # EFI x64 -- ovmf
                next-server 192.168.2.14;
                filename "efi-x64/shim.efi";

Same for x86_64. For some reason ovmf (with the builtin virtio-net driver) and ipxe efi drivers use different arch ids to signal x86_64. I simply list both here to get things going no matter what.

Update (Nov 9th): 7 is the correct value for x86_64. Looks like RfC 4578 got this wrong initially. There is an errata for this. ipxe is fixed meanwhile.

        } else if ((exists vendor-class-identifier) and
                   (option vendor-class-identifier = "U-Boot.armv8")) {
                # rpi 3
                next-server 192.168.2.108;
                filename "rpi3/boot.conf";

This is uboot (64bit) on a raspberry pi 3 trying to netboot. Well, any aarch64 to be exact, but I don't own any other device. The boot.conf file is a dummy and doesn't exist. uboot will pick up the rpi3/ sub-directory though, and after failing to load boot.conf it will try to boot pxelinux style, i.e. look for config files in the rpi3/pxelinux.cfg/ directory.

        } else if ((exists vendor-class-identifier) and
                   (option vendor-class-identifier = "U-Boot.armv7")) {
                # rpi 2
                next-server 192.168.2.108;
                filename "rpi2/boot.conf";

Same for armv7 (32bit) uboot, raspberry pi 2 in my case. raspberry pi 3 with 32bit uboot will land here too.

        } else if ((exists user-class) and ((option user-class = "iPXE") or
                                            (option user-class = "gPXE"))) {
                # bios -- gpxe/ipxe
                next-server 192.168.2.14;
                filename "http://192.168.2.14/tftpboot/pxelinux.0";

This is for ipxe/gpxe on classic bios-based x86 machines. ipxe can load files over http, and pxelinux running with ipxe as network driver can do this too. So we can specify a http url as filename, and it'll work fine and faster than using tftp.

        } else {
                # bios -- chainload gpxe
                next-server 192.168.2.14;
                filename "gpxe-undionly.kpxe";
        }

Everything else chainloads gpxe. I think this is unused since ages, new physical machines have EFI these days and virtual machines already use gpxe/ipxe roms when running with seabios.

}
include "/etc/dhcp/home.4.dhcp.inc";
include "/etc/dhcp/xeni.4.dhcp.inc";

Here are the static ip addresses assigned. This is in a include file because these files are generated by a script. I basically have a file with a table listing hostname, mac address and ip address, and my script generates include files for dhcpd.conf and dns zone files from that. Inside the include there are lots of entries looking like this:

host photosmart {
        hardware ethernet 10:60:4b:11:71:34;
        fixed-address 192.168.2.53;
}

So all machines get a static ip address assigned, based on the mac address. Together with the dns configuration this allows to connect to the machines by hostname.

In case you want a different netboot configuration for a specific host it is also possible to add next-server and filename entries here to override the network defaults configured above.

That's it for dhcp.

tftp setup

Ok, an tftp server is needed too. That can run on the same machine, but doesn't has to. You can even have multiple machines serving tftp. You might have noticed that not all entries above have the same next-server. The raspberry pi entries point to my workstation where I cross-compile arm kernels now and then, so I can boot them directly. All other entries point to the NAS where all the install trees are located. Getting tftp running is easy these days:

dnf install tftp-server
systemctl enable tftp
Place the boot files in /var/lib/tftpboot

Placing the boot files is left as an exercise to the reader, otherwise this becomes too long. Maybe I'll do a separate post on the topic later.

For now just a general hint: in.tftpd has a --verbose switch. Turning that on and watching the log is a good way to see which files clients are trying to load. Helpful for trouble-shooting.

httpd setup

Only needed if you want boot over http as mentioned above. Unless you have apache httpd already running you have to install and enable it of course. Then you can drop this tftpboot.conf file into /etc/httpd/conf.d to make /var/lib/tftpboot available over http too:

<Directory "/var/lib/tftpboot">
        Options Indexes FollowSymLinks Includes
        AllowOverride None
        Require ip 127.0.0.1/8 192.168.0.0/16
</Directory>
Alias   "/tftpboot"      "/var/lib/tftpboot"

libvirt setup

So, of course I want netboot my virtual machines too, even if they are on a NAT-ed libvirt network. In that case they are not using the dhcp server running on my raspberry pi, but the dnsmasq server started by libvirt on the virtualization host. Luckily it is possible to configure network booting in the network configuration, like this:

<network>
  <name>default</name>
  <forward mode='nat'/>
  <ip address='192.168.123.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.123.10' end='192.168.123.99'/>
      <bootp file='http://192.168.2.14/tftpboot/pxelinux.0'/>
    </dhcp>
  </ip>
</network>

That'll work just fine for seabios guests, but not when running ovmf. Unfortunately libvirt doesn't support serving different boot files depending in client architecture. But there is an easy way out: Just define a separate network for ovmf guests:

<network>
  <name>efi-x64</name>
  <forward mode='nat'/>
  <ip address='192.168.132.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.132.10' end='192.168.132.99'/>
      <bootp file='efi-x64/shim.efi' server='192.168.2.14'/>
    </dhcp>
  </ip>
</network>

Ok, almost there. While the tcp-based http protocol goes through the NAT forwarding without a hitch the udp-based tftp protocol doesn't. It needs some extra help. The nf_nat_tftp kernel module handles that. You can use the systemd module load service to make sure it gets loaded on each boot.

using qemu directly

The qemu user network stack has netboot support too, you point it to the tftp server this way:

qemu-system-x86_64 \
        -netdev user,id=net0,bootfile=http://192.168.2.14/tftpboot/pxelinux.0 \
        -device virtio-net-pci,netdev=net0 \
        [ more args follow here ]

In case the tftpboot directory is available on the local machine it is also possible to use the builtin tftp server instead:

qemu-system-x86_64 \
        -netdev user,id=net0,tftp=/var/lib/tftpboot,bootfile=pxelinux.0 \
        -device virtio-net-pci,netdev=net0 \
        [ more args follow here ]