Advanced network booting for virtual machines

Network booting is cool. Once you have setup everything you can stop juggling iso images in your virtual machine configs. Instead you just kick a network boot and pick whatever you want install from the boot menu delivered by the boot server.

This article is not about the basics of setting up a boot server. The internet has tons of tutorials on how to install a tftp server and how to boot your favorite OS from tftp. This article will focus on configuring network boot for libvirt-managed virtual machines.

Before we get started ...

The config file snippets are examples from my home network, home.kraxel.org is the local domain and 192.168.2.14 is the machine acting as boot server here. You have to replace those to match your setup of course. The same is true for the boot file names.

The default libvirt network uses 192.168.122.0/24. In case you use that unmodified these addresses will work fine for you and in fact they should already be in your libvirt network configuration. If you have changed the default libvirt network I expect you know what you have to do 😎.

Step one: very basic netboot setup

That is pretty simple. libvirt has support for that, so all you have to do is adding a bootp tag with the ip address of your tftp server and the boot file name to the network config.

<network>
  [ ... ]
  <ip address='192.168.122.1' netmask='255.255.255.0'>                                        
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254'/>                                    
      <bootp file='pxelinux.0' server='192.168.2.14'/>
    </dhcp>
  </ip>
</network>

You can edit the network configuration using virsh net-edit name. The default libvirt network is simply named default. The network needs an restart to apply any changes (virsh net-destroy name; virsh net-start name).

That was easy, right? Well, maybe not. In case this is not working for you try running modprobe nf_nat_tftp. tftp uses udp, which means there are no connections at ip level, so the kernel has to look into the tftp packets to figure how to route them correctly for a masqueraded network. The nf_nat_tftp kernel module does exactly that.

Note: Recent libvirt versions seem to take care to load nf_nat_tftp if needed, so there is a chance this works out-of-the-box for you.

Neverthelless that leads straight to the question: do we actually need tftp?

Step two: replace tftp with http

As you might have guessed the answer is no.

The ipxe boot roms support booting from http, by simply specifying an URL instead of a filename as bootfile. This was never formally specified though, so unfortunaly you can't expect this to work with every boot rom. For qemu-powered virtual machines this isn't a problem at all because the qemu boot roms are built from ipxe. With physical machines you might have to hop though some extra loops to chainload ipxe (not covered here).

The easiest way to get this going is to install apache on your tftp boot server, then configure a virtual host with the tftproot as document root. You can do so by dropping a snippet like this into /etc/httpd/conf.d/:

<Directory "/var/lib/tftpboot">
        Options Indexes FollowSymLinks
        AllowOverride None
	Require all granted
</Directory>
<VirtualHost *:80>
        ServerName boot.home.kraxel.org
        DocumentRoot /var/lib/tftpboot
</VirtualHost>

Enabling Indexing is not needed for boot server functionality, but might be handy if you want access the boot server with your web browser for trouble-shooting.

Using the tftproot as document root has the advantage that the paths are identical for both tftp and http boot, so your pxelinux and grub configuration files should continue to work unmodified.

Now you can go edit your libvirt network config and replace the bootp configuration with this:

<bootp file='http://boot.home.kraxel.org/pxelinux.0'/>

Done. Don't forget to restart the network to apply the changes. Booting should be noticable faster now (especially when fetching larger initrds), and any NAT traversal problems should be gone too.

Extra tip for lazy people

When using http you can boot from pretty much any server on the internet, there is no need to setup your own. You can use for example the boot server provided by netboot.xyz with a large collection of operating systems available as live systems and for install. Here is the bootp snippet for this:

<bootp file='http://boot.netboot.xyz/ipxe/netboot.xyz.lkrn'/>

In most cases probably want have a local boot server for faster installs. But for a one time test install of a new distro this might be more handy than downloading the install iso.

Step three: what about UEFI?

For EFI guests the pxelinux.0 is pretty much useless indeed, so we must do something else for them. First question is how do we figure this is a EFI guest asking for a boot file? Lets have a look at the dhcp request, BIOS guest goes first. Captured using tcpdump -i virbr0 -v port bootps:

[ ... ] 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:89:32:47 [ ... ]
	  Client-Ethernet-Address 52:54:00:89:32:47 (oui Unknown)
	  Vendor-rfc1048 Extensions
            [ ... ]
	    ARCH Option 93, length 2: 0
	    Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"

Now a request from a (x64) EFI guest:

[ ... ] 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:89:32:47 [ ... ]
	  Client-Ethernet-Address 52:54:00:89:32:47 (oui Unknown)
	  Vendor-rfc1048 Extensions
            { ... ]
	    ARCH Option 93, length 2: 7
	    Vendor-Class Option 60, length 32: "PXEClient:Arch:00007:UNDI:003001"

See? The EFI guest uses arch 7 instead of 0, in both option 93 and option 60. So we will use that.

Unfortunaly libvirt has no direct support for that. But libvirt uses dnsmasq as dhcp (and dns) server for the virtual networks. dnsmasq has support for this, and starting with libvirt version 5.6.0 it is possible to specify any dnsmasq config option in your libvirt network configuration using the dnsmasq xml namespace.

dnsmasq uses the concept of tags to implement this. Requests can be tagged using matches, and configurartion directives can be applied to requests with certain tags. So, here is how it looks like, using the efi-x64-pxe tag for x64 efi guests and /arch-x86_64/grubx64.efi as bootfile.

<network xmlns:dnsmasq='http://libvirt.org/schemas/network/dnsmasq/1.0'>
  [ ... ]
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254'/>
      <bootp file='http://boot.home.kraxel.org/pxelinux.0'/>
    </dhcp>
  </ip>
  <dnsmasq:options>
    <dnsmasq:option value='#'/>
    <dnsmasq:option value='dhcp-match=set:efi-x64-pxe,option:client-arch,7'/>
    <dnsmasq:option value='dhcp-boot=tag:efi-x64-pxe,/arch-x86_64/grubx64.efi,,192.168.2.14'/>
  </dnsmasq:options>
</network>

dnsmasq uses '#' for comments, and it is here only to visually separate entries a bit. It will also be in the dnsmasq config files created by libvirt (in /var/lib/libvirt/dnsmasq/).

Step four: Can UEFI guests use http too?

Sure. You might have already noticed that the UEFI boot manager has both UEFI PXEv4 and UEFI HTTPv4 entries. Here is what happens when you pick the latter:

[ ... ] 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:89:32:47 [ ... ]
	  Client-Ethernet-Address 52:54:00:89:32:47 (oui Unknown)
	  Vendor-rfc1048 Extensions
            [ ... ]
	    ARCH Option 93, length 2: 16
	    Vendor-Class Option 60, length 33: "HTTPClient:Arch:00016:UNDI:003001"

It's arch 16 now. Also option 60 starts with HTTPClient instead of PXEClient. So we can simply add another arch match to identify http clients.

Another detail we need to take care of is that the UEFI http boot client expect a reply with option 60 set to HTTPClient, otherwise it will be ignored. So we need to take care of that too, using dhcp-option-force. Here we go, using tag efi-x64-http for http clients:

<network xmlns:dnsmasq='http://libvirt.org/schemas/network/dnsmasq/1.0'>
  [ ... ]
  <dnsmasq:options>
    <dnsmasq:option value='#'/>
    <dnsmasq:option value='dhcp-match=set:efi-x64-pxe,option:client-arch,7'/>
    <dnsmasq:option value='dhcp-boot=tag:efi-x64-pxe,/arch-x86_64/grubx64.efi,,192.168.2.14'/>
    <dnsmasq:option value='#'/>
    <dnsmasq:option value='dhcp-match=set:efi-x64-http,option:client-arch,16'/>
    <dnsmasq:option value='dhcp-boot=tag:efi-x64-http,http://boot.home.kraxel.org/arch-x86_64/grubx64.efi'/>
    <dnsmasq:option value='dhcp-option-force=tag:efi-x64-http,60,HTTPClient'/>
  </dnsmasq:options>
</network>

Extra tip for lazy people, now with UEFI

Complete example, defining a new libvirt network named netboot.xyz. You can store that in some file, then use virsh net-define file to create the network.

<network xmlns:dnsmasq='http://libvirt.org/schemas/network/dnsmasq/1.0'>
  <name>netboot.xyz</name>
  <forward mode='nat'/>
  <bridge name='netboot0' stp='on' delay='0'/>
  <ip address='192.168.123.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.123.10' end='192.168.123.99'/>
      <bootp file='http://boot.netboot.xyz/ipxe/netboot.xyz.lkrn'/>
    </dhcp>
  </ip>
  <dnsmasq:options>
    <dnsmasq:option value='dhcp-match=set:efi-x64-http,option:client-arch,16'/>
    <dnsmasq:option value='dhcp-boot=tag:efi-x64-http,http://boot.netboot.xyz/ipxe/netboot.xyz.efi'/>
    <dnsmasq:option value='dhcp-option-force=tag:efi-x64-http,60,HTTPClient'/>
  </dnsmasq:options>
</network>

Then, in your guest domain configration, use <source network='netboot.xyz'/> to use the new network. With this both BIOS and UEFI guests can netboot from netboot.xyz. With UEFI you have to take care to pick the UEFI HTTPv4 entry from the firmware boot menu.

Step five: architecture experiments

There is a world beyond x86. The arch field does not only specify the system architecture (bios vs. uefi) or the boot protocol (pxe vs. http), but also the cpu architecture. Here are the ones relevant for qemu:

Code	Architecture
0x00	BIOS pxeboot (both i386 and x86_64)
0x06	EFI pxeboot, IA32 (i386)
0x07	EFI pxeboot, X64 (x86_64)
0x0a	EFI pxeboot, ARM (v7)
0x0b	EFI pxeboot, AA64 (v8 / aarch64)
0x12	powerpc64
0x16	EFI httpboot, X64
0x18	EFI httpboot, ARM
0x19	EFI httpboot, AA64
0x31	s390x

So, if you want play with arm or powerpc without owning such a machine you can let qemu emulate it with tcg. If you want netboot it -- no problem, just add a few more lines to your network configuration. Here is an example for aarch64:

<network xmlns:dnsmasq='http://libvirt.org/schemas/network/dnsmasq/1.0'>
  [ ... ]
  <dnsmasq:options>
    [ ... ]
    <dnsmasq:option value='#'/>
    <dnsmasq:option value='dhcp-match=set:efi-aa64-pxe,option:client-arch,b'/>
    <dnsmasq:option value='dhcp-boot=tag:efi-aa64-pxe,/arch-aarch64/grubaa64.efi,,192.168.2.14'/>
    <dnsmasq:option value='#'/>
    <dnsmasq:option value='dhcp-match=set:efi-aa64-http,option:client-arch,19'/>
    <dnsmasq:option value='dhcp-boot=tag:efi-aa64-http,http://boot.home.kraxel.org/arch-aarch64/grubaa64.efi'/>
    <dnsmasq:option value='dhcp-option-force=tag:efi-aa64-http,60,HTTPClient'/>
  </dnsmasq:options>
</network>

In case you are wondering why I place the grub binaries in subdirectories: grub tries fetch the config file from the same directory, so that way I get per-arch config files and they are named /arch-aarch64/grub.cfg, /arch-x86_64/grub.cfg and so on. A nice side effect is that the toplevel directory is a bit less cluttered with files.

And beyond libvirt?

Well, the fundamental idea doesn't change. Look at arch option, then send different replies depending on what you find there. With other dhcp servers the syntax is different, but the pattern is the same. Here is a sample snippet for the isc dhcp server shipped with most linux distributions:

option arch code 93 = unsigned integer 16;

subnet 192.168.2.0 netmask 255.255.255.0 {
        [ ... ]

        if (option arch = 00:16) {
		option vendor-class-identifier "HTTPClient";
		filename "http://boot.home.kraxel.org/arch-x86_64/grubx64.efi";
	} else if (option arch = 00:07) {
		next-server 192.168.2.14;
		filename "/arch-x86_64/grubx64.efi";
	} else {
		next-server 192.168.2.14;
		filename "/pxelinux.0";
	}
}