Advanced network booting for virtual machines
Network booting is cool. Once you have setup everything you can stop juggling iso images in your virtual machine configs. Instead you just kick a network boot and pick whatever you want install from the boot menu delivered by the boot server.
This article is not about the basics of setting up a boot server. The internet has tons of tutorials on how to install a tftp server and how to boot your favorite OS from tftp. This article will focus on configuring network boot for libvirt-managed virtual machines.
Before we get started ...
The config file snippets are examples from my home
network, home.kraxel.org
is the local domain
and 192.168.2.14
is the machine acting as boot server
here. You have to replace those to match your setup of course. The
same is true for the boot file names.
The default libvirt network uses 192.168.122.0/24
. In
case you use that unmodified these addresses will work fine for you
and in fact they should already be in your libvirt network
configuration. If you have changed the default libvirt network I
expect you know what you have to do 😎.
Step one: very basic netboot setup
That is pretty simple. libvirt has support for that, so all you
have to do is adding a bootp
tag with the ip address of
your tftp server and the boot file name to the network config.
You can edit the network configuration using virsh
net-edit name
. The default libvirt network is simply
named default
. The network needs an restart to apply
any changes (virsh net-destroy name; virsh
net-start name
).
That was easy, right? Well, maybe not. In case this is not working
for you try running modprobe nf_nat_tftp
. tftp uses
udp, which means there are no connections at ip level, so the kernel
has to look into the tftp packets to figure how to route them
correctly for a masqueraded network. The nf_nat_tftp
kernel module does exactly that.
Note: Recent libvirt versions seem to take care to
load nf_nat_tftp
if needed, so there is a chance this
works out-of-the-box for you.
Neverthelless that leads straight to the question: do we actually need tftp?
Step two: replace tftp with http
As you might have guessed the answer is no.
The ipxe boot roms support booting from http, by simply specifying an URL instead of a filename as bootfile. This was never formally specified though, so unfortunaly you can't expect this to work with every boot rom. For qemu-powered virtual machines this isn't a problem at all because the qemu boot roms are built from ipxe. With physical machines you might have to hop though some extra loops to chainload ipxe (not covered here).
The easiest way to get this going is to install apache on your tftp
boot server, then configure a virtual host with the tftproot as
document root. You can do so by dropping a snippet like this
into /etc/httpd/conf.d/
:
Enabling Indexing is not needed for boot server functionality, but might be handy if you want access the boot server with your web browser for trouble-shooting.
Using the tftproot as document root has the advantage that the paths are identical for both tftp and http boot, so your pxelinux and grub configuration files should continue to work unmodified.
Now you can go edit your libvirt network config and replace
the bootp
configuration with this:
Done. Don't forget to restart the network to apply the changes. Booting should be noticable faster now (especially when fetching larger initrds), and any NAT traversal problems should be gone too.
Extra tip for lazy people
When using http you can boot from pretty much any server on the
internet, there is no need to setup your own. You can use for
example the boot server provided
by netboot.xyz with a large
collection of operating systems available as live systems and for
install. Here is the bootp
snippet for this:
In most cases probably want have a local boot server for faster installs. But for a one time test install of a new distro this might be more handy than downloading the install iso.
Step three: what about UEFI?
For EFI guests the pxelinux.0 is pretty much useless indeed, so we
must do something else for them. First question is how do we figure
this is a EFI guest asking for a boot file? Lets have a look at the
dhcp request, BIOS guest goes first. Captured using tcpdump
-i virbr0 -v port bootps
:
[ ... ] 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:89:32:47 [ ... ] Client-Ethernet-Address 52:54:00:89:32:47 (oui Unknown) Vendor-rfc1048 Extensions [ ... ] ARCH Option 93, length 2: 0 Vendor-Class Option 60, length 32: "PXEClient:Arch:00000:UNDI:002001"
Now a request from a (x64) EFI guest:
[ ... ] 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:89:32:47 [ ... ] Client-Ethernet-Address 52:54:00:89:32:47 (oui Unknown) Vendor-rfc1048 Extensions { ... ] ARCH Option 93, length 2: 7 Vendor-Class Option 60, length 32: "PXEClient:Arch:00007:UNDI:003001"
See? The EFI guest uses arch 7 instead of 0, in both option 93 and option 60. So we will use that.
Unfortunaly libvirt has no direct support for that. But libvirt uses dnsmasq as dhcp (and dns) server for the virtual networks. dnsmasq has support for this, and starting with libvirt version 5.6.0 it is possible to specify any dnsmasq config option in your libvirt network configuration using the dnsmasq xml namespace.
dnsmasq uses the concept of tags to implement this. Requests can be
tagged using matches, and configurartion directives can be applied
to requests with certain tags. So, here is how it looks like, using
the efi-x64-pxe
tag for x64 efi guests
and /arch-x86_64/grubx64.efi
as bootfile.
dnsmasq uses '#' for comments, and it is here only to visually
separate entries a bit. It will also be in the dnsmasq config files
created by libvirt (in /var/lib/libvirt/dnsmasq/
).
Step four: Can UEFI guests use http too?
Sure. You might have already noticed that the UEFI boot manager has
both UEFI PXEv4
and UEFI HTTPv4
entries.
Here is what happens when you pick the latter:
[ ... ] 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:89:32:47 [ ... ] Client-Ethernet-Address 52:54:00:89:32:47 (oui Unknown) Vendor-rfc1048 Extensions [ ... ] ARCH Option 93, length 2: 16 Vendor-Class Option 60, length 33: "HTTPClient:Arch:00016:UNDI:003001"
It's arch 16 now. Also option 60 starts
with HTTPClient
instead of PXEClient
.
So we can simply add another arch match to identify http clients.
Another detail we need to take care of is that the UEFI http boot
client expect a reply with option 60 set to HTTPClient
,
otherwise it will be ignored. So we need to take care of that too,
using dhcp-option-force
. Here we go, using
tag efi-x64-http
for http clients:
Extra tip for lazy people, now with UEFI
Complete example, defining a new libvirt network
named netboot.xyz
. You can store that in some file, then
use virsh net-define file
to create the network.
Then, in your guest domain configration, use <source
network='netboot.xyz'/>
to use the new network. With this
both BIOS and UEFI guests can netboot from netboot.xyz. With UEFI
you have to take care to pick the UEFI HTTPv4
entry
from the firmware boot menu.
Step five: architecture experiments
There is a world beyond x86. The arch field does not only specify the system architecture (bios vs. uefi) or the boot protocol (pxe vs. http), but also the cpu architecture. Here are the ones relevant for qemu:
Code | Architecture |
---|---|
0x00 | BIOS pxeboot (both i386 and x86_64) |
0x06 | EFI pxeboot, IA32 (i386) |
0x07 | EFI pxeboot, X64 (x86_64) |
0x0a | EFI pxeboot, ARM (v7) |
0x0b | EFI pxeboot, AA64 (v8 / aarch64) |
0x12 | powerpc64 |
0x16 | EFI httpboot, X64 |
0x18 | EFI httpboot, ARM |
0x19 | EFI httpboot, AA64 |
0x31 | s390x |
So, if you want play with arm or powerpc without owning such a machine you can let qemu emulate it with tcg. If you want netboot it -- no problem, just add a few more lines to your network configuration. Here is an example for aarch64:
In case you are wondering why I place the grub binaries in
subdirectories: grub tries fetch the config file from the same
directory, so that way I get per-arch config files and they are
named /arch-aarch64/grub.cfg
, /arch-x86_64/grub.cfg
and so on. A nice side effect is that the toplevel directory is a
bit less cluttered with files.
And beyond libvirt?
Well, the fundamental idea doesn't change. Look at arch option, then send different replies depending on what you find there. With other dhcp servers the syntax is different, but the pattern is the same. Here is a sample snippet for the isc dhcp server shipped with most linux distributions:
option arch code 93 = unsigned integer 16; subnet 192.168.2.0 netmask 255.255.255.0 { [ ... ] if (option arch = 00:16) { option vendor-class-identifier "HTTPClient"; filename "http://boot.home.kraxel.org/arch-x86_64/grubx64.efi"; } else if (option arch = 00:07) { next-server 192.168.2.14; filename "/arch-x86_64/grubx64.efi"; } else { next-server 192.168.2.14; filename "/pxelinux.0"; } }