local vgpu display – status update

After a looong time things are finally landing upstream. Both vfio API update for vgpu local display support and support for that in the intel drm driver have been merged during the 4.16 merge window. Should not take too long to get the qemu changes merged upstream now, if all goes well the next release (2.12) will have it.

Tina Zhang wrote a nice blog article with both technical background and instructions for those who want play with it.

Running macOS as guest in kvm.

There are various approaches to run macOS as guest under kvm. One is to add apple-specific features to OVMF, as described by Gabriel L. Somlo. I’ve choose to use the Clover EFI bootloader instead. Here is how my setup looks like.

What you needed

First a bootable installer disk image. You can create a bootable usbstick using the createinstallmedia tool shipped with the installer. You can then dd the usb stick to a raw disk image.

Next a clover disk image. I’ve created a script which uses guestfish to generate a disk image from a clover iso image, with a custom config file. The advantage of having a separate disk only for clover is that you can easily update clover, downgrade clover and tweak the clover configuration without booting the virtual machine. So, if something goes wrong recovering is a lot easier.

Qemu. Version 2.10 (or newer) strongly recommended. macOS versions up to 10.12.3 work fine in qemu 2.9. macOS 10.12.4 requires fixes for the qemu applesmc emulation which got merged for qemu 2.10.

OVMF. Latest git works fine for me. Older OVMF versions trip over a recent qemu update and provides broken ACPI tables to the OS then. With the result that macOS doesn’t boot, even though ovmf itself shows no signs of trouble.

Configuring your virtual machine

Here are snippets of my libvirt config, with comments explaining the important things:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

The xmlns:qemu entry is needed for qemu-specific tweaks, that way we can ask libvirt to add extra arguments to the qemu command line.

    <type arch='x86_64' machine='pc-q35-2.9'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram template='/usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd'>/var/lib/libvirt/qemu/nvram/macos-test-org-base_VARS.fd</nvram>
    <bootmenu enable='yes'/>

Using the q35 machine type here, and the cutting edge edk2 builds from my firmware repo.

  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>Penryn</model>
    <feature policy='require' name='invtsc'/>

CPU. Penryn is known-good. The invtsc feature is needed because macOS uses the TSC for timekeeping. When asking to provide a fixed TSC frequency qemu will store the TSC frequency in a hypervisor cpuid leaf. And macOS will pick it up there. Without that macOS does a wild guess, likely gets it very wrong, and wall clock in your guest runs either way too fast or way too slow.

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/>
      <source dev='/dev/path/to/lvm/volume'/>
      <target dev='sda' bus='sata'/>

This is the system disk where macOS is be installed on. Attached as sata disk to the q35 ahci controller.

You also need the installmedia image. On a real macintosh you’ll typically use a usbstick. macOS doesn’t care much where the installer is stored though, so you can attach the image as sata disk too and it’ll work fine.

Finally you need the clover disk image. edk2 alone isn’t able to boot from the system or installer disk, so it’ll start clover. clover in turn will load the hfs+ filesystem driver so the other disks can be accessed, will offer a boot menu and allows to boot macOS.

    <interface type='network'>
      <source network='default'/>
      <model type='e1000-82545em'/>

Ethernet card. macOS has drivers for this model. Seems to have problems with link detection now and then. Set link status to down for a moment, then to up again (using virsh domif-setlink) gets the virtual machine online.

    <input type='tablet' bus='usb'/>
    <input type='keyboard' bus='usb'/>

USB tablet and keyboard, as input devices. Tablet allows to operate the mouse without pointer grabs which is much more convenient than using a virtual mouse.

      <model type='vga' vram='65536'/>

Qemu standard vga.

    <qemu:arg value='-readconfig'/>
    <qemu:arg value='/path/to/macintosh.cfg'/>

This is the extra configuration item for stuff not supported by libvirt. The macintosh.cfg file looks like this, adding the emulated smc device:

[device "smc"]
driver = "isa-applesmc"
osk = "<insert-real-osk-here>"

You can run Gabriels SmcDumpKey tool on a macintosh to figure what the osk is.

Configuring clover

I’m using config.plist.stripped.qemu as starting point. Here are the most important settings:

Kernel command line. There are lots of options you can start the kernel with. You might want try "-v" to start the kernel in verbose mode where it prints boot messages to the screen (like the linux kernel without "quiet"). For trouble-shooting, or to impress your friends.
Name of the volume clover should boot from by default. Put your system disk name here, otherwise clover will wait forever for you to pick a boot menu item.
As the Name says, the Display Resolution. Note that OVMF has a Display Resolution Setting too. Hit ESC at the splash screen to enter the Setup, then go to Device Manager / OVMF Platform Configuration / Change Preferred Resolution. The two Settings must match, otherwise macOS will boot with a scrambled display.


That should be it. Boot the virtual machine. Installing and using macOS should work as usual.

Final comment

Gabriel also has some comments on the legal side of this. Summary: Probably allowed by Apple on macintosh hardware, i.e. when running linux on your macintosh, then run macOS as guest there. If in doubt check with your lawyer.

Fresh Fedora 26 images uploaded

Fedora 26 is out of the door, and here are fresh fedora 26 images.

There are raspberry pi images. The aarch64 images requires a model 3, the armv7 image boots on both 2 and 3 models. Unlike the images for the previous fedora releases the new images use the standard fedora kernels instead of a custom kernel. So, the kernel update service for the older images will stop within the next weeks.

There are efi images for qemu. The i386 and x86_64 images use systemd-boot as bootloader. grub2 doesn’t work due to bug 1196114 (unless you create a boot menu entry manually in uefi setup). The arm images use grub2 as bootloader. armv7 isn’t supported by systemd-boot in the first place. The aarch64 versions throws an exception. The efi images can also be booted as container, using "systemd-nspawn --boot --image <file>", but you have to convert them to raw first.

The images don’t have a root password. You have to set one using "virt-customize -a <image> --root-password "password:<secret>", otherwise you can’t login after boot.

The images have been created with imagefish.

meson experiments

Seems the world of build systems is going to change. Traditional approach is make. Pimped up with autoconf and automake. But some newcomers provide new approaches to building your projects.

First, there is ninja-build. It’s a workhorse, roughly comparable to make. It isn’t really designed to be used standalone though. Typically the lowlevel ninja build files are generated by some highlevel build tool, similar to how Makefiles are generated by autotools.

Second, there is meson, a build tool which (on unix) by default uses ninja as backend. meson appears to become pretty popular.

So, lets have a closer look at it. I’m working on drminfo right now, a tool to dump information about drm devices, which also comes with a simple test tool, rendering a test image to the display. It is pretty small, doesn’t even use autotools, perfect for trying out something new. Also nice for this post as the build files are pretty small.

So, here is the Makefile:

CC      ?= gcc
CFLAGS  ?= -Os -g -std=c99
CFLAGS  += -Wall

TARGETS := drminfo drmtest gtktest

drminfo : CFLAGS += $(shell pkg-config --cflags libdrm cairo pixman-1)
drminfo : LDLIBS += $(shell pkg-config --libs libdrm cairo pixman-1)

drmtest : CFLAGS += $(shell pkg-config --cflags libdrm gbm epoxy cairo cairo-gl pixman-1)
drmtest : LDLIBS += $(shell pkg-config --libs libdrm gbm epoxy cairo cairo-gl pixman-1)
drmtest : LDLIBS += -ljpeg

gtktest : CFLAGS += $(shell pkg-config --cflags gtk+-3.0 cairo pixman-1)
gtktest : LDLIBS += $(shell pkg-config --libs gtk+-3.0 cairo pixman-1)
gtktest : LDLIBS += -ljpeg

all: $(TARGETS)

        rm -f $(TARGETS)
        rm -f *~ *.o

drminfo: drminfo.o drmtools.o
drmtest: drmtest.o drmtools.o render.o image.o
gtktest: gtktest.o render.o image.o

Thanks to pkg-config there is no need to use autotools just to figure the cflags and libraries needed, and the Makefile is short and easy to read. The only thing here you might not be familiar with are target specific variables.

Now, compare with the meson.build file:

project('drminfo', 'c')

# pkg-config deps
libdrm_dep    = dependency('libdrm')
gbm_dep       = dependency('gbm')
epoxy_dep     = dependency('epoxy')
cairo_dep     = dependency('cairo')
cairo_gl_dep  = dependency('cairo-gl')
pixman_dep    = dependency('pixman-1')
gtk3_dep      = dependency('gtk+-3.0')

# libjpeg dep
jpeg_dep      = declare_dependency(link_args : '-ljpeg')

drminfo_srcs  = [ 'drminfo.c', 'drmtools.c' ]
drmtest_srcs  = [ 'drmtest.c', 'drmtools.c', 'render.c', 'image.c' ]
gtktest_srcs  = [ 'gtktest.c', 'render.c', 'image.c' ]

drminfo_deps  = [ libdrm_dep, cairo_dep, pixman_dep ]
drmtest_deps  = [ libdrm_dep, gbm_dep, epoxy_dep,
                  cairo_dep, cairo_gl_dep, pixman_dep, jpeg_dep ]
gtktest_deps  = [ gtk3_dep,
                  cairo_dep, cairo_gl_dep, pixman_dep, jpeg_dep ]

           sources      : drminfo_srcs,
           dependencies : drminfo_deps)
           sources      : drmtest_srcs,
           dependencies : drmtest_deps)
           sources      : gtktest_srcs,
           dependencies : gtktest_deps,
           install      : false)

Pretty straight forward translation. So, what are the differences?

First, meson and ninja have built-in support a bunch of features. No need to put anything into your build files to use them, they are just there:

  • Automatic header dependencies. When a header file changes all source files which include it get rebuilt.
  • Automatic build system dependencies: When meson.build changes the ninja build files are updated.
  • Rebuilds on command changes: When the build command line for a target changes the target is rebuilt.

Sure, you can do all that with make too, the linux kernel build system does it for example. But then your Makefiles will be a order of magnitude larger than the one shown above, because all the clever stuff is in the build files instead of the build tool.

Second meson keeps the object files strictly separated by target. The project has some source files shared by multiple executables. drmtools.c for example is used by both drminfo and drmtest. With the Makefile above it get build once. meson builds it separately for each target, with the cflags for the specific target.

Another nice feature is that ninja automatically does parallel builds. It figures the number of processors available and runs (by default) that many jobs.

Overall I’m pretty pleased, I’ll probably use meson more frequently in the future. If you want try it out too I’d suggest to start with the tutorial.

Fedora 25 images for qemu and raspberry pi 3 uploaded

I’ve uploaded three new images to https://www.kraxel.org/repos/rpi2/images/.

The fedora-25-rpi3 image is for the raspberry pi 3.
The fedora-25-efi images are for qemu (virt machine type with edk2 firmware).

The images don’t have a root password set. You must use libguestfs-tools to set the root password …

virt-customize -a <image> --root-password "password:<your-password-here>>

… otherwise you can’t login after boot.

The rpi3 image is partitioned simliar to the official (armv7) fedora 25 images: The firmware is on a separate vfat partition for only firmware and uboot (mounted at /boot/fw). /boot is a ext2 filesystem now and holds the kernels only. Well, for compatibility reasons with the f24 images (all images use the same kernel rpms) firmware files are in /boot too, but they are not used. So, if you want tweak something in config.txt, go to /boot/fw not /boot.

The rpi3 images also have swap is commented out in /etc/fstab. The reason is that the swap partition must be reinitialized, because swap partitions generated when running on a 64k pages kernel (CONFIG_ARM64_64K_PAGES=y) are not compatible with 4k pages (CONFIG_ARM64_4K_PAGES=y). This must be fixed, by running “swapon --fixpgsz <device>” once, then you can uncomment the swap line in /etc/fstab.

local display for intel vgpu starts working

Intel vgpu guest display shows up in the qemu gtk window. This is a linux kernel booted to the dracut emergency shell prompt, with the framebuffer console running @ inteldrmfb. Booting a full linux guest with X11 and/or wayland not tested yet. There are rendering glitches too, running “dmesg” looks like this:

So, a bunch of issues to solve before this is ready for users, but it’s a nice start.

For the brave:

host kernel: https://www.kraxel.org/cgit/linux/log/?h=intel-vgpu
qemu: https://www.kraxel.org/cgit/qemu/log/?h=work/intel-vgpu

Take care, both branches are moving targets (aka: rebasing at times).

tweak arm images with libguestfs-tools

So, when using the official fedora arm images on your raspberry pi (or any other arm board) board you might have faced the problem that it is not easy to use them for a headless (i.e. no keyboard and display connected) machine. There is no default password, fedora asks you to set one on the first boot instead. Which is from a security point of view surely better than shipping with a fixed password. But for headless machines it is quite inconvenient …

Luckily there is an easy way out. You can use libguestfs-tools. The tools have been created to configure virtual machine images (this is where the name comes from). But the tools work fine with sdcards too.

I’m using a usb sdcard reader which shows up as /dev/sdc on my system. I can just pass /dev/sdc as image to the tools (take care, the device is probably something else for you). For example, to set a root password:

virt-customize -a /dev/sdc --root-password "password:<your-password-here>"

The initial setup on the first boot is a systemd service, and it can be turned off by simply removing the symlinks which enable the service:

virt-customize -a /dev/sdc \
  --delete /etc/systemd/system/multi-user.target.wants/initial-setup.service \
  --delete /etc/systemd/system/graphical.target.wants/initial-setup.service

You can use virt-copy-in (or virt-tar-in) to copy config files to the disk image. Small (or empty) configuration files can also be created with the write command:

virt-customize -a /dev/sdc --write "/.autorelabel:"

Adding the .autorelabel file will force selinux relabeling on the first boot (takes a while). It is a good idea to do that in case you copy files to the sdcard, to make sure the new files are labeled correctly. Especially in case you copy security sensitive things like ssh keys or ssh config files. Without relabeling selinux will not allow sshd access those files, which in turn can break remote logins.

There is alot more the virt-* tools can do for you. Check out the manual pages for more info. And you can easily script things, virt-customize has a --commands-from-file switch which accepts a file with a list of commands.

virtual gpu support landing upstream

The upstreaming process of virtual gpu support (vgpu) made a big step forward with the 4.10 merge window. Two important pieces have been merged:

First, the mediated device framework (mdev). Basically this allows kernel drivers to present virtual pci devices, using the vfio framework and interfaces. Both nvidia and intel will use mdev to partition the physical gpu of the host into multiple virtual devices which then can be assigned to virtual machines.

Second, intel landed initial mdev support for the i915 driver too. There is quite some work left to do in future kernel releases though. Accessing to the guest display is not supported yet, so you must run x11vnc or simliar tools in the guest to see the screen. Also there are some stability issues to find and fix.

If you want play with this nevertheless, here is how to do it. But be prepared for crashes and better don’t try this on a production machine.

On the host: create virtual devices

On the host machine you obviously need a 4.10 kernel. Also the intel graphics device (igd) must be broadwell or newer. In the kernel configuration enable vfio and mdev (all CONFIG_VFIO_* options). Enable CONFIG_DRM_I915_GVT and CONFIG_DRM_I915_GVT_KVMGT for intel vgpu support. Building the mtty sample driver (CONFIG_SAMPLE_VFIO_MDEV_MTTY, a virtual serial port) can be useful too, for testing.

Boot the new kernel. Load all modules: vfio-pci, vfio-mdev, optionally mtty. Also i915 and kvmgt of course, but that probably happened during boot already.

Go to the /sys/class/mdev_bus directory. This should look like this:

kraxel@broadwell ~# cd /sys/class/mdev_bus
kraxel@broadwell .../class/mdev_bus# ls -l
total 0
lrwxrwxrwx. 1 root root 0 17. Jan 10:51 0000:00:02.0 -> ../../devices/pci0000:00/0000:00:02.0
lrwxrwxrwx. 1 root root 0 17. Jan 11:57 mtty -> ../../devices/virtual/mtty/mtty

Each driver with mdev support has a directory there. Go to $device/mdev_supported_types to check what kind of virtual devices you can create.

kraxel@broadwell .../class/mdev_bus# cd 0000:00:02.0/mdev_supported_types
kraxel@broadwell .../0000:00:02.0/mdev_supported_types# ls -l
total 0
drwxr-xr-x. 3 root root 0 17. Jan 11:59 i915-GVTg_V4_1
drwxr-xr-x. 3 root root 0 17. Jan 11:57 i915-GVTg_V4_2
drwxr-xr-x. 3 root root 0 17. Jan 11:59 i915-GVTg_V4_4

As you can see intel supports three different configurations on my machine. The configuration (basically the amount of video memory) differs, and the number of instances you can create. Check the description and available_instance files in the directories:

kraxel@broadwell .../0000:00:02.0/mdev_supported_types# cd i915-GVTg_V4_2
kraxel@broadwell .../mdev_supported_types/i915-GVTg_V4_2# cat description 
low_gm_size: 64MB
high_gm_size: 192MB
fence: 4
kraxel@broadwell .../mdev_supported_types/i915-GVTg_V4_2# cat available_instance 

Now it is possible to create virtual devices by writing a UUID into the create file:

kraxel@broadwell .../mdev_supported_types/i915-GVTg_V4_2# uuid=$(uuidgen)
kraxel@broadwell .../mdev_supported_types/i915-GVTg_V4_2# echo $uuid
kraxel@broadwell .../mdev_supported_types/i915-GVTg_V4_2# sudo sh -c "echo $uuid > create"

The new vgpu device will show up as subdirectory of the host gpu:

kraxel@broadwell .../mdev_supported_types/i915-GVTg_V4_2# cd ../../$uuid
kraxel@broadwell .../0000:00:02.0/f321853c-c584-4a6b-b99a-3eee22a3919c# ls -l
total 0
lrwxrwxrwx. 1 root root    0 17. Jan 12:31 driver -> ../../../../bus/mdev/drivers/vfio_mdev
lrwxrwxrwx. 1 root root    0 17. Jan 12:35 iommu_group -> ../../../../kernel/iommu_groups/10
lrwxrwxrwx. 1 root root    0 17. Jan 12:35 mdev_type -> ../mdev_supported_types/i915-GVTg_V4_2
drwxr-xr-x. 2 root root    0 17. Jan 12:35 power
--w-------. 1 root root 4096 17. Jan 12:35 remove
lrwxrwxrwx. 1 root root    0 17. Jan 12:31 subsystem -> ../../../../bus/mdev
-rw-r--r--. 1 root root 4096 17. Jan 12:35 uevent

You can see the device landed in iommu group 10. We’ll need that in a moment.

On the host: configure guests

Ideally this would be as simple as adding <hostdev> to your guests libvirt xml config. The mdev devices don’t have a pci address on the host though, and because of that they must be passed to qemu using the sysfs device path instead of the pci address. libvirt doesn’t (yet) support sysfs paths though, so it is a bit more complicated for now. Alot of the setup libvirt does for hostdevs automatically must be done manually instead.

First, we must allow qemu access /dev. By default libvirt uses control groups to restrict access. That must be turned off. Edit /etc/libvirt/qemu.conf. Uncomment the cgroup_controllers line. Remove "devices" from the list. Restart libvirtd.

Second, we must allow qemu access the iommu group (10 in my case). A simple chmod will do:

kraxel@broadwell ~# chmod 666 /dev/vfio/10

Third, we must update the guest configuration:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  [ ... ]
  <currentMemory unit='KiB'>1048576</currentMemory>
  [ ... ]
    <qemu:arg value='-device'/>
    <qemu:arg value='vfio-pci,addr=05.0,sysfsdev=/sys/class/mdev_bus/0000:00:02.0/f321853c-c584-4a6b-b99a-3eee22a3919c'/>

There is special qemu namespace which can be used to pass extra command line arguments to qemu. We do this here to use a qemu feature not yet supported by libvirt (use sysfs paths for vfio-pci). Also we must explicitly allow to lock down guest memory.

Now we are ready to go:

kraxel@broadwell ~# virsh start --console $guest

In the guest

It is a good idea to prepare the guest a bit before adding the vgpu to the guest configuration. Setup a serial console, so you can talk to it even in case graphics are broken. Blacklist the i915 module and load it manually, at least until you have a known-working configuration. Also booting to runlevel 3 (aka multi-user.target) instead of 5 (aka graphical.target) and starting the xorg server manually is better for now.

For the guest machine intel recommends the 4.8 kernel. In theory newer kernels should work too, in practice they didn’t last time I tested (4.10-rc2). Also make sure the xorg server uses the modesetting driver, the intel driver didn’t work in my testing. This config file will do:

root@guest ~# cat /etc/X11/xorg.conf.d/intel.conf 
Section "Device"
        Identifier  "Card0"
#       Driver      "intel"
        Driver      "modesetting"
        BusID       "PCI:0:5:0"

I’m starting the xorg server with x11vnc, xterm and mwm (motif window manager) using this little script:


# debug
echo "# $0: DISPLAY=$DISPLAY"

# start server
if test "$DISPLAY" != ":4"; then
        echo "# $0: starting Xorg server"
        exec startx $0 -- /usr/bin/Xorg :4
        exit 1
echo "# $0: starting session"

# configure session
xrdb $HOME/.Xdefaults

# start clients
x11vnc -rfbport 5904 &
xterm &
exec mwm

The session runs on display 4, so you should be able to connect from the host this way:

kraxel@broadwell ~# vncviewer $guest_ip:4

Have fun!

raspberry pi status update

You might have noticed meanwhile that Fedora 25 ships with raspberry pi support and might have wondered what this means for my packages and images.

The fedora images use a different partition layout than my images. Specifically the fedora images have a separate vfat partition for the firmware and uboot and the /boot partition with the linux kernels lives on ext2. My images have a vfat /boot partition with everything (firmware, uboot, kernels), and the rpms in my repo will only work properly on such a sdcard. You can’t mix & match stuff and there is no easy way to switch from my sdcard layout to the fedora one.

Current plan forward:

I will continue to build rpms for armv7 (32bit) for a while for existing installs. There will be no new fedora 25 images though. For new devices or reinstalls I recommend to use the official fedora images instead.

Fedora 25 has no aarch64 (64bit) support, although it is expected to land in one of the next releases. Most likely I’ll create new Fedora 25 images for aarch64 (after final release), and of course I’ll continue to build kernel updates too.

Finally some words on the upstream kernel status:

The 4.8 dwc2 usb host adapter driver has some serious problems on the raspberry pi. 4.7 works ok, and so do the 4.9-rc kernels. But 4.7 doesn’t get stable updates any more, so I jumped straight to the 4.9-rc kernels for mainline. You might have noticed already if you updated your rpi recently. The raspberry pi foundation kernels don’t suffer from that issue as they use a different (not upstream) driver for the dwc.