FreeNAS VM disk access errors, bisected to commit 6f1a4891a592

From: Marc Dionne
Date: Fri Apr 17 2020 - 12:06:55 EST


Hi,

Commit 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity
race") causes Linux VMs hosted on FreeNAS (bhyve hypervisor) to lose
access to their disk devices shortly after boot. The disks are zfs
zvols on the host, presented to each VM.

Background: I recently updated some fedora 31 VMs running under the
bhyve hypervisor (hosted on a FreeNAS mini), and they moved to a
distro 5.5 kernel (5.5.15). Shortly after reboot, the disks became
inaccessible with any operation getting EIO errors. Booting back into
a 5.4 kernel, everything was fine. I built a 5.7-rc1 kernel, which
showed the same symptoms, and was then able to bisect it down to
commit 6f1a4891a592. Note that the symptoms do not occur on every
boot, but often enough (roughly 80%) to make bisection possible.

Applying a manual revert of 6f1a4891a592 on top of mainline from
yesterday gives me a kernel that works fine.

Not sure which details are useful, but here are some bits that might
be relevant:
- FreeNAS host is running FreeNAS-11.3-U2
- efi/bios details:
efi: EFI v2.40 by BHYVE
efi: SMBIOS=0x7fb5b000 ACPI=0x7fb88000 ACPI 2.0=0x7fb88014
DMI: BHYVE, BIOS 1.00 03/14/2014
- A sample disk:
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: ATA-9: BHYVE SATA DISK, 001, max UDMA/133
ata4.00: 2147483680 sectors, multi 128: LBA48 NCQ (depth 32)
ata4.00: configured for UDMA/133
scsi 3:0:0:0: Direct-Access ATA BHYVE SATA DISK 001 PQ: 0 ANSI: 5
scsi 3:0:0:0: Attached scsi generic sg3 type 0
- The first sign of a problem on a "bad" kernel shows up as:
ata1.00: exception Emask 0x0 SAct 0x78000001 SErr 0x0 action 0x6 frozen

Thanks,
Marc