[PATCH 0/1] KVM: x86/mmu: don't kill the VM on access to a disabled passthrough BAR
From: mike . malyshev
Date: Sun Jun 21 2026 - 09:37:19 EST
From: Mikhail Malyshev <mike.malyshev@xxxxxxxxx>
A guest with an assigned PCI device can crash its own VM by toggling
PCI_COMMAND.MEM on that device while another vCPU accesses the device's
BAR. KVM_RUN returns -EFAULT, which userspace (QEMU) treats as fatal.
This is a guest-triggerable, host-side VM kill, so I think it is worth
addressing in KVM rather than papering over it in userspace.
The window
==========
A passed-through BAR is mapped into the guest via a VM_IO/VM_PFNMAP VMA
whose fault handler (vfio_pci_mmap_fault()) refuses to install a PTE while
the device's memory space is disabled. When the guest clears
PCI_COMMAND.MEM:
- the kernel vfio config-write path zaps the BAR's userspace mapping;
- userspace's memory listener later removes the corresponding KVM
memslot.
A vCPU that faults on the BAR in the window after the mapping is zapped but
before the memslot is removed lands in the page fault path with a *valid*
memslot but a backing whose fault handler declines. hva_to_pfn_remapped()
returns an error, the gfn resolves to KVM_PFN_ERR_FAULT, and
kvm_handle_error_pfn() returns -EFAULT.
On bare metal the same access is an Unsupported Request (reads all ones,
writes dropped), not a fatal error. This series makes KVM emulate the
access as MMIO in that case, matching hardware, while leaving genuine
faults (e.g. a vanished anonymous backing, vma == NULL) returning -EFAULT
as before -- consistent with what tools/testing/selftests/kvm/
mmu_stress_test.c already asserts.
How it was found / confirmed
============================
The crash was originally hit in production on edge devices that pass an
Intel iGPU (Raptor Lake-P) through to a guest; the guest's display driver
clears PCI_COMMAND.MEM on one vCPU while another vCPU is mid-MMIO to BAR0.
To study it deterministically I built a reduced, hardware-light reproducer
(no specific guest OS required, the race is host-side):
- a Linux guest with any assigned PCI device whose BAR0 is mmap'd from
userspace (/sys/.../resource0) and hammered with a tight MMIO write
loop on one vCPU;
- a second thread that toggles PCI_COMMAND.MEM 1->0->1 on that device
via the VFIO config region.
Without this patch the VM dies within ~1 s. eBPF on the fault path showed
the -EFAULT originating in the faultin path (kvm_mmu_faultin_pfn ->
kvm_handle_error_pfn) with the memslot valid (flags=0, not
KVM_MEMSLOT_INVALID), no mmu_notifier invalidation in progress, and the
pfn equal to the GUP-error value -- i.e. the VM_PFNMAP fault handler
declining, exactly the case this patch targets. (An earlier attempt to
treat it as a stale-mapping race and retry on mmu_invalidate_retry_gfn()
did not help, because by the time of the fault the invalidation has
already completed and the seq is stable; that confirmed the failure is a
steady-state "device decoding disabled" condition, not a transient
invalidation, and led to the MMIO approach here.)
With the patch the same reproducer survived 200k toggle cycles, and a
fleet of 17 devices ran 48h with no recurrence.
Open questions for reviewers
============================
- hva_to_pfn_remapped() can in principle return an error for reasons
other than "fault handler declined" (e.g. an OOM from
fixup_user_fault()). Treating all of them as MMIO is what this patch
does for simplicity; I can instead plumb the specific condition through
if you'd prefer to narrow it.
- I could not find a clean way to add a selftest: a faithful regression
test needs a VM_PFNMAP backing whose fault handler can be toggled,
which from pure userspace means /dev/mem or a real assigned device
(neither CI-portable), or a dedicated test module (outside selftests).
Guidance on the preferred shape would be welcome.
Mikhail Malyshev (1):
KVM: x86/mmu: Emulate, don't kill the VM, on access to a disabled
passthrough BAR
arch/x86/kvm/mmu/mmu.c | 16 +++++++++++++++-
include/linux/kvm_host.h | 8 ++++++++
virt/kvm/kvm_main.c | 9 ++++++++-
3 files changed, 31 insertions(+), 2 deletions(-)
--
2.43.0