[PATCH 0/5] Introduce a quirk to control memslot zap behavior

From: Yan Zhao
Date: Thu Jun 13 2024 - 02:08:33 EST


Today "zapping only memslot leaf SPTEs" on moving/deleting a memslot is not
done. Instead, KVM opts to invalidate all page tables and generate fresh
new ones based on the new memslot layout (referred to as "zap all" for
short). This "zap all" behavior is of low overhead for most use cases, and
is adopted primarily due to a bug which caused VM instability when a VM is
with Nvidia Geforce GPU assigned (see link in patch 1).

However, the "zap all" behavior is not desired for certain specific
scenarios. e.g.
- It's not viable for TDX,
a) TDX requires root page of private page table remains unaltered
throughout the TD life cycle.
b) TDX mandates that leaf entries in private page table must be zapped
prior to non-leaf entries.
c) TDX requires re-accepting of private pages after page dropping.
- It's not performant for scenarios involving frequent deletion and
re-adding of numerous small memslots.

This series therefore introduces the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk,
enabling users to control the behavior of memslot zapping when a memslot is
moved/deleted.

The quirk is turned on by default, leading to invalidation/zapping to all
SPTEs when a memslot is moved/deleted.

Users have the option to turn off the quirk. Doing so will limit the
zapping to only leaf SPTEs within the range of memslot being moved/deleted.

This series has been tested with
- Normal VMs
w/ and w/o device assignment, and kvm selftests

- TDX guests.
Memslot deletion typically does not occur without device assignment for a
TD. Therefore, it is tested with shared device assignment.

Note: For TDX integration, the quirk is currently disabled via TDX code in
QEMU rather than being automatically disabled based on VM type in
KVM, which is not safe. A malfunctioning QEMU that fails to disable
the quirk could result in the shared EPT being invalidated while the
private EPT remains unaffected, as kvm_mmu_zap_all_fast() only
targets the shared EPT.

However, current kvm->arch.disabled_quirks is entirely
user-controlled, and there is no mechanism for users to verify if a
quirk has been disabled by the kernel.
We are therefore wondering which below options are better for TDX:

a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
besides the testing of kvm_check_has_quirk(). It is similar to
"all new VM types have the quirk disabled". e.g.

static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
{
     return kvm->arch.vm_type != KVM_X86_TDX_VM &&
         kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
}

b) Init the disabled_quirks based on VM type in kernel, extend
disabled_quirk querying/setting interface to enforce the quirk to
be disabled for TDX.

Patch 1: KVM changes.
Patch 2-5: Selftests updates. Verify memslot move/deletion functionality
with the quirk enabled/disabled.


Yan Zhao (5):
KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
KVM: selftests: Test slot move/delete with slot zap quirk
enabled/disabled
KVM: selftests: Allow slot modification stress test with quirk
disabled
KVM: selftests: Test memslot move in memslot_perf_test with quirk
disabled
KVM: selftests: Test private access to deleted memslot with quirk
disabled

Documentation/virt/kvm/api.rst | 6 ++++
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++++-
.../kvm/memslot_modification_stress_test.c | 19 ++++++++--
.../testing/selftests/kvm/memslot_perf_test.c | 12 ++++++-
.../selftests/kvm/set_memory_region_test.c | 29 ++++++++++-----
.../kvm/x86_64/private_mem_kvm_exits_test.c | 11 ++++--
8 files changed, 102 insertions(+), 15 deletions(-)

base-commit: dd5a440a31fae6e459c0d6271dddd62825505361
--
2.43.2