[PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs

From: Sean Christopherson
Date: Wed Aug 23 2023 - 21:02:13 EST

Next message: Vineet Gupta: "Re: [PATCH] arc: Explicitly include correct DT includes"
Previous message: Masahiro Yamada: "Re: [RFC][PATCH] kconfig: introduce listunknownconfig"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
when checking to see if a page fault is stale, as the sequence count is
stored as an "unsigned long" everywhere else in KVM. This fixes a bug
where KVM will effectively hang vCPUs due to always thinking page faults
are stale, which results in KVM refusing to "fix" faults.

mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
KVM is handling page faults to detect if userspace mappings relevant to
the guest were invalidated between snapshotting the counter and acquiring
mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
resolve the page fault is fresh. If KVM sees that the counter has
changed, KVM simply resumes the guest without fixing the fault.

What _should_ happen is that the source of the mmu_notifier invalidations
eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
again fix guest page fault(s).

But for a long-lived VM and/or a VM that the host just doesn't particularly
like, it's possible for a VM to be on the receiving end of 2 billion (with
a B) mmu_notifier invalidations. When that happens, bit 31 will be set in
mmu_invalidate_seq. This causes the value to be turned into a 32-bit
negative value when implicitly cast to an "int" by is_page_fault_stale(),
and then sign-extended into a 64-bit unsigned when the signed "int" is
implicitly cast back to an "unsigned long" on the call to
mmu_invalidate_retry_hva().

As a result of the casting and sign-extension, given a sequence counter of
e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing

if (0x8002dc25 != 0xffffffff8002dc25)

and signals that the page fault is stale and needs to be retried even
though the sequence counter is stable, and KVM effectively hangs any vCPU
that takes a page fault (EPT violation or #NPF when TDP is enabled).

Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
how KVM tracks the sequence counter snapshot.

Reported-by: Brian Rak <brak@xxxxxxxxx>
Reported-by: Amaan Cheval <amaan.cheval@xxxxxxxxx>
Reported-by: Eric Wheeler <kvm@xxxxxxxxxxxxxxxxxx>
Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@xxxxxxxxxxxxxxx
Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 230108a90cf3..beca03556379 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4212,7 +4212,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* root was invalidated by a memslot update or a relevant mmu_notifier fired.
*/
static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
- struct kvm_page_fault *fault, int mmu_seq)
+ struct kvm_page_fault *fault,
+ unsigned long mmu_seq)
{
struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);

base-commit: 802aacbbffe2512dce9f8f33ad99d01cfec435de
--
2.42.0.rc2.253.gd59a3bf2b4-goog

Next message: Vineet Gupta: "Re: [PATCH] arc: Explicitly include correct DT includes"
Previous message: Masahiro Yamada: "Re: [RFC][PATCH] kconfig: introduce listunknownconfig"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]