Re: [PATCH] KVM: x86/mmu: fix potential races when walking host page table

From: Paolo Bonzini
Date: Fri Apr 29 2022 - 05:10:49 EST


On 4/29/22 05:17, Mingwei Zhang wrote:
@@ -2838,11 +2836,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
*/
hva = __gfn_to_hva_memslot(slot, gfn);
- pte = lookup_address_in_mm(kvm->mm, hva, &level);
- if (unlikely(!pte))
- return PG_LEVEL_4K;
-
- return level;
+ return kvm_lookup_address_level_in_mm(kvm, hva);
}

The function can be just inlined in host_pfn_mapping_level.

int kvm_mmu_max_mapping_level(struct kvm *kvm,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 951d0a78ccdae..61406efe4ea7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13044,6 +13044,76 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
}
EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
+/*
+ * Lookup the valid mapping level for a virtual address in the current mm.
+ * Return the level of the mapping if there is present one. Otherwise, always
+ * return PG_LEVEL_NONE.

This is a change in semantics, because host_pfn_mapping_level never returned PG_LEVEL_NONE. Returning PG_LEVEL_4K for a non-present entry is safe; if it happens, MMU notifiers will force a retry. If the function is inlined in host_pfn_mapping_level, returning PG_LEVEL_4K would allow making the semantic change in a separate patch.

In fact, kvm_mmu_hugepage_adjust will go on and set fault->req_level and fault->goal_level to PG_LEVEL_NONE, which is wrong even if it does not cause havoc.

+ * Note: the information retrieved may be stale. Use it with causion.

The comment should point out that mmu_notifier_retry make it safe to use the stale value---of course this is only true if kvm_lookup_address_level_in_mm is used where mmu_notifier_retry is used, and might be another point in favor of inlining.

+ ptep = pte_offset_map(&pmd, address);
+ pte = ptep_get(ptep);
+ if (pte_present(pte)) {
+ pte_unmap(ptep);
+ level = PG_LEVEL_4K;
+ goto out;
+ }
+ pte_unmap(ptep);

Not needed as long as PG_LEVEL_4K is returned for a non-present PTE.

+out:
+ local_irq_restore(flags);
+ return level;
+}
+EXPORT_SYMBOL_GPL(kvm_lookup_address_level_in_mm);

Exporting is not needed.

Thanks for writing the walk code though. I'll adapt it and integrate the patch.

Paolo

EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 588792f003345..f1cdcc8483bd0 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -454,4 +454,6 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
unsigned int port, void *data, unsigned int count,
int in);
+int kvm_lookup_address_level_in_mm(struct kvm *kvm, unsigned long address);
+
#endif

base-commit: 2a39d8b39bffdaf1a4223d0d22f07baee154c8f3